Speech_synthesizer

Speech synthesis

Artificial production of human speech

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.^[1] The reverse process is speech recognition.

Automatic announcement

A synthetic voice announcing an arriving train in Sweden.

Problems playing this file? See media help.

Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely "synthetic" voice output.^[2]

The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood clearly. An intelligible text-to-speech program allows people with visual impairments or reading disabilities to listen to written words on a home computer. Many computer operating systems have included speech synthesizers since the early 1990s.

Overview of a typical TTS system

A text-to-speech system (or "engine") is composed of two parts:^[3] a front-end and a back-end. The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. The process of assigning phonetic transcriptions to words is called text-to-phoneme or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. The back-end—often referred to as the synthesizer—then converts the symbolic linguistic representation into sound. In certain systems, this part includes the computation of the target prosody (pitch contour, phoneme durations),^[4] which is then imposed on the output speech.

Share this article:

This article uses material from the Wikipedia article Speech_synthesizer, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.

[1] [1]
Allen, Jonathan; Hunnicutt, M. Sharon; Klatt, Dennis (1987). From Text to Speech: The MITalk system. Cambridge University Press. ISBN 978-0-521-30641-6.

[2] [2]
Rubin, P.; Baer, T.; Mermelstein, P. (1981). "An articulatory synthesizer for perceptual research". Journal of the Acoustical Society of America. 70 (2): 321–328. Bibcode:1981ASAJ...70..321R. doi:10.1121/1.386780.

[3] [3]
van Santen, Jan P. H.; Sproat, Richard W.; Olive, Joseph P.; Hirschberg, Julia (1997). Progress in Speech Synthesis. Springer. ISBN 978-0-387-94701-3.

[4] [4]
Van Santen, J. (April 1994). "Assignment of segmental duration in text-to-speech synthesis". Computer Speech & Language. 8 (2): 95–128. doi:10.1006/csla.1994.1005.

[Helsinki-5] [5]
History and Development of Speech Synthesis, Helsinki University of Technology, Retrieved on November 4, 2006

[6] [6]
Mechanismus der menschlichen Sprache nebst der Beschreibung seiner sprechenden Maschine ("Mechanism of the human speech with description of its speaking machine", J. B. Degen, Wien). (in German)

[7] [7]
Mattingly, Ignatius G. (1974). Sebeok, Thomas A. (ed.). "Speech synthesis for phonetic and phonological models" (PDF). Current Trends in Linguistics. 12. Mouton, The Hague: 2451–2487. Archived from the original (PDF) on 2013-05-12. Retrieved 2011-12-13.

[8] [8]
Klatt, D (1987). "Review of text-to-speech conversion for English". Journal of the Acoustical Society of America. 82 (3): 737–93. Bibcode:1987ASAJ...82..737K. doi:10.1121/1.395275. PMID 2958525.

[9] [9]
Lambert, Bruce (March 21, 1992). "Louis Gerstman, 61, a Specialist In Speech Disorders and Processes". The New York Times.

[10] [10]
"Arthur C. Clarke Biography". Archived from the original on December 11, 1997. Retrieved 5 December 2017.

[11] [11]
"Where "HAL" First Spoke (Bell Labs Speech Synthesis website)". Bell Labs. Archived from the original on 2000-04-07. Retrieved 2010-02-17.

[12] [12]
Anthropomorphic Talking Robot Waseda-Talker Series Archived 2016-03-04 at the Wayback Machine

[13] [13]
Gray, Robert M. (2010). "A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol" (PDF). Found. Trends Signal Process. 3 (4): 203–303. doi:10.1561/2000000036. ISSN 1932-8346. Archived (PDF) from the original on 2022-10-09.

[14] [14]
Zheng, F.; Song, Z.; Li, L.; Yu, W. (1998). "The Distance Measure for Line Spectrum Pairs Applied to Speech Recognition" (PDF). Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP'98) (3): 1123–6. Archived (PDF) from the original on 2022-10-09.

[ieee-15] [15]
"List of IEEE Milestones". IEEE. Retrieved 15 July 2019.

[ItakuraHistory-16] [16]
"Fumitada Itakura Oral History". IEEE Global History Network. 20 May 2009. Retrieved 2009-07-21.

[17] [17]
Billi, Roberto; Canavesio, Franco; Ciaramella, Alberto; Nebbia, Luciano (1 November 1995). "Interactive voice technology at work: The CSELT experience". Speech Communication. 17 (3): 263–271. doi:10.1016/0167-6393(95)00030-R.

[18] [18]
Sproat, Richard W. (1997). Multilingual Text-to-Speech Synthesis: The Bell Labs Approach. Springer. ISBN 978-0-7923-8027-6.

[19] [19]
[TSI Speech+ & other speaking calculators]

[20] [20]
Gevaryahu, Jonathan, [ "TSI S14001A Speech Synthesizer LSI Integrated Circuit Guide"]^{[dead link]}

[21] [21]
Breslow, et al. US 4326710: "Talking electronic game", April 27, 1982

[22] [22]
Voice Chess Challenger

[23] [23]
Gaming's most important evolutions Archived 2011-06-15 at the Wayback Machine, GamesRadar

[24] [24]
Adlum, Eddie (November 1985). "The Replay Years: Reflections from Eddie Adlum". RePlay. Vol. 11, no. 2. pp. 134-175 (160-3).

[25] [25]
Szczepaniak, John (2014). The Untold History of Japanese Game Developers. Vol. 1. SMG Szczepaniak. pp. 544–615. ISBN 978-0992926007.

[NewYorkTimes-26] [26]
CadeMetz (2020-08-20). "Ann Syrdal, Who Helped Give Computers a Female Voice, Dies at 74". The New York Times. Retrieved 2020-08-23.

[27] [27]
Kurzweil, Raymond (2005). The Singularity is Near. Penguin Books. ISBN 978-0-14-303788-0.

[28] [28]
Taylor, Paul (2009). Text-to-speech synthesis. Cambridge, UK: Cambridge University Press. p. 3. ISBN 9780521899277.

[29] [29]
Alan W. Black, Perfect synthesis for all of the people all of the time. IEEE TTS Workshop 2002.

[30] [30]
John Kominek and Alan W. Black. (2003). CMU ARCTIC databases for speech synthesis. CMU-LTI-03-177. Language Technologies Institute, School of Computer Science, Carnegie Mellon University.

[31] [31]
Julia Zhang. Language Generation and Speech Synthesis in Dialogues for Language Learning, masters thesis, Section 5.6 on page 54.

[32] [32]
William Yang Wang and Kallirroi Georgila. (2011). Automatic Detection of Unnatural Word-Level Segments in Unit-Selection Speech Synthesis, IEEE ASRU 2011.

[33] [33]
"Pitch-Synchronous Overlap and Add (PSOLA) Synthesis". Archived from the original on February 22, 2007. Retrieved 2008-05-28.

[34] [34]
T. Dutoit, V. Pagel, N. Pierret, F. Bataille, O. van der Vrecken. The MBROLA Project: Towards a set of high quality speech synthesizers of use for non commercial purposes. ICSLP Proceedings, 1996.

[35] [35]
Muralishankar, R; Ramakrishnan, A.G.; Prathibha, P (2004). "Modification of Pitch using DCT in the Source Domain". Speech Communication. 42 (2): 143–154. doi:10.1016/j.specom.2003.05.001.

[36] [36]
"Education: Marvel of The Bronx". Time. 1974-04-01. ISSN 0040-781X. Retrieved 2019-05-28.

[37] [37]
"1960 - Rudy the Robot - Michael Freeman (American)". cyberneticzoo.com. 2010-09-13. Retrieved 2019-05-23.

[38] [38]
New York Magazine. New York Media, LLC. 1979-07-30.

[39] [39]
The Futurist. World Future Society. 1978. pp. 359, 360, 361.

[40] [40]
L.F. Lamel, J.L. Gauvain, B. Prouts, C. Bouhier, R. Boesch. Generation and Synthesis of Broadcast Messages, Proceedings ESCA-NATO Workshop and Applications of Speech Technology, September 1993.

[41] [41]
Dartmouth College: Music and Computers Archived 2011-06-08 at the Wayback Machine, 1993.

[42] [42]
Examples include Astro Blaster, Space Fury, and Star Trek: Strategic Operations Simulator

[43] [43]
Examples include Star Wars, Firefox, Return of the Jedi, Road Runner, The Empire Strikes Back, Indiana Jones and the Temple of Doom, 720°, Gauntlet, Gauntlet II, A.P.B., Paperboy, RoadBlasters, Vindicators Part II, Escape from the Planet of the Robot Monsters.

[44] [44]
John Holmes and Wendy Holmes (2001). Speech Synthesis and Recognition (2nd ed.). CRC. ISBN 978-0-7484-0856-6.

[:0-45] [45]
Lucero, J. C.; Schoentgen, J.; Behlau, M. (2013). "Physics-based synthesis of disordered voices" (PDF). Interspeech 2013. Lyon, France: International Speech Communication Association: 587–591. doi:10.21437/Interspeech.2013-161. S2CID 17451802. Retrieved Aug 27, 2015.

[:1-46] [46]
Englert, Marina; Madazio, Glaucya; Gielow, Ingrid; Lucero, Jorge; Behlau, Mara (2016). "Perceptual error identification of human and synthesized voices". Journal of Voice. 30 (5): 639.e17–639.e23. doi:10.1016/j.jvoice.2015.07.017. PMID 26337775.

[47] [47]
"The HMM-based Speech Synthesis System". Hts.sp.nitech.ac.j. Retrieved 2012-02-22.

[48] [48]
Remez, R.; Rubin, P.; Pisoni, D.; Carrell, T. (22 May 1981). "Speech perception without traditional speech cues" (PDF). Science. 212 (4497): 947–949. Bibcode:1981Sci...212..947R. doi:10.1126/science.7233191. PMID 7233191. Archived from the original (PDF) on 2011-12-16. Retrieved 2011-12-14.

[arxivmello-49] [49]
Valle, Rafael (2020). "Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens". arXiv:1910.11997 [eess].

[towardds-50] [50]
Chandraseta, Rionaldi (2021-01-19). "Generate Your Favourite Characters' Voice Lines using Machine Learning". Towards Data Science. Archived from the original on 2021-01-21. Retrieved 2021-01-23.

[automaton-51] [51]
Kurosawa, Yuki (2021-01-19). "ゲームキャラ音声読み上げソフト「15.ai」公開中。『Undertale』や『Portal』のキャラに好きなセリフを言ってもらえる". AUTOMATON. Archived from the original on 2021-01-19. Retrieved 2021-01-19.

[Denfaminicogamer-52] [52]
Yoshiyuki, Furushima (2021-01-18). "『Portal』のGLaDOSや『UNDERTALE』のサンズがテキストを読み上げてくれる。文章に込められた感情まで再現することを目指すサービス「15.ai」が話題に". Denfaminicogamer. Archived from the original on 2021-01-18. Retrieved 2021-01-18.

[53] [53]
"Generative AI comes for cinema dubbing: Audio AI startup ElevenLabs raises pre-seed". Sifted. January 23, 2023. Retrieved 2023-02-03.

[:12-54] [54]
Ashworth, Boone (April 12, 2023). "AI Can Clone Your Favorite Podcast Host's Voice". Wired. Retrieved 2023-04-25.

[55] [55]
WIRED Staff. "This Podcast Is Not Hosted by AI Voice Clones. We Swear". Wired. ISSN 1059-1028. Retrieved 2023-07-25.

[:34-56] [56]
Wiggers, Kyle (2023-06-20). "Voice-generating platform ElevenLabs raises $19M, launches detection tool". TechCrunch. Retrieved 2023-07-25.

[57] [57]
Bonk, Lawrence. "ElevenLabs' Powerful New AI Tool Lets You Make a Full Audiobook in Minutes". Lifewire. Retrieved 2023-07-25.

[58] [58]
Zhu, Jian (2020-05-25). "Probing the phonetic and phonological knowledge of tones in Mandarin TTS models". Speech Prosody 2020. ISCA: ISCA: 930–934. arXiv:1912.10915. doi:10.21437/speechprosody.2020-190. S2CID 209444942.

[59] [59]
Smith, Hannah; Mansted, Katherine (April 1, 2020). Weaponised deep fakes: National security and democracy. Vol. 28. Australian Strategic Policy Institute. pp. 11–13. ISSN 2209-9689.{{cite book}}: CS1 maint: date and year (link)

[60] [60]
Lyu, Siwei (2020). "Deepfake Detection: Current Challenges and Next Steps". 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). pp. 1–6. arXiv:2003.09234. doi:10.1109/icmew46912.2020.9105991. ISBN 978-1-7281-1485-9. S2CID 214605906. Retrieved 2022-06-29.

[Audio_deepfake_:0-61] [61]
Diakopoulos, Nicholas; Johnson, Deborah (June 2020). "Anticipating and addressing the ethical implications of deepfakes in the context of elections". New Media & Society. 23 (7) (published 2020-06-05): 2072–2098. doi:10.1177/1461444820925811. ISSN 1461-4448. S2CID 226196422.

[62] [62]
Murphy, Margi (20 February 2024). "Deepfake Audio Boom Exploits One Billion-Dollar Startup's AI". Bloomberg.

[Audio_deepfake_:10-63] [63]
Chadha, Anupama; Kumar, Vaibhav; Kashyap, Sonu; Gupta, Mayank (2021), Singh, Pradeep Kumar; Wierzchoń, Sławomir T.; Tanwar, Sudeep; Ganzha, Maria (eds.), "Deepfake: An Overview", Proceedings of Second International Conference on Computing, Communications, and Cyber-Security, Lecture Notes in Networks and Systems, vol. 203, Singapore: Springer Singapore, pp. 557–566, doi:10.1007/978-981-16-0733-2_39, ISBN 978-981-16-0732-5, S2CID 236666289, retrieved 2022-06-29

[Audio_deepfake_:11-64] [64]
"AI gave Val Kilmer his voice back. But critics worry the technology could be misused". Washington Post. ISSN 0190-8286. Retrieved 2022-06-29.

[65] [65]
Etienne, Vanessa (August 19, 2021). "Val Kilmer Gets His Voice Back After Throat Cancer Battle Using AI Technology: Hear the Results". PEOPLE.com. Retrieved 2022-07-01.

[66] [66]
Newman, Lily Hay. "AI-Generated Voice Deepfakes Aren't Scary Good—Yet". Wired. ISSN 1059-1028. Retrieved 2023-07-25.

[67] [67]
"Speech synthesis". World Wide Web Organization.

[68] [68]
"Blizzard Challenge". Festvox.org. Retrieved 2012-02-22.

[69] [69]
"Smile -and the world can hear you". University of Portsmouth. January 9, 2008. Archived from the original on May 17, 2008.

[70] [70]
"Smile – And The World Can Hear You, Even If You Hide". Science Daily. January 2008.

[71] [71]
Drahota, A. (2008). "The vocal communication of different kinds of smile" (PDF). Speech Communication. 50 (4): 278–287. doi:10.1016/j.specom.2007.10.001. S2CID 46693018. Archived from the original (PDF) on 2013-07-03.

[72] [72]
Muralishankar, R.; Ramakrishnan, A. G.; Prathibha, P. (February 2004). "Modification of pitch using DCT in the source domain". Speech Communication. 42 (2): 143–154. doi:10.1016/j.specom.2003.05.001.

[73] [73]
Prathosh, A. P.; Ramakrishnan, A. G.; Ananthapadmanabha, T. V. (December 2013). "Epoch extraction based on integrated linear prediction residual using plosion index". IEEE Trans. Audio Speech Language Processing. 21 (12): 2471–2480. doi:10.1109/TASL.2013.2273717. S2CID 10491251.

[TI_will_exit_dedicated_speech-synthesis_chips,_transfer_products_to_Sensory-74] [74]
EE Times. "TI will exit dedicated speech-synthesis chips, transfer products to Sensory Archived 2012-05-28 at the Wayback Machine." June 14, 2001.

[75] [75]
"1400XL/1450XL Speech Handler External Reference Specification" (PDF). Archived from the original (PDF) on 2012-03-24. Retrieved 2012-02-22.

[demo-76] [76]
"It Sure Is Great To Get Out Of That Bag!". folklore.org. Retrieved 2013-03-24.

[77] [77]
"Amazon Polly". Amazon Web Services, Inc. Retrieved 2020-04-28.

[78] [78]
Miner, Jay; et al. (1991). Amiga Hardware Reference Manual (3rd ed.). Addison-Wesley Publishing Company, Inc. ISBN 978-0-201-56776-2.

[79] [79]
Devitt, Francesco (30 June 1995). "Translator Library (Multilingual-speech version)". Archived from the original on 26 February 2012. Retrieved 9 April 2013.

[Narrator-80] [80]
"Accessibility Tutorials for Windows XP: Using Narrator". Microsoft. 2011-01-29. Archived from the original on June 21, 2003. Retrieved 2011-01-29.

[microsoft.com-81] [81]
"How to configure and use Text-to-Speech in Windows XP and in Windows Vista". Microsoft. 2007-05-07. Retrieved 2010-02-17.

[82] [82]
Jean-Michel Trivi (2009-09-23). "An introduction to Text-To-Speech in Android". Android-developers.blogspot.com. Retrieved 2010-02-17.

[83] [83]
Andreas Bischoff, The Pediaphon – Speech Interface to the free Wikipedia Encyclopedia for Mobile Phones, PDA's and MP3-Players, Proceedings of the 18th International Conference on Database and Expert Systems Applications, Pages: 575–579 ISBN 0-7695-2932-1, 2007

[84] [84]
"gnuspeech". Gnu.org. Retrieved 2010-02-17.

[85] [85]
"Smithsonian Speech Synthesis History Project (SSSHP) 1986–2002". Mindspring.com. Archived from the original on 2013-10-03. Retrieved 2010-02-17.

[GoogleLearningTransferToTTS2018-86] [86]
Jia, Ye; Zhang, Yu; Weiss, Ron J. (2018-06-12), "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis", Advances in Neural Information Processing Systems, 31: 4485–4495, arXiv:1806.04558

[Baidu2018-87] [87]
Arık, Sercan Ö.; Chen, Jitong; Peng, Kainan; Ping, Wei; Zhou, Yanqi (2018), "Neural Voice Cloning with a Few Samples", Advances in Neural Information Processing Systems, 31, arXiv:1802.06006

[BBC2019-88] [88]
"Fake voices 'help cyber-crooks steal cash'". bbc.com. BBC. 2019-07-08. Retrieved 2019-09-11.

[WaPo2019-89] [89]
Drew, Harwell (2019-09-04). "An artificial-intelligence first: Voice-mimicking software reportedly used in a major theft". Washington Post. Retrieved 2019-09-08.

[Thi2016-90] [90]
Thies, Justus (2016). "Face2Face: Real-time Face Capture and Reenactment of RGB Videos". Proc. Computer Vision and Pattern Recognition (CVPR), IEEE. Retrieved 2016-06-18.

[Suw2017-91] [91]
Suwajanakorn, Supasorn; Seitz, Steven; Kemelmacher-Shlizerman, Ira (2017), Synthesizing Obama: Learning Lip Sync from Audio, University of Washington, retrieved 2018-03-02

[Batch042020-92] [92]
Ng, Andrew (2020-04-01). "Voice Cloning for the Masses". deeplearning.ai. The Batch. Archived from the original on 2020-08-07. Retrieved 2020-04-02.

[93] [93]
Brunow, David A.; Cullen, Theresa A. (2021-07-03). "Effect of Text-to-Speech and Human Reader on Listening Comprehension for Students with Learning Disabilities". Computers in the Schools. 38 (3): 214–231. doi:10.1080/07380569.2021.1953362. hdl:11244/316759. ISSN 0738-0569. S2CID 243101945.

[94] [94]
Triandafilidi, Ioanis I.; Tatarnikova, T. M.; Poponin, A. S. (2022-05-30). "Speech Synthesis System for People with Disabilities". 2022 Wave Electronics and its Application in Information and Telecommunication Systems (WECONF). St. Petersburg, Russian Federation: IEEE. pp. 1–5. doi:10.1109/WECONF55058.2022.9803600. ISBN 978-1-6654-7083-4. S2CID 250118756.

[95] [95]
Zhao, Yunxin; Song, Minguang; Yue, Yanghao; Kuruvilla-Dugdale, Mili (2021-07-27). "Personalizing TTS Voices for Progressive Dysarthria". 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). Athens, Greece: IEEE. pp. 1–4. doi:10.1109/BHI50953.2021.9508522. ISBN 978-1-6654-0358-0. S2CID 236982893.

[96] [96]
"Evolution of Reading Machines for the Blind: Haskins Laboratories" Research as a Case History" (PDF). Journal of Rehabilitation Research and Development. 21 (1). 1984.

[97] [97]
"Speech Synthesis Software for Anime Announced". Anime News Network. 2007-05-02. Retrieved 2010-02-17.

[98] [98]
"Code Geass Speech Synthesizer Service Offered in Japan". Animenewsnetwork.com. 2008-09-09. Retrieved 2010-02-17.

[towardds3-99] [99]
Chandraseta, Rionaldi (2021-01-19). "Generate Your Favourite Characters' Voice Lines using Machine Learning". Towards Data Science. Archived from the original on 2021-01-21. Retrieved 2021-01-23.

[automaton2-100] [100]
Kurosawa, Yuki (2021-01-19). "ゲームキャラ音声読み上げソフト「15.ai」公開中。『Undertale』や『Portal』のキャラに好きなセリフを言ってもらえる". AUTOMATON. Archived from the original on 2021-01-19. Retrieved 2021-01-19.

[Denfaminicogamer2-101] [101]
Yoshiyuki, Furushima (2021-01-18). "『Portal』のGLaDOSや『UNDERTALE』のサンズがテキストを読み上げてくれる。文章に込められた感情まで再現することを目指すサービス「15.ai」が話題に". Denfaminicogamer. Archived from the original on 2021-01-18. Retrieved 2021-01-18.

[:162-102] [102]
"Now hear this: Voice cloning AI startup ElevenLabs nabs $19M from a16z and other heavy hitters". VentureBeat. 2023-06-20. Retrieved 2023-07-25.

[103] [103]
"Sztuczna inteligencja czyta głosem Jarosława Kuźniara. Rewolucja w radiu i podcastach". Press.pl (in Polish). April 9, 2023. Retrieved 2023-04-25.

[:13-104] [104]
Ashworth, Boone (April 12, 2023). "AI Can Clone Your Favorite Podcast Host's Voice". Wired. Retrieved 2023-04-25.

[105] [105]
Knibbs, Kate. "Generative AI Podcasts Are Here. Prepare to Be Bored". Wired. ISSN 1059-1028. Retrieved 2023-07-25.

[106] [106]
Suciu, Peter. "Arrested Succession Parody On YouTube Features 'Narration' By AI-Generated Ron Howard". Forbes. Retrieved 2023-07-25.

[107] [107]
Fadulu, Lola (2023-07-06). "Can A.I. Be Funny? This Troupe Thinks So". The New York Times. ISSN 0362-4331. Retrieved 2023-07-25.

[:2-108] [108]
Kanetkar, Riddhi. "Hot AI startup ElevenLabs, founded by ex-Google and Palantir staff, is set to raise $18 million at a $100 million valuation. Check out the 14-slide pitch deck it used for its $2 million pre-seed". Business Insider. Retrieved 2023-07-25.

[:02-109] [109]
"AI-Generated Voice Firm Clamps Down After 4chan Makes Celebrity Voices for Abuse". www.vice.com. January 30, 2023. Retrieved 2023-02-03.

[110] [110]
"Usage of text-to-speech in AI video generation". elai.io. Retrieved 10 August 2022.

[111] [111]
"AI Text to speech for videos". synthesia.io. Retrieved 12 October 2023.

[112] [112]
Bruno, Chelsea A (2014-03-25). Vocal Synthesis and Deep Listening (Master of Music Music thesis). Florida International University. doi:10.25148/etd.fi14040802.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

[78]

[79]

[80]

[81]

[82]

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

[92]

[93]

[94]

[95]

[96]

[97]

[98]

[99]

[100]

Speech_synthesizer

History

Electronic devices

Synthesizer technologies

Concatenation synthesis

Unit selection synthesis

Diphone synthesis

Domain-specific synthesis

Formant synthesis

Articulatory synthesis

HMM-based synthesis

Sinewave synthesis

Deep learning-based synthesis

Audio deepfakes

Challenges

Text normalization challenges

Text-to-phoneme challenges

Evaluation challenges

Prosodics and emotional content

Dedicated hardware

Hardware and software systems

Texas Instruments

Mattel

SAM

Atari

Apple

Amazon

AmigaOS

Microsoft Windows

Votrax

Text-to-speech systems

Android

Internet

Open source

Others

Digital sound-alikes

Speech synthesis markup languages

Applications

Singing synthesis

See also

References

External links

Share this article: