Technology

Fact-checked

What is Speech Synthesis?

Kristina Choi

Last Modified Date: February 04, 2024

Speech synthesis is a process where verbal communication is replicated through an artificial device. A computer that converts text to speech is one kind of speech synthesizer.

The earliest forms of speech synthesis were implemented through machines designed to function like the human vocal tract. The speaking machine created by Wolfgang von Kempelen in the 1700’s is an example. With this device, speech was produced through a kitchen bellow, a bagpipe reed and a clarinet bell. The kitchen bellow was designed to act like a lung, while the glottis (the area of the vocal cords) was represented through the bagpipe reed. The clarinet bell served as the mouth.

Operation of the device was completely manual. The right hand controlled a series of levers while the left hand manipulated the clarinet bell (mouth). There was also the option of plugging the ‘nostrils’, to produce a less nasal sound. Either way, as long as the basic controls were properly used, the machine received airflow. This airflow determined the types of sounds that would be produced.

Subsequent speaking machines throughout the 18th and 19th centuries maintained this setup, though there were improvements. For example, in the late 1800s, Joseph Faber created a speaking machine that could receive input through a keyboard and a pedal. The machine was also very creative, as the sound came out through an artificial ‘face.’

When the 20th century came around, innovations in electronics allowed speech synthesis to take an even more powerful direction. Although the premise of imitating the human vocal tract was still the same, early 20th century speaking machines could produce better sounds since the input was more precise.

However, it wasn’t until the advent of computers that speech synthesis could actually be used outside of the entertainment arena. This is mainly because speech synthesizers could be stored in software instead of a separate machine. Additionally, with computers as an aid, speech synthesis could take on a different form; using human voices as the main source for sound.

This form of speech synthesis is known as concatenative. The process works by connecting various recordings of human speech. The resulting sound is much more natural and pleasing to the ear. This is in contrast to programs that use articulatory synthesis, where speech is replicated through a computerized model of the vocal tract.

Commercial speech synthesizers can employ either concatenative or articulatory methods, but both are able to achieve the same objective; being able to give people an opportunity to hear text. This is especially helpful in situations where reading is obtrusive or impossible.

In the business world, such situations are very common, especially for telephone transactions. Without text-to-speech (TTS) alternatives, business owners would have to spend money hiring even more customer service personnel. Synthesized solutions avoid this problem, since everything is done by computer; not a human being.

Synthesized speech also plays a role in daily life, especially for individuals who are disabled. Talking clocks, dictionaries and other devices can make things easier for people who have trouble seeing or reading. Synthesized speech is even able to give a voice to individuals who couldn’t speak at all. Steven Hawking, a famous physicist, is a prominent example. Since Lou Gehrig’s disease has rendered him mute, Hawking uses a voice synthesizer to communicate with people.

There are also TTS applications available to help assist people with various computer activities. To obtain these types of applications, most users will have to buy separate software or download patches. The latter option is usually free, depending on the operating system or word processing program being used. However, if a person decides to buy separate software, they could have access to a higher-quality system. Specific examples can be seen through Natural Reader 7 and Text Aloud 2.

Ultimately, speech synthesis is technology that has revolutionized how mankind communicates. In a sense it gives text a life of its own. It also gives the world an opportunity to hear the thoughts of brilliant individuals who would have normally been voiceless.