Technology

Fact-checked

What is Voice Recognition?

R. Kayne

Last Modified Date: February 17, 2024

Voice recognition, or speech recognition, is a computer technology that utilizes audio input for entering data rather than a keyboard. Speaking into a microphone, for example, produces the same result as typing words manually with a keyboard. Simply stated, voice recognition software is designed with an internal database of recognizable words or phrases. The program matches the audio signature of speech with corresponding entries in the database.

Though turning speech into text might sound easy, it is an extremely difficult task. The problem lies in the virtually infinite array of individual speech patterns and accents, compounded by the natural human tendency to run words together.

Voice recognition software for a computer allows a user to speak into a microphone to audio input information rather than typing.

Various models of speech recognition software are used for an array of applications, from personal dictation to commercial automated call routing, from aiding the disabled to sports and news event subtitling. Each model behaves differently and has its own capabilities and boundaries.

Voice recognition programs that require the user to "train" the software to recognize their particular stylized patterns of speech are called speaker dependent systems. Individuals commonly use these types of programs at home or at the office. Email, memos, letters, data and text can be input by speaking into a microphone.

Some voice recognition systems, called discrete speech systems, require the user to speak clearly and slowly and to separate words. Continuous speech systems are designed to understand a more natural mode of speaking.

Smartphones are equipped with voice recognition software that can be used to speak commands and instructions.

Discrete speech systems are widely used for customer service routing. The system is speaker independent, but understands only a small pool of words or phrases. The caller is given a choice to answer a question, usually with "yes" or "no." After receiving an answer, the system escalates the caller to the next level. If the caller replies with a unique answer, the automated response is usually, "Sorry, I didn't understand you; please try again," with a repeat of the question and available answers. This type of voice recognition is also referred to as grammar constrained recognition.

Continuous speech is a more sophisticated form of voice recognition software, wherein the caller can speak naturally to explain a problem or request a service. This program is designed to pick out key words or phrases and make a statistical best-guess as to what the customer wants. Speaking plainly aids the program in identifying the need. This type of system has a far more intensive database than discreet speech systems and is also referred to as natural language recognition.

Automatic Speech Recognition (ASR) is a model of voice recognition designed for dictation. This software differs from previous models in that it does not strive to understand what is being said, only to identify the words spoken. Since many words in the English language sound alike, mistakes are easily made. ASR software is often found on digital voice recorders.

AS FEATURED ON:

Discussion Comments

SauteePan

April 18, 2011

@Bhutan - I agree and find that that is the main problem with the voice recognition software. However, once the software becomes accustomed to your voice by recording your speech patterns it can make this software a real blessing.

It can save you so much time and I know for me my productivity doubles, but I still have to proofread the text because occasionally it can make a mistake.

It is a good idea to take a look at the voice recognition software reviews before you buy one of these programs because some have extra features than others have and they do not all cost the same.

Bhutan

April 17, 2011

@Anon63415 -I really don’t know the answer to your question, but I did want to say that I have used voice recognition software and for the most part it can save you time by simply using the voice recognition microphone and dictating what you would like system to transcribe.

The problem is in the typing. Sometimes even the best voice recognition software will get confused with words that sound the same and if you don’t keep an eye on the transcription as the words are printing on the screen it may misuse words and make your paragraph not make sense.

It takes a while for the voice recognition software to properly articulate your words correctly which is the biggest frustration with the software.

anon63415

February 1, 2010

What came first? the T.V. program Star Trek which had heads up displays or the technology?