Voice recognition can refer to one of two types of computer science: forensic voice identification or speech-to-text capability. This article addresses the latter definition.
Voice recognition, or speech recognition in this case, is a computer technology that utilizes audio input for entering data rather than a keyboard. Speaking into a microphone, for example, produces the same result as typing words manually with a keyboard. Simply stated, voice recognition software is designed with an internal database of recognizable words or phrases. The program matches the audio signature of speech with corresponding entries in the database.
Though turning speech into text might sound easy, it is an extremely difficult task. The problem lies in the virtually infinite array of individual speech patterns and accents, compounded by the natural human tendency to run words together.
An illustration of the inherent challenges of voice recognition software appears on a T-shirt created by Apple researchers. The shirt reads, "I helped Apple wreck a nice beach." When spoken aloud, it sounds like, I helped Apple recognize speech.
Various models of voice recognition software are used for an array of applications, from personal dictation to commercial automated call routing, from aiding the disabled to sports and news event subtitling. Each model behaves differently and has its own capabilities and boundaries.
Voice recognition programs that require the user to "train" the software to recognize their particular stylized patterns of speech are called speaker dependent systems. Individuals commonly use these types of programs at home or at the office. Email, memos, letters, data and text can be input by speaking into a microphone.
Some voice recognition systems, called discrete speech systems, require the user to speak clearly and slowly and to separate words. Continuous speech systems are designed to understand a more natural mode of speaking.
Discrete speech voice recognition systems are widely used for customer service routing. The system is speaker independent, but understands only a small pool of words or phrases. The caller is given a choice to answer a question, usually with "yes" or "no." After receiving an answer, the system escalates the caller to the next level. If the caller replies with a unique answer, the automated response is usually, "Sorry, I didn't understand you; please try again," with a repeat of the question and available answers. This type of voice recognition is also referred to as grammar constrained recognition.
Continuous speech is a more sophisticated form of voice recognition software, wherein the caller can speak naturally to explain a problem or request a service. This program is designed to pick out key words or phrases and make a statistical best-guess as to what the customer wants. Speaking plainly aids voice recognition in identifying the need. This type of system has a far more intensive database than discreet speech systems and is also referred to as natural language recognition.
Automatic Speech Recognition (ASR) is a model of voice recognition designed for dictation. This software differs from previous models in that it does not strive to understand what is being said, only to identify the words spoken. Since many words in the English language sound alike, mistakes are easily made. However, major companies like Microsoft are investing in voice recognition, and Bill Gates' own prediction has ASR understanding continuous speech by the year 2011. ASR software is often found on digital voice recorders.
Dominant players in voice recognition software have been ScanSoft and Nuance, with the former company acquiring the latter. Smaller players include Fonix Speech, Aculab and Verbio, among others, with major corporations like IBM and the aforementioned Microsoft also investing in the technology. Though many still feel it is more trouble to train software and correct mistakes than to simply use a keyboard, a time is coming when voice recognition software will likely close that gap. Augmenting keyboards with the discriminate ability to use speech will probably become commonplace.
Voice recognition software is gaining popularity as it becomes more sophisticated. It is especially useful in business where it can replace a live operator to funnel calls, disseminate information, take orders and perform other highly useful functions. However, it is also gaining favor as a desktop application, helped along by renowned software like ScanSoft's, DragonNaturallySpeaking and IBM's ViaVoice.
|
SauteePan
Post 3 |
@Bhutan - I agree and find that that is the main problem with the voice recognition software. However, once the software becomes accustomed to your voice by recording your speech patterns it can make this software a real blessing.
It can save you so much time and I know for me my productivity doubles, but I still have to proofread the text because occasionally it can make a mistake.
It is a good idea to take a look at the voice recognition software reviews before you buy one of these programs because some have extra features than others have and they do not all cost the same. |
|
Bhutan
Post 2 |
@Anon63415 -I really don’t know the answer to your question, but I did want to say that I have used voice recognition software and for the most part it can save you time by simply using the voice recognition microphone and dictating what you would like system to transcribe. The problem is in the typing. Sometimes even the best voice recognition software will get confused with words that sound the same and if you don’t keep an eye on the transcription as the words are printing on the screen it may misuse words and make your paragraph not make sense. It takes a while for the voice recognition software to properly articulate your words correctly which is the biggest frustration with the software. |
|
anon63415
Post 1 |
What came first? the T.V. program Star Trek which had heads up displays or the technology?
|