Technology

Fact-checked

What is Speech Compression?

Mary McMahon

Last Modified Date: February 17, 2024

Speech compression involves the compression of audio data in the form of speech. Speech is a somewhat unique form of audio data, with a number of needs which must be addressed during compression to ensure that it will be intelligible and reasonably pleasant to listen to. A number of software programs have been designed specifically with speech compression in mind, including programs which can perform additional functions such as encrypting the compressed data for security.

Raw audio data can take up a great deal of memory. During compression, the data is compressed so that it will occupy less space. This frees up room in storage, and it also becomes important when data is being transmitted over a network. On a mobile phone network, for example, if speech compression is used, more users can be accommodated at a given time because less bandwidth is needed. Likewise, speech compression becomes important with teleconferencing and other applications; sending data is expensive, and anything which reduces the volume of data which needs to be sent can help to cut costs.

Speech is a relatively simple and widely studied type of audio data, which makes it easy to compress in some ways. However, it is important to ensure that compression retains the integrity of the speech. If the data becomes distorted in some way, it can be difficult to understand, and it can also be hard to listen to. Thus, speech compression needs to be performed in a way which retains the key qualities of the data. It is easy for speech to song “wrong” to a listener, interfering with understanding of the transmitted data.

Programs which handle the creation of audio files may have a compression option available. After recording or generating the raw audio file, people can choose between a number of parameters to get the file compressed to a more manageable size. Speech compression can also be done on the fly, as when people use cell phones and the network compresses the data while generating a data signal so that people can talk in real time.

If the data also needs to be encrypted, this may be done in real time or in a second pass which encrypts the compressed data. In this case, someone who wants to hear the speech will need to decrypt the data and run it through a program, which may be embedded into a piece of equipment such as a secured phone, which is capable of reading compressed data.

Mary McMahon

Ever since she began contributing to the site several years ago, Mary has embraced the exciting challenge of being a EasyTechJunkie researcher and writer. Mary has a liberal arts degree from Goddard College and spends her free time reading, cooking, and exploring the great outdoors.

Learn more...

Mary McMahon

Learn more...

AS FEATURED ON:

Discussion Comments

nony

September 5, 2011

@hamje32 - I know that there are a lot of algorithms out there to compress speech. While I can appreciate that the manner of compression is similar to other forms of compression like video compression, I think that speech compression is easier.

Simply, audio doesn’t take up as much space as video. I realize that for Internet phone calls you still need to compress it, but still, even with dial up connections that offered limited bandwidth I was able to make VOIP calls. I could not, by contrast, easily stream video until I got broadband Internet.

NathanG

September 4, 2011

@hamje32 - I think what you’ve described is what they call lossless compression. Basically, they make a file smaller without losing any of the original data.

I believe this would be necessary with speech compression, otherwise parts of your conversation would cut off; this would be unacceptable.

With video or image compression however you can be permitted to have some loss of the image and still have acceptable quality in the end result.

hamje32

September 4, 2011

I believe one thing that makes speech compression more effective is the rise of digital communications. This is where they take your speech and convert it to digital form.

In digital form, it becomes nothing more than a sequence of numbers. These sequences of numbers may display various patterns, and the speech coding algorithm uses these patterns to aid in its compression.

That’s why VOIP is so much better quality nowadays than it was in the early days. I have VOIP, and I can tell you that the concept is quite simple. They take your voice signal, convert it to digital format, and then chop it up and reassemble it on the other end.

They can compress it to so that they have less information to transmit. Once reassembled, it’s decoded and in the same format as it was to begin with.