Software

Fact-checked

What Is Lossless Data Compression?

Ray Hawk

Last Modified Date: January 27, 2024

Lossless data compression is a computer method of storing files and combining them into archives that takes up less physical space in memory than the files would otherwise without losing any information the data contains in the process. Lossy compression, by contrast, reduces file size with approximations of the data, and restoration is of a close facsimile to the original file contents. Algorithms used for lossless data compression are essentially a set of streamlined rules or instructions for encoding the information using fewer bits of memory while still retaining the ability to restore the data to its original format without alteration.

Some common file types that use lossless data compression include the International Business Machines (IBM) computer-based zip and Unix computer-based gzip file archives. Also used are image file formats such as the graphic interchange format (GIF), portable network graphics (PNG), and Bitmap (BMP) files. Data compression algorithms also vary based on the file type being compressed, with common variations for text, audio, and executable program files.

Lossless data compression is a computer method of compressing files without losing any information the data contains in the process.

The two main categories of algorithms for lossless data compression are based on a statistical model of input data and a mapping model of bit strings in a data file. Routine statistical algorithms used are the Burrows-Wheeler transform (BWT), the Abraham Lempel and Jacob Ziv (LZ77) algorithm published in 1977, and the Prediction by Partial Matching (PPM) method. Mapping algorithms frequently employed include the Huffman coding algorithm and Arithmetic coding.

Some of the algorithms are open source tools and others are proprietary and patented, though patents on some have also now expired. This may result in compression methods sometimes being applied to the wrong file format. Due to the fact that certain data compression methods are incompatible with each other, storing mixed files can often degrade a component of a file. For instance, an image file with text that is compressed can show degradation in the readability of the text once restored. Scanners and software that employ grammar induction can extract meaning from text stored along with image files by applying what is known as latent semantic analysis (LSA).

Another form of mapping algorithm method for lossless data compression is the use of universal code. More flexible to use than Huffman coding, it doesn't require knowledge of maximum integer values ahead of time. Huffman coding and Arithmetic coding do produce better data compression rates, however. Efforts are also underway to produce universal data compression methods that would create algorithms that work well for a variety of sources.

AS FEATURED ON:

Discussion Comments

MrMoody

December 25, 2011

@Mammmood - One thing I can tell you is that some file formats require lossless compression while others don’t. For example I mess around with video editing applications and sometimes I do editing of the audio separately.

For most audio I can choose lossy compression – in other words, I lose some data but it’s not a big deal. This results in a decent quality sound file that isn’t too big.

However, if I want crystal clear sound I go with lossless audio. There is a tradeoff however. I get a bigger sound file. So these are things you have to keep in mind.

Mammmood

December 24, 2011

@SkyWhisperer - I’m a programmer but I’ve never bothered to learn the technical details of how the different compression algorithms work. I had a general idea but didn’t know the specifics.

However I have used compression tools in my applications. I found open source compression libraries that I basically plug into my applications and they work just fine. I need to learn a few function calls to make them work.

I guess you could say the whole thing is a black box to me, but the article is quite useful in explaining the different algorithms that are out there.

SkyWhisperer

December 24, 2011

@Charred - Have you ever had a chance to look at how big a bitmap file is? These things are huge! There is not a lot compression going on there. Bitmaps are not considered lossy data – basically, you need every pixel restored to its original format.

I asked someone to explain to me how Huffman data compression works. Basically it looks for repetition among characters and then assigns codes to pairs of characters with their repetition values.

That’s the basics of it from what I understand. It is then able to squeeze more data in a little less space and then extract it when needed to its original size.

Charred

December 23, 2011

Lossless file compression is one of the biggest contributions to computer science in my opinion, especially during the early days of the Internet revolution.

Before broadband came along we were all chugging along on dial up modems at a low baud rate. There would have been no way for me to download things like Doom or other game applications if the file had not been compressed.

It would have to be lossless data compression too, otherwise the file simply wouldn’t work when it was extracted. Nowadays with broadband you have more speed but you still need the compression so you can bundle the files together into a package for a single download.