Lossless data compression is a computer method of storing files and combining them into archives that takes up less physical space in memory than the files would otherwise without losing any information the data contains in the process. Lossy compression, by contrast, reduces file size with approximations of the data, and restoration is of a close facsimile to the original file contents. Algorithms used for lossless data compression are essentially a set of streamlined rules or instructions for encoding the information using fewer bits of memory while still retaining the ability to restore the data to its original format without alteration.
Some common file types that use lossless data compression include the International Business Machines (IBM) computer-based zip and Unix computer-based gzip file archives. Also used are image file formats such as the graphic interchange format (GIF), portable network graphics (PNG), and Bitmap (BMP) files. Data compression algorithms also vary based on the file type being compressed, with common variations for text, audio, and executable program files.
The two main categories of algorithms for lossless data compression are based on a statistical model of input data and a mapping model of bit strings in a data file. Routine statistical algorithms used are the Burrows-Wheeler transform (BWT), the Abraham Lempel and Jacob Ziv (LZ77) algorithm published in 1977, and the Prediction by Partial Matching (PPM) method. Mapping algorithms frequently employed include the Huffman coding algorithm and Arithmetic coding.
Some of the algorithms are open source tools and others are proprietary and patented, though patents on some have also now expired. This may result in compression methods sometimes being applied to the wrong file format. Due to the fact that certain data compression methods are incompatible with each other, storing mixed files can often degrade a component of a file. For instance, an image file with text that is compressed can show degradation in the readability of the text once restored. Scanners and software that employ grammar induction can extract meaning from text stored along with image files by applying what is known as latent semantic analysis (LSA).
Another form of mapping algorithm method for lossless data compression is the use of universal code. More flexible to use than Huffman coding, it doesn't require knowledge of maximum integer values ahead of time. Huffman coding and Arithmetic coding do produce better data compression rates, however. Efforts are also underway to produce universal data compression methods that would create algorithms that work well for a variety of sources.