Security

Fact-checked

What is a Hash Function?

M. McGee

Last Modified Date: February 29, 2024

A hash function is a method of computer error checking and data organization. A large amount of data is manipulated with a mathematical algorithm until a small number is left. This number is used as part of the catalog that allows a computer to find that specific piece of information later. A good hash function should give a small enough result so it is easy to use, but supply a unique result for every data set. A hash function also provides minimal error checking, as a corrupted and a good piece of data should yield different results when hashed.

In a computer database, it is typically easier to save locations with numbers rather than letters. Digits have a much greater number of methods for organization and sorting than letters. As a result, numbers are often assigned to locations containing variable information within a computer’s database. These numbers may be arbitrary or representative of the information.

Arbitrary numbers are simply assigned based on position in the computer’s memory or the order in which the data was saved. Saving information this way is common in smaller databases or in places in which the data doesn’t change very often. When used in other areas, re-indexing the database begins to take more and more time until it is no longer efficient.

Representative information is where the hash function comes in. The information, regardless of what it contains, is translated into numbers. These numbers are fed into a mathematical construct that outputs a small number, typically an integer. If the hash function is working properly, every location in that part of the database will have its own unique result. If two or more locations have the same result, programs could bring up the wrong information based on the duplicated hash.

It is possible to use a hash function for other things as well. Large amounts of highly repetitive data can be broken down into smaller values. This is especially nice when looking for repeated sequences in large data sets. For instance, deoxyribonucleic acid (DNA) is made up of a very small number of different components. When breaking those components down using hash values, places where two strings of DNA are the same and different become very clear, simply from comparing two small columns of numbers.

The last area in which hash functions are useful is in error checking. When information is hashed initially, the value is recorded as part of the location’s index. If that information is needed later, the information is retrieved along with that value. If the program rehashes the information and the result is different, then a corruption occurred at some point. This corruption is usually with the data, as a hash corruption would have prevented the data from being retrieved in the first place.