Learn something new every day More Info... by email
A consensus sequence is a set of proteins, or nucleotides in deoxyribonucleic acid (DNA), that appears regularly. The amino acids adenine, guanine, cytosine, and thymine make up genetic sequences in DNA. Molecular biologists often use statistics to predict where a certain sequence of them will be, or to understand where particular molecules tend to bind. Where the amino acid sequences remain the same and where they tend to vary can be represented by a formula. In the case of a consensus promoter sequence, a particular type of enzyme can bind to sites of similarly sequenced proteins.
Geneticists, like researchers in many scientific disciplines, often use substitutions to simplify complex systems. There are so many amino acid bases and genes in the body that scientists cannot count them unless there is some general system for doing so. A consensus sequence can appear in many locations in DNA as well in various living things. The similarities and differences that tend to occur can be indicated by a formula.
Statistically, scientists can classify genetic sequences to look for patterns. Repeating patterns, called sequence motifs, are generally used to represent genetic areas that control specific biological processes. Consensus sequences can also offer insight into how proteins are synthesized or how molecules are guided within a cell.
In the notation of a consensus sequence, the location of some nucleotides can show that they are always in the represented location. It can also be indicated that one nucleotide or another can be there. In this case, how frequently an amino acid appears, in place of another, is generally not stated. A graphical model is sometimes used to indicate this frequency, by increasing or decreasing the size of symbols. Some software programs can generate sequence logos automatically.
Often, a consensus sequence matches up with a recognized protein binding site. To accurately depict sequences on the genome, mathematical formulas are often used. These include statistical formulas such as logarithms and numerical values, which can be positive or negative, to represent the location of genetic information. Processes in the genome for normal biological functions, as well as those related to diseases, can be analyzed in this way.
The mathematical representations of a consensus sequence generally provide a model of DNA and amino acid patterns. An exact picture is typically not provided. The sequences, however, can help scientists relate the functional aspects of different parts of the genome to evolutionary patterns of organisms.