Learn something new every day
More Info... by email
Biclustering is a data mining technique which sorts information into a matrix by assigning the rows and columns of the matrix at the same time. At the core of this technique is efficiency, allowing the computer to sift through and sort a large amount of data in a shorter amount of time compared to single clustering methods. Biclustering is simply a general heading of one particular class of data mining techniques; there are many different algorithms which can fall under this category, including block clustering, the Plaid model, coupled two-way clustering, and interrelated two-way clustering.
To understanding the importance of biclustering, one must first understand the general concept of data mining. Data mining is taking a large pile of data — such as information dumped from a company's main database — and sorting through it to identify trends and other useful patterns. This type of analysis can be used to determine patterns which otherwise would not become evident through casual study, such as consumer purchasing trends and stock market fluctuations. Data mining can be conducted manually by a human analyst, or electronically using an type of data mining algorithm; that is where biclustering comes into play.
During the process of data mining, the computer conducting the analysis will attempt to sort related pieces of information together with one another. This process is known as "clustering." Clustering allows the computer to flex its artificial intelligence by recognizing when two or more pieces of information are related to one another, placing them together in a matrix. Normally, either the rows or the columns of the matrix are filled, but only one at a time.
Biclustering does away with this by limitation by enabling the computer to fill both the rows and the columns at the same time. This improves the efficiency of the clustering process, but can result in differently-arranged matrices depending on the particular algorithm being used. For example, a computer arranging things with constant matching values in rows versus one arranging things with constant matching values placed in columns will generate different looking matrices using the exact same values. There is no one "right" way to cluster the data; it all depends on the particular situation and preferences of the individual conducting the data mining.