Internet

Fact-checked

What is a Data Mining Classification?

Emma G.

Last Modified Date: February 27, 2024

Data mining classification is one step in the process of data mining. It is used to group items based on certain key characteristics. There are several techniques used for data mining classification, including nearest neighbor classification, decision tree learning, and support vector machines.

Data mining is a method researchers use to extract patterns from data. Generally a representative sample is chosen from the pool of data and then manipulated and analyzed to find patterns. In addition to data mining classification, researchers may also use clustering, regression, and rule learning to analyze the data.

There are several algorithms that can be used in data mining classification. Nearest neighbor classification is one of the simplest of the data mining classification algorithms. It relies on a training set. A training set is a set of data used to train the computer into paying attention to certain variables. In nearest neighbor classification, the computer simply classifies all data as part of the group that contains data closest in value to the input.

Decision tree learning uses a branching model to classify the data. The computer basically asks a series of questions about the data. If the answer to the first question is true, it asks question 2a. If the answer is false, it asks question 2b. When drawn out, this method forms a tree of branching paths.

Naive Bayes classification relies on probability. It asks a series of questions about each piece of data and then uses the answers to determine the probability that the data belong in a particular classification. This is different from decision tree learning because the answer to the first question does not influence which question will be asked next.

More complicated methods of data mining classification include neural networks and support vector machines. These methods are computer-based models that would be difficult to do by hand. Neural networks is often used in artificial intelligence programming because it mimics the human brain. It filters information through a series of nodes that find patterns and then classify the information.

Support vector machines use training samples to build a model that will classify information, usually visualized as a scatter plot with a wide space between categories. When new information is fed into the machine, it is plotted on the graph. The data are then classified based on which category the information falls closest to on the graph. This method works only when there are two options to choose from.

AS FEATURED ON:

Discussion Comments

hamje32

yesterday

@nony - All I know about neural networks is that they enable a computer to “learn.” As a neural network receives inputs it begins to notice, if you will, patterns about the data, and that enables it to learn.

I can certainly see how this would be useful in both machine learning and data mining. After all what are we trying to do with data mining? We are trying to turn data into information.

So this involves some learning, doesn’t it? I think computers are better at noticing patterns than we are. But I can’t say I’ve seen neural networks in use, only that it makes sense in principle.

nony

December 23, 2011

@Charred - Has anyone ever seen how these neural networks actually work in relation to data mining? I’ve never really understood neural networks and can’t wrap my head around how they would work with data mining.

Frankly, it sounds like a bit of overkill. I’ve heard of neural networks being used in artificial intelligence and computer game engines so I don’t understand how they would fit into the data mining methodologies.

Charred

December 23, 2011

@allenJo - I would think that the nearest neighbor algorithm would be the easiest of the data mining concepts to work with. My understanding is that with this algorithm you are just taking a record and comparing it with other records. As a result you find sets of related data and then you can cluster them.

At a company I worked for one of the reporting people used this method for his analysis and he produced clustering diagrams which showed the related information. It was interesting to look at and seemed to produce meaningful information.

Also, I think from a programming perspective it was the easiest of the algorithms since it just involved a simple comparison. The only problem I saw with it was it would do well for small data sets but it would be impractical for larger data sets, where you were for example comparing thousands upon thousands of records.

allenJo

December 22, 2011

I’ve worked with data mining tools in the past to find patterns in mountains of customer sales and revenue data. I find the Naive Bayes classification to be the most interesting and useful tool in my opinion.

It’s like playing a game of 21 questions where you ask one question after another, whereby you gradually limit the number of possible answers for what kind of data you’re looking at. The final answer in this case would be the classification of what the data might belong to.

The difference of course is that you don’t ask 21 questions – you might ask more or less, depending on the data that you’re working with.