What is Web Data Mining?

More than ever, entities and individuals alike are using the World Wide Web to conduct a host of business and personal transactions. As a result, companies are increasingly employing Web data mining tools and techniques in order to find ways to improve their bottom lines and grow their customer base. Web data mining involves the process of collecting and summarizing data from a Web site’s hyperlink structure, page content, or usage log in order to identify patterns. Using Web data mining, a company can identify a potential competitor, improve customer service, or target customer needs and expectations. A government agency may also seek to uncover terrorist threats or other criminal activities through the use of a Web data mining application.

Some common Web data mining techniques include Web content mining, Web usage mining, and Web structure mining. Web content mining examines the subject matter of a Web site. For example, Web content miners may analyze a site's audio, text, images, and video features. Web content miners typically focus on a site’s textual information more than other site features. Natural language processing and information retrieval are two data mining techniques often used by Web content miners.


Web usage mining is usually an automated process whereby Web servers collect and report user access patterns in server access logs. A company may, for example, use a Web usage data mining tool to report on server access logs and user registration information in order to create a more effective Web site structure. Web structure mining studies the node and connection structure of Web sites. It can be useful in identifying similarities and relationships that exist among different Web sites. Web structure mining often involves uncovering patterns from hyperlinks or pulling out document structures on a Web page.

Two general data mining techniques that can be employed by Web data miners are data mining association analysis and data mining regression. Data mining association analysis helps uncover noteworthy relationships buried in large data sets. Data mining regression is a statistical technique whereby mathematical formulas are used to predict future results, such as profit margins, house values, or sales figures.

Data mining software vendors offer Web data mining tools that can pull out predictive information from large quantities of data. Businesses often use these software mining tools to analyze specific data sets regarding consumer behavior. Using the results of the data analysis, companies are able to forecast future business trends.


Discuss this Article

Post 5

One way to think about data mining is to think about what it would be like to be under the same kind of surveillance in real life.

You walk into a store and they immediately know who you are, where you live and what you bought last time.

As you walk around the store a record is kept of ever item you look at, every size that you choose and how long you linger in certain places.

When you pay all of your data is stored and kept on file for next time as if they just kept your credit card behind the counter. All of this adds up to a really weird experience. Data mining is not an innocent thing.

Post 4

After reading this article I am a little freaked out. I don't normally worry about this kind of stuff, but thinking about someone tracking everything I see and do online just gives me a weird feeling.

So I am wondering if there is a way to stop data mining or to hide your identity when you are online. I am not a big tech girl but this seems like something that would probably exist. maybe I can download a program or avoid visiting certain sites.

Any tips you guys have would be great. I love the internet. I want to feel safe and comfortable when I am online.

Post 3

Data mining has lots of issues related to it that go beyond just privacy concerns. There is the very real chance that data mining will lead to a more limited internet experience where you simply walk down a path predetermined for you by the worlds biggest companies.

The more you use the internet and the more data that is mined, the more that the content you see gets specifically tailored to you. This effects the ads you see, the songs that get recommended, the news you read and the sites that get linked.

This is all fine and good to a point, but eventually you will end up with nothing, hearing only your own views, watching only movies that are just like the ones you like. You become a robot. I know this sounds kind of wacky, but in the future there are real reasons to be concerned.

Post 2

@JessicaLynn - I don't see what's so creepy about it. Companies are just trying to market products to you that you might actually want to buy!

I think it's actually pretty ingenious. Simply by using web site data mining, companies have access to a whole bunch of free information. They can use this information to redesign their websites, or amp up their marketing. The possibilities are endless and the data is totally free to the company!

Post 1

I think web data mining is just a little bit creepy. I know certain search engines compile information about searching habits, for example.

One search engine I use a lot is also responsible for a lot of the ads you see on websites. Imagine my surprise when I noticed the ads I was being shown were relevant to websites I visit a lot or something I had recently searched for!

I know it's not that big of a deal, but it still feels a little "big brother is watching" if you know what I mean.

