What is a Web Crawler?

Heather Kaefer

A web crawler is a relatively simple automated program, or script, that methodically scans or "crawls" through Internet pages to create an index of the data it's looking for; these programs are usually made to be used only once, but they can be programmed for long-term usage as well. There are several uses for the program, perhaps the most popular being search engines using it to provide webs surfers with relevant websites. Other users include linguists and market researchers, or anyone trying to search information from the Internet in an organized manner. Alternative names for a web crawler include web spider, web robot, bot, crawler, and automatic indexer. Crawler programs can be purchased on the Internet, or from many companies that sell computer software, and the programs can be downloaded to most computers.

Web crawlers and other similar technologies use algorithms, complex mathematical equations, which are the keys to producing targeted results in searches.
Web crawlers and other similar technologies use algorithms, complex mathematical equations, which are the keys to producing targeted results in searches.

Common Uses

There are various uses for web crawlers, but essentially a web crawler may be used by anyone seeking to collect information out on the Internet. Search engines frequently use web crawlers to collect information about what is available on public web pages. Their primary purpose is to collect data so that when Internet surfers enter a search term on their site, they can quickly provide the surfer with relevant web sites. Linguists may use a web crawler to perform a textual analysis; that is, they may comb the Internet to determine what words are commonly used today. Market researchers may use a web crawler to determine and assess trends in a given market.

Get started

Want to automatically save money while you shop online?

Join 3 million Wikibuy users who have found 
$70 million in savings over the last year.

Wikibuy compensates us when you install Wikibuy using the links we provided.
Web crawlers scan through Internet pages to create an index of data.
Web crawlers scan through Internet pages to create an index of data.

Web crawling is an important method for collecting data on, and keeping up with, the rapidly expanding Internet. A vast number of web pages are continually being added every day, and information is constantly changing. A web crawler is a way for the search engines and other users to regularly ensure that their databases are up-to-date. There are numerous illegal uses of web crawlers as well such as hacking a server for more information than is freely given.

How it Works

When a search engine's web crawler visits a web page, it "reads" the visible text, the hyperlinks, and the content of the various tags used in the site, such as keyword rich meta tags. Using the information gathered from the crawler, a search engine will then determine what the site is about and index the information. The website is then included in the search engine's database and its page ranking process.

Web crawlers may operate one time only, say for a particular one-time project. If its purpose is for something long-term, as is the case with search engines, web crawlers may be programed to comb through the Internet periodically to determine whether there has been any significant changes. If a site is experiencing heavy traffic or technical difficulties, the spider may be programmed to note that and revisit the site again, hopefully after the technical issues have subsided.

Web crawlers can be operated for a particular one-time project.
Web crawlers can be operated for a particular one-time project.

You might also Like

Discussion Comments

anon991693

How do I create my own web-spider?

vittu

what is the link between a webcrawler and a master which assigns map tasks to the mappers?

anon108933

I want to set up a website where I want to have the information from various sites mentioned there - more so that my web site becomes a reference point - more akin to online news. Can i do this with a web crawler? if so how?

anon107578

If you're interested in web crawling, you should try 80legs. They have free web crawling available, but you can buy some more powerful services for decent prices.

anon106962

Yeah, there are several third party web crawlers you can use to crawl sites and gather data. 80legs is a good one - free plan lets you crawl 100,000 pages free and more options avail. Mozenda is pricey (5,000 pages for $99), but it's got a nice user interface tool.

We use these to crawl some sites as part of our business strategy at work. Some techies from our development group got us started with them.

anon104280

what are the basic differences in google search and web crawler?

cvpkarthik

what is a crawler? Please give me a idea. where is it used? programming?

stellabiz

Dhananjay- I'm running a small business and using a web crawler called Mozenda for data gathering and marketing research. It's really simple and not very expensive. I think it can be used for everything  from extensive data mining for corporations to personal use (comparison shopping or researching colleges etc). I'm actually a bit addicted to it.

anon73138

does anyone know what blp_bbot is?

anon69510

There are some third-party services for web crawling.

anon66011

is a web crawler used to download complete sites automatically? and can be read offline? please reply soon. it's urgent.

Dhananjay

Which are actual users of Web crawlers other than search engines? What are the uses of the web crawler in day to day Internet surfing?

anon55873

How do they index the data? i'm sure one is necessary.

anon52425

Very well written. :)

anon49895

Depending on whether or not the e-Mail supports HTML formatting, you could always try doing this:

Send Mail

You can alter the subject as well. Just change "hello" to whatever you'd like, and "again" to whatever you like. the %20 represents the code for initiating a 'space'. So if you would like it to say something like: E-mail to the webmaster, it would be subject=E-mail%20to%20the%20webmaster. best of luck.

bettylou

I have a webstore. I just learned how to do a signature in my e-mails so my webstore is at the bottom. But it is not blue in color like most e-mail links are. Can someone tell me what I need to do, so people can just click on the signature and get to the website?

Thank you, Betty

Post your comments
Login:
Forgot password?
Register: