Software

Fact-checked

What is a Web Crawler?

Heather Kaefer

Last Modified Date: January 27, 2024

A web crawler is a relatively simple automated program, or script, that methodically scans or "crawls" through Internet pages to create an index of the data it's looking for; these programs are usually made to be used only once, but they can be programmed for long-term usage as well. There are several uses for the program, perhaps the most popular being search engines using it to provide webs surfers with relevant websites. Other users include linguists and market researchers, or anyone trying to search information from the Internet in an organized manner. Alternative names for a web crawler include web spider, web robot, bot, crawler, and automatic indexer. Crawler programs can be purchased on the Internet, or from many companies that sell computer software, and the programs can be downloaded to most computers.

Common Uses

Web crawlers and other similar technologies use algorithms, complex mathematical equations, which are the keys to producing targeted results in searches.

There are various uses for web crawlers, but essentially a web crawler may be used by anyone seeking to collect information out on the Internet. Search engines frequently use web crawlers to collect information about what is available on public web pages. Their primary purpose is to collect data so that when Internet surfers enter a search term on their site, they can quickly provide the surfer with relevant web sites. Linguists may use a web crawler to perform a textual analysis; that is, they may comb the Internet to determine what words are commonly used today. Market researchers may use a web crawler to determine and assess trends in a given market.

Web crawlers scan through Internet pages to create an index of data.

Web crawling is an important method for collecting data on, and keeping up with, the rapidly expanding Internet. A vast number of web pages are continually being added every day, and information is constantly changing. A web crawler is a way for the search engines and other users to regularly ensure that their databases are up-to-date. There are numerous illegal uses of web crawlers as well such as hacking a server for more information than is freely given.

How it Works

Web crawlers can be operated for a particular one-time project.

When a search engine's web crawler visits a web page, it "reads" the visible text, the hyperlinks, and the content of the various tags used in the site, such as keyword rich meta tags. Using the information gathered from the crawler, a search engine will then determine what the site is about and index the information. The website is then included in the search engine's database and its page ranking process.

Web crawlers may operate one time only, say for a particular one-time project. If its purpose is for something long-term, as is the case with search engines, web crawlers may be programed to comb through the Internet periodically to determine whether there has been any significant changes. If a site is experiencing heavy traffic or technical difficulties, the spider may be programmed to note that and revisit the site again, hopefully after the technical issues have subsided.

AS FEATURED ON:

Discussion Comments

anon991693

July 9, 2015

How do I create my own web-spider?

vittu

September 6, 2011

what is the link between a webcrawler and a master which assigns map tasks to the mappers?

anon108933

September 5, 2010

I want to set up a website where I want to have the information from various sites mentioned there - more so that my web site becomes a reference point - more akin to online news. Can i do this with a web crawler? if so how?

anon107578

August 30, 2010

If you're interested in web crawling, you should try 80legs. They have free web crawling available, but you can buy some more powerful services for decent prices.

anon106962

August 27, 2010

Yeah, there are several third party web crawlers you can use to crawl sites and gather data. 80legs is a good one - free plan lets you crawl 100,000 pages free and more options avail. Mozenda is pricey (5,000 pages for $99), but it's got a nice user interface tool.

We use these to crawl some sites as part of our business strategy at work. Some techies from our development group got us started with them.

anon104280

August 16, 2010

what are the basic differences in google search and web crawler?

cvpkarthik

July 20, 2010

what is a crawler? Please give me a idea. where is it used? programming?

stellabiz

May 24, 2010

Dhananjay- I'm running a small business and using a web crawler called Mozenda for data gathering and marketing research. It's really simple and not very expensive. I think it can be used for everything from extensive data mining for corporations to personal use (comparison shopping or researching colleges etc). I'm actually a bit addicted to it.

anon73138

March 25, 2010

does anyone know what blp_bbot is?

anon69510

March 8, 2010

There are some third-party services for web crawling.

anon66011

February 17, 2010

is a web crawler used to download complete sites automatically? and can be read offline? please reply soon. it's urgent.

Dhananjay

January 24, 2010

Which are actual users of Web crawlers other than search engines? What are the uses of the web crawler in day to day Internet surfing?

anon55873

December 10, 2009

How do they index the data? i'm sure one is necessary.

anon52425

November 14, 2009

Very well written. :)

anon49895

October 23, 2009

Depending on whether or not the e-Mail supports HTML formatting, you could always try doing this:

Send Mail

You can alter the subject as well. Just change "hello" to whatever you'd like, and "again" to whatever you like. the %20 represents the code for initiating a 'space'. So if you would like it to say something like: E-mail to the webmaster, it would be subject=E-mail%20to%20the%20webmaster. best of luck.

bettylou

April 13, 2009

I have a webstore. I just learned how to do a signature in my e-mails so my webstore is at the bottom. But it is not blue in color like most e-mail links are. Can someone tell me what I need to do, so people can just click on the signature and get to the website?

Thank you, Betty

Post your comments

By: Cousin_Avi
Web crawlers and other similar technologies use algorithms, complex mathematical equations, which are the keys to producing targeted results in searches.
By: zothen
Web crawlers scan through Internet pages to create an index of data.
By: PricelessPhotos
Web crawlers can be operated for a particular one-time project.

What is a Web Crawler?

Common Uses

How it Works

AS FEATURED ON:

AS FEATURED ON:

Discussion Comments

Post your comments

Login:

Register:

Common Uses

How it Works

AS FEATURED ON:

AS FEATURED ON:

Related Articles

Discussion Comments

Post your comments

Login:

Register: