What is a Web Crawler?

internet computers

A web crawler is a relatively simple, automated program, or script, that methodically scans or "crawls" through Internet pages to create an index of of the data it's looking for. Alternative names for a web crawler include web spider, web robot, bot, crawler and automatic indexer.

A web crawler can be used for many purposes. Probably the most common use associated with the term is related to search engines. Search engines use web crawlers to collect information about what is out there on public web pages. Their primary purpose is to collect data so that when Internet surfers enter a search term on their site, they can quickly provide the surfer with relevant web sites.

When a search engine's web crawler visits a web page it "reads" the visible text, the hyperlinks, and the content of the various tags used in the site, such as keyword rich meta tags. Using the information gathered from the crawler, a search engine will then determine what the site is about and index the information. The website is then included in the search engine's database and its page ranking process.

Search engines, however, are not the only users of web crawlers. Linguists may use a web crawler to perform a textual analysis; that is, they may comb the Internet to determine what words are commonly used today. Market researchers may use a web crawler to determine and assess trends in a given market. There are numerous nefarious uses of web crawlers as well. In the end a web crawler may be used by anyone seeking to collect information out on the Internet.

Web crawlers may operate one time only, say for a particular one-time project, or if its purpose is for something long term, as is the case with search engines, they may be programed to comb through the Internet periodically to determine whether there has been any significant changes. If a site is experiencing heavy traffic or technical difficulties, the spider may be programmed to note that and revisit the site again, hopefully after the technical issues have subsided.

Web crawling is an important method for collecting data on, and keeping up with the rapidly expanding, Internet. A vast amount of web pages are continually being added every day and information is constantly changing. A web crawler is a way for the search engines and other users to regularly ensure that their databases are up to date.

Related wiseGEEK articles

Category

wiseGEEK features

Subscribe to wiseGEEK


4
Depending on whether or not the e-Mail supports HTML formatting, you could always try doing this:

Send Mail

You can alter the subject as well. Just change "hello" to whatever you'd like, and "again" to whatever you like. the %20 represents the code for initiating a 'space'. So if you would like it to say something like: E-mail to the webmaster, it would be subject=E-mail%20to%20the%20webmaster. best of luck.

- anon49895
2
I have a webstore. I just learned how to do a signature in my e-mails so my webstore is at the bottom. But it is not blue in color like most e-mail links are. Can someone tell me what I need to do, so people can just click on the signature and get to the website?

Thank you, Betty

- bettylou

FREE: Subscribe to wiseGEEK

 
    learn more

our strict privacy policy ensures that your email address will be safe



Written by Heather Kaefer
Last Modified: 23 October 2009

copyright © 2003 - 2009
conjecture corporation