Category: 

What is a Screen Scraper?

Article Details
  • Written By: Carrie Grosvenor
  • Edited By: Bronwyn Harris
  • Last Modified Date: 27 August 2016
  • Copyright Protected:
    2003-2016
    Conjecture Corporation
  • Print this Article
Free Widgets for your Site/Blog
Although Stonehenge is the most famous, there are over 1,000 ancient stone circles standing in the British Isles.   more...

September 26 ,  1960 :  The first televised US Presidential debate took place.  more...

A screen scraper is a computer program that collects character-based data from the display output of another program. Screen scrapers can extract the data they are looking for and present it in a richer format, such as with graphs or tables, or simply index the data for storage. There are many other names for a screen scraper, including web site scraper, content miner, web site ripper, web extractor, automated data collector, and HTML scraper.

A screen scraper will search through the code of a website and filter out the extraneous code that is in place to provide a nice-looking presentation to the end browser. Such code is necessary to view the entire page in its intended layout, but a scraper is simply looking for useful data. This data is collected and presented as a simple database, without the bells and whistles the original HTML code provided.

A good example of a screen scraper in action is with search engine spiders. These spiders access hundreds of thousands of websites, which each contain numerous pages within. The keyword data from these sites is collected and indexed, then ultimately presented to the end user as search engine results.

Ad

Most screen scrapers scour the HTML coding of a website to get their information, but they can also search other scripting languages such as JavaScript or PHP. The data that is mined can then be presented as HTML itself, so that the user can access it with their web browser, or stored as text data that can be accessed by the user offline.

Businesses use screen scrapers to mine the data from a variety of keyword-related websites in order to generate graphs, charts, spreadsheets, and comparison data to be used in reports and presentations. The screen scraper saves an extraordinary amount of time, since an employee doing the same task would have to search for relevant sites, click on links, and browse each site individually to find and record the applicable data they need. A screen scraper can also be used when information is stored on a system that can no longer be accessed due to compatibility issues with newer hardware or software.

Screen scrapers can be both a blessing and a curse for site owners and web surfers. While they absolutely provide a functional service for businesses, search engines, and others, a screen scraper can also be used for less than altruistic purposes. For example, companies or individuals who use spam as an advertising method can use a screen scraper to mine e-mail addresses from websites.

While a screen scraper can be a handy tool, there is some debate among the web community over legalities and ethics when using them. Copyright issues become blurry when a screen scraper extracts someone's hard work and presents it in another format for another website, and those sites that depend on advertising to generate revenue have problems when their ads are being discarded by the screen scraper. As a result, some website owners have begun to implement tools that will prevent their sites from being scraped.

Ad

You might also Like

Recommended

Discuss this Article

anon200423
Post 3

I used SmokeDoc - a very useful screen scraping tool.

anon105222
Post 2

Beta2 version of ScrapePro Web Scraper Designer has been released. You can use it for free.

Post your comments

Post Anonymously

Login

username
password
forgot password?

Register

username
password
confirm
email