A screen scraper is a computer program that collects character-based data from the display output of another program. Screen scrapers can extract the data they are looking for and present it in a richer format, such as with graphs or tables, or simply index the data for storage. There are many other names for a screen scraper, including web site scraper, content miner, web site ripper, web extractor, automated data collector, and HTML scraper.
A screen scraper will search through the code of a website and filter out the extraneous code that is in place to provide a nice-looking presentation to the end browser. Such code is necessary to view the entire page in its intended layout, but a scraper is simply looking for useful data. This data is collected and presented as a simple database, without the bells and whistles the original HTML code provided.
A good example of a screen scraper in action is with search engine spiders. These spiders access hundreds of thousands of websites, which each contain numerous pages within. The keyword data from these sites is collected and indexed, then ultimately presented to the end user as search engine results.
Businesses use screen scrapers to mine the data from a variety of keyword-related websites in order to generate graphs, charts, spreadsheets, and comparison data to be used in reports and presentations. The screen scraper saves an extraordinary amount of time, since an employee doing the same task would have to search for relevant sites, click on links, and browse each site individually to find and record the applicable data they need. A screen scraper can also be used when information is stored on a system that can no longer be accessed due to compatibility issues with newer hardware or software.
Screen scrapers can be both a blessing and a curse for site owners and web surfers. While they absolutely provide a functional service for businesses, search engines, and others, a screen scraper can also be used for less than altruistic purposes. For example, companies or individuals who use spam as an advertising method can use a screen scraper to mine e-mail addresses from websites.
While a screen scraper can be a handy tool, there is some debate among the web community over legalities and ethics when using them. Copyright issues become blurry when a screen scraper extracts someone's hard work and presents it in another format for another website, and those sites that depend on advertising to generate revenue have problems when their ads are being discarded by the screen scraper. As a result, some website owners have begun to implement tools that will prevent their sites from being scraped.