Learn something new every day
More Info... by email
A scraper site is a website which pulls content from other sources and republishes it, typically without attribution. Such sites are maintained for a variety of reasons, and they are of great concern to many legitimate content producers on the Internet, because they pose a number of problems. Most scraper sites violate copyright law by reprinting content without consent and not crediting the author, and they also wreak havoc in search engine results and site rankings, which can make it difficult for Internet users to find the sites they actually want to see.
The key feature of a scraper site is that it uses automated means to harvest content from other sites. The practice of harvesting content is known as “scraping,” and it can be accomplished in a number of ways, from downloading entire sites to pulling content out of feeds generated in RSS, XML, and Atom for the benefit of readers who want to subscribe to a site, rather than visiting it constantly to check for new material. Once scraped, the content is lifted verbatim and installed on a new site.
Most scraper sites are maintained for the purpose of generating advertising revenue through advertisements linked with the site. People may innocently search for something, land on the scraper site, and then click the ads out of confusion. Scraper sites are also used in link farming, a practice which involves the maintenance of several sites which all link to each other, thereby inflating search engine rankings.
When content is stolen, it frustrates the original creator both because it violates copyright law and because the scraper site may deprive the original content owner of revenue. Many webmasters use a variety of techniques in an attempt to defeat scraper sites, and some have called for action on the part of search engines and advertising companies, asking them to delist scraper sites or make them less profitable so that the practice is less appealing.
In cases where a scraper site does credit the creator, this can also harm the creator by making it look as though his or her site is in a “bad neighborhood,” with a large number of spammy links rather than links from respected sites. As a result, rankings in search engines may fall, and the site owner may be powerless to do anything about it, since site owners cannot control who links to them.
Getting a scraper site to remove copyrighted content can be extremely challenging, as many such sites use layers of subterfuge to conceal their owners. Some frustrated webmasters go directly to the company which hosts the scraper site, citing copyright violations and requesting an immediate removal of the disputed content.
Technically, search engines and news aggregation sites could also be considered scraper sites. However, since these sites are maintained for the public good and because their use of material falls under fair use guidelines, these sites are generally not lumped with harmful scraper sites.
I don't get it. The fact is, every single site is a "scraper" site, unless it was the one breaking the news. But none do break news; companies break news. So the first blog to blog the news from the company gets to be OK but the rest are considered scrapers?