Are You Being Scraped? Should You Care?
Writing by Nick Stamoulis on Thursday, 3 of January , 2008 at 1:35 pm
You’ve seen ‘em. The scraper sites that are stealing your content. They’re a big pain in the you-know-what. You know it, I know it, they know it. But does it matter?
It might.
Scraper sites exist to make money. But how do they make money off of your content? It depends.
Some scraper sites will steal your content and place AdSense ads on those pages to gain the revenue from people clicking on those ads. Other scraper sites will sell advertising. But how do they get your content?
How Content Scrapers Operate
There are bots that will crawl your site looking for specific keywords. When they find them they scrape that content and place it on the blog or website of the person who programmed the bot. This is called content theft and it’s a big, big problem. Just visit some forums and you’ll see webmasters cursing up a storm about this stolen content. But what can you do about it?
Some of those scraper sites are banking on bloggers clicking through from their admin areas to see who is linking to them. They may not have any PR or clout at all and exist only to get the traffic from the people whose sites they are scraping from. These scrapers aren’t really the problem. They’re a nuisance, but if you don’t click their ads then they likely aren’t making any money.
Identifying The Real Problem Scrapers
The problem sites are the scrapers who have a high PR. A high PageRank indicates credibility. Therefore, it’s possible that the search engines will crawl those sites before they crawl yours and they’ll be credited with original content even though they stole it from you. If that is the case then you have something to worry about.
You’ll have to download the Google toolbar in order to see what the PR of those sites are. I recommend that you visit every content scraper site you find that has your content on it without your approval. Send them a letter asking them to remove your content. If they do not do so then find out who their ISP is and report them. You should also report to the search engines that someone is scraping your content and give them an URL.
How To Fix The Scraping Problem
Those high PR sites scraping your content may have other content that you can’t see. They could have a blog or article directory in a folder on that site that isn’t linked to the scraped content. It’s invisible to human eyes, but the search engines can see it. That content could cause their site to be crawled more often than yours, in which case, if they get crawled before you do, they’ll get credited for original content and you’ll be dinged for duplicate content. That’s your real problem. And to fix that you have to communicate with the search engines to let them know that your site is the originator of the content. Be patient, though. You can rectify the situation with the search engines, but it doesn’t happen in two hours.
Category: Content Development, PageRank, SEO, Search Engines
- Add this post to Del.icio.us - Digg
Comment by Jonathan Bailey
Made Thursday, 3 of January , 2008 at 4:40 pm
Though I am very happy to see SEO blogs such as yourself addressing this issue in more detail, there are several inaccuracies in this post that I have to address.
First, the vast majority of scraping does not happen from the site itself, but rather, the RSS feed. RSS scraping is much easier, much more reliable and much faster than site scraping. I don’t know of any spammers that scrape from HTML save a few that do so from Google search results.
Second, the way to deal with content theft is not to notify the search engines, but get the work removed. You can file a DMCA notice against the infringing site and get them shut down if they are hosted in the U.S. There are similar laws in many other Western nations.
Similarly, you can file a DMCA notice with Google, Yahoo!, etc. to get the infringing material removed from those indexes.
There’s a lot to know and learn about fighting content theft. However, it can be done and it is easy to do.
If I can help you in any way with it, please let me know. I am always available to assist.
Thank you again for addressing this issue!
Comment by Nick Stamoulis
Made Saturday, 5 of January , 2008 at 8:35 am
Thanks Jonathan. Your site is a great resource. Yes, filing a DMCA notice is a good thing to do, but if you are just concerned with getting your content removed from a site then you don’t have to go that far. Contacting the search engines is necessary if you want to be credited with original content. Otherwise, they’ll continue to index the scrapers who manage to steal to your ranking. Your distinction between stealing from RSS vs. HTML is a very good distinction that I missed. Thank you.
Comment by Gireesh
Made Wednesday, 9 of January , 2008 at 7:52 am
The content scrapping is usually done by the sites that have no business value. The content is scrapped from high profile websites. What’s the solution?
Gireesh Kumar Sharma
Sr. Content Writer
Recognize, Nourish and Retain Talent
E-Mail: gksharma@saigun.com
EmpXtrack
www.EmpXtrack.com
www.Saigun.com
+91 120 431 5560/ 431 5561
Subscribe to our RSS Feed 














