Should You Ask Permission Before ‘Scraping’ Content?
Content scraping has a bad reputation, but it isn’t necessarily bad ALL the time. Before you decide to use a blog post or content from and RSS feed for your own website you should first think about the implications. Not every webmaster will appreciate it.
RSS feeds were established to allow for easier content distribution. The idea is to distribute your content to readers without those readers having to visit your website to do so. Therefore, it seems, you should be able to use content from an RSS feed on your own website without asking permission. Right?

Actually, that is technically correct. But there are some courtesies that you might extend. First, never, please, NEVER, use someone else’s content, even if from an RSS feed or article distribution site, without giving them attribution (usually done with a link back to their website). If you use someone else’s content without permission and without giving attribution then that is content theft. Shame on you.
However, scraping content from an RSS feed to use on your website is OK if you are linking back to the site from which the content originated. You must give the author and the publisher credit for using their work. But should you ask for permission.
Understand that I’m no lawyer so this is not legal advice. You should speak to your own legal counsel, but the way I see it is like this: If content exists in the form of an RSS feed, there is an implied consent to use it. You can run the feed through your own website as long as it links back to the source. No permission needed. But it would be a professional courtesy to send an e-mail to the content’s publisher and author asking for their permission to do so. You might technically be on sound legal ground if you don’t (again, speak to your attorney), but what will the social effects be? Some webmasters do frown on others using their content without express permission. So it will always work to your benefit to seek permission first.
Before you send an e-mail asking for permission to use content from an RSS feed, look on the website for terms of use. Some publishers will publish their terms on their site so it’s worth a few extra minutes to look and see if you find them.





Always ask for permission. No exceptions. Again, as the blog post stated, you might be able to “legally” get away with it and you might not be sued by someone who does not have the time to deal with it, but take my word for it, you will be disliked . . . a lot, if you do not ask permission.
Part of the issue is this. According to fair use, if I quote a paragraph of your content and link back to you for the rest of the story, then I’m doing it the right way, even though the AP is suing people who do just that because they want the law to change.
If I am using your whole story as content, then whether I link to you or not, it isn’t really fair use anymore. I may link back to the source, but I already gave the reader the whole story, which menas I benefit, the reader benefits, but the content originator does not benefit.
Hope that helps.
@Chris McElroy – Thanks for reading and the comment! Very good point about the AP case and the additional information…thanks!
If you scrape a site that contains data, and that data is public in the first place, as opposed to stories, it’s a different story. What’s copyright protected, more or less, is the aggregation of the data into a work – a book, a database, a collection of web pages.. But since that data is public, you could have obtained it in different ways, and in fact the data belongs to whomever consented to give it first, probably not the site who published it.
For instance, if I scrape Google Earth’s meta layers, and other sources, to build a data set for some kind of encyclopedia, I think I might possibly stand a chance in court, with a good IP lawyer and a big budget. Google Earth’s terms of use probably bar users from doing just that, but then again these terms might not hold in court if the information was public in the first place and was provided freely by other parties.
Ownership of blog posts is a simple issue in a sense, comparable to copyrights on newspaper articles, academic papers, even music and videos (ahem). “public” data exposed via privately owned API’s and user interfaces on the web, is trickier and will likely generate lots of court battles, then lobbying and lawmaking and then international confrontations — what happens when a Chinese, firm starts aggressively ‘scraping’ US sites for ‘public’ information about ‘private’ US people?
@bugzapp – Great points about referencing public data vs. a story…thanks for your comment!