Google Now Indexes Scanned Documents

Writing by Nick Stamoulis on Friday, 31 of October , 2008 at 12:19 pm

Google can now indexed scanned documents - .pdf files and other images of print text. Wow!

This is great news because before humans - people like you me - had no problems reading scanned documents online, but the search engines did. Now you can scan your entire library of technical manuals and possibly rank for search terms within them. At least, in theory.

From Google’s official blog:

Consider a circle. Should it be read it as a zero, the letter ‘O’, just a circle, or the ring from my coffee cup? People learn to answer this kind of question very quickly, but for the computer it is a painstaking and error-prone process.

Check it out:

Here’s a link to the SERP.

Now view the .pdf document.

Incredible that the huge title across the top of the page on the .pdf file is the actual title of the document in the SERP, just like on a web page. How you can make that work for your website? Any ideas?

                      Category: SEO                      
3 Comments

Comment by pattypat

Made Friday, 31 of October , 2008 at 1:03 pm

Personally, I don’t find this to be an improvement. Some of he documents I have in my site have been indexed by Google but the PDF coming from a bad quality source (microfilms), whatever Google has interpreted is nothing like the real title. On top of that, I have a very well indexed website and I wish my visitors to view it as a whole, not just a PDF randomly picked and shown out of its normal page setup.

I have nothing against the idea, I just want Google to give the opportunity to webmasters to avoid this indexation. I haven’t found anything in FAQ that would tell me how to stop this.

Comment by frde

Made Friday, 31 of October , 2008 at 10:31 pm

Why not just use robots.txt?

[...] couple of days ago there was announcement on Search Engine Optimization Journal that Google can now crawl and index scanned documents. But can you drive traffic to those pages with PPC and expect your quality score to register the [...]

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Search Engine Optimization Journal is an SEO Blog that discusses Search Engine Marketing, Search Engine Ranking and Positioning for the new and advanced reader.
Learn more about this SEO blog.