Google Can Crawl Forms And Drop-Down Boxes
Writing by Nick Stamoulis on Sunday, April 13, 2008
As this post from Matt Cutts illustrates, Google is now starting to crawl drop-down boxes:
Now Google is finding ways to crawl through forms and drop-down boxes. We only do this for a small number of high-quality sites right now, and we’re very cautious and careful to do the crawling politely and abide by robots.txt. If you’d prefer that Google not crawl urls like this, you can use robots.txt to block the urls that would be discovered by crawling through a form. But I hope that the dialog above is a pretty good example of why this new discovery method can be helpful to webmasters.
I find two critical pieces to this snippet that should be mentioned. First, right now they are only crawling forms and drop-down boxes for “high quality” sites. I’m not sure what “high quality” sites means, but my guess is that it doesn’t mean every site on line. I can also conjecture that “high quality” probably means something akin to Google’s already outlined quality guidelines for webmasters, meaning that if you run PPC campaigns then your landing pages must meet a certain level of quality. If you don’t then quality probably means the same thing it means for determining PageRank and rankings within the SERPs. That’s just a guess.
The second thing that is critical about this statement is that you can prevent Google from crawling forms and drop-down boxes with robots.txt. This is probably more critical for forms than for drop-down boxes. Most drop-down boxes are only going to include pages that are not protected by passwords, but if you have menu items on your drop-down boxes whose pages are password protected and you don’t want the search engines to access them then you need to include them in your robots.txt file. With forms, it is possible that form information that is saved as html documents on a server could be crawled and that could lead to privacy of your form users being sacrificed or critical information that you are trying to protect might be compromised. That’s a good time to use robots.txt. You should research this a little bit more in-depth.
Category: Search Engines
Read similar posts in Search Engines
- Can Site Architecture Issues Affect Your Traffic?
- What's Next - Real-Time Search Mashup?
- Should You Change Your SEO Strategy For Bing?
- Why Google Measures Bounce Rate By Keyword Phrase
- How To Change Your Address (For Google's Sake)
Subscribe to our RSS feed
No comments yet.
Subscribe to our RSS Feed 




