It turns out that there are a lot of differences between Google’s regular web crawler and the Google News crawler. And though very few of us will find our content included in Google News, it still seems like a good idea to make our content conform to their technical requirements. Here are a few of them:
- In order for our crawler to correctly gather your content, each article needs to link to a page dedicated solely to that article. We’re unable to index articles from news sections which consist of one long page rather than a series of links that lead to articles on individual pages.
- If your articles are located in a drop down box, we won’t be able to crawl them. Google News is unable to crawl articles only accessible through a drop down menu.
- Google News does not recognize or follow Flash, graphic/image or JavaScript links which link to articles. Our automated crawler is best able to crawl plain text HTML links.
- Google News doesn’t crawl articles in PDF format, although this content is included on Google Web Search. Our automated crawler is currently best able to crawl plain text HTML sites.