[ Sponsored Links ]

Advertise here »

Allowing search engines to spider 'hidden' content - best practice?

 
I'll side with Adam on this.

The decision is not how to completely hide your content, but how difficult you want to make it for users to get it for free that paying/registering becomes a no brainer.

It is not cloaking as long as you show the SAME content to spiders you would show to users.

The common set up works as follows:
  • keep the site completely open to everyone including spiders.
  • set up a user detection script NOT a spider detection that can push users that have not authenticated to a registration or login page. Users that have authenticated get full normal access anyway.
  • for a better user experience and to comply with Googles "First Click free" guidelines, set that script up to allow access to a certain number of pages for free ( 1 or more) when people access via search engines.
  • You can set nocache meta tags or nowadays use the x-robots headers to keep the content out of the Google cache so that people can't access it that way, however often sites don't block the search engine cache so you can still access the content that way as Google prefers it if the nocache option is not used, I presume this is because their tools look at the way pages change over time, so if they can't cache them this won't work as well. The average internet user probably doesn't know the cache link even exists anyway.
Here's how a couple of major websites do it, NOT low level spam sites, and seem to have no issues:

The Washington Post
http://www.washingtonpost.com
- all articles are accessible for free when accessing via search engine results, after which a click on any link prompts authentication.

Webmaster World
http://www.webmasterworld.com
- a certain number of articles are accessible for free after which no matter which way you access the pages including via search results, authentication is prompted. Seems to be time based and the authentication prompt resets back to free access after a short period.

Both use server side 302 temporary redirects to puch users to the login/registration pages, and then once the user authenticates they get referred back to the article.

Google needs to find a balanced approach to indexing and supplying results from content behind walled gardens because there will only be more of them in the future as publishers of quality content look to monetise it properly, "First Click free" and various major websites succesful implementations of this show that it is possible and is a reasonable interim solution for Google, internet users and publishers.


Edward Cowell (Teddie)
Neutralize (**)
http://www.neutralize.com
 
Subscribe for only €299