[ Sponsored Links ]

Advertise here »

Allowing search engines to spider 'hidden' content - best practice?

 
Yes, this is definitely along the lines of what we'd ideally like to do.

The relevant Google News page (http://www.google.com/support/newspub/bin/answer.py?answer=40543) strongly suggests that they actually don't mind you selectively allowing the Googlebot through a pay wall, which from a publisher's perspective provides the best of both worlds (we get our content indexed while it remains fully protected), but clearly that's not going to create a spectacular user experience for anyone clicking through from a search result.

However, I'm concerned about the technical logistics of "First Click Free" -- namely how trivial it is to circumvent it. Anyone with a rudimentary grasp of IT can install a browser extension to spoof their HTTP referer, or disable cookies, or even (with a bit more effort) make consecutive HTTP requests appear to come from different IP addresses, and faced with that combination of techniques it becomes impossible to distinguish between someone who's genuinely just landed from a search result and someone who's done 30 seconds of browser configuration and is now trawling the site downloading all the content.

There's always the old argument that publishers can do no better than provide a mild deterrent, and that anyone who's determined enough to crack the system is essentially welcome to invest the time and effort necessary to do so, but here we're talking about a really small effort versus a really big payoff. It's one thing for, say, the Washington Post to allow First Click Free on the grounds that their content is essentially ephemeral and largely advertising-funded in the first place, but E-consultancy's content has a high and lasting value that we can't afford to compromise so readily.

Does anyone have any technical insights about how to safely release useful chunks of content to users arriving from search engine results, without also making it very easy for any half-competent user to subvert this system to his advantage? The fundamentally anonymous and stateless nature of the web makes this a very difficult problem to solve reliably. We are extremely keen to provide the richest possible user experience, but preferably not at the expense of our content assets!
 
Subscribe for only €299