Excessive Bandwidth/MS Search
Featured threads
Most viewed threads in last month
Econsultancy's new site 383 views
PPC Reporting Essentials 178 views
Convince me facebook is good for advertising! 160 views
Email - HTML issues in Outlook 2007 and when to send? 150 views
SEO Title characters 148 views
Most active threads in last month
Econsultancy's new site 11 replies
SEO Title characters 9 replies
SEO Agency Search 7 replies
.de vs. page on current website 5 replies
PPC Reporting Essentials 4 replies
Technical Director at Box UK
30 November 2004 12:05pm
Just a quick note of caution - if anyone’s noticing a huge increase in their bandwidth over the last month, it’s probably due to the new Microsoft search spider.
We’ve had a 10x increase in bandwidth since Oct 10th (and the associated cost that entails). After some investigation, we’ve noticed that the spider has got ’trapped’ in one of our sites that uses sessions - the spider is getting a new session for each request (and hence a new session id), and therefore continues to spider forever...
Other spiders can cope with session ids (even if it's just to ignore pages/sites that use them), so I'm a little concerned that MS didn't put much consideration into this widely used technique...
So, if anyone’s using session ids in their URLs, I’d suggest switching them off for the MS spider (with some user-agent sniffing), or otherwise pre-empting the hell of the MS search spider.
Dan
Web Consultant at architxt.net
30 November 2004 15:37pm
This is quite amazing. The spider must have been very active, like some trapped animal gone crazy!
If all this traffic had a (considerably) negative impact on your server's performance then could it be classified as a denial of service attack?
On 12:05:01 30 November 2004 Dan Zambonini wrote:
Technical Director at Box UK
30 November 2004 17:28pm
It is a bit like a trapped animal - something extremely dangerous... Possibly something rabid, or something with lots of bugs all over it.
We're still looking into it - but the session ids might be a red herring - but it's definitely something to do with dynamically generated URLs (possibly URLs that store the previously viewed section, or some other kind of per-session based information).
Either way, from what I can see, it's spidering the same pages over and over and over again, without recongising that they are the same page (albeit with different internal links/urls).
I'll keep you posted.
CEO at Econsultancy
01 December 2004 10:45am
Mmm... I'm intrigued by the 'denial of service' angle. Could it be that there will be court cases coming againts search engines sometime soon for excessive site crawling? Loss of revenue due to slowed sites, for example?
I guess for the time being no-one would make such complaints because they're keen to get the SEO rankings and so want to be indexed. However, I certainly know that the search spiders (especially Google and MSN) completely knacker our site when they visit - it's a bit of a love/hate relationship...
Ashley
Fndr at Majestic12.co.uk
02 December 2004 18:46pm
> If all this traffic had a (considerably) negative impact on your
> server's performance then could it be classified as a denial
> of service attack?
Unlikely - DoS attack implies intent to take server down, and in this case this intent is not present. Does not excuse Microsoft from not having designed their crawler to avoid loops like that.
regards
Alex
Director at SciVisum.co.uk
03 December 2004 18:21pm
Well, I hate to say 'told you so', but on Nov 11th I did...see the Times newspaper:
http://business.timesonline.co.uk/article/0,,9075-1354322,00.html
But hey, don't you just love Microsoft!
Nagging thought though Dan - you're *sure* it's an MS spider....
Deri
SciVisum.co,uk
Web application testing specialists
On 12:05:01 30 November 2004 Dan Zambonini wrote:
>Just a quick note of caution - if anyone’s noticing
>a huge increase in their bandwidth over the last month,
>it’s probably due to the new Microsoft search
>spider.
>
>We’ve had a 10x increase in bandwidth since Oct 10th
>(and the associated cost that entails).
Technical Director at Box UK
03 December 2004 18:32pm
MS Search 4.0 Robot
user agent, but I'll check the IP address range - could well be a spoofer. I'll let you know - good thinking.
Dan
Fndr at Majestic12.co.uk
03 December 2004 18:43pm
The Times has a cheek to compare search engines while at the same time banning any robots from their own site apart from Google! Here is so-called robots.txt that defines which robots (that support this standard) can not access it - http://business.timesonline.co.uk/robots.txt
Director at SciVisum.co.uk
06 December 2004 11:27am
The IP address is vital - can you let us know ASAP?
See the discussion here from a year ago, of some dodgy guys using the MS search bot User Agent on their bot:
http://www.webmasterworld.com/forum97/34.htm
Deri
Technical Director at Box UK
06 December 2004 13:30pm
194.6.120.101