开发 · 2023年8月21日 0

SEO爬虫真伪判断

SEO爬虫介绍

SEO(Search Engine Optimization)是指通过优化网站内容、结构和其他因素,以提高网站在搜索引擎结果页面(SERP)中的排名,从而增加有机搜索流量的过程。而爬虫(Crawler)是搜索引擎使用的一种自动化程序,用于浏览互联网上的网页、索引内容以及为搜索引擎结果页面提供信息。

正规的搜索引擎的爬虫,需要符合两个条件:

  1. 采用规定好的User-Agent字段
  2. 请求来源IP来自厂商可信网段

User-Agent介绍

厂商用途User-Agent示例
Bing网页爬虫Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/

Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
W.X.Y.Z Safari/537.36

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
广告爬虫AdIdxBotMozilla/5.0 (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm)

Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm)

Mozilla/5.0 (Windows Phone 8.1; ARM; Trident/7.0; Touch; rv:11.0; IEMobile/11.0; NOKIA; Lumia 530) like Gecko (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm)
网页预览Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/W.X.Y.Z Safari/537.36

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
Office预览Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; MicrosoftPreview/2.0; +https://aka.ms/MicrosoftPreview) Chrome/W.X.Y.Z Safari/537.36

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; MicrosoftPreview/2.0; +https://aka.ms/MicrosoftPreview)
360网页爬虫Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36; 360Spider

IP网段

多数厂商的爬虫IP是支持rDNS反查的,例如下面的IP,rDNS是官方的msn.com

厂商网段rDns特征字符
Binghttps://www.bing.com/toolbox/bingbot.jsonsearch.msn.com
Googlehttps://developers.google.com/search/apis/ipranges/googlebot.jsongooglebot
360https://www.so.com/help/spider_ip.html不支持