SEO爬虫介绍
SEO(Search Engine Optimization)是指通过优化网站内容、结构和其他因素,以提高网站在搜索引擎结果页面(SERP)中的排名,从而增加有机搜索流量的过程。而爬虫(Crawler)是搜索引擎使用的一种自动化程序,用于浏览互联网上的网页、索引内容以及为搜索引擎结果页面提供信息。
正规的搜索引擎的爬虫,需要符合两个条件:
- 采用规定好的User-Agent字段
- 请求来源IP来自厂商可信网段
User-Agent介绍
厂商 | 用途 | User-Agent示例 |
Bing | 网页爬虫 | Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/ Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) W.X.Y.Z Safari/537.36 Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) |
广告爬虫AdIdxBot | Mozilla/5.0 (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm) Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm) Mozilla/5.0 (Windows Phone 8.1; ARM; Trident/7.0; Touch; rv:11.0; IEMobile/11.0; NOKIA; Lumia 530) like Gecko (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm) | |
网页预览 | Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/W.X.Y.Z Safari/537.36 Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) | |
Office预览 | Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; MicrosoftPreview/2.0; +https://aka.ms/MicrosoftPreview) Chrome/W.X.Y.Z Safari/537.36 Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; MicrosoftPreview/2.0; +https://aka.ms/MicrosoftPreview) | |
360 | 网页爬虫 | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36; 360Spider |
IP网段
多数厂商的爬虫IP是支持rDNS反查的,例如下面的IP,rDNS是官方的msn.com
厂商 | 网段 | rDns特征字符 |
Bing | https://www.bing.com/toolbox/bingbot.json | search.msn.com |
https://developers.google.com/search/apis/ipranges/googlebot.json | googlebot | |
360 | https://www.so.com/help/spider_ip.html | 不支持 |