前言

这是 Scrapy 系列学习文章之一，本章主要介绍 Requests 和 Responses 的相关的内容；

本文为作者的原创作品，转载需注明出处；

简介

Link extractors are objects whose only purpose is to extract links from web pages (scrapy.http.Response objects) which will be eventually followed.

Link extractors 的设计目的是通过 Response 对象从页面中提取 links，供后续使用；

Scrapy 内置有scrapy.linkextractors.LinkExtractor对象；不过你可以通过实现一个非常简单的接口来自定义你自己的 LinkExtractor 来满足你自己的需要；只需要实现一个 public 方法即可，extract_links(response)，接收一个 Response 对象作为参数然后返回一组 scrapy.link.Link 对象；

内置的 Link Extractors

Link extractors classes bundled with Scrapy are provided in the scrapy.linkextractors module.

默认的 Link Extractor 是 LinkExtractor，与 LxmlLinkExtractor 一样；

1	from scrapy.linkextractors import LinkExtractor

LxmlLinkExtractor

class scrapy.linkextractors.lxmlhtml.LxmlLinkExtractor(allow=(), deny=(), allow_domains=(), deny_domains=(), deny_extensions=None, restrict_xpaths=(), restrict_css=(), tags=('a', 'area'), attrs=('href', ), canonicalize=False, unique=True, process_value=None, strip=True)

详细内容参考 https://doc.scrapy.org/en/latest/topics/link-extractors.html