爬虫 Scrapy 学习系列十四:Exceptions

前言

这是 Scrapy 系列学习文章之一,本章主要介绍 Exceptions 的相关的内容;

本文为作者的原创作品,转载需注明出处;

内置 Exceptions 一览

DropItem

1
exception scrapy.exceptions.DropItem

The exception that must be raised by item pipeline stages to stop processing an Item. For more information see 爬虫 Scrapy 学习系列之九:Item Pipeline">Item Pipeline;

在 Item Pipeline 执行过程中,如果需要终止对某个 Item 的执行,那么此异常是必须被抛出的;

CloseSpider

1
exception scrapy.exceptions.CloseSpider(reason='cancelled')

在 spider 的回调方法中抛出,目的是终止当前的 spider 继续执行;

Parameters: reason (str) – the reason for closing

1
2
3
def parse_page(self, response):
if 'Bandwidth exceeded' in response.body:
raise CloseSpider('bandwidth_exceeded')

IgnoreRequest

1
exception scrapy.exceptions.IgnoreRequest

This exception can be raised by the Scheduler or any downloader middleware to indicate that the request should be ignored.

NotConfigured

1
exception scrapy.exceptions.NotConfigured

This exception can be raised by some components to indicate that they will remain disabled. Those components include:

通过该异常表示某些 components 当前不可用

  • Extensions
  • Item pipelines
  • Downloader middlewares
  • Spider middlewares

The exception must be raised in the component’s __init__ method.

NotSupported

1
exception scrapy.exceptions.NotSupported

This exception is raised to indicate an unsupported feature.