Class Summary |
AcceptRevisitProcessor |
Set a URI to be revisited by the ARFrontier. |
ContentBasedWaitEvaluator |
A WaitEvaluator that compares the CrawlURIs content type to a configurable
regular expression. |
CrawlStateUpdater |
A step, late in the processing of a CrawlURI, for updating the per-host
information that may have been affected by the fetch. |
FrontierScheduler |
'Schedule' with the Frontier CandidateURIs being carried by the passed
CrawlURI. |
ImageWaitEvaluator |
A specialized ContentBasedWaitEvaluator. |
LinksScoper |
Determine which extracted links are within scope. |
LowDiskPauseProcessor |
Processor module which uses 'df -k', where available and with
the expected output format (on Linux), to monitor available
disk space and pause the crawl if free space on monitored
filesystems falls below certain thresholds. |
RejectRevisitProcessor |
Set a URI to not be revisited by the ARFrontier. |
SupplementaryLinksScoper |
Run CandidateURI links carried in the passed CrawlURI through a filter
and 'handle' rejections. |
TextWaitEvaluator |
A specialized ContentBasedWaitEvaluator. |
WaitEvaluator |
A processor that determines when a URI should be revisited next. |