Package org.archive.crawler.framework

Interface Summary
AlertManager Manager for application alerts.
Frontier An interface for URI Frontiers.
Frontier.FrontierGroup Generic interface representing the internal groupings of a Frontier's URIs -- usually queues.
FrontierHostStatistics An optional interface the Frontiers can implement to provide information about specific hosts.
FrontierMarker A marker is a pointer to a place somewhere inside a frontier's list of pending URIs.
StatisticsTracking An interface for objects that want to collect statistics on running crawls.
 

Class Summary
AbstractTracker A partial implementation of the StatisticsTracking interface.
Checkpointer Runs checkpointing.
CrawlController CrawlController collects all the classes which cooperate to perform a crawl and provides a high-level interface to the running crawl.
CrawlScope A CrawlScope instance defines which URIs are "in" a particular crawl.
Filter Base class for filter classes.
Processor Base class for URI processing classes.
ProcessorChain This class groups together a number of processors that logically fit together.
ProcessorChainList A list of all the ProcessorChains.
Scoper Base class for Scopers.
ToePool A collection of ToeThreads.
ToeThread One "worker thread"; asks for CrawlURIs, processes them, repeats unless told otherwise.
WriterPoolProcessor Abstract implementation of a file pool processor.
 



Copyright © 2003-2011 Internet Archive. All Rights Reserved.