Uses of Interface
org.archive.crawler.datamodel.FetchStatusCodes

Packages that use FetchStatusCodes
org.archive.crawler.datamodel   
org.archive.crawler.fetcher   
org.archive.crawler.framework   
org.archive.crawler.frontier   
org.archive.crawler.postprocessor   
org.archive.crawler.prefetch   
org.archive.crawler.processor   
org.archive.crawler.writer   
 

Uses of FetchStatusCodes in org.archive.crawler.datamodel
 

Classes in org.archive.crawler.datamodel that implement FetchStatusCodes
 class CrawlServer
          Represents a single remote "server".
 class CrawlSubstats
          Collector of statistics for a 'subset' of a crawl, such as a server (host:port), host, or frontier group (eg queue).
 class CrawlURI
          Represents a candidate URI and the associated state it collects as it is crawled.
 

Uses of FetchStatusCodes in org.archive.crawler.fetcher
 

Classes in org.archive.crawler.fetcher that implement FetchStatusCodes
 class FetchDNS
          Processor to resolve 'dns:' URIs.
 class FetchFTP
          Fetches documents and directory listings using FTP.
 class FetchHTTP
          HTTP fetcher that uses Apache Jakarta Commons HttpClient library.
 

Uses of FetchStatusCodes in org.archive.crawler.framework
 

Classes in org.archive.crawler.framework that implement FetchStatusCodes
 class ToeThread
          One "worker thread"; asks for CrawlURIs, processes them, repeats unless told otherwise.
 class WriterPoolProcessor
          Abstract implementation of a file pool processor.
 

Uses of FetchStatusCodes in org.archive.crawler.frontier
 

Classes in org.archive.crawler.frontier that implement FetchStatusCodes
 class AbstractFrontier
          Shared facilities for Frontier implementations.
 class AdaptiveRevisitFrontier
          A Frontier that will repeatedly visit all encountered URIs.
 class BdbFrontier
          A Frontier using several BerkeleyDB JE Databases to hold its record of known hosts (queues), and pending URIs.
 class DomainSensitiveFrontier
          Deprecated. As of release 1.10.0. Replaced by BdbFrontier and QuotaEnforcer.
 class WorkQueueFrontier
          A common Frontier base using several queues to hold pending URIs.
 

Uses of FetchStatusCodes in org.archive.crawler.postprocessor
 

Classes in org.archive.crawler.postprocessor that implement FetchStatusCodes
 class CrawlStateUpdater
          A step, late in the processing of a CrawlURI, for updating the per-host information that may have been affected by the fetch.
 class FrontierScheduler
          'Schedule' with the Frontier CandidateURIs being carried by the passed CrawlURI.
 class LinksScoper
          Determine which extracted links are within scope.
 

Uses of FetchStatusCodes in org.archive.crawler.prefetch
 

Classes in org.archive.crawler.prefetch that implement FetchStatusCodes
 class PreconditionEnforcer
          Ensures the preconditions for a fetch -- such as DNS lookup or acquiring and respecting a robots.txt policy -- are satisfied before a URI is passed to subsequent stages.
 class Preselector
          If set to recheck the crawl's scope, gives a yes/no on whether a CrawlURI should be processed at all.
 class QuotaEnforcer
          A simple quota enforcer.
 class RuntimeLimitEnforcer
          A processor to enforce runtime limits on crawls.
 

Uses of FetchStatusCodes in org.archive.crawler.processor
 

Classes in org.archive.crawler.processor that implement FetchStatusCodes
 class BeanShellProcessor
          A processor which runs a BeanShell script on the CrawlURI.
 class CrawlMapper
          A simple crawl splitter/mapper, dividing up CandidateURIs/CrawlURIs between crawlers by diverting some range of URIs to local log files (which can then be imported to other crawlers).
 class HashCrawlMapper
          Maps URIs to one of N crawler names by applying a hash to the URI's (possibly-transformed) classKey.
 class LexicalCrawlMapper
          A simple crawl splitter/mapper, dividing up CandidateURIs/CrawlURIs between crawlers by diverting some range of URIs to local log files (which can then be imported to other crawlers).
 

Uses of FetchStatusCodes in org.archive.crawler.writer
 

Classes in org.archive.crawler.writer that implement FetchStatusCodes
 class ARCWriterProcessor
          Processor module for writing the results of successful fetches (and perhaps someday, certain kinds of network failures) to the Internet Archive ARC file format.
 class WARCWriterProcessor
          WARCWriterProcessor.
 



Copyright © 2003-2011 Internet Archive. All Rights Reserved.