|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Uses of FetchStatusCodes in org.archive.crawler.datamodel |
---|
Classes in org.archive.crawler.datamodel that implement FetchStatusCodes | |
---|---|
class |
CrawlServer
Represents a single remote "server". |
class |
CrawlSubstats
Collector of statistics for a 'subset' of a crawl, such as a server (host:port), host, or frontier group (eg queue). |
class |
CrawlURI
Represents a candidate URI and the associated state it collects as it is crawled. |
Uses of FetchStatusCodes in org.archive.crawler.fetcher |
---|
Classes in org.archive.crawler.fetcher that implement FetchStatusCodes | |
---|---|
class |
FetchDNS
Processor to resolve 'dns:' URIs. |
class |
FetchFTP
Fetches documents and directory listings using FTP. |
class |
FetchHTTP
HTTP fetcher that uses Apache Jakarta Commons HttpClient library. |
Uses of FetchStatusCodes in org.archive.crawler.framework |
---|
Classes in org.archive.crawler.framework that implement FetchStatusCodes | |
---|---|
class |
ToeThread
One "worker thread"; asks for CrawlURIs, processes them, repeats unless told otherwise. |
class |
WriterPoolProcessor
Abstract implementation of a file pool processor. |
Uses of FetchStatusCodes in org.archive.crawler.frontier |
---|
Classes in org.archive.crawler.frontier that implement FetchStatusCodes | |
---|---|
class |
AbstractFrontier
Shared facilities for Frontier implementations. |
class |
AdaptiveRevisitFrontier
A Frontier that will repeatedly visit all encountered URIs. |
class |
BdbFrontier
A Frontier using several BerkeleyDB JE Databases to hold its record of known hosts (queues), and pending URIs. |
class |
DomainSensitiveFrontier
Deprecated. As of release 1.10.0. Replaced by BdbFrontier and
QuotaEnforcer . |
class |
WorkQueueFrontier
A common Frontier base using several queues to hold pending URIs. |
Uses of FetchStatusCodes in org.archive.crawler.postprocessor |
---|
Classes in org.archive.crawler.postprocessor that implement FetchStatusCodes | |
---|---|
class |
CrawlStateUpdater
A step, late in the processing of a CrawlURI, for updating the per-host information that may have been affected by the fetch. |
class |
FrontierScheduler
'Schedule' with the Frontier CandidateURIs being carried by the passed CrawlURI. |
class |
LinksScoper
Determine which extracted links are within scope. |
Uses of FetchStatusCodes in org.archive.crawler.prefetch |
---|
Classes in org.archive.crawler.prefetch that implement FetchStatusCodes | |
---|---|
class |
PreconditionEnforcer
Ensures the preconditions for a fetch -- such as DNS lookup or acquiring and respecting a robots.txt policy -- are satisfied before a URI is passed to subsequent stages. |
class |
Preselector
If set to recheck the crawl's scope, gives a yes/no on whether a CrawlURI should be processed at all. |
class |
QuotaEnforcer
A simple quota enforcer. |
class |
RuntimeLimitEnforcer
A processor to enforce runtime limits on crawls. |
Uses of FetchStatusCodes in org.archive.crawler.processor |
---|
Classes in org.archive.crawler.processor that implement FetchStatusCodes | |
---|---|
class |
BeanShellProcessor
A processor which runs a BeanShell script on the CrawlURI. |
class |
CrawlMapper
A simple crawl splitter/mapper, dividing up CandidateURIs/CrawlURIs between crawlers by diverting some range of URIs to local log files (which can then be imported to other crawlers). |
class |
HashCrawlMapper
Maps URIs to one of N crawler names by applying a hash to the URI's (possibly-transformed) classKey. |
class |
LexicalCrawlMapper
A simple crawl splitter/mapper, dividing up CandidateURIs/CrawlURIs between crawlers by diverting some range of URIs to local log files (which can then be imported to other crawlers). |
Uses of FetchStatusCodes in org.archive.crawler.writer |
---|
Classes in org.archive.crawler.writer that implement FetchStatusCodes | |
---|---|
class |
ARCWriterProcessor
Processor module for writing the results of successful fetches (and perhaps someday, certain kinds of network failures) to the Internet Archive ARC file format. |
class |
WARCWriterProcessor
WARCWriterProcessor. |
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |