|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Packages that use CoreAttributeConstants | |
---|---|
org.archive.crawler.admin | Contains classes that the web UI uses to monitor and control crawls. |
org.archive.crawler.datamodel | |
org.archive.crawler.deciderules | Provides classes for a simple decision rules framework. |
org.archive.crawler.deciderules.recrawl | |
org.archive.crawler.extractor | |
org.archive.crawler.fetcher | |
org.archive.crawler.filter | |
org.archive.crawler.framework | |
org.archive.crawler.frontier | |
org.archive.crawler.io | |
org.archive.crawler.postprocessor | |
org.archive.crawler.prefetch | |
org.archive.crawler.processor.recrawl | |
org.archive.crawler.util | |
org.archive.crawler.writer |
Uses of CoreAttributeConstants in org.archive.crawler.admin |
---|
Classes in org.archive.crawler.admin that implement CoreAttributeConstants | |
---|---|
class |
SeedRecord
Record of all interesting info about the most-recent processing of a specific seed. |
Uses of CoreAttributeConstants in org.archive.crawler.datamodel |
---|
Classes in org.archive.crawler.datamodel that implement CoreAttributeConstants | |
---|---|
class |
CandidateURI
A URI, discovered or passed-in, that may be scheduled. |
class |
CrawlURI
Represents a candidate URI and the associated state it collects as it is crawled. |
Uses of CoreAttributeConstants in org.archive.crawler.deciderules |
---|
Classes in org.archive.crawler.deciderules that implement CoreAttributeConstants | |
---|---|
class |
ExceedsDocumentLengthTresholdDecideRule
|
class |
NotExceedsDocumentLengthTresholdDecideRule
|
Uses of CoreAttributeConstants in org.archive.crawler.deciderules.recrawl |
---|
Classes in org.archive.crawler.deciderules.recrawl that implement CoreAttributeConstants | |
---|---|
class |
IdenticalDigestDecideRule
Rule applies configured decision to any CrawlURIs whose prior-history content-digest matches the latest fetch. |
Uses of CoreAttributeConstants in org.archive.crawler.extractor |
---|
Classes in org.archive.crawler.extractor that implement CoreAttributeConstants | |
---|---|
class |
AggressiveExtractorHTML
Extended version of ExtractorHTML with more aggressive javascript link extraction where javascript code is parsed first with general HTML tags regexp, and than by javascript speculative link regexp. |
class |
ChangeEvaluator
This processor compares the CrawlURI's current content digest
with the one from a previous crawl. |
class |
ExtractorCSS
This extractor is parsing URIs from CSS type files. |
class |
ExtractorDOC
This class allows the caller to extract href style links from word97-format word documents. |
class |
ExtractorHTML
Basic link-extraction, from an HTML content-body, using regular expressions. |
class |
ExtractorHTTP
Extracts URIs from HTTP response headers. |
class |
ExtractorImpliedURI
An extractor for finding 'implied' URIs inside other URIs. |
class |
ExtractorJS
Processes Javascript files for strings that are likely to be crawlable URIs. |
class |
ExtractorPDF
Allows the caller to process a CrawlURI representing a PDF for the purpose of extracting URIs |
class |
ExtractorSWF
Process SWF (flash/shockwave) files for strings that are likely to be crawlable URIs. |
class |
ExtractorUniversal
A last ditch extractor that will look at the raw byte code and try to extract anything that looks like a link. |
class |
ExtractorURI
An extractor for finding URIs inside other URIs. |
class |
ExtractorXML
A simple extractor which finds HTTP URIs inside XML/RSS files, inside attribute values and simple elements (those with only whitespace + HTTP URI + whitespace as contents) |
class |
JerichoExtractorHTML
Improved link-extraction from an HTML content-body using jericho-html parser. |
class |
TrapSuppressExtractor
Pseudo-extractor that suppresses link-extraction of likely trap pages, by noticing when content's digest is identical to that of its 'via'. |
Uses of CoreAttributeConstants in org.archive.crawler.fetcher |
---|
Classes in org.archive.crawler.fetcher that implement CoreAttributeConstants | |
---|---|
class |
FetchDNS
Processor to resolve 'dns:' URIs. |
class |
FetchFTP
Fetches documents and directory listings using FTP. |
class |
FetchHTTP
HTTP fetcher that uses Apache Jakarta Commons HttpClient library. |
Uses of CoreAttributeConstants in org.archive.crawler.filter |
---|
Classes in org.archive.crawler.filter that implement CoreAttributeConstants | |
---|---|
class |
HTTPMidFetchUnchangedFilter
A mid fetch filter for HTTP fetcher processors. |
Uses of CoreAttributeConstants in org.archive.crawler.framework |
---|
Classes in org.archive.crawler.framework that implement CoreAttributeConstants | |
---|---|
class |
ToeThread
One "worker thread"; asks for CrawlURIs, processes them, repeats unless told otherwise. |
class |
WriterPoolProcessor
Abstract implementation of a file pool processor. |
Uses of CoreAttributeConstants in org.archive.crawler.frontier |
---|
Subinterfaces of CoreAttributeConstants in org.archive.crawler.frontier | |
---|---|
interface |
AdaptiveRevisitAttributeConstants
Defines static constants for the Adaptive Revisiting module defining data keys in the CrawlURI AList. |
Classes in org.archive.crawler.frontier that implement CoreAttributeConstants | |
---|---|
class |
AbstractFrontier
Shared facilities for Frontier implementations. |
class |
AdaptiveRevisitFrontier
A Frontier that will repeatedly visit all encountered URIs. |
class |
AdaptiveRevisitHostQueue
A priority based queue of CrawlURIs. |
class |
BdbFrontier
A Frontier using several BerkeleyDB JE Databases to hold its record of known hosts (queues), and pending URIs. |
class |
DomainSensitiveFrontier
Deprecated. As of release 1.10.0. Replaced by BdbFrontier and
QuotaEnforcer . |
class |
WorkQueueFrontier
A common Frontier base using several queues to hold pending URIs. |
Uses of CoreAttributeConstants in org.archive.crawler.io |
---|
Classes in org.archive.crawler.io that implement CoreAttributeConstants | |
---|---|
class |
LocalErrorFormatter
|
class |
RuntimeErrorFormatter
Runtime exception log formatter. |
class |
UriErrorFormatter
Formatter for 'uri-errors.log', of URIs so malformed they could not be instantiated. |
class |
UriProcessingFormatter
Formatter for 'crawl.log'. |
Uses of CoreAttributeConstants in org.archive.crawler.postprocessor |
---|
Classes in org.archive.crawler.postprocessor that implement CoreAttributeConstants | |
---|---|
class |
AcceptRevisitProcessor
Set a URI to be revisited by the ARFrontier. |
class |
ContentBasedWaitEvaluator
A WaitEvaluator that compares the CrawlURIs content type to a configurable regular expression. |
class |
CrawlStateUpdater
A step, late in the processing of a CrawlURI, for updating the per-host information that may have been affected by the fetch. |
class |
ImageWaitEvaluator
A specialized ContentBasedWaitEvaluator. |
class |
RejectRevisitProcessor
Set a URI to not be revisited by the ARFrontier. |
class |
TextWaitEvaluator
A specialized ContentBasedWaitEvaluator. |
class |
WaitEvaluator
A processor that determines when a URI should be revisited next. |
Uses of CoreAttributeConstants in org.archive.crawler.prefetch |
---|
Classes in org.archive.crawler.prefetch that implement CoreAttributeConstants | |
---|---|
class |
PreconditionEnforcer
Ensures the preconditions for a fetch -- such as DNS lookup or acquiring and respecting a robots.txt policy -- are satisfied before a URI is passed to subsequent stages. |
Uses of CoreAttributeConstants in org.archive.crawler.processor.recrawl |
---|
Classes in org.archive.crawler.processor.recrawl that implement CoreAttributeConstants | |
---|---|
class |
FetchHistoryProcessor
Maintain a history of fetch information inside the CrawlURI's attributes. |
Uses of CoreAttributeConstants in org.archive.crawler.util |
---|
Classes in org.archive.crawler.util that implement CoreAttributeConstants | |
---|---|
class |
CrawledBytesHistotable
|
Uses of CoreAttributeConstants in org.archive.crawler.writer |
---|
Classes in org.archive.crawler.writer that implement CoreAttributeConstants | |
---|---|
class |
ARCWriterProcessor
Processor module for writing the results of successful fetches (and perhaps someday, certain kinds of network failures) to the Internet Archive ARC file format. |
class |
Kw3WriterProcessor
Processor module that writes the results of successful fetches to files on disk. |
class |
MirrorWriterProcessor
Processor module that writes the results of successful fetches to files on disk. |
class |
WARCWriterProcessor
WARCWriterProcessor. |
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |