Uses of Interface
org.archive.crawler.datamodel.CoreAttributeConstants

Packages that use CoreAttributeConstants
org.archive.crawler.admin Contains classes that the web UI uses to monitor and control crawls. 
org.archive.crawler.datamodel   
org.archive.crawler.deciderules Provides classes for a simple decision rules framework. 
org.archive.crawler.deciderules.recrawl   
org.archive.crawler.extractor   
org.archive.crawler.fetcher   
org.archive.crawler.filter   
org.archive.crawler.framework   
org.archive.crawler.frontier   
org.archive.crawler.io   
org.archive.crawler.postprocessor   
org.archive.crawler.prefetch   
org.archive.crawler.processor.recrawl   
org.archive.crawler.util   
org.archive.crawler.writer   
 

Uses of CoreAttributeConstants in org.archive.crawler.admin
 

Classes in org.archive.crawler.admin that implement CoreAttributeConstants
 class SeedRecord
          Record of all interesting info about the most-recent processing of a specific seed.
 

Uses of CoreAttributeConstants in org.archive.crawler.datamodel
 

Classes in org.archive.crawler.datamodel that implement CoreAttributeConstants
 class CandidateURI
          A URI, discovered or passed-in, that may be scheduled.
 class CrawlURI
          Represents a candidate URI and the associated state it collects as it is crawled.
 

Uses of CoreAttributeConstants in org.archive.crawler.deciderules
 

Classes in org.archive.crawler.deciderules that implement CoreAttributeConstants
 class ExceedsDocumentLengthTresholdDecideRule
           
 class NotExceedsDocumentLengthTresholdDecideRule
           
 

Uses of CoreAttributeConstants in org.archive.crawler.deciderules.recrawl
 

Classes in org.archive.crawler.deciderules.recrawl that implement CoreAttributeConstants
 class IdenticalDigestDecideRule
          Rule applies configured decision to any CrawlURIs whose prior-history content-digest matches the latest fetch.
 

Uses of CoreAttributeConstants in org.archive.crawler.extractor
 

Classes in org.archive.crawler.extractor that implement CoreAttributeConstants
 class AggressiveExtractorHTML
          Extended version of ExtractorHTML with more aggressive javascript link extraction where javascript code is parsed first with general HTML tags regexp, and than by javascript speculative link regexp.
 class ChangeEvaluator
          This processor compares the CrawlURI's current content digest with the one from a previous crawl.
 class ExtractorCSS
          This extractor is parsing URIs from CSS type files.
 class ExtractorDOC
          This class allows the caller to extract href style links from word97-format word documents.
 class ExtractorHTML
          Basic link-extraction, from an HTML content-body, using regular expressions.
 class ExtractorHTTP
          Extracts URIs from HTTP response headers.
 class ExtractorImpliedURI
          An extractor for finding 'implied' URIs inside other URIs.
 class ExtractorJS
          Processes Javascript files for strings that are likely to be crawlable URIs.
 class ExtractorPDF
          Allows the caller to process a CrawlURI representing a PDF for the purpose of extracting URIs
 class ExtractorSWF
          Process SWF (flash/shockwave) files for strings that are likely to be crawlable URIs.
 class ExtractorUniversal
          A last ditch extractor that will look at the raw byte code and try to extract anything that looks like a link.
 class ExtractorURI
          An extractor for finding URIs inside other URIs.
 class ExtractorXML
          A simple extractor which finds HTTP URIs inside XML/RSS files, inside attribute values and simple elements (those with only whitespace + HTTP URI + whitespace as contents)
 class JerichoExtractorHTML
          Improved link-extraction from an HTML content-body using jericho-html parser.
 class TrapSuppressExtractor
          Pseudo-extractor that suppresses link-extraction of likely trap pages, by noticing when content's digest is identical to that of its 'via'.
 

Uses of CoreAttributeConstants in org.archive.crawler.fetcher
 

Classes in org.archive.crawler.fetcher that implement CoreAttributeConstants
 class FetchDNS
          Processor to resolve 'dns:' URIs.
 class FetchFTP
          Fetches documents and directory listings using FTP.
 class FetchHTTP
          HTTP fetcher that uses Apache Jakarta Commons HttpClient library.
 

Uses of CoreAttributeConstants in org.archive.crawler.filter
 

Classes in org.archive.crawler.filter that implement CoreAttributeConstants
 class HTTPMidFetchUnchangedFilter
          A mid fetch filter for HTTP fetcher processors.
 

Uses of CoreAttributeConstants in org.archive.crawler.framework
 

Classes in org.archive.crawler.framework that implement CoreAttributeConstants
 class ToeThread
          One "worker thread"; asks for CrawlURIs, processes them, repeats unless told otherwise.
 class WriterPoolProcessor
          Abstract implementation of a file pool processor.
 

Uses of CoreAttributeConstants in org.archive.crawler.frontier
 

Subinterfaces of CoreAttributeConstants in org.archive.crawler.frontier
 interface AdaptiveRevisitAttributeConstants
          Defines static constants for the Adaptive Revisiting module defining data keys in the CrawlURI AList.
 

Classes in org.archive.crawler.frontier that implement CoreAttributeConstants
 class AbstractFrontier
          Shared facilities for Frontier implementations.
 class AdaptiveRevisitFrontier
          A Frontier that will repeatedly visit all encountered URIs.
 class AdaptiveRevisitHostQueue
          A priority based queue of CrawlURIs.
 class BdbFrontier
          A Frontier using several BerkeleyDB JE Databases to hold its record of known hosts (queues), and pending URIs.
 class DomainSensitiveFrontier
          Deprecated. As of release 1.10.0. Replaced by BdbFrontier and QuotaEnforcer.
 class WorkQueueFrontier
          A common Frontier base using several queues to hold pending URIs.
 

Uses of CoreAttributeConstants in org.archive.crawler.io
 

Classes in org.archive.crawler.io that implement CoreAttributeConstants
 class LocalErrorFormatter
           
 class RuntimeErrorFormatter
          Runtime exception log formatter.
 class UriErrorFormatter
          Formatter for 'uri-errors.log', of URIs so malformed they could not be instantiated.
 class UriProcessingFormatter
          Formatter for 'crawl.log'.
 

Uses of CoreAttributeConstants in org.archive.crawler.postprocessor
 

Classes in org.archive.crawler.postprocessor that implement CoreAttributeConstants
 class AcceptRevisitProcessor
          Set a URI to be revisited by the ARFrontier.
 class ContentBasedWaitEvaluator
          A WaitEvaluator that compares the CrawlURIs content type to a configurable regular expression.
 class CrawlStateUpdater
          A step, late in the processing of a CrawlURI, for updating the per-host information that may have been affected by the fetch.
 class ImageWaitEvaluator
          A specialized ContentBasedWaitEvaluator.
 class RejectRevisitProcessor
          Set a URI to not be revisited by the ARFrontier.
 class TextWaitEvaluator
          A specialized ContentBasedWaitEvaluator.
 class WaitEvaluator
          A processor that determines when a URI should be revisited next.
 

Uses of CoreAttributeConstants in org.archive.crawler.prefetch
 

Classes in org.archive.crawler.prefetch that implement CoreAttributeConstants
 class PreconditionEnforcer
          Ensures the preconditions for a fetch -- such as DNS lookup or acquiring and respecting a robots.txt policy -- are satisfied before a URI is passed to subsequent stages.
 

Uses of CoreAttributeConstants in org.archive.crawler.processor.recrawl
 

Classes in org.archive.crawler.processor.recrawl that implement CoreAttributeConstants
 class FetchHistoryProcessor
          Maintain a history of fetch information inside the CrawlURI's attributes.
 

Uses of CoreAttributeConstants in org.archive.crawler.util
 

Classes in org.archive.crawler.util that implement CoreAttributeConstants
 class CrawledBytesHistotable
           
 

Uses of CoreAttributeConstants in org.archive.crawler.writer
 

Classes in org.archive.crawler.writer that implement CoreAttributeConstants
 class ARCWriterProcessor
          Processor module for writing the results of successful fetches (and perhaps someday, certain kinds of network failures) to the Internet Archive ARC file format.
 class Kw3WriterProcessor
          Processor module that writes the results of successful fetches to files on disk.
 class MirrorWriterProcessor
          Processor module that writes the results of successful fetches to files on disk.
 class WARCWriterProcessor
          WARCWriterProcessor.
 



Copyright © 2003-2011 Internet Archive. All Rights Reserved.