org.archive.crawler.datamodel
Interface CoreAttributeConstants

All Known Subinterfaces:
AdaptiveRevisitAttributeConstants
All Known Implementing Classes:
AbstractFrontier, AcceptRevisitProcessor, AdaptiveRevisitFrontier, AdaptiveRevisitHostQueue, AggressiveExtractorHTML, ARCWriterProcessor, BdbFrontier, CandidateURI, ChangeEvaluator, ContentBasedWaitEvaluator, CrawledBytesHistotable, CrawlStateUpdater, CrawlURI, DomainSensitiveFrontier, ExceedsDocumentLengthTresholdDecideRule, ExtractorCSS, ExtractorDOC, ExtractorHTML, ExtractorHTTP, ExtractorImpliedURI, ExtractorJS, ExtractorPDF, ExtractorSWF, ExtractorUniversal, ExtractorURI, ExtractorXML, FetchDNS, FetchFTP, FetchHistoryProcessor, FetchHTTP, HTTPMidFetchUnchangedFilter, IdenticalDigestDecideRule, ImageWaitEvaluator, JerichoExtractorHTML, Kw3WriterProcessor, LocalErrorFormatter, MirrorWriterProcessor, NotExceedsDocumentLengthTresholdDecideRule, PreconditionEnforcer, RejectRevisitProcessor, RuntimeErrorFormatter, SeedRecord, TextWaitEvaluator, ToeThread, TrapSuppressExtractor, UriErrorFormatter, UriProcessingFormatter, WaitEvaluator, WARCWriterProcessor, WorkQueueFrontier, WriterPoolProcessor

public interface CoreAttributeConstants

CrawlURI attribute keys used by the core crawler classes.

Author:
gojomo

Field Summary
static java.lang.String A_ANNOTATIONS
          shorthand string tokens indicating notable occurences, separated by commas
static java.lang.String A_CONTENT_DIGEST
          content digest
static java.lang.String A_CONTENT_TYPE
          Extracted MIME type of fetched content; should be set immediately by fetching module if possible (rather than waiting for a later analyzer)
static java.lang.String A_CREDENTIAL_AVATARS_KEY
          Key to get credential avatars from A_LIST.
static java.lang.String A_DELAY_FACTOR
          Multiplier of last fetch duration to wait before fetching another item of the same class (eg host)
static java.lang.String A_DISTANCE_FROM_SEED
           
static java.lang.String A_DNS_FETCH_TIME
           
static java.lang.String A_DNS_SERVER_IP_LABEL
           
static java.lang.String A_ETAG_HEADER
          header name (and AList key) for ETag
static java.lang.String A_FETCH_BEGAN_TIME
           
static java.lang.String A_FETCH_COMPLETED_TIME
           
static java.lang.String A_FETCH_HISTORY
          fetch history array
static java.lang.String A_FORCE_RETIRE
          flag indicating the containing queue should be retired
static java.lang.String A_FTP_CONTROL_CONVERSATION
           
static java.lang.String A_FTP_FETCH_STATUS
           
static java.lang.String A_HERITABLE_KEYS
          Key to (optional) attribute specifying a list of keys that are passed to CandidateURIs that 'descend' (are discovered via) this URI.
static java.lang.String A_HTML_BASE
           
static java.lang.String A_HTTP_BIND_ADDRESS
          local override of origin bind address
static java.lang.String A_HTTP_PROXY_HOST
          local override of proxy host
static java.lang.String A_HTTP_PROXY_PORT
          local override of proxy port
static java.lang.String A_HTTP_TRANSACTION
           
static java.lang.String A_LAST_MODIFIED_HEADER
          header name (and AList key) for last-modified timestamp
static java.lang.String A_LOCALIZED_ERRORS
           
static java.lang.String A_META_ROBOTS
           
static java.lang.String A_MINIMUM_DELAY
          Minimum delay before fetching another item of th same class (eg host).
static java.lang.String A_MIRROR_PATH
          Define for org.archive.crawler.writer.MirrorWriterProcessor.
static java.lang.String A_PREREQUISITE_URI
           
static java.lang.String A_REFERENCE_LENGTH
          reference length (content length or virtual length
static java.lang.String A_RETRY_DELAY
           
static java.lang.String A_RRECORD_SET_LABEL
           
static java.lang.String A_RUNTIME_EXCEPTION
           
static java.lang.String A_SOURCE_TAG
          a 'source' (usu.
static java.lang.String A_STATUS
          key for status (when in history)
static java.lang.String A_WRITTEN_TO_WARC
          name of warc file where uri had records written
static java.lang.String HEADER_TRUNC
           
static java.lang.String LENGTH_TRUNC
           
static java.lang.String TIMER_TRUNC
           
static java.lang.String TRUNC_SUFFIX
          Fetch truncation codes present in CrawlURI annotations.
 

Field Detail

A_CONTENT_TYPE

static final java.lang.String A_CONTENT_TYPE
Extracted MIME type of fetched content; should be set immediately by fetching module if possible (rather than waiting for a later analyzer)

See Also:
Constant Field Values

A_DELAY_FACTOR

static final java.lang.String A_DELAY_FACTOR
Multiplier of last fetch duration to wait before fetching another item of the same class (eg host)

See Also:
Constant Field Values

A_MINIMUM_DELAY

static final java.lang.String A_MINIMUM_DELAY
Minimum delay before fetching another item of th same class (eg host). Even if lastFetchTime*delayFactor is less than this, this period will be waited.

See Also:
Constant Field Values

A_RRECORD_SET_LABEL

static final java.lang.String A_RRECORD_SET_LABEL
See Also:
Constant Field Values

A_DNS_FETCH_TIME

static final java.lang.String A_DNS_FETCH_TIME
See Also:
Constant Field Values

A_DNS_SERVER_IP_LABEL

static final java.lang.String A_DNS_SERVER_IP_LABEL
See Also:
Constant Field Values

A_FETCH_BEGAN_TIME

static final java.lang.String A_FETCH_BEGAN_TIME
See Also:
Constant Field Values

A_FETCH_COMPLETED_TIME

static final java.lang.String A_FETCH_COMPLETED_TIME
See Also:
Constant Field Values

A_HTTP_TRANSACTION

static final java.lang.String A_HTTP_TRANSACTION
See Also:
Constant Field Values

A_FTP_CONTROL_CONVERSATION

static final java.lang.String A_FTP_CONTROL_CONVERSATION
See Also:
Constant Field Values

A_FTP_FETCH_STATUS

static final java.lang.String A_FTP_FETCH_STATUS
See Also:
Constant Field Values

A_RUNTIME_EXCEPTION

static final java.lang.String A_RUNTIME_EXCEPTION
See Also:
Constant Field Values

A_LOCALIZED_ERRORS

static final java.lang.String A_LOCALIZED_ERRORS
See Also:
Constant Field Values

A_ANNOTATIONS

static final java.lang.String A_ANNOTATIONS
shorthand string tokens indicating notable occurences, separated by commas

See Also:
Constant Field Values

A_PREREQUISITE_URI

static final java.lang.String A_PREREQUISITE_URI
See Also:
Constant Field Values

A_DISTANCE_FROM_SEED

static final java.lang.String A_DISTANCE_FROM_SEED
See Also:
Constant Field Values

A_HTML_BASE

static final java.lang.String A_HTML_BASE
See Also:
Constant Field Values

A_RETRY_DELAY

static final java.lang.String A_RETRY_DELAY
See Also:
Constant Field Values

A_META_ROBOTS

static final java.lang.String A_META_ROBOTS
See Also:
Constant Field Values

A_MIRROR_PATH

static final java.lang.String A_MIRROR_PATH
Define for org.archive.crawler.writer.MirrorWriterProcessor.

See Also:
Constant Field Values

A_CREDENTIAL_AVATARS_KEY

static final java.lang.String A_CREDENTIAL_AVATARS_KEY
Key to get credential avatars from A_LIST.

See Also:
Constant Field Values

A_SOURCE_TAG

static final java.lang.String A_SOURCE_TAG
a 'source' (usu. URI) that's inherited by discovered URIs

See Also:
Constant Field Values

A_HERITABLE_KEYS

static final java.lang.String A_HERITABLE_KEYS
Key to (optional) attribute specifying a list of keys that are passed to CandidateURIs that 'descend' (are discovered via) this URI.

See Also:
Constant Field Values

A_FORCE_RETIRE

static final java.lang.String A_FORCE_RETIRE
flag indicating the containing queue should be retired

See Also:
Constant Field Values

A_HTTP_PROXY_HOST

static final java.lang.String A_HTTP_PROXY_HOST
local override of proxy host

See Also:
Constant Field Values

A_HTTP_PROXY_PORT

static final java.lang.String A_HTTP_PROXY_PORT
local override of proxy port

See Also:
Constant Field Values

A_HTTP_BIND_ADDRESS

static final java.lang.String A_HTTP_BIND_ADDRESS
local override of origin bind address

See Also:
Constant Field Values

TRUNC_SUFFIX

static final java.lang.String TRUNC_SUFFIX
Fetch truncation codes present in CrawlURI annotations. All truncation annotations have a TRUNC_SUFFIX suffix (TODO: Make for-sure unique or redo truncation so definitive flag marked against CrawlURI).

See Also:
Constant Field Values

HEADER_TRUNC

static final java.lang.String HEADER_TRUNC
See Also:
Constant Field Values

TIMER_TRUNC

static final java.lang.String TIMER_TRUNC
See Also:
Constant Field Values

LENGTH_TRUNC

static final java.lang.String LENGTH_TRUNC
See Also:
Constant Field Values

A_FETCH_HISTORY

static final java.lang.String A_FETCH_HISTORY
fetch history array

See Also:
Constant Field Values

A_CONTENT_DIGEST

static final java.lang.String A_CONTENT_DIGEST
content digest

See Also:
Constant Field Values

A_LAST_MODIFIED_HEADER

static final java.lang.String A_LAST_MODIFIED_HEADER
header name (and AList key) for last-modified timestamp

See Also:
Constant Field Values

A_ETAG_HEADER

static final java.lang.String A_ETAG_HEADER
header name (and AList key) for ETag

See Also:
Constant Field Values

A_STATUS

static final java.lang.String A_STATUS
key for status (when in history)

See Also:
Constant Field Values

A_REFERENCE_LENGTH

static final java.lang.String A_REFERENCE_LENGTH
reference length (content length or virtual length

See Also:
Constant Field Values

A_WRITTEN_TO_WARC

static final java.lang.String A_WRITTEN_TO_WARC
name of warc file where uri had records written

See Also:
Constant Field Values


Copyright © 2003-2011 Internet Archive. All Rights Reserved.