org.archive.crawler.frontier
Interface AdaptiveRevisitAttributeConstants

All Superinterfaces:
CoreAttributeConstants
All Known Implementing Classes:
AcceptRevisitProcessor, AdaptiveRevisitFrontier, AdaptiveRevisitHostQueue, ChangeEvaluator, ContentBasedWaitEvaluator, HTTPMidFetchUnchangedFilter, ImageWaitEvaluator, RejectRevisitProcessor, TextWaitEvaluator, WaitEvaluator

public interface AdaptiveRevisitAttributeConstants
extends CoreAttributeConstants

Defines static constants for the Adaptive Revisiting module defining data keys in the CrawlURI AList.

Author:
Kristinn Sigurdsson
See Also:
CoreAttributeConstants

Field Summary
static java.lang.String A_CONTENT_STATE_KEY
          Key to use getting state of crawluri from the CrawlURI alist.
static java.lang.String A_DISCARD_REVISIT
          Mark a URI to be dropped from revisit handling.
static java.lang.String A_FETCH_OVERDUE
           
static java.lang.String A_LAST_CONTENT_DIGEST
          Designates a field in the CrawlURIs AList for the content digest of an earlier visit.
static java.lang.String A_LAST_DATESTAMP
           
static java.lang.String A_LAST_ETAG
           
static java.lang.String A_NUMBER_OF_VERSIONS
           
static java.lang.String A_NUMBER_OF_VISITS
           
static java.lang.String A_TIME_OF_NEXT_PROCESSING
           
static java.lang.String A_WAIT_INTERVAL
           
static java.lang.String A_WAIT_REEVALUATED
           
static int CONTENT_CHANGED
          URI content had changed between the two latest, successfully completed fetches.
static int CONTENT_UNCHANGED
          URI content has not changed between the two latest, successfully completed fetches.
static int CONTENT_UNKNOWN
          No knowledge of URI content.
 
Fields inherited from interface org.archive.crawler.datamodel.CoreAttributeConstants
A_ANNOTATIONS, A_CONTENT_DIGEST, A_CONTENT_TYPE, A_CREDENTIAL_AVATARS_KEY, A_DELAY_FACTOR, A_DISTANCE_FROM_SEED, A_DNS_FETCH_TIME, A_DNS_SERVER_IP_LABEL, A_ETAG_HEADER, A_FETCH_BEGAN_TIME, A_FETCH_COMPLETED_TIME, A_FETCH_HISTORY, A_FORCE_RETIRE, A_FTP_CONTROL_CONVERSATION, A_FTP_FETCH_STATUS, A_HERITABLE_KEYS, A_HTML_BASE, A_HTTP_BIND_ADDRESS, A_HTTP_PROXY_HOST, A_HTTP_PROXY_PORT, A_HTTP_TRANSACTION, A_LAST_MODIFIED_HEADER, A_LOCALIZED_ERRORS, A_META_ROBOTS, A_MINIMUM_DELAY, A_MIRROR_PATH, A_PREREQUISITE_URI, A_REFERENCE_LENGTH, A_RETRY_DELAY, A_RRECORD_SET_LABEL, A_RUNTIME_EXCEPTION, A_SOURCE_TAG, A_STATUS, A_WRITTEN_TO_WARC, HEADER_TRUNC, LENGTH_TRUNC, TIMER_TRUNC, TRUNC_SUFFIX
 

Field Detail

A_LAST_CONTENT_DIGEST

static final java.lang.String A_LAST_CONTENT_DIGEST
Designates a field in the CrawlURIs AList for the content digest of an earlier visit.

See Also:
Constant Field Values

A_TIME_OF_NEXT_PROCESSING

static final java.lang.String A_TIME_OF_NEXT_PROCESSING
See Also:
Constant Field Values

A_WAIT_INTERVAL

static final java.lang.String A_WAIT_INTERVAL
See Also:
Constant Field Values

A_NUMBER_OF_VISITS

static final java.lang.String A_NUMBER_OF_VISITS
See Also:
Constant Field Values

A_NUMBER_OF_VERSIONS

static final java.lang.String A_NUMBER_OF_VERSIONS
See Also:
Constant Field Values

A_FETCH_OVERDUE

static final java.lang.String A_FETCH_OVERDUE
See Also:
Constant Field Values

A_LAST_ETAG

static final java.lang.String A_LAST_ETAG
See Also:
Constant Field Values

A_LAST_DATESTAMP

static final java.lang.String A_LAST_DATESTAMP
See Also:
Constant Field Values

A_WAIT_REEVALUATED

static final java.lang.String A_WAIT_REEVALUATED
See Also:
Constant Field Values

A_DISCARD_REVISIT

static final java.lang.String A_DISCARD_REVISIT
Mark a URI to be dropped from revisit handling. Used for custom processors that want to implement more selective revisiting. Actual effect depends on whether an alreadyIncluded structure is used. If an alreadyIncluded is used, dropping the URI from revisit handling means it won't be visited again. If an alreadyIncluded is not used, this merely drops one discovery of the URI, and it may be rediscovered and thus revisited that way.

See Also:
Constant Field Values

CONTENT_UNKNOWN

static final int CONTENT_UNKNOWN
No knowledge of URI content. Possibly not fetched yet, unable to check if different or an error occurred on last fetch attempt.

See Also:
Constant Field Values

CONTENT_UNCHANGED

static final int CONTENT_UNCHANGED
URI content has not changed between the two latest, successfully completed fetches.

See Also:
Constant Field Values

CONTENT_CHANGED

static final int CONTENT_CHANGED
URI content had changed between the two latest, successfully completed fetches. By definition, content has changed if there has only been one successful fetch made.

See Also:
Constant Field Values

A_CONTENT_STATE_KEY

static final java.lang.String A_CONTENT_STATE_KEY
Key to use getting state of crawluri from the CrawlURI alist.

See Also:
Constant Field Values


Copyright © 2003-2011 Internet Archive. All Rights Reserved.