org.archive.crawler.postprocessor
Class WaitEvaluator

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.framework.Processor
                      extended by org.archive.crawler.postprocessor.WaitEvaluator
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean, CoreAttributeConstants, AdaptiveRevisitAttributeConstants
Direct Known Subclasses:
ContentBasedWaitEvaluator

public class WaitEvaluator
extends Processor
implements AdaptiveRevisitAttributeConstants

A processor that determines when a URI should be revisited next. Does not account for DNS and robots.txt expiration. That should be handled seperately by the Frontiers.

Author:
Kristinn Sigurdsson
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
ComplexType.MBeanAttributeInfoIterator
 
Field Summary
static java.lang.String ATTR_CHANGED_FACTOR
          Factor decrease on wait when changed
static java.lang.String ATTR_DEFAULT_WAIT_INTERVAL
          Fixed wait time for 'unknown' change status.
static java.lang.String ATTR_INITIAL_WAIT_INTERVAL
          Default wait time after initial visit.
static java.lang.String ATTR_MAX_WAIT_INTERVAL
          Maximum wait between visits
static java.lang.String ATTR_MIN_WAIT_INTERVAL
          Minimum wait between visits
static java.lang.String ATTR_UNCHANGED_FACTOR
          Factor increase on wait when unchanged
static java.lang.String ATTR_USE_OVERDUE_TIME
          Indicates if the amount of time the URI was overdue should be added to the wait time before the new wait time is calculated.
protected static java.lang.Double DEFAULT_CHANGED_FACTOR
           
protected static java.lang.Long DEFAULT_DEFAULT_WAIT_INTERVAL
           
protected static java.lang.Long DEFAULT_INITIAL_WAIT_INTERVAL
           
protected static java.lang.Long DEFAULT_MAX_WAIT_INTERVAL
           
protected static java.lang.Long DEFAULT_MIN_WAIT_INTERVAL
           
protected static java.lang.Double DEFAULT_UNCHANGED_FACTOR
           
protected static java.lang.Boolean DEFAULT_USE_OVERDUE_TIME
           
(package private)  java.util.logging.Logger logger
           
 
Fields inherited from class org.archive.crawler.framework.Processor
ATTR_DECIDE_RULES, ATTR_ENABLED, attrDecideRules
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definition, definitionMap
 
Fields inherited from interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
A_CONTENT_STATE_KEY, A_DISCARD_REVISIT, A_FETCH_OVERDUE, A_LAST_CONTENT_DIGEST, A_LAST_DATESTAMP, A_LAST_ETAG, A_NUMBER_OF_VERSIONS, A_NUMBER_OF_VISITS, A_TIME_OF_NEXT_PROCESSING, A_WAIT_INTERVAL, A_WAIT_REEVALUATED, CONTENT_CHANGED, CONTENT_UNCHANGED, CONTENT_UNKNOWN
 
Fields inherited from interface org.archive.crawler.datamodel.CoreAttributeConstants
A_ANNOTATIONS, A_CONTENT_DIGEST, A_CONTENT_TYPE, A_CREDENTIAL_AVATARS_KEY, A_DELAY_FACTOR, A_DISTANCE_FROM_SEED, A_DNS_FETCH_TIME, A_DNS_SERVER_IP_LABEL, A_ETAG_HEADER, A_FETCH_BEGAN_TIME, A_FETCH_COMPLETED_TIME, A_FETCH_HISTORY, A_FORCE_RETIRE, A_FTP_CONTROL_CONVERSATION, A_FTP_FETCH_STATUS, A_HERITABLE_KEYS, A_HTML_BASE, A_HTTP_BIND_ADDRESS, A_HTTP_PROXY_HOST, A_HTTP_PROXY_PORT, A_HTTP_TRANSACTION, A_LAST_MODIFIED_HEADER, A_LOCALIZED_ERRORS, A_META_ROBOTS, A_MINIMUM_DELAY, A_MIRROR_PATH, A_PREREQUISITE_URI, A_REFERENCE_LENGTH, A_RETRY_DELAY, A_RRECORD_SET_LABEL, A_RUNTIME_EXCEPTION, A_SOURCE_TAG, A_STATUS, A_WRITTEN_TO_WARC, HEADER_TRUNC, LENGTH_TRUNC, TIMER_TRUNC, TRUNC_SUFFIX
 
Constructor Summary
WaitEvaluator(java.lang.String name)
          Constructor
WaitEvaluator(java.lang.String name, java.lang.String description, java.lang.Long default_inital_wait_interval, java.lang.Long default_max_wait_interval, java.lang.Long default_min_wait_interval, java.lang.Double default_unchanged_factor, java.lang.Double default_changed_factor)
          Constructor
 
Method Summary
protected  void innerProcess(CrawlURI curi)
          Classes subclassing this one should override this method to perform their custom actions on the CrawlURI.
 
Methods inherited from class org.archive.crawler.framework.Processor
checkForInterrupt, finalTasks, getController, getDecideRule, getDefaultNextProcessor, initialTasks, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement, listUsedFiles
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName, hashCode
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

logger

java.util.logging.Logger logger

ATTR_INITIAL_WAIT_INTERVAL

public static final java.lang.String ATTR_INITIAL_WAIT_INTERVAL
Default wait time after initial visit.

See Also:
Constant Field Values

DEFAULT_INITIAL_WAIT_INTERVAL

protected static final java.lang.Long DEFAULT_INITIAL_WAIT_INTERVAL

ATTR_MAX_WAIT_INTERVAL

public static final java.lang.String ATTR_MAX_WAIT_INTERVAL
Maximum wait between visits

See Also:
Constant Field Values

DEFAULT_MAX_WAIT_INTERVAL

protected static final java.lang.Long DEFAULT_MAX_WAIT_INTERVAL

ATTR_MIN_WAIT_INTERVAL

public static final java.lang.String ATTR_MIN_WAIT_INTERVAL
Minimum wait between visits

See Also:
Constant Field Values

DEFAULT_MIN_WAIT_INTERVAL

protected static final java.lang.Long DEFAULT_MIN_WAIT_INTERVAL

ATTR_UNCHANGED_FACTOR

public static final java.lang.String ATTR_UNCHANGED_FACTOR
Factor increase on wait when unchanged

See Also:
Constant Field Values

DEFAULT_UNCHANGED_FACTOR

protected static final java.lang.Double DEFAULT_UNCHANGED_FACTOR

ATTR_CHANGED_FACTOR

public static final java.lang.String ATTR_CHANGED_FACTOR
Factor decrease on wait when changed

See Also:
Constant Field Values

DEFAULT_CHANGED_FACTOR

protected static final java.lang.Double DEFAULT_CHANGED_FACTOR

ATTR_DEFAULT_WAIT_INTERVAL

public static final java.lang.String ATTR_DEFAULT_WAIT_INTERVAL
Fixed wait time for 'unknown' change status. I.e. wait time for URIs whose content change detection is not available.

See Also:
Constant Field Values

DEFAULT_DEFAULT_WAIT_INTERVAL

protected static final java.lang.Long DEFAULT_DEFAULT_WAIT_INTERVAL

ATTR_USE_OVERDUE_TIME

public static final java.lang.String ATTR_USE_OVERDUE_TIME
Indicates if the amount of time the URI was overdue should be added to the wait time before the new wait time is calculated.

See Also:
Constant Field Values

DEFAULT_USE_OVERDUE_TIME

protected static final java.lang.Boolean DEFAULT_USE_OVERDUE_TIME
Constructor Detail

WaitEvaluator

public WaitEvaluator(java.lang.String name)
Constructor

Parameters:
name - The name of the module

WaitEvaluator

public WaitEvaluator(java.lang.String name,
                     java.lang.String description,
                     java.lang.Long default_inital_wait_interval,
                     java.lang.Long default_max_wait_interval,
                     java.lang.Long default_min_wait_interval,
                     java.lang.Double default_unchanged_factor,
                     java.lang.Double default_changed_factor)
Constructor

Parameters:
name - The name of the module
description - Description of the module
default_inital_wait_interval - The default value for initial wait time
default_max_wait_interval - The maximum value for wait time
default_min_wait_interval - The minimum value for wait time
default_unchanged_factor - The factor for changing wait times of unchanged documents (will be multiplied by this value)
default_changed_factor - The factor for changing wait times of changed documents (will be divided by this value)
Method Detail

innerProcess

protected void innerProcess(CrawlURI curi)
                     throws java.lang.InterruptedException
Description copied from class: Processor
Classes subclassing this one should override this method to perform their custom actions on the CrawlURI.

Overrides:
innerProcess in class Processor
Parameters:
curi - The CrawlURI being processed.
Throws:
java.lang.InterruptedException


Copyright © 2003-2011 Internet Archive. All Rights Reserved.