org.archive.crawler.filter
Class HTTPMidFetchUnchangedFilter

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.framework.Filter
                      extended by org.archive.crawler.filter.HTTPMidFetchUnchangedFilter
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean, CoreAttributeConstants, AdaptiveRevisitAttributeConstants

public class HTTPMidFetchUnchangedFilter
extends Filter
implements AdaptiveRevisitAttributeConstants

A mid fetch filter for HTTP fetcher processors. It will evaluate the HTTP header to try and predict if the document has changed since it last passed through this filter. It does this by comparing the last-modified and etag values with the same values stored during the last processing of the URI.

If both values are present, they must agree on predicting no change, otherwise a change is predicted (return true).

If only one of the values is present, it alone is used to predict if a change has occured.

If neither value is present the filter will return true (predict change)

Author:
Kristinn Sigurdsson
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
ComplexType.MBeanAttributeInfoIterator
 
Field Summary
static int HEADER_PREDICTS_CHANGED
           
static int HEADER_PREDICTS_MISSING
           
static int HEADER_PREDICTS_UNCHANGED
           
 
Fields inherited from class org.archive.crawler.framework.Filter
ATTR_ENABLED
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definition, definitionMap
 
Fields inherited from interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
A_CONTENT_STATE_KEY, A_DISCARD_REVISIT, A_FETCH_OVERDUE, A_LAST_CONTENT_DIGEST, A_LAST_DATESTAMP, A_LAST_ETAG, A_NUMBER_OF_VERSIONS, A_NUMBER_OF_VISITS, A_TIME_OF_NEXT_PROCESSING, A_WAIT_INTERVAL, A_WAIT_REEVALUATED, CONTENT_CHANGED, CONTENT_UNCHANGED, CONTENT_UNKNOWN
 
Fields inherited from interface org.archive.crawler.datamodel.CoreAttributeConstants
A_ANNOTATIONS, A_CONTENT_DIGEST, A_CONTENT_TYPE, A_CREDENTIAL_AVATARS_KEY, A_DELAY_FACTOR, A_DISTANCE_FROM_SEED, A_DNS_FETCH_TIME, A_DNS_SERVER_IP_LABEL, A_ETAG_HEADER, A_FETCH_BEGAN_TIME, A_FETCH_COMPLETED_TIME, A_FETCH_HISTORY, A_FORCE_RETIRE, A_FTP_CONTROL_CONVERSATION, A_FTP_FETCH_STATUS, A_HERITABLE_KEYS, A_HTML_BASE, A_HTTP_BIND_ADDRESS, A_HTTP_PROXY_HOST, A_HTTP_PROXY_PORT, A_HTTP_TRANSACTION, A_LAST_MODIFIED_HEADER, A_LOCALIZED_ERRORS, A_META_ROBOTS, A_MINIMUM_DELAY, A_MIRROR_PATH, A_PREREQUISITE_URI, A_REFERENCE_LENGTH, A_RETRY_DELAY, A_RRECORD_SET_LABEL, A_RUNTIME_EXCEPTION, A_SOURCE_TAG, A_STATUS, A_WRITTEN_TO_WARC, HEADER_TRUNC, LENGTH_TRUNC, TIMER_TRUNC, TRUNC_SUFFIX
 
Constructor Summary
HTTPMidFetchUnchangedFilter(java.lang.String name)
          Constructor
HTTPMidFetchUnchangedFilter(java.lang.String name, java.lang.String description)
          Constructor
 
Method Summary
protected  boolean innerAccepts(java.lang.Object o)
          Classes subclassing this one should override this method to perfrom their custom determination of whether or not the object given to it.
 
Methods inherited from class org.archive.crawler.framework.Filter
accepts, getFilterOffPosition, kickUpdate, returnTrueIfMatches, toString
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement, listUsedFiles
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName, hashCode
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

HEADER_PREDICTS_MISSING

public static final int HEADER_PREDICTS_MISSING
See Also:
Constant Field Values

HEADER_PREDICTS_UNCHANGED

public static final int HEADER_PREDICTS_UNCHANGED
See Also:
Constant Field Values

HEADER_PREDICTS_CHANGED

public static final int HEADER_PREDICTS_CHANGED
See Also:
Constant Field Values
Constructor Detail

HTTPMidFetchUnchangedFilter

public HTTPMidFetchUnchangedFilter(java.lang.String name)
Constructor

Parameters:
name - Module name

HTTPMidFetchUnchangedFilter

public HTTPMidFetchUnchangedFilter(java.lang.String name,
                                   java.lang.String description)
Constructor

Parameters:
name - Module name
description - A description of the modules functions
Method Detail

innerAccepts

protected boolean innerAccepts(java.lang.Object o)
Description copied from class: Filter
Classes subclassing this one should override this method to perfrom their custom determination of whether or not the object given to it.

Overrides:
innerAccepts in class Filter
Parameters:
o - The object
Returns:
True if it passes the filter.


Copyright © 2003-2011 Internet Archive. All Rights Reserved.