org.archive.crawler.processor.recrawl
Class FetchHistoryProcessor
java.lang.Object
javax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.framework.Processor
org.archive.crawler.processor.recrawl.FetchHistoryProcessor
- All Implemented Interfaces:
- java.io.Serializable, javax.management.DynamicMBean, CoreAttributeConstants
public class FetchHistoryProcessor
- extends Processor
- implements CoreAttributeConstants
Maintain a history of fetch information inside the CrawlURI's attributes.
- Version:
- $Date: 2006-09-25 20:19:54 +0000 (Mon, 25 Sep 2006) $, $Revision: 4654 $
- Author:
- gojomo
- See Also:
- Serialized Form
Fields inherited from interface org.archive.crawler.datamodel.CoreAttributeConstants |
A_ANNOTATIONS, A_CONTENT_DIGEST, A_CONTENT_TYPE, A_CREDENTIAL_AVATARS_KEY, A_DELAY_FACTOR, A_DISTANCE_FROM_SEED, A_DNS_FETCH_TIME, A_DNS_SERVER_IP_LABEL, A_ETAG_HEADER, A_FETCH_BEGAN_TIME, A_FETCH_COMPLETED_TIME, A_FETCH_HISTORY, A_FORCE_RETIRE, A_FTP_CONTROL_CONVERSATION, A_FTP_FETCH_STATUS, A_HERITABLE_KEYS, A_HTML_BASE, A_HTTP_BIND_ADDRESS, A_HTTP_PROXY_HOST, A_HTTP_PROXY_PORT, A_HTTP_TRANSACTION, A_LAST_MODIFIED_HEADER, A_LOCALIZED_ERRORS, A_META_ROBOTS, A_MINIMUM_DELAY, A_MIRROR_PATH, A_PREREQUISITE_URI, A_REFERENCE_LENGTH, A_RETRY_DELAY, A_RRECORD_SET_LABEL, A_RUNTIME_EXCEPTION, A_SOURCE_TAG, A_STATUS, A_WRITTEN_TO_WARC, HEADER_TRUNC, LENGTH_TRUNC, TIMER_TRUNC, TRUNC_SUFFIX |
Method Summary |
protected void |
initialTasks()
Classes subclassing this one should override this method to perform
processor specific actions. |
protected void |
innerProcess(CrawlURI curi)
Classes subclassing this one should override this method to perform
their custom actions on the CrawlURI. |
protected void |
saveHeader(java.lang.String name,
org.apache.commons.httpclient.HttpMethodBase method,
st.ata.util.AList latestFetch)
Save a header from the given HTTP operation into the AList. |
Methods inherited from class org.archive.crawler.framework.Processor |
checkForInterrupt, finalTasks, getController, getDecideRule, getDefaultNextProcessor, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn |
Methods inherited from class org.archive.crawler.settings.ComplexType |
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute |
Methods inherited from class org.archive.crawler.settings.Type |
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
getName, hashCode |
Methods inherited from class java.lang.Object |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
ATTR_HISTORY_LENGTH
public static final java.lang.String ATTR_HISTORY_LENGTH
- setting for desired history array length
- See Also:
- Constant Field Values
DEFAULT_HISTORY_LENGTH
public static final java.lang.Integer DEFAULT_HISTORY_LENGTH
- default history array length
FetchHistoryProcessor
public FetchHistoryProcessor(java.lang.String name)
- Usual constructor
- Parameters:
name
-
innerProcess
protected void innerProcess(CrawlURI curi)
throws java.lang.InterruptedException
- Description copied from class:
Processor
- Classes subclassing this one should override this method to perform
their custom actions on the CrawlURI.
- Overrides:
innerProcess
in class Processor
- Parameters:
curi
- The CrawlURI being processed.
- Throws:
java.lang.InterruptedException
saveHeader
protected void saveHeader(java.lang.String name,
org.apache.commons.httpclient.HttpMethodBase method,
st.ata.util.AList latestFetch)
- Save a header from the given HTTP operation into the AList.
- Parameters:
name
- header name to save into history AListmethod
- http operation containing headerslatestFetch
- AList to get header
initialTasks
protected void initialTasks()
- Description copied from class:
Processor
- Classes subclassing this one should override this method to perform
processor specific actions.
This method is garanteed to be called after the crawl is set up, but
before any URI-processing has occured.
- Overrides:
initialTasks
in class Processor
Copyright © 2003-2011 Internet Archive. All Rights Reserved.