org.archive.crawler.postprocessor
Class ContentBasedWaitEvaluator
java.lang.Object
javax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.framework.Processor
org.archive.crawler.postprocessor.WaitEvaluator
org.archive.crawler.postprocessor.ContentBasedWaitEvaluator
- All Implemented Interfaces:
- java.io.Serializable, javax.management.DynamicMBean, CoreAttributeConstants, AdaptiveRevisitAttributeConstants
- Direct Known Subclasses:
- ImageWaitEvaluator, TextWaitEvaluator
public class ContentBasedWaitEvaluator
- extends WaitEvaluator
A WaitEvaluator that compares the CrawlURIs content type to a configurable
regular expression. If it matches, then the wait evaluation is performed.
Otherwise the processor passes on the CrawlURI, doing nothing.
- Author:
- Kristinn Sigurdsson
- See Also:
WaitEvaluator
,
Serialized Form
Fields inherited from class org.archive.crawler.postprocessor.WaitEvaluator |
ATTR_CHANGED_FACTOR, ATTR_DEFAULT_WAIT_INTERVAL, ATTR_INITIAL_WAIT_INTERVAL, ATTR_MAX_WAIT_INTERVAL, ATTR_MIN_WAIT_INTERVAL, ATTR_UNCHANGED_FACTOR, ATTR_USE_OVERDUE_TIME, DEFAULT_CHANGED_FACTOR, DEFAULT_DEFAULT_WAIT_INTERVAL, DEFAULT_INITIAL_WAIT_INTERVAL, DEFAULT_MAX_WAIT_INTERVAL, DEFAULT_MIN_WAIT_INTERVAL, DEFAULT_UNCHANGED_FACTOR, DEFAULT_USE_OVERDUE_TIME, logger |
Fields inherited from interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants |
A_CONTENT_STATE_KEY, A_DISCARD_REVISIT, A_FETCH_OVERDUE, A_LAST_CONTENT_DIGEST, A_LAST_DATESTAMP, A_LAST_ETAG, A_NUMBER_OF_VERSIONS, A_NUMBER_OF_VISITS, A_TIME_OF_NEXT_PROCESSING, A_WAIT_INTERVAL, A_WAIT_REEVALUATED, CONTENT_CHANGED, CONTENT_UNCHANGED, CONTENT_UNKNOWN |
Fields inherited from interface org.archive.crawler.datamodel.CoreAttributeConstants |
A_ANNOTATIONS, A_CONTENT_DIGEST, A_CONTENT_TYPE, A_CREDENTIAL_AVATARS_KEY, A_DELAY_FACTOR, A_DISTANCE_FROM_SEED, A_DNS_FETCH_TIME, A_DNS_SERVER_IP_LABEL, A_ETAG_HEADER, A_FETCH_BEGAN_TIME, A_FETCH_COMPLETED_TIME, A_FETCH_HISTORY, A_FORCE_RETIRE, A_FTP_CONTROL_CONVERSATION, A_FTP_FETCH_STATUS, A_HERITABLE_KEYS, A_HTML_BASE, A_HTTP_BIND_ADDRESS, A_HTTP_PROXY_HOST, A_HTTP_PROXY_PORT, A_HTTP_TRANSACTION, A_LAST_MODIFIED_HEADER, A_LOCALIZED_ERRORS, A_META_ROBOTS, A_MINIMUM_DELAY, A_MIRROR_PATH, A_PREREQUISITE_URI, A_REFERENCE_LENGTH, A_RETRY_DELAY, A_RRECORD_SET_LABEL, A_RUNTIME_EXCEPTION, A_SOURCE_TAG, A_STATUS, A_WRITTEN_TO_WARC, HEADER_TRUNC, LENGTH_TRUNC, TIMER_TRUNC, TRUNC_SUFFIX |
Constructor Summary |
ContentBasedWaitEvaluator(java.lang.String name)
Constructor |
ContentBasedWaitEvaluator(java.lang.String name,
java.lang.String description,
java.lang.String defaultRegExpr,
java.lang.Long default_inital_wait_interval,
java.lang.Long default_max_wait_interval,
java.lang.Long default_min_wait_interval,
java.lang.Double default_unchanged_factor,
java.lang.Double default_changed_factor)
Constructor |
Method Summary |
protected void |
innerProcess(CrawlURI curi)
Classes subclassing this one should override this method to perform
their custom actions on the CrawlURI. |
Methods inherited from class org.archive.crawler.framework.Processor |
checkForInterrupt, finalTasks, getController, getDecideRule, getDefaultNextProcessor, initialTasks, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn |
Methods inherited from class org.archive.crawler.settings.ComplexType |
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute |
Methods inherited from class org.archive.crawler.settings.Type |
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
getName, hashCode |
Methods inherited from class java.lang.Object |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
ATTR_CONTENT_REGEXPR
public static final java.lang.String ATTR_CONTENT_REGEXPR
- The regular expression that we limit this evaluator to.
- See Also:
- Constant Field Values
DEFAULT_CONTENT_REGEXPR
protected static final java.lang.String DEFAULT_CONTENT_REGEXPR
- See Also:
- Constant Field Values
ContentBasedWaitEvaluator
public ContentBasedWaitEvaluator(java.lang.String name)
- Constructor
- Parameters:
name
- The name of the module
ContentBasedWaitEvaluator
public ContentBasedWaitEvaluator(java.lang.String name,
java.lang.String description,
java.lang.String defaultRegExpr,
java.lang.Long default_inital_wait_interval,
java.lang.Long default_max_wait_interval,
java.lang.Long default_min_wait_interval,
java.lang.Double default_unchanged_factor,
java.lang.Double default_changed_factor)
- Constructor
- Parameters:
name
- The name of the moduledescription
- Description of the moduledefault_inital_wait_interval
- The default value for initial wait
timedefault_max_wait_interval
- The maximum value for wait timedefault_min_wait_interval
- The minimum value for wait timedefault_unchanged_factor
- The factor for changing wait times of
unchanged documents (will be multiplied by this value)default_changed_factor
- The factor for changing wait times of
changed documents (will be divided by this value)
innerProcess
protected void innerProcess(CrawlURI curi)
throws java.lang.InterruptedException
- Description copied from class:
Processor
- Classes subclassing this one should override this method to perform
their custom actions on the CrawlURI.
- Overrides:
innerProcess
in class WaitEvaluator
- Parameters:
curi
- The CrawlURI being processed.
- Throws:
java.lang.InterruptedException
Copyright © 2003-2011 Internet Archive. All Rights Reserved.