The processor allows variable runtime based on host (or other
override/refinement criteria) however using such overrides only makes sense
when using 'Block URIs' as pause and terminate will have global impact once
encountered anywhere.
- Author:
- Kristinn Sigurðsson
- See Also:
- Serialized Form
Fields inherited from interface org.archive.crawler.datamodel.FetchStatusCodes |
S_BLOCKED_BY_CUSTOM_PROCESSOR, S_BLOCKED_BY_QUOTA, S_BLOCKED_BY_RUNTIME_LIMIT, S_BLOCKED_BY_USER, S_CONNECT_FAILED, S_CONNECT_LOST, S_DEEMED_CHAFF, S_DEEMED_NOT_FOUND, S_DEFERRED, S_DELETED_BY_USER, S_DNS_SUCCESS, S_DOMAIN_PREREQUISITE_FAILURE, S_DOMAIN_UNRESOLVABLE, S_GETBYNAME_SUCCESS, S_OTHER_PREREQUISITE_FAILURE, S_OUT_OF_SCOPE, S_PREREQUISITE_UNSCHEDULABLE_FAILURE, S_PROCESSING_THREAD_KILLED, S_ROBOTS_PRECLUDED, S_ROBOTS_PREREQUISITE_FAILURE, S_RUNTIME_EXCEPTION, S_SERIOUS_ERROR, S_TIMEOUT, S_TOO_MANY_EMBED_HOPS, S_TOO_MANY_LINK_HOPS, S_TOO_MANY_RETRIES, S_UNATTEMPTED, S_UNFETCHABLE_URI, S_UNQUEUEABLE |
Method Summary |
protected long |
getRuntime(CrawlURI curi)
Returns the amount of time to allow the crawl to run before this
processor interrupts. |
protected void |
innerProcess(CrawlURI curi)
Classes subclassing this one should override this method to perform
their custom actions on the CrawlURI. |
Methods inherited from class org.archive.crawler.framework.Processor |
checkForInterrupt, finalTasks, getController, getDecideRule, getDefaultNextProcessor, initialTasks, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn |
Methods inherited from class org.archive.crawler.settings.ComplexType |
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute |
Methods inherited from class org.archive.crawler.settings.Type |
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
getName, hashCode |
Methods inherited from class java.lang.Object |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
logger
protected java.util.logging.Logger logger
ATTR_RUNTIME_SECONDS
public static final java.lang.String ATTR_RUNTIME_SECONDS
DEFAULT_RUNTIME_SECONDS
protected static final long DEFAULT_RUNTIME_SECONDS
- See Also:
- Constant Field Values
ATTR_END_OPERATION
public static final java.lang.String ATTR_END_OPERATION
OP_PAUSE
protected static final java.lang.String OP_PAUSE
OP_TERMINATE
protected static final java.lang.String OP_TERMINATE
OP_BLOCK_URIS
protected static final java.lang.String OP_BLOCK_URIS
DEFAULT_END_OPERATION
protected static final java.lang.String DEFAULT_END_OPERATION
AVAILABLE_END_OPERATIONS
protected static final java.lang.String[] AVAILABLE_END_OPERATIONS
RuntimeLimitEnforcer
public RuntimeLimitEnforcer(java.lang.String name)
innerProcess
protected void innerProcess(CrawlURI curi)
throws java.lang.InterruptedException
- Description copied from class:
Processor
- Classes subclassing this one should override this method to perform
their custom actions on the CrawlURI.
- Overrides:
innerProcess
in class Processor
- Parameters:
curi
- The CrawlURI being processed.
- Throws:
java.lang.InterruptedException
getRuntime
protected long getRuntime(CrawlURI curi)
- Returns the amount of time to allow the crawl to run before this
processor interrupts.
- Returns:
- the amount of time in milliseconds.
Copyright © 2003-2011 Internet Archive. All Rights Reserved.