org.archive.crawler.framework
Class Scoper
java.lang.Object
javax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.framework.Processor
org.archive.crawler.framework.Scoper
- All Implemented Interfaces:
- java.io.Serializable, javax.management.DynamicMBean
- Direct Known Subclasses:
- LinksScoper, Preselector, SupplementaryLinksScoper
public abstract class Scoper
- extends Processor
Base class for Scopers.
Scopers test CandidateURIs against a scope.
Scopers allow logging of rejected CandidateURIs.
- Version:
- $Date: 2010-05-11 22:15:04 +0000 (Tue, 11 May 2010) $, $Revision: 6867 $
- Author:
- stack
- See Also:
- Serialized Form
Constructor Summary |
Scoper(java.lang.String name,
java.lang.String description)
Constructor. |
Method Summary |
protected void |
finalTasks()
Classes subclassing this one should override this method to perform
processor specific actions. |
protected void |
initialTasks()
Classes subclassing this one should override this method to perform
processor specific actions. |
protected boolean |
isInScope(CandidateURI caUri)
Schedule the given CandidateURI with the Frontier. |
protected boolean |
isOverrideLogger(java.lang.Object context)
|
protected void |
outOfScope(CandidateURI caUri)
Called when a CandidateUri is ruled out of scope. |
Methods inherited from class org.archive.crawler.framework.Processor |
checkForInterrupt, getController, getDecideRule, getDefaultNextProcessor, innerProcess, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn |
Methods inherited from class org.archive.crawler.settings.ComplexType |
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute |
Methods inherited from class org.archive.crawler.settings.Type |
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
getName, hashCode |
Methods inherited from class java.lang.Object |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
ATTR_OVERRIDE_LOGGER_ENABLED
protected static final java.lang.String ATTR_OVERRIDE_LOGGER_ENABLED
- Protected so avaiilable to subclasses.
- See Also:
- Constant Field Values
Scoper
public Scoper(java.lang.String name,
java.lang.String description)
- Constructor.
- Parameters:
name
- description
-
initialTasks
protected void initialTasks()
- Description copied from class:
Processor
- Classes subclassing this one should override this method to perform
processor specific actions.
This method is garanteed to be called after the crawl is set up, but
before any URI-processing has occured.
- Overrides:
initialTasks
in class Processor
finalTasks
protected void finalTasks()
- Description copied from class:
Processor
- Classes subclassing this one should override this method to perform
processor specific actions.
- Overrides:
finalTasks
in class Processor
isOverrideLogger
protected boolean isOverrideLogger(java.lang.Object context)
- Parameters:
context
- Context to use looking up attribute.
- Returns:
- True if we are to override default logger (default logs
to console) with a logger that writes all loggings to a file
named for this class.
isInScope
protected boolean isInScope(CandidateURI caUri)
- Schedule the given
CandidateURI
with the Frontier.
- Parameters:
caUri
- The CandidateURI to be scheduled.
- Returns:
- true if CandidateURI was accepted by crawl scope, false
otherwise.
outOfScope
protected void outOfScope(CandidateURI caUri)
- Called when a CandidateUri is ruled out of scope.
Override if you don't want logs as coming from this class.
- Parameters:
caUri
- CandidateURI that is out of scope.
Copyright © 2003-2011 Internet Archive. All Rights Reserved.