org.archive.crawler.scope
Class ClassicScope
java.lang.Object
javax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.framework.Filter
org.archive.crawler.framework.CrawlScope
org.archive.crawler.scope.ClassicScope
- All Implemented Interfaces:
- java.io.Serializable, javax.management.DynamicMBean
- Direct Known Subclasses:
- BroadScope, RefinedScope, SeedCachingScope
public class ClassicScope
- extends CrawlScope
ClassicScope: superclass with shared Scope behavior for
most common scopes.
Roughly, its logic is captured in innerAccept(). A URI is
included if:
forceAccepts(uri)
|| (((isSeed(uri)
|| focusAccepts(uri))
|| additionalFocusAccepts(uri)
|| transitiveAccepts(uri))
&& !excludeAccepts(uri));
Subclasses should override focusAccepts, additionalFocusAccepts,
and transitiveAccepts.
The excludeFilter may be specified by supplying
a exclude
subelement. If unspecified, a
accepts-none filter will be used -- meaning that
no URIs will pass the filter and thus be excluded.
- Author:
- gojomo
- See Also:
- Serialized Form
Method Summary |
protected boolean |
additionalFocusAccepts(java.lang.Object o)
Check if URI is accepted by the additional focus of this scope. |
protected boolean |
exceedsMaxHops(java.lang.Object o)
Check if there are too many hops |
protected boolean |
excludeAccepts(java.lang.Object o)
Check if URI is excluded by any filters. |
protected boolean |
focusAccepts(java.lang.Object o)
Check if URI is accepted by the focus of this scope. |
protected boolean |
innerAccepts(java.lang.Object o)
Returns whether the given object (typically a CandidateURI) falls within
this scope. |
void |
kickUpdate()
Take note of a situation (such as settings edit) where involved
reconfiguration (such as reading from external files) may be necessary. |
protected boolean |
transitiveAccepts(java.lang.Object o)
|
protected boolean |
xforceAccepts(java.lang.Object o)
|
Methods inherited from class org.archive.crawler.framework.CrawlScope |
addSeed, addSeedListener, checkClose, getSeedfile, initialize, isSameHost, isSeed, listUsedFiles, refreshSeeds, seedsIterator, seedsIterator, toString |
Methods inherited from class org.archive.crawler.settings.ComplexType |
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, unsetAttribute |
Methods inherited from class org.archive.crawler.settings.Type |
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
getName, hashCode |
Methods inherited from class java.lang.Object |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
ATTR_EXCLUDE_FILTER
public static final java.lang.String ATTR_EXCLUDE_FILTER
- See Also:
- Constant Field Values
ATTR_MAX_LINK_HOPS
public static final java.lang.String ATTR_MAX_LINK_HOPS
- See Also:
- Constant Field Values
ATTR_MAX_TRANS_HOPS
public static final java.lang.String ATTR_MAX_TRANS_HOPS
- See Also:
- Constant Field Values
ClassicScope
public ClassicScope(java.lang.String name)
- Parameters:
name
- ignored by superclass
ClassicScope
public ClassicScope()
- Default constructor.
innerAccepts
protected final boolean innerAccepts(java.lang.Object o)
- Returns whether the given object (typically a CandidateURI) falls within
this scope.
- Overrides:
innerAccepts
in class Filter
- Parameters:
o
- Object to test.
- Returns:
- Whether the given object (typically a CandidateURI) falls within
this scope.
additionalFocusAccepts
protected boolean additionalFocusAccepts(java.lang.Object o)
- Check if URI is accepted by the additional focus of this scope.
This method should be overridden in subclasses.
- Parameters:
o
- the URI to check.
- Returns:
- True if additional focus filter accepts passed object.
transitiveAccepts
protected boolean transitiveAccepts(java.lang.Object o)
- Parameters:
o
- the URI to check.
- Returns:
- True if transitive filter accepts passed object.
xforceAccepts
protected boolean xforceAccepts(java.lang.Object o)
- Parameters:
o
- the URI to check.
- Returns:
- True if force-accepts filter accepts passed object.
focusAccepts
protected boolean focusAccepts(java.lang.Object o)
- Check if URI is accepted by the focus of this scope.
This method should be overridden in subclasses.
- Parameters:
o
- the URI to check.
- Returns:
- True if focus filter accepts passed object.
excludeAccepts
protected boolean excludeAccepts(java.lang.Object o)
- Check if URI is excluded by any filters.
- Parameters:
o
- the URI to check.
- Returns:
- True if exclude filter accepts passed object.
exceedsMaxHops
protected boolean exceedsMaxHops(java.lang.Object o)
- Check if there are too many hops
- Parameters:
o
- URI to check.
- Returns:
- true if too many hops.
kickUpdate
public void kickUpdate()
- Take note of a situation (such as settings edit) where involved
reconfiguration (such as reading from external files) may be necessary.
- Overrides:
kickUpdate
in class CrawlScope
Copyright © 2003-2011 Internet Archive. All Rights Reserved.