org.archive.crawler.scope
Class ClassicScope

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.framework.Filter
                      extended by org.archive.crawler.framework.CrawlScope
                          extended by org.archive.crawler.scope.ClassicScope
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean
Direct Known Subclasses:
BroadScope, RefinedScope, SeedCachingScope

public class ClassicScope
extends CrawlScope

ClassicScope: superclass with shared Scope behavior for most common scopes. Roughly, its logic is captured in innerAccept(). A URI is included if:

    forceAccepts(uri)
    || (((isSeed(uri) 
         || focusAccepts(uri)) 
         || additionalFocusAccepts(uri) 
         || transitiveAccepts(uri))
       && !excludeAccepts(uri));
Subclasses should override focusAccepts, additionalFocusAccepts, and transitiveAccepts. The excludeFilter may be specified by supplying a exclude subelement. If unspecified, a accepts-none filter will be used -- meaning that no URIs will pass the filter and thus be excluded.

Author:
gojomo
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
ComplexType.MBeanAttributeInfoIterator
 
Field Summary
static java.lang.String ATTR_EXCLUDE_FILTER
           
static java.lang.String ATTR_MAX_LINK_HOPS
           
static java.lang.String ATTR_MAX_TRANS_HOPS
           
 
Fields inherited from class org.archive.crawler.framework.CrawlScope
ATTR_NAME, ATTR_REREAD_SEEDS_ON_CONFIG, ATTR_SEEDS, DEFAULT_REREAD_SEEDS_ON_CONFIG, seedListeners
 
Fields inherited from class org.archive.crawler.framework.Filter
ATTR_ENABLED
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definition, definitionMap
 
Constructor Summary
ClassicScope()
          Default constructor.
ClassicScope(java.lang.String name)
           
 
Method Summary
protected  boolean additionalFocusAccepts(java.lang.Object o)
          Check if URI is accepted by the additional focus of this scope.
protected  boolean exceedsMaxHops(java.lang.Object o)
          Check if there are too many hops
protected  boolean excludeAccepts(java.lang.Object o)
          Check if URI is excluded by any filters.
protected  boolean focusAccepts(java.lang.Object o)
          Check if URI is accepted by the focus of this scope.
protected  boolean innerAccepts(java.lang.Object o)
          Returns whether the given object (typically a CandidateURI) falls within this scope.
 void kickUpdate()
          Take note of a situation (such as settings edit) where involved reconfiguration (such as reading from external files) may be necessary.
protected  boolean transitiveAccepts(java.lang.Object o)
           
protected  boolean xforceAccepts(java.lang.Object o)
           
 
Methods inherited from class org.archive.crawler.framework.CrawlScope
addSeed, addSeedListener, checkClose, getSeedfile, initialize, isSameHost, isSeed, listUsedFiles, refreshSeeds, seedsIterator, seedsIterator, toString
 
Methods inherited from class org.archive.crawler.framework.Filter
accepts, getFilterOffPosition, returnTrueIfMatches
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName, hashCode
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

ATTR_EXCLUDE_FILTER

public static final java.lang.String ATTR_EXCLUDE_FILTER
See Also:
Constant Field Values

ATTR_MAX_LINK_HOPS

public static final java.lang.String ATTR_MAX_LINK_HOPS
See Also:
Constant Field Values

ATTR_MAX_TRANS_HOPS

public static final java.lang.String ATTR_MAX_TRANS_HOPS
See Also:
Constant Field Values
Constructor Detail

ClassicScope

public ClassicScope(java.lang.String name)
Parameters:
name - ignored by superclass

ClassicScope

public ClassicScope()
Default constructor.

Method Detail

innerAccepts

protected final boolean innerAccepts(java.lang.Object o)
Returns whether the given object (typically a CandidateURI) falls within this scope.

Overrides:
innerAccepts in class Filter
Parameters:
o - Object to test.
Returns:
Whether the given object (typically a CandidateURI) falls within this scope.

additionalFocusAccepts

protected boolean additionalFocusAccepts(java.lang.Object o)
Check if URI is accepted by the additional focus of this scope. This method should be overridden in subclasses.

Parameters:
o - the URI to check.
Returns:
True if additional focus filter accepts passed object.

transitiveAccepts

protected boolean transitiveAccepts(java.lang.Object o)
Parameters:
o - the URI to check.
Returns:
True if transitive filter accepts passed object.

xforceAccepts

protected boolean xforceAccepts(java.lang.Object o)
Parameters:
o - the URI to check.
Returns:
True if force-accepts filter accepts passed object.

focusAccepts

protected boolean focusAccepts(java.lang.Object o)
Check if URI is accepted by the focus of this scope. This method should be overridden in subclasses.

Parameters:
o - the URI to check.
Returns:
True if focus filter accepts passed object.

excludeAccepts

protected boolean excludeAccepts(java.lang.Object o)
Check if URI is excluded by any filters.

Parameters:
o - the URI to check.
Returns:
True if exclude filter accepts passed object.

exceedsMaxHops

protected boolean exceedsMaxHops(java.lang.Object o)
Check if there are too many hops

Parameters:
o - URI to check.
Returns:
true if too many hops.

kickUpdate

public void kickUpdate()
Take note of a situation (such as settings edit) where involved reconfiguration (such as reading from external files) may be necessary.

Overrides:
kickUpdate in class CrawlScope


Copyright © 2003-2011 Internet Archive. All Rights Reserved.