org.archive.crawler.scope
Class PathScope

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.framework.Filter
                      extended by org.archive.crawler.framework.CrawlScope
                          extended by org.archive.crawler.scope.ClassicScope
                              extended by org.archive.crawler.scope.SeedCachingScope
                                  extended by org.archive.crawler.scope.PathScope
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean

Deprecated. As of release 1.10.0. Replaced by DecidingScope.

public class PathScope
extends SeedCachingScope

A core CrawlScope suitable for the most common crawl needs. Roughly, its logic is that a URI is included if: (( isSeed(uri) || focusFilter.accepts(uri) ) || transitiveFilter.accepts(uri) ) && ! excludeFilter.accepts(uri) The focusFilter may be specified by either: - adding a 'mode' attribute to the scope element. mode="broad" is equivalent to no focus; modes "path", "host", and "domain" imply a SeedExtensionFilter will be used, with the scope element providing its configuration - adding a focus subelement If unspecified, the focusFilter will default to an accepts-all filter. The transitiveFilter may be specified by supplying a transitive subelement. If unspecified, a TransclusionFilter will be used, with the scope element providing its configuration. The excludeFilter may be specified by supplying a exclude subelement. If unspecified, a accepts-none filter will be used -- meaning that no URIs will pass the filter and thus be excluded.

Author:
gojomo
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
ComplexType.MBeanAttributeInfoIterator
 
Field Summary
(package private)  Filter additionalFocusFilter
          Deprecated.  
static java.lang.String ATTR_ADDITIONAL_FOCUS_FILTER
          Deprecated.  
static java.lang.String ATTR_TRANSITIVE_FILTER
          Deprecated.  
(package private)  Filter transitiveFilter
          Deprecated.  
 
Fields inherited from class org.archive.crawler.scope.SeedCachingScope
seeds
 
Fields inherited from class org.archive.crawler.scope.ClassicScope
ATTR_EXCLUDE_FILTER, ATTR_MAX_LINK_HOPS, ATTR_MAX_TRANS_HOPS
 
Fields inherited from class org.archive.crawler.framework.CrawlScope
ATTR_NAME, ATTR_REREAD_SEEDS_ON_CONFIG, ATTR_SEEDS, DEFAULT_REREAD_SEEDS_ON_CONFIG, seedListeners
 
Fields inherited from class org.archive.crawler.framework.Filter
ATTR_ENABLED
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definition, definitionMap
 
Constructor Summary
PathScope(java.lang.String name)
          Deprecated.  
 
Method Summary
protected  boolean additionalFocusAccepts(java.lang.Object o)
          Deprecated. Check if URI is accepted by the additional focus of this scope.
protected  boolean focusAccepts(java.lang.Object o)
          Deprecated. Check if URI is accepted by the focus of this scope.
protected  boolean transitiveAccepts(java.lang.Object o)
          Deprecated.  
 
Methods inherited from class org.archive.crawler.scope.SeedCachingScope
addSeed, fillSeedsCache, refreshSeeds, seedsIterator
 
Methods inherited from class org.archive.crawler.scope.ClassicScope
exceedsMaxHops, excludeAccepts, innerAccepts, kickUpdate, xforceAccepts
 
Methods inherited from class org.archive.crawler.framework.CrawlScope
addSeed, addSeedListener, checkClose, getSeedfile, initialize, isSameHost, isSeed, listUsedFiles, seedsIterator, toString
 
Methods inherited from class org.archive.crawler.framework.Filter
accepts, getFilterOffPosition, returnTrueIfMatches
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName, hashCode
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

ATTR_TRANSITIVE_FILTER

public static final java.lang.String ATTR_TRANSITIVE_FILTER
Deprecated. 
See Also:
Constant Field Values

ATTR_ADDITIONAL_FOCUS_FILTER

public static final java.lang.String ATTR_ADDITIONAL_FOCUS_FILTER
Deprecated. 
See Also:
Constant Field Values

additionalFocusFilter

Filter additionalFocusFilter
Deprecated. 

transitiveFilter

Filter transitiveFilter
Deprecated. 
Constructor Detail

PathScope

public PathScope(java.lang.String name)
Deprecated. 
Method Detail

transitiveAccepts

protected boolean transitiveAccepts(java.lang.Object o)
Deprecated. 
Overrides:
transitiveAccepts in class ClassicScope
Parameters:
o -
Returns:
True if transitive filter accepts passed object.

focusAccepts

protected boolean focusAccepts(java.lang.Object o)
Deprecated. 
Description copied from class: ClassicScope
Check if URI is accepted by the focus of this scope. This method should be overridden in subclasses.

Overrides:
focusAccepts in class ClassicScope
Parameters:
o -
Returns:
True if focus filter accepts passed object.

additionalFocusAccepts

protected boolean additionalFocusAccepts(java.lang.Object o)
Deprecated. 
Description copied from class: ClassicScope
Check if URI is accepted by the additional focus of this scope. This method should be overridden in subclasses.

Overrides:
additionalFocusAccepts in class ClassicScope
Parameters:
o - the URI to check.
Returns:
True if additional focus filter accepts passed object.


Copyright © 2003-2011 Internet Archive. All Rights Reserved.