org.archive.crawler.scope
Class BroadScope

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.framework.Filter
                      extended by org.archive.crawler.framework.CrawlScope
                          extended by org.archive.crawler.scope.ClassicScope
                              extended by org.archive.crawler.scope.BroadScope
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean

public class BroadScope
extends ClassicScope

A CrawlScope instance defines which URIs are "in" a particular crawl. It is essentially a Filter which determines, looking at the totality of information available about a CandidateURI/CrawlURI instamce, if that URI should be scheduled for crawling.

Dynamic information inherent in the discovery of the URI -- such as the path by which it was discovered -- may be considered.

Dynamic information which requires the consultation of external and potentially volatile information -- such as current robots.txt requests and the history of attempts to crawl the same URI -- should NOT be considered. Those potentially high-latency decisions should be made at another step. .

Author:
gojomo
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
ComplexType.MBeanAttributeInfoIterator
 
Field Summary
 
Fields inherited from class org.archive.crawler.scope.ClassicScope
ATTR_EXCLUDE_FILTER, ATTR_MAX_LINK_HOPS, ATTR_MAX_TRANS_HOPS
 
Fields inherited from class org.archive.crawler.framework.CrawlScope
ATTR_NAME, ATTR_REREAD_SEEDS_ON_CONFIG, ATTR_SEEDS, DEFAULT_REREAD_SEEDS_ON_CONFIG, seedListeners
 
Fields inherited from class org.archive.crawler.framework.Filter
ATTR_ENABLED
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definition, definitionMap
 
Constructor Summary
BroadScope(java.lang.String name)
          Constructor.
 
Method Summary
protected  boolean focusAccepts(java.lang.Object o)
          Check if URI is accepted by the focus of this scope.
protected  boolean transitiveAccepts(java.lang.Object o)
           
 
Methods inherited from class org.archive.crawler.scope.ClassicScope
additionalFocusAccepts, exceedsMaxHops, excludeAccepts, innerAccepts, kickUpdate, xforceAccepts
 
Methods inherited from class org.archive.crawler.framework.CrawlScope
addSeed, addSeedListener, checkClose, getSeedfile, initialize, isSameHost, isSeed, listUsedFiles, refreshSeeds, seedsIterator, seedsIterator, toString
 
Methods inherited from class org.archive.crawler.framework.Filter
accepts, getFilterOffPosition, returnTrueIfMatches
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName, hashCode
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

BroadScope

public BroadScope(java.lang.String name)
Constructor.

Parameters:
name - Name of this crawlscope.
Method Detail

transitiveAccepts

protected boolean transitiveAccepts(java.lang.Object o)
Overrides:
transitiveAccepts in class ClassicScope
Parameters:
o - the URI to check.
Returns:
True if transitive filter accepts passed object.

focusAccepts

protected boolean focusAccepts(java.lang.Object o)
Check if URI is accepted by the focus of this scope. This method should be overridden in subclasses.

Overrides:
focusAccepts in class ClassicScope
Parameters:
o - the URI to check.
Returns:
True if focus filter accepts passed object.


Copyright © 2003-2011 Internet Archive. All Rights Reserved.