org.archive.crawler.scope
Class SurtPrefixScope

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.framework.Filter
                      extended by org.archive.crawler.framework.CrawlScope
                          extended by org.archive.crawler.scope.ClassicScope
                              extended by org.archive.crawler.scope.RefinedScope
                                  extended by org.archive.crawler.scope.SurtPrefixScope
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean

Deprecated. As of release 1.10.0. Replaced by DecidingScope.

public class SurtPrefixScope
extends RefinedScope

A specialized CrawlScope suitable for the most common crawl needs. Roughly, as with other existing CrawlScope variants, SurtPrefixScope's logic is that a URI is included if:

  ( isSeed(uri) || focusFilter.accepts(uri) ) ||
     transitiveFilter.accepts(uri) ) && ! excludeFilter.accepts(uri)
 
Specifically, SurtPrefixScope uses a SurtFilter to test for focus-inclusion.

Author:
gojomo
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
ComplexType.MBeanAttributeInfoIterator
 
Field Summary
static java.lang.String ATTR_ALSO_CHECK_VIA
          Deprecated. Whether the 'via' of CrawlURIs should also be checked to see if it is prefixed by the set of SURT prefixes
static java.lang.String ATTR_SEEDS_AS_SURT_PREFIXES
          Deprecated.  
static java.lang.String ATTR_SURTS_DUMP_FILE
          Deprecated.  
static java.lang.String ATTR_SURTS_SOURCE_FILE
          Deprecated.  
static java.lang.Boolean DEFAULT_ALSO_CHECK_VIA
          Deprecated.  
(package private)  SurtPrefixSet surtPrefixes
          Deprecated.  
 
Fields inherited from class org.archive.crawler.scope.RefinedScope
additionalFocusFilter, ATTR_ADDITIONAL_FOCUS_FILTER, ATTR_TRANSITIVE_FILTER, transitiveFilter
 
Fields inherited from class org.archive.crawler.scope.ClassicScope
ATTR_EXCLUDE_FILTER, ATTR_MAX_LINK_HOPS, ATTR_MAX_TRANS_HOPS
 
Fields inherited from class org.archive.crawler.framework.CrawlScope
ATTR_NAME, ATTR_REREAD_SEEDS_ON_CONFIG, ATTR_SEEDS, DEFAULT_REREAD_SEEDS_ON_CONFIG, seedListeners
 
Fields inherited from class org.archive.crawler.framework.Filter
ATTR_ENABLED
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definition, definitionMap
 
Constructor Summary
SurtPrefixScope(java.lang.String name)
          Deprecated.  
 
Method Summary
protected  boolean focusAccepts(java.lang.Object object)
          Deprecated. Check if a URI is part of this scope.
 void initialize(CrawlController controller)
          Deprecated. Initialize is called just before the crawler starts to run.
 void kickUpdate()
          Deprecated. Re-read prefixes after an update.
 
Methods inherited from class org.archive.crawler.scope.RefinedScope
additionalFocusAccepts, transitiveAccepts
 
Methods inherited from class org.archive.crawler.scope.ClassicScope
exceedsMaxHops, excludeAccepts, innerAccepts, xforceAccepts
 
Methods inherited from class org.archive.crawler.framework.CrawlScope
addSeed, addSeedListener, checkClose, getSeedfile, isSameHost, isSeed, listUsedFiles, refreshSeeds, seedsIterator, seedsIterator, toString
 
Methods inherited from class org.archive.crawler.framework.Filter
accepts, getFilterOffPosition, returnTrueIfMatches
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName, hashCode
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

ATTR_SURTS_SOURCE_FILE

public static final java.lang.String ATTR_SURTS_SOURCE_FILE
Deprecated. 
See Also:
Constant Field Values

ATTR_SEEDS_AS_SURT_PREFIXES

public static final java.lang.String ATTR_SEEDS_AS_SURT_PREFIXES
Deprecated. 
See Also:
Constant Field Values

ATTR_SURTS_DUMP_FILE

public static final java.lang.String ATTR_SURTS_DUMP_FILE
Deprecated. 
See Also:
Constant Field Values

ATTR_ALSO_CHECK_VIA

public static final java.lang.String ATTR_ALSO_CHECK_VIA
Deprecated. 
Whether the 'via' of CrawlURIs should also be checked to see if it is prefixed by the set of SURT prefixes

See Also:
Constant Field Values

DEFAULT_ALSO_CHECK_VIA

public static final java.lang.Boolean DEFAULT_ALSO_CHECK_VIA
Deprecated. 

surtPrefixes

SurtPrefixSet surtPrefixes
Deprecated. 
Constructor Detail

SurtPrefixScope

public SurtPrefixScope(java.lang.String name)
Deprecated. 
Method Detail

initialize

public void initialize(CrawlController controller)
Deprecated. 
Description copied from class: CrawlScope
Initialize is called just before the crawler starts to run. The settings system is up and initialized so can be used. This initialize happens after ComplexType.earlyInitialize(CrawlerSettings).

Overrides:
initialize in class CrawlScope
Parameters:
controller - Controller object.

focusAccepts

protected boolean focusAccepts(java.lang.Object object)
Deprecated. 
Check if a URI is part of this scope.

Overrides:
focusAccepts in class ClassicScope
Parameters:
object - An instance of UURI or of CandidateURI.
Returns:
True if focus filter accepts passed object.

kickUpdate

public void kickUpdate()
Deprecated. 
Re-read prefixes after an update.

Overrides:
kickUpdate in class ClassicScope
See Also:
CrawlScope.kickUpdate()


Copyright © 2003-2011 Internet Archive. All Rights Reserved.