org.archive.crawler.deciderules
Class SurtPrefixedDecideRule

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.deciderules.DecideRule
                      extended by org.archive.crawler.deciderules.ConfiguredDecideRule
                          extended by org.archive.crawler.deciderules.PredicatedDecideRule
                              extended by org.archive.crawler.deciderules.SurtPrefixedDecideRule
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean, SeedListener
Direct Known Subclasses:
NotSurtPrefixedDecideRule, OnDomainsDecideRule, OnHostsDecideRule, ScopePlusOneDecideRule

public class SurtPrefixedDecideRule
extends PredicatedDecideRule
implements SeedListener

Rule applies configured decision to any URIs that, when expressed in SURT form, begin with one of the prefixes in the configured set. The set can be filled with SURT prefixes implied or listed in the seeds file, or another external file. The "also-check-via" option to implement "one hop off" scoping derives from a contribution by Shifra Raffel of the California Digital Library.

Author:
gojomo
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
ComplexType.MBeanAttributeInfoIterator
 
Field Summary
static java.lang.String ATTR_ALSO_CHECK_VIA
          Whether the 'via' of CrawlURIs should also be checked to see if it is prefixed by the set of SURT prefixes
static java.lang.String ATTR_REBUILD_ON_RECONFIG
          Whether every config change should trigger a rebuilding of the prefix set.
static java.lang.String ATTR_SEEDS_AS_SURT_PREFIXES
           
static java.lang.String ATTR_SURTS_DUMP_FILE
           
static java.lang.String ATTR_SURTS_SOURCE_FILE
           
static java.lang.Boolean DEFAULT_ALSO_CHECK_VIA
           
static java.lang.Boolean DEFAULT_REBUILD_ON_RECONFIG
           
protected  SurtPrefixSet surtPrefixes
           
 
Fields inherited from class org.archive.crawler.deciderules.ConfiguredDecideRule
ALLOWED_TYPES, ATTR_DECISION
 
Fields inherited from class org.archive.crawler.deciderules.DecideRule
ACCEPT, PASS, REJECT
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definition, definitionMap
 
Constructor Summary
SurtPrefixedDecideRule(java.lang.String name)
          Usual constructor.
 
Method Summary
 void addedSeed(CandidateURI curi)
           
protected  void buildSurtPrefixSet()
          Construct the set of prefixes to use, from the seed list ( which may include both URIs and '+'-prefixed directives).
protected  void dumpSurtPrefixSet()
          Dump the current prefixes in use to configured dump file (if any)
protected  boolean evaluate(java.lang.Object object)
          Evaluate whether given object's URI is covered by the SURT prefix set
protected  java.io.File getSeedfile()
          Dig through everything to get the crawl-global seeds file.
 void kickUpdate()
          Re-read prefixes after an update.
protected  java.lang.String prefixFrom(java.lang.String uri)
           
protected  void readPrefixes()
           
 
Methods inherited from class org.archive.crawler.deciderules.PredicatedDecideRule
decisionFor
 
Methods inherited from class org.archive.crawler.deciderules.ConfiguredDecideRule
singlePossibleNonPassDecision
 
Methods inherited from class org.archive.crawler.deciderules.DecideRule
getController
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement, listUsedFiles
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName, hashCode
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

ATTR_SURTS_SOURCE_FILE

public static final java.lang.String ATTR_SURTS_SOURCE_FILE
See Also:
Constant Field Values

ATTR_SEEDS_AS_SURT_PREFIXES

public static final java.lang.String ATTR_SEEDS_AS_SURT_PREFIXES
See Also:
Constant Field Values

ATTR_SURTS_DUMP_FILE

public static final java.lang.String ATTR_SURTS_DUMP_FILE
See Also:
Constant Field Values

ATTR_REBUILD_ON_RECONFIG

public static final java.lang.String ATTR_REBUILD_ON_RECONFIG
Whether every config change should trigger a rebuilding of the prefix set.

See Also:
Constant Field Values

DEFAULT_REBUILD_ON_RECONFIG

public static final java.lang.Boolean DEFAULT_REBUILD_ON_RECONFIG

ATTR_ALSO_CHECK_VIA

public static final java.lang.String ATTR_ALSO_CHECK_VIA
Whether the 'via' of CrawlURIs should also be checked to see if it is prefixed by the set of SURT prefixes

See Also:
Constant Field Values

DEFAULT_ALSO_CHECK_VIA

public static final java.lang.Boolean DEFAULT_ALSO_CHECK_VIA

surtPrefixes

protected SurtPrefixSet surtPrefixes
Constructor Detail

SurtPrefixedDecideRule

public SurtPrefixedDecideRule(java.lang.String name)
Usual constructor.

Parameters:
name -
Method Detail

evaluate

protected boolean evaluate(java.lang.Object object)
Evaluate whether given object's URI is covered by the SURT prefix set

Specified by:
evaluate in class PredicatedDecideRule
Parameters:
object - Item to evaluate.
Returns:
true if item, as SURT form URI, is prefixed by an item in the set

readPrefixes

protected void readPrefixes()

dumpSurtPrefixSet

protected void dumpSurtPrefixSet()
Dump the current prefixes in use to configured dump file (if any)


buildSurtPrefixSet

protected void buildSurtPrefixSet()
Construct the set of prefixes to use, from the seed list ( which may include both URIs and '+'-prefixed directives).


kickUpdate

public void kickUpdate()
Re-read prefixes after an update.

Overrides:
kickUpdate in class DecideRule
See Also:
CrawlScope.kickUpdate()

getSeedfile

protected java.io.File getSeedfile()
Dig through everything to get the crawl-global seeds file. Add self as listener while at it.

Returns:
Seed list file

addedSeed

public void addedSeed(CandidateURI curi)
Specified by:
addedSeed in interface SeedListener

prefixFrom

protected java.lang.String prefixFrom(java.lang.String uri)


Copyright © 2003-2011 Internet Archive. All Rights Reserved.