org.archive.crawler.deciderules
Class SurtPrefixedDecideRule
java.lang.Object
javax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.deciderules.DecideRule
org.archive.crawler.deciderules.ConfiguredDecideRule
org.archive.crawler.deciderules.PredicatedDecideRule
org.archive.crawler.deciderules.SurtPrefixedDecideRule
- All Implemented Interfaces:
- java.io.Serializable, javax.management.DynamicMBean, SeedListener
- Direct Known Subclasses:
- NotSurtPrefixedDecideRule, OnDomainsDecideRule, OnHostsDecideRule, ScopePlusOneDecideRule
public class SurtPrefixedDecideRule
- extends PredicatedDecideRule
- implements SeedListener
Rule applies configured decision to any URIs that, when
expressed in SURT form, begin with one of the prefixes
in the configured set.
The set can be filled with SURT prefixes implied or
listed in the seeds file, or another external file.
The "also-check-via" option to implement "one hop off"
scoping derives from a contribution by Shifra Raffel
of the California Digital Library.
- Author:
- gojomo
- See Also:
- Serialized Form
Method Summary |
void |
addedSeed(CandidateURI curi)
|
protected void |
buildSurtPrefixSet()
Construct the set of prefixes to use, from the seed list (
which may include both URIs and '+'-prefixed directives). |
protected void |
dumpSurtPrefixSet()
Dump the current prefixes in use to configured dump file (if any) |
protected boolean |
evaluate(java.lang.Object object)
Evaluate whether given object's URI is covered by the SURT prefix set |
protected java.io.File |
getSeedfile()
Dig through everything to get the crawl-global seeds file. |
void |
kickUpdate()
Re-read prefixes after an update. |
protected java.lang.String |
prefixFrom(java.lang.String uri)
|
protected void |
readPrefixes()
|
Methods inherited from class org.archive.crawler.settings.ComplexType |
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute |
Methods inherited from class org.archive.crawler.settings.Type |
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
getName, hashCode |
Methods inherited from class java.lang.Object |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
ATTR_SURTS_SOURCE_FILE
public static final java.lang.String ATTR_SURTS_SOURCE_FILE
- See Also:
- Constant Field Values
ATTR_SEEDS_AS_SURT_PREFIXES
public static final java.lang.String ATTR_SEEDS_AS_SURT_PREFIXES
- See Also:
- Constant Field Values
ATTR_SURTS_DUMP_FILE
public static final java.lang.String ATTR_SURTS_DUMP_FILE
- See Also:
- Constant Field Values
ATTR_REBUILD_ON_RECONFIG
public static final java.lang.String ATTR_REBUILD_ON_RECONFIG
- Whether every config change should trigger a
rebuilding of the prefix set.
- See Also:
- Constant Field Values
DEFAULT_REBUILD_ON_RECONFIG
public static final java.lang.Boolean DEFAULT_REBUILD_ON_RECONFIG
ATTR_ALSO_CHECK_VIA
public static final java.lang.String ATTR_ALSO_CHECK_VIA
- Whether the 'via' of CrawlURIs should also be checked
to see if it is prefixed by the set of SURT prefixes
- See Also:
- Constant Field Values
DEFAULT_ALSO_CHECK_VIA
public static final java.lang.Boolean DEFAULT_ALSO_CHECK_VIA
surtPrefixes
protected SurtPrefixSet surtPrefixes
SurtPrefixedDecideRule
public SurtPrefixedDecideRule(java.lang.String name)
- Usual constructor.
- Parameters:
name
-
evaluate
protected boolean evaluate(java.lang.Object object)
- Evaluate whether given object's URI is covered by the SURT prefix set
- Specified by:
evaluate
in class PredicatedDecideRule
- Parameters:
object
- Item to evaluate.
- Returns:
- true if item, as SURT form URI, is prefixed by an item in the set
readPrefixes
protected void readPrefixes()
dumpSurtPrefixSet
protected void dumpSurtPrefixSet()
- Dump the current prefixes in use to configured dump file (if any)
buildSurtPrefixSet
protected void buildSurtPrefixSet()
- Construct the set of prefixes to use, from the seed list (
which may include both URIs and '+'-prefixed directives).
kickUpdate
public void kickUpdate()
- Re-read prefixes after an update.
- Overrides:
kickUpdate
in class DecideRule
- See Also:
CrawlScope.kickUpdate()
getSeedfile
protected java.io.File getSeedfile()
- Dig through everything to get the crawl-global seeds file.
Add self as listener while at it.
- Returns:
- Seed list file
addedSeed
public void addedSeed(CandidateURI curi)
- Specified by:
addedSeed
in interface SeedListener
prefixFrom
protected java.lang.String prefixFrom(java.lang.String uri)
Copyright © 2003-2011 Internet Archive. All Rights Reserved.