org.archive.crawler.deciderules
Class PathologicalPathDecideRule
java.lang.Object
javax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.deciderules.DecideRule
org.archive.crawler.deciderules.ConfiguredDecideRule
org.archive.crawler.deciderules.PredicatedDecideRule
org.archive.crawler.deciderules.MatchesRegExpDecideRule
org.archive.crawler.deciderules.PathologicalPathDecideRule
- All Implemented Interfaces:
- java.io.Serializable, javax.management.DynamicMBean
public class PathologicalPathDecideRule
- extends MatchesRegExpDecideRule
Rule REJECTs any URI which contains an excessive number of identical,
consecutive path-segments (eg http://example.com/a/a/a/boo.html == 3 '/a'
segments)
- Author:
- gojomo
- See Also:
- Serialized Form
Method Summary |
protected java.lang.String |
constructRegexp()
|
protected java.lang.String |
getRegexp(java.lang.Object o)
Construct the regexp string to be matched against the URI. |
void |
kickUpdate()
Repetitions may have changed; refresh constructedRegexp |
Methods inherited from class org.archive.crawler.settings.ComplexType |
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute |
Methods inherited from class org.archive.crawler.settings.Type |
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
getName, hashCode |
Methods inherited from class java.lang.Object |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
ATTR_REPETITIONS
public static final java.lang.String ATTR_REPETITIONS
- See Also:
- Constant Field Values
DEFAULT_REPETITIONS
static final java.lang.Integer DEFAULT_REPETITIONS
- Default maximum repetitions.
Default access so accessible by unit test.
constructedRegexp
protected java.lang.String constructedRegexp
PathologicalPathDecideRule
public PathologicalPathDecideRule(java.lang.String name)
- Constructs a new PathologicalPathFilter.
- Parameters:
name
- the name of the filter.
getRegexp
protected java.lang.String getRegexp(java.lang.Object o)
- Construct the regexp string to be matched against the URI.
- Overrides:
getRegexp
in class MatchesRegExpDecideRule
- Parameters:
o
- an object to extract a URI from.
- Returns:
- the regexp pattern.
constructRegexp
protected java.lang.String constructRegexp()
kickUpdate
public void kickUpdate()
- Repetitions may have changed; refresh constructedRegexp
- Overrides:
kickUpdate
in class DecideRule
- See Also:
DecideRule.kickUpdate()
Copyright © 2003-2011 Internet Archive. All Rights Reserved.