org.archive.crawler.deciderules
Class PathologicalPathDecideRule

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.deciderules.DecideRule
                      extended by org.archive.crawler.deciderules.ConfiguredDecideRule
                          extended by org.archive.crawler.deciderules.PredicatedDecideRule
                              extended by org.archive.crawler.deciderules.MatchesRegExpDecideRule
                                  extended by org.archive.crawler.deciderules.PathologicalPathDecideRule
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean

public class PathologicalPathDecideRule
extends MatchesRegExpDecideRule

Rule REJECTs any URI which contains an excessive number of identical, consecutive path-segments (eg http://example.com/a/a/a/boo.html == 3 '/a' segments)

Author:
gojomo
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
ComplexType.MBeanAttributeInfoIterator
 
Field Summary
static java.lang.String ATTR_REPETITIONS
           
protected  java.lang.String constructedRegexp
           
(package private) static java.lang.Integer DEFAULT_REPETITIONS
          Default maximum repetitions.
 
Fields inherited from class org.archive.crawler.deciderules.MatchesRegExpDecideRule
ATTR_REGEXP
 
Fields inherited from class org.archive.crawler.deciderules.ConfiguredDecideRule
ALLOWED_TYPES, ATTR_DECISION
 
Fields inherited from class org.archive.crawler.deciderules.DecideRule
ACCEPT, PASS, REJECT
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definition, definitionMap
 
Constructor Summary
PathologicalPathDecideRule(java.lang.String name)
          Constructs a new PathologicalPathFilter.
 
Method Summary
protected  java.lang.String constructRegexp()
           
protected  java.lang.String getRegexp(java.lang.Object o)
          Construct the regexp string to be matched against the URI.
 void kickUpdate()
          Repetitions may have changed; refresh constructedRegexp
 
Methods inherited from class org.archive.crawler.deciderules.MatchesRegExpDecideRule
evaluate
 
Methods inherited from class org.archive.crawler.deciderules.PredicatedDecideRule
decisionFor
 
Methods inherited from class org.archive.crawler.deciderules.ConfiguredDecideRule
singlePossibleNonPassDecision
 
Methods inherited from class org.archive.crawler.deciderules.DecideRule
getController
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement, listUsedFiles
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName, hashCode
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

ATTR_REPETITIONS

public static final java.lang.String ATTR_REPETITIONS
See Also:
Constant Field Values

DEFAULT_REPETITIONS

static final java.lang.Integer DEFAULT_REPETITIONS
Default maximum repetitions. Default access so accessible by unit test.


constructedRegexp

protected java.lang.String constructedRegexp
Constructor Detail

PathologicalPathDecideRule

public PathologicalPathDecideRule(java.lang.String name)
Constructs a new PathologicalPathFilter.

Parameters:
name - the name of the filter.
Method Detail

getRegexp

protected java.lang.String getRegexp(java.lang.Object o)
Construct the regexp string to be matched against the URI.

Overrides:
getRegexp in class MatchesRegExpDecideRule
Parameters:
o - an object to extract a URI from.
Returns:
the regexp pattern.

constructRegexp

protected java.lang.String constructRegexp()

kickUpdate

public void kickUpdate()
Repetitions may have changed; refresh constructedRegexp

Overrides:
kickUpdate in class DecideRule
See Also:
DecideRule.kickUpdate()


Copyright © 2003-2011 Internet Archive. All Rights Reserved.