org.archive.crawler.filter
Class PathologicalPathFilter
java.lang.Object
javax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.framework.Filter
org.archive.crawler.filter.URIRegExpFilter
org.archive.crawler.filter.PathologicalPathFilter
- All Implemented Interfaces:
- java.io.Serializable, javax.management.DynamicMBean
Deprecated. As of release 1.10.0. Replaced by DecidingFilter
and
equivalent DecideRule
.
public class PathologicalPathFilter
- extends URIRegExpFilter
Checks if a URI contains a repeated pattern.
This filter is checking if a pattern is repeated a specific number of times.
The use is to avoid crawler traps where the server adds the same pattern to
the requested URI like: http://host/img/img/img/img....
. This
filter returns TRUE if the path is pathological. FALSE otherwise.
- Author:
- John Erik Halse
- See Also:
- Serialized Form
Constructor Summary |
PathologicalPathFilter(java.lang.String name)
Deprecated. Constructs a new PathologicalPathFilter. |
Method Summary |
protected boolean |
getFilterOffPosition(CrawlURI curi)
Deprecated. If the filter is disabled, the value returned by this method is
what filters return as their disabled setting. |
protected java.lang.String |
getRegexp(java.lang.Object o)
Deprecated. Construct the regexp string to be matched aginst the URI. |
Methods inherited from class org.archive.crawler.settings.ComplexType |
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, unsetAttribute |
Methods inherited from class org.archive.crawler.settings.Type |
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
getName, hashCode |
Methods inherited from class java.lang.Object |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
ATTR_REPETITIONS
public static final java.lang.String ATTR_REPETITIONS
- Deprecated.
- See Also:
- Constant Field Values
DEFAULT_REPETITIONS
public static final java.lang.Integer DEFAULT_REPETITIONS
- Deprecated.
PathologicalPathFilter
public PathologicalPathFilter(java.lang.String name)
- Deprecated.
- Constructs a new PathologicalPathFilter.
- Parameters:
name
- the name of the filter.
getRegexp
protected java.lang.String getRegexp(java.lang.Object o)
- Deprecated.
- Construct the regexp string to be matched aginst the URI.
- Overrides:
getRegexp
in class URIRegExpFilter
- Parameters:
o
- an object to extract a URI from.
- Returns:
- the regexp pattern.
getFilterOffPosition
protected boolean getFilterOffPosition(CrawlURI curi)
- Deprecated.
- Description copied from class:
Filter
- If the filter is disabled, the value returned by this method is
what filters return as their disabled setting.
Default is that we return 'true', continue processing, but some
filters -- the exclude filters for example -- will want to return
false if disabled so processing can continue.
- Overrides:
getFilterOffPosition
in class Filter
- Parameters:
curi
- CrawlURI to use as context. Passed curi can be null.
- Returns:
- This filters 'off' position.
Copyright © 2003-2011 Internet Archive. All Rights Reserved.