org.archive.crawler.url.canonicalize
Class BaseRule
java.lang.Object
javax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.url.canonicalize.BaseRule
- All Implemented Interfaces:
- java.io.Serializable, javax.management.DynamicMBean, CanonicalizationRule
- Direct Known Subclasses:
- FixupQueryStr, LowercaseRule, RegexRule, StripExtraSlashes, StripSessionCFIDs, StripSessionIDs, StripUserinfoRule, StripWWWNRule, StripWWWRule
public abstract class BaseRule
- extends ModuleType
- implements CanonicalizationRule
Base of all rules applied canonicalizing a URL that are configurable
via the Heritrix settings system.
This base class is abstact. Subclasses must implement the
CanonicalizationRule.canonicalize(String, Object)
method.
- Version:
- $Date: 2005-11-04 23:00:23 +0000 (Fri, 04 Nov 2005) $, $Revision: 3932 $
- Author:
- stack
- See Also:
- Serialized Form
Constructor Summary |
BaseRule(java.lang.String name,
java.lang.String description)
Constructor. |
Method Summary |
protected java.lang.String |
doStripRegexMatch(java.lang.String url,
java.util.regex.Matcher matcher)
Run a regex that strips elements of a string. |
boolean |
isEnabled(java.lang.Object context)
|
Methods inherited from class org.archive.crawler.settings.ComplexType |
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute |
Methods inherited from class org.archive.crawler.settings.Type |
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
getName, hashCode |
Methods inherited from class java.lang.Object |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
ATTR_ENABLED
public static final java.lang.String ATTR_ENABLED
- See Also:
- Constant Field Values
BaseRule
public BaseRule(java.lang.String name,
java.lang.String description)
- Constructor.
- Parameters:
name
- Name of this canonicalization rule.description
- Description of what this rule does.
isEnabled
public boolean isEnabled(java.lang.Object context)
- Specified by:
isEnabled
in interface CanonicalizationRule
- Parameters:
context
- An object that will provide context for the settings
system. The UURI of the URL we're canonicalizing is an example of
an object that provides context.
- Returns:
- True if this rule is enabled and to be run.
doStripRegexMatch
protected java.lang.String doStripRegexMatch(java.lang.String url,
java.util.regex.Matcher matcher)
- Run a regex that strips elements of a string.
Assumes the regex has a form that wants to strip elements of the passed
string. Assumes that if a match, appending group 1
and group 2 yields desired result.
- Parameters:
url
- Url to search in.matcher
- Matcher whose form yields a group 1 and group 2 if a
match (non-null.
- Returns:
- Original
url
else concatenization of group 1
and group 2.
Copyright © 2003-2011 Internet Archive. All Rights Reserved.