org.archive.crawler.url.canonicalize
Class BaseRule

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.url.canonicalize.BaseRule
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean, CanonicalizationRule
Direct Known Subclasses:
FixupQueryStr, LowercaseRule, RegexRule, StripExtraSlashes, StripSessionCFIDs, StripSessionIDs, StripUserinfoRule, StripWWWNRule, StripWWWRule

public abstract class BaseRule
extends ModuleType
implements CanonicalizationRule

Base of all rules applied canonicalizing a URL that are configurable via the Heritrix settings system. This base class is abstact. Subclasses must implement the CanonicalizationRule.canonicalize(String, Object) method.

Version:
$Date: 2005-11-04 23:00:23 +0000 (Fri, 04 Nov 2005) $, $Revision: 3932 $
Author:
stack
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
ComplexType.MBeanAttributeInfoIterator
 
Field Summary
static java.lang.String ATTR_ENABLED
           
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definition, definitionMap
 
Constructor Summary
BaseRule(java.lang.String name, java.lang.String description)
          Constructor.
 
Method Summary
protected  java.lang.String doStripRegexMatch(java.lang.String url, java.util.regex.Matcher matcher)
          Run a regex that strips elements of a string.
 boolean isEnabled(java.lang.Object context)
           
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement, listUsedFiles
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName, hashCode
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface org.archive.crawler.url.CanonicalizationRule
canonicalize, getName
 

Field Detail

ATTR_ENABLED

public static final java.lang.String ATTR_ENABLED
See Also:
Constant Field Values
Constructor Detail

BaseRule

public BaseRule(java.lang.String name,
                java.lang.String description)
Constructor.

Parameters:
name - Name of this canonicalization rule.
description - Description of what this rule does.
Method Detail

isEnabled

public boolean isEnabled(java.lang.Object context)
Specified by:
isEnabled in interface CanonicalizationRule
Parameters:
context - An object that will provide context for the settings system. The UURI of the URL we're canonicalizing is an example of an object that provides context.
Returns:
True if this rule is enabled and to be run.

doStripRegexMatch

protected java.lang.String doStripRegexMatch(java.lang.String url,
                                             java.util.regex.Matcher matcher)
Run a regex that strips elements of a string. Assumes the regex has a form that wants to strip elements of the passed string. Assumes that if a match, appending group 1 and group 2 yields desired result.

Parameters:
url - Url to search in.
matcher - Matcher whose form yields a group 1 and group 2 if a match (non-null.
Returns:
Original url else concatenization of group 1 and group 2.


Copyright © 2003-2011 Internet Archive. All Rights Reserved.