org.archive.crawler.url.canonicalize
Class StripWWWNRule
java.lang.Object
javax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.url.canonicalize.BaseRule
org.archive.crawler.url.canonicalize.StripWWWNRule
- All Implemented Interfaces:
- java.io.Serializable, javax.management.DynamicMBean, CanonicalizationRule
public class StripWWWNRule
- extends BaseRule
Strip any 'www[0-9]*' found on http/https URLs IF they have some
path/query component (content after third slash). Top 'slash page'
URIs are left unstripped: we prefer crawling redundant
top pages to missing an entire site only available from either
the www-full or www-less hostname, but not both.
- Version:
- $Date: 2006-09-18 20:32:47 +0000 (Mon, 18 Sep 2006) $, $Revision: 4634 $
- Author:
- stack
- See Also:
- Serialized Form
Method Summary |
java.lang.String |
canonicalize(java.lang.String url,
java.lang.Object context)
Apply this canonicalization rule. |
Methods inherited from class org.archive.crawler.settings.ComplexType |
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute |
Methods inherited from class org.archive.crawler.settings.Type |
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
getName, hashCode |
Methods inherited from class java.lang.Object |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
StripWWWNRule
public StripWWWNRule(java.lang.String name)
canonicalize
public java.lang.String canonicalize(java.lang.String url,
java.lang.Object context)
- Description copied from interface:
CanonicalizationRule
- Apply this canonicalization rule.
- Parameters:
url
- Url string we apply this rule to.context
- An object that will provide context for the settings
system. The UURI of the URL we're canonicalizing is an example of
an object that provides context.
- Returns:
- Result of applying this rule to passed
url
.
Copyright © 2003-2011 Internet Archive. All Rights Reserved.