org.archive.crawler.url.canonicalize
Class StripWWWRule
java.lang.Object
javax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.url.canonicalize.BaseRule
org.archive.crawler.url.canonicalize.StripWWWRule
- All Implemented Interfaces:
- java.io.Serializable, javax.management.DynamicMBean, CanonicalizationRule
public class StripWWWRule
- extends BaseRule
Strip any 'www' found on http/https URLs, IF they have some
path/query component (content after third slash). (Top 'slash page'
URIs are left unstripped, so that we prefer crawling redundant
top pages to missing an entire site only available from either
the www-full or www-less hostname, but not both).
- Version:
- $Date: 2006-09-25 20:27:35 +0000 (Mon, 25 Sep 2006) $, $Revision: 4655 $
- Author:
- stack
- See Also:
- Serialized Form
Method Summary |
java.lang.String |
canonicalize(java.lang.String url,
java.lang.Object context)
Apply this canonicalization rule. |
Methods inherited from class org.archive.crawler.settings.ComplexType |
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute |
Methods inherited from class org.archive.crawler.settings.Type |
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
getName, hashCode |
Methods inherited from class java.lang.Object |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
StripWWWRule
public StripWWWRule(java.lang.String name)
canonicalize
public java.lang.String canonicalize(java.lang.String url,
java.lang.Object context)
- Description copied from interface:
CanonicalizationRule
- Apply this canonicalization rule.
- Parameters:
url
- Url string we apply this rule to.context
- An object that will provide context for the settings
system. The UURI of the URL we're canonicalizing is an example of
an object that provides context.
- Returns:
- Result of applying this rule to passed
url
.
Copyright © 2003-2011 Internet Archive. All Rights Reserved.