Uses of Interface
org.archive.crawler.url.CanonicalizationRule

Packages that use CanonicalizationRule
org.archive.crawler.url.canonicalize   
 

Uses of CanonicalizationRule in org.archive.crawler.url.canonicalize
 

Classes in org.archive.crawler.url.canonicalize that implement CanonicalizationRule
 class BaseRule
          Base of all rules applied canonicalizing a URL that are configurable via the Heritrix settings system.
 class FixupQueryStr
          Strip any trailing question mark.
 class LowercaseRule
          Lowercases the URL.
 class RegexRule
          General conversion rule.
 class StripExtraSlashes
           
 class StripSessionCFIDs
          Strip cold fusion session ids.
 class StripSessionIDs
          Strip known session ids.
 class StripUserinfoRule
          Strip any 'userinfo' found on http/https URLs.
 class StripWWWNRule
          Strip any 'www[0-9]*' found on http/https URLs IF they have some path/query component (content after third slash).
 class StripWWWRule
          Strip any 'www' found on http/https URLs, IF they have some path/query component (content after third slash).
 



Copyright © 2003-2011 Internet Archive. All Rights Reserved.