|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Packages that use CanonicalizationRule | |
---|---|
org.archive.crawler.url.canonicalize |
Uses of CanonicalizationRule in org.archive.crawler.url.canonicalize |
---|
Classes in org.archive.crawler.url.canonicalize that implement CanonicalizationRule | |
---|---|
class |
BaseRule
Base of all rules applied canonicalizing a URL that are configurable via the Heritrix settings system. |
class |
FixupQueryStr
Strip any trailing question mark. |
class |
LowercaseRule
Lowercases the URL. |
class |
RegexRule
General conversion rule. |
class |
StripExtraSlashes
|
class |
StripSessionCFIDs
Strip cold fusion session ids. |
class |
StripSessionIDs
Strip known session ids. |
class |
StripUserinfoRule
Strip any 'userinfo' found on http/https URLs. |
class |
StripWWWNRule
Strip any 'www[0-9]*' found on http/https URLs IF they have some path/query component (content after third slash). |
class |
StripWWWRule
Strip any 'www' found on http/https URLs, IF they have some path/query component (content after third slash). |
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |