org.archive.crawler.url
Class Canonicalizer

java.lang.Object
  extended by org.archive.crawler.url.Canonicalizer

public class Canonicalizer
extends java.lang.Object

URL canonicalizer.

Version:
$Date: 2006-09-26 20:38:48 +0000 (Tue, 26 Sep 2006) $, $Revision: 4667 $
Author:
stack

Method Summary
static java.lang.String canonicalize(UURI uuri, CrawlOrder order)
          Convenience method that is passed a settings object instance pulling from it what it needs to canonicalize.
static java.lang.String canonicalize(UURI uuri, java.util.Iterator rules)
          Run the passed uuri through the list of rules.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

canonicalize

public static java.lang.String canonicalize(UURI uuri,
                                            CrawlOrder order)
Convenience method that is passed a settings object instance pulling from it what it needs to canonicalize.

Parameters:
uuri - UURI to canonicalize.
order - A crawlorder instance.
Returns:
Canonicalized string of uuri else uuri if an error.

canonicalize

public static java.lang.String canonicalize(UURI uuri,
                                            java.util.Iterator rules)
Run the passed uuri through the list of rules.

Parameters:
uuri - Url to canonicalize.
rules - Iterator of canonicalization rules to apply (Get one of these on the url-canonicalizer-rules element in order files or create a list externally). Rules must implement the Rule interface.
Returns:
Canonicalized URL.


Copyright © 2003-2011 Internet Archive. All Rights Reserved.