org.archive.crawler.util
Class RecoveryLogMapper

java.lang.Object
  extended by org.archive.crawler.util.RecoveryLogMapper

public class RecoveryLogMapper
extends java.lang.Object


Constructor Summary
RecoveryLogMapper(java.lang.String recoverLogFileName)
          Normal constructor - if encounter not-found seeds while loading recoverLogFileName, will throw throw SeedUrlNotFoundException.
RecoveryLogMapper(java.lang.String recoverLogFileName, java.lang.String seedNotFoundLogFileName)
          Constructor to use if you want to allow not-found seeds, logging them to seedNotFoundLogFileName.
 
Method Summary
 java.util.Iterator<java.lang.String> getIteratorOfURLsSuccessfullyCrawledFromSeedUrl(java.lang.String seedUrlString)
           
static java.util.logging.Logger getLogger()
           
 java.util.Collection<java.lang.String> getSeedCollection()
           
 java.lang.String getSeedForUrl(java.lang.String urlString)
          Returns seed for urlString (null if seed not found).
 java.util.Map getSeedUrlToDiscoveredUrlsMap()
           
 java.util.Set getSuccessfullyCrawledUrls()
           
protected  void load(java.lang.String recoverLogFileName)
           
static void main(java.lang.String[] args)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RecoveryLogMapper

public RecoveryLogMapper(java.lang.String recoverLogFileName)
                  throws java.io.FileNotFoundException,
                         java.io.IOException,
                         SeedUrlNotFoundException
Normal constructor - if encounter not-found seeds while loading recoverLogFileName, will throw throw SeedUrlNotFoundException. Use RecoveryLogMapper(String) if you want to just log such cases and keep going. (Those should not happen if the recover log is written correctly, but we see them in pratice.)

Parameters:
recoverLogFileName -
Throws:
java.io.FileNotFoundException
java.io.IOException
SeedUrlNotFoundException

RecoveryLogMapper

public RecoveryLogMapper(java.lang.String recoverLogFileName,
                         java.lang.String seedNotFoundLogFileName)
                  throws java.io.FileNotFoundException,
                         java.io.IOException,
                         SeedUrlNotFoundException
Constructor to use if you want to allow not-found seeds, logging them to seedNotFoundLogFileName. In contrast, RecoveryLogMapper(String) will throw SeedUrlNotFoundException when a seed isn't found.

Parameters:
recoverLogFileName -
seedNotFoundLogFileName -
Throws:
java.io.FileNotFoundException
java.io.IOException
SeedUrlNotFoundException
Method Detail

load

protected void load(java.lang.String recoverLogFileName)
             throws java.io.FileNotFoundException,
                    java.io.IOException,
                    SeedUrlNotFoundException
Throws:
java.io.FileNotFoundException
java.io.IOException
SeedUrlNotFoundException

getSeedForUrl

public java.lang.String getSeedForUrl(java.lang.String urlString)
Returns seed for urlString (null if seed not found).

Parameters:
urlString -
Returns:
Seed.

getSeedUrlToDiscoveredUrlsMap

public java.util.Map getSeedUrlToDiscoveredUrlsMap()
Returns:
Returns the seedUrlToDiscoveredUrlsMap.

getSuccessfullyCrawledUrls

public java.util.Set getSuccessfullyCrawledUrls()
Returns:
Returns the successfullyCrawledUrls.

getLogger

public static java.util.logging.Logger getLogger()
Returns:
Returns the logger.

getIteratorOfURLsSuccessfullyCrawledFromSeedUrl

public java.util.Iterator<java.lang.String> getIteratorOfURLsSuccessfullyCrawledFromSeedUrl(java.lang.String seedUrlString)
                                                                                     throws SeedUrlNotFoundException
Throws:
SeedUrlNotFoundException

getSeedCollection

public java.util.Collection<java.lang.String> getSeedCollection()

main

public static void main(java.lang.String[] args)


Copyright © 2003-2011 Internet Archive. All Rights Reserved.