org.archive.crawler.util
Class RecoveryLogMapper
java.lang.Object
org.archive.crawler.util.RecoveryLogMapper
public class RecoveryLogMapper
- extends java.lang.Object
Constructor Summary |
RecoveryLogMapper(java.lang.String recoverLogFileName)
Normal constructor - if encounter not-found seeds while loading
recoverLogFileName, will throw throw SeedUrlNotFoundException. |
RecoveryLogMapper(java.lang.String recoverLogFileName,
java.lang.String seedNotFoundLogFileName)
Constructor to use if you want to allow not-found seeds, logging
them to seedNotFoundLogFileName. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
RecoveryLogMapper
public RecoveryLogMapper(java.lang.String recoverLogFileName)
throws java.io.FileNotFoundException,
java.io.IOException,
SeedUrlNotFoundException
- Normal constructor - if encounter not-found seeds while loading
recoverLogFileName, will throw throw SeedUrlNotFoundException.
Use
RecoveryLogMapper(String)
if you want to just log
such cases and keep going. (Those should not happen if the
recover log is written correctly, but we see them in pratice.)
- Parameters:
recoverLogFileName
-
- Throws:
java.io.FileNotFoundException
java.io.IOException
SeedUrlNotFoundException
RecoveryLogMapper
public RecoveryLogMapper(java.lang.String recoverLogFileName,
java.lang.String seedNotFoundLogFileName)
throws java.io.FileNotFoundException,
java.io.IOException,
SeedUrlNotFoundException
- Constructor to use if you want to allow not-found seeds, logging
them to seedNotFoundLogFileName. In contrast,
RecoveryLogMapper(String)
will throw SeedUrlNotFoundException
when a seed isn't found.
- Parameters:
recoverLogFileName
- seedNotFoundLogFileName
-
- Throws:
java.io.FileNotFoundException
java.io.IOException
SeedUrlNotFoundException
load
protected void load(java.lang.String recoverLogFileName)
throws java.io.FileNotFoundException,
java.io.IOException,
SeedUrlNotFoundException
- Throws:
java.io.FileNotFoundException
java.io.IOException
SeedUrlNotFoundException
getSeedForUrl
public java.lang.String getSeedForUrl(java.lang.String urlString)
- Returns seed for urlString (null if seed not found).
- Parameters:
urlString
-
- Returns:
- Seed.
getSeedUrlToDiscoveredUrlsMap
public java.util.Map getSeedUrlToDiscoveredUrlsMap()
- Returns:
- Returns the seedUrlToDiscoveredUrlsMap.
getSuccessfullyCrawledUrls
public java.util.Set getSuccessfullyCrawledUrls()
- Returns:
- Returns the successfullyCrawledUrls.
getLogger
public static java.util.logging.Logger getLogger()
- Returns:
- Returns the logger.
getIteratorOfURLsSuccessfullyCrawledFromSeedUrl
public java.util.Iterator<java.lang.String> getIteratorOfURLsSuccessfullyCrawledFromSeedUrl(java.lang.String seedUrlString)
throws SeedUrlNotFoundException
- Throws:
SeedUrlNotFoundException
getSeedCollection
public java.util.Collection<java.lang.String> getSeedCollection()
main
public static void main(java.lang.String[] args)
Copyright © 2003-2011 Internet Archive. All Rights Reserved.