org.archive.crawler.frontier
Class RecoveryJournal
java.lang.Object
org.archive.crawler.io.CrawlerJournal
org.archive.crawler.frontier.RecoveryJournal
- All Implemented Interfaces:
- FrontierJournal
public class RecoveryJournal
- extends CrawlerJournal
- implements FrontierJournal
Helper class for managing a simple Frontier change-events journal which is
useful for recovering from crawl problems.
By replaying the journal into a new Frontier, its state (at least with
respect to URIs alreadyIncluded and in pending queues) will match that of the
original Frontier, allowing a pseudo-resume of a previous crawl, at least as
far as URI visitation/coverage is concerned.
- Author:
- gojomo
Constructor Summary |
RecoveryJournal(java.lang.String path,
java.lang.String filename)
Create a new recovery journal at the given location |
Methods inherited from class org.archive.crawler.io.CrawlerJournal |
checkpoint, close, considerTimestamp, getBufferedInput, getBufferedReader, getBufferedReader, initialize, noteLine, seriousError, writeLine, writeLine, writeLine, writeLine |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
F_ADD
public static final java.lang.String F_ADD
- See Also:
- Constant Field Values
F_EMIT
public static final java.lang.String F_EMIT
- See Also:
- Constant Field Values
F_DISREGARD
public static final java.lang.String F_DISREGARD
- See Also:
- Constant Field Values
F_RESCHEDULE
public static final java.lang.String F_RESCHEDULE
- See Also:
- Constant Field Values
F_SUCCESS
public static final java.lang.String F_SUCCESS
- See Also:
- Constant Field Values
F_FAILURE
public static final java.lang.String F_FAILURE
- See Also:
- Constant Field Values
RecoveryJournal
public RecoveryJournal(java.lang.String path,
java.lang.String filename)
throws java.io.IOException
- Create a new recovery journal at the given location
- Parameters:
path
- Directory to make the recovery journal in.filename
- Name to use for recovery journal file.
- Throws:
java.io.IOException
added
public void added(CandidateURI curi)
- Specified by:
added
in interface FrontierJournal
- Parameters:
curi
- CrawlURI that has been scheduled to be added to the
Frontier.
writeLongUriLine
public void writeLongUriLine(java.lang.String tag,
CandidateURI curi)
finishedSuccess
public void finishedSuccess(CandidateURI curi)
- Specified by:
finishedSuccess
in interface FrontierJournal
- Parameters:
curi
- CrawlURI that finished successfully.
emitted
public void emitted(CandidateURI curi)
- Description copied from interface:
FrontierJournal
- Note that a CrawlURI was emitted for processing.
If not followed by a finished or rescheduled notation in
the journal, the CrawlURI was still in-process when the journal ended.
- Specified by:
emitted
in interface FrontierJournal
- Parameters:
curi
- CrawlURI emitted.
finishedDisregard
public void finishedDisregard(CandidateURI curi)
- Specified by:
finishedDisregard
in interface FrontierJournal
- Parameters:
curi
- CrawlURI finished disregarded (uncounted failure).
finishedFailure
public void finishedFailure(CandidateURI curi)
- Specified by:
finishedFailure
in interface FrontierJournal
- Parameters:
curi
- CrawlURI finished unsuccessfully.
rescheduled
public void rescheduled(CandidateURI curi)
- Specified by:
rescheduled
in interface FrontierJournal
- Parameters:
curi
- CrawlURI that was returned to the Frontier for
another try.
importRecoverLog
public static void importRecoverLog(java.io.File source,
CrawlController controller,
boolean retainFailures)
throws java.io.IOException
- Utility method for scanning a recovery journal and applying it to
a Frontier.
- Parameters:
source
- Recover log path.frontier
- Frontier reference.retainFailures
-
- Throws:
java.io.IOException
- See Also:
Frontier.importRecoverLog(String, boolean)
Copyright © 2003-2011 Internet Archive. All Rights Reserved.