org.archive.crawler.framework
Class Checkpointer

java.lang.Object
  extended by org.archive.crawler.framework.Checkpointer
All Implemented Interfaces:
java.io.Serializable

public class Checkpointer
extends java.lang.Object
implements java.io.Serializable

Runs checkpointing. Also keeps history of crawl checkpoints Generally used by CrawlController only but also has static utility methods classes that need to participate in a checkpoint can use.

Author:
gojomo, stack
See Also:
Serialized Form

Nested Class Summary
 class Checkpointer.CheckpointingThread
          Thread to run the checkpointing.
 
Field Summary
static java.text.DecimalFormat INDEX_FORMAT
           
 
Constructor Summary
Checkpointer(CrawlController cc, java.io.File checkpointDir)
          Create a new CheckpointContext with the given store directory
Checkpointer(CrawlController cc, java.lang.String prefix)
          Create a new CheckpointContext with the given store directory
 
Method Summary
 void checkpoint()
          Run a checkpoint of the crawler.
protected  void checkpointFailed()
           
protected  void checkpointFailed(java.lang.Exception e)
          Note that a checkpoint failed
protected  void checkpointFailed(java.lang.String message)
           
(package private)  void cleanup()
           
protected  void clearCheckpointInProgressDirectory()
           
protected  java.io.File createCheckpointInProgressDirectory()
           
static java.lang.String formatCheckpointName(java.lang.String prefix, int index)
           
 java.io.File getCheckpointInProgressDirectory()
           
protected  CrawlController getController()
           
 int getNextCheckpoint()
           
 java.lang.String getNextCheckpointName()
           
 java.util.List getPredecessorCheckpoints()
           
protected  void initialize(CrawlController cc, java.lang.String prefix)
           
 boolean isAtBeginning()
           
protected  boolean isCheckpointErrors()
           
 boolean isCheckpointFailed()
           
 boolean isCheckpointing()
           
 void recover(CrawlController cc)
          Call when recovering from a checkpoint.
protected  void setCheckpointErrors(boolean checkpointErrors)
           
protected  void writeValidity()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

INDEX_FORMAT

public static final java.text.DecimalFormat INDEX_FORMAT
Constructor Detail

Checkpointer

public Checkpointer(CrawlController cc,
                    java.io.File checkpointDir)
Create a new CheckpointContext with the given store directory

Parameters:
cc - CrawlController instance thats hosting this Checkpointer.
checkpointDir - Where to store checkpoint.

Checkpointer

public Checkpointer(CrawlController cc,
                    java.lang.String prefix)
Create a new CheckpointContext with the given store directory

Parameters:
cc - CrawlController instance thats hosting this Checkpointer.
prefix - Prefix for checkpoint label.
Method Detail

initialize

protected void initialize(CrawlController cc,
                          java.lang.String prefix)

cleanup

void cleanup()

getNextCheckpoint

public int getNextCheckpoint()
Returns:
Returns the nextCheckpoint index.

checkpoint

public void checkpoint()
Run a checkpoint of the crawler.


createCheckpointInProgressDirectory

protected java.io.File createCheckpointInProgressDirectory()

clearCheckpointInProgressDirectory

protected void clearCheckpointInProgressDirectory()

getController

protected CrawlController getController()

getNextCheckpointName

public java.lang.String getNextCheckpointName()
Returns:
next checkpoint name (zero-padding string).

formatCheckpointName

public static java.lang.String formatCheckpointName(java.lang.String prefix,
                                                    int index)

writeValidity

protected void writeValidity()

getCheckpointInProgressDirectory

public java.io.File getCheckpointInProgressDirectory()
Returns:
Checkpoint directory. Name of the directory is the name of this current checkpoint. Null if no checkpoint in progress.

isCheckpointing

public boolean isCheckpointing()
Returns:
True if a checkpoint is in progress.

checkpointFailed

protected void checkpointFailed(java.lang.Exception e)
Note that a checkpoint failed

Parameters:
e - Exception checkpoint failed on.

checkpointFailed

protected void checkpointFailed(java.lang.String message)

checkpointFailed

protected void checkpointFailed()

isCheckpointFailed

public boolean isCheckpointFailed()
Returns:
True if current/last checkpoint failed.

isAtBeginning

public boolean isAtBeginning()
Returns:
Return whether this context is at a new crawl, never- checkpointed state.

recover

public void recover(CrawlController cc)
Call when recovering from a checkpoint. Call this after instance has been revivifyied post-serialization to amend counters and directories that effect where checkpoints get stored from here on out.

Parameters:
cc - CrawlController instance.

getPredecessorCheckpoints

public java.util.List getPredecessorCheckpoints()
Returns:
Returns the predecessorCheckpoints.

isCheckpointErrors

protected boolean isCheckpointErrors()

setCheckpointErrors

protected void setCheckpointErrors(boolean checkpointErrors)


Copyright © 2003-2011 Internet Archive. All Rights Reserved.