org.archive.crawler.io
Class CrawlerJournal

java.lang.Object
  extended by org.archive.crawler.io.CrawlerJournal
Direct Known Subclasses:
RecoveryJournal

public class CrawlerJournal
extends java.lang.Object

Utility class for a crawler journal/log that is compressed and rotates by serial number at checkpoints.

Author:
gojomo

Field Summary
protected  it.unimi.dsi.mg4j.util.MutableString accumulatingBuffer
          Allocate a buffer for accumulating lines to write and reuse it.
static java.lang.String GZIP_SUFFIX
          suffix to recognize gzipped files
protected  java.io.File gzipFile
          File we're writing journal to.
protected  long lines
          line count
static java.lang.String LOG_ERROR
          prefix for error lines
static java.lang.String LOG_TIMESTAMP
          prefix for timestamp lines
protected  java.io.Writer out
          Stream on which we record frontier events.
protected  int timestamp_interval
          number of lines between timestamps
 
Constructor Summary
CrawlerJournal(java.io.File file)
          Create a new crawler journal at the given location
CrawlerJournal(java.lang.String path, java.lang.String filename)
          Create a new crawler journal at the given location
 
Method Summary
 void checkpoint(java.io.File checkpointDir)
          Handle a checkpoint by rotating the current log to a checkpoint-named file and starting a new log.
 void close()
          Flush and close the underlying IO objects.
protected  void considerTimestamp()
          Write a timestamp line if appropriate
static java.io.BufferedInputStream getBufferedInput(java.io.File source)
          Get a BufferedInputStream on the recovery file given.
static java.io.BufferedReader getBufferedReader(java.io.File source)
          Get a BufferedReader on the crawler journal given
static java.io.BufferedReader getBufferedReader(java.net.URL source)
          Get a BufferedReader on the crawler journal given.
protected  java.io.Writer initialize(java.io.File f)
           
protected  void noteLine()
          Count and note a line
 void seriousError(java.lang.String err)
          Note a serious error vioa a special log line
 void writeLine(it.unimi.dsi.mg4j.util.MutableString mstring)
          Write a line.
 void writeLine(java.lang.String string)
          Write a line
 void writeLine(java.lang.String s1, java.lang.String s2)
          Write a line of two strings
 void writeLine(java.lang.String s1, java.lang.String s2, java.lang.String s3)
          Write a line of three strings
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG_ERROR

public static final java.lang.String LOG_ERROR
prefix for error lines

See Also:
Constant Field Values

LOG_TIMESTAMP

public static final java.lang.String LOG_TIMESTAMP
prefix for timestamp lines

See Also:
Constant Field Values

out

protected java.io.Writer out
Stream on which we record frontier events.


lines

protected long lines
line count


timestamp_interval

protected int timestamp_interval
number of lines between timestamps


GZIP_SUFFIX

public static final java.lang.String GZIP_SUFFIX
suffix to recognize gzipped files

See Also:
Constant Field Values

gzipFile

protected java.io.File gzipFile
File we're writing journal to. Keep a reference in case we want to rotate it off.


accumulatingBuffer

protected it.unimi.dsi.mg4j.util.MutableString accumulatingBuffer
Allocate a buffer for accumulating lines to write and reuse it.

Constructor Detail

CrawlerJournal

public CrawlerJournal(java.lang.String path,
                      java.lang.String filename)
               throws java.io.IOException
Create a new crawler journal at the given location

Parameters:
path - Directory to make thejournal in.
filename - Name to use for journal file.
Throws:
java.io.IOException

CrawlerJournal

public CrawlerJournal(java.io.File file)
               throws java.io.IOException
Create a new crawler journal at the given location

Parameters:
file - path at which to make journal
Throws:
java.io.IOException
Method Detail

getBufferedReader

public static java.io.BufferedReader getBufferedReader(java.io.File source)
                                                throws java.io.IOException
Get a BufferedReader on the crawler journal given

Parameters:
source - File journal
Returns:
journal buffered reader.
Throws:
java.io.IOException

getBufferedReader

public static java.io.BufferedReader getBufferedReader(java.net.URL source)
                                                throws java.io.IOException
Get a BufferedReader on the crawler journal given.

Parameters:
source - URL journal
Returns:
journal buffered reader.
Throws:
java.io.IOException

getBufferedInput

public static java.io.BufferedInputStream getBufferedInput(java.io.File source)
                                                    throws java.io.IOException
Get a BufferedInputStream on the recovery file given.

Parameters:
source - file to open
Returns:
journal buffered input stream.
Throws:
java.io.IOException

initialize

protected java.io.Writer initialize(java.io.File f)
                             throws java.io.FileNotFoundException,
                                    java.io.IOException
Throws:
java.io.FileNotFoundException
java.io.IOException

writeLine

public void writeLine(java.lang.String string)
Write a line

Parameters:
string - String

writeLine

public void writeLine(java.lang.String s1,
                      java.lang.String s2)
Write a line of two strings

Parameters:
s1 - String
s2 - String

writeLine

public void writeLine(java.lang.String s1,
                      java.lang.String s2,
                      java.lang.String s3)
Write a line of three strings

Parameters:
s1 - String
s2 - String
s3 - String

writeLine

public void writeLine(it.unimi.dsi.mg4j.util.MutableString mstring)
Write a line.

Parameters:
mstring - MutableString to write

noteLine

protected void noteLine()
                 throws java.io.IOException
Count and note a line

Throws:
java.io.IOException

considerTimestamp

protected void considerTimestamp()
                          throws java.io.IOException
Write a timestamp line if appropriate

Throws:
java.io.IOException

close

public void close()
Flush and close the underlying IO objects.


seriousError

public void seriousError(java.lang.String err)
Note a serious error vioa a special log line

Parameters:
err -

checkpoint

public void checkpoint(java.io.File checkpointDir)
                throws java.io.IOException
Handle a checkpoint by rotating the current log to a checkpoint-named file and starting a new log.

Parameters:
checkpointDir -
Throws:
java.io.IOException


Copyright © 2003-2011 Internet Archive. All Rights Reserved.