org.archive.io
Class WriterPoolMember

java.lang.Object
  extended by org.archive.io.WriterPoolMember
All Implemented Interfaces:
ArchiveFileConstants
Direct Known Subclasses:
ARCWriter, WARCWriter

public abstract class WriterPoolMember
extends java.lang.Object
implements ArchiveFileConstants

Member of WriterPool. Implements rotating off files, file naming with some guarantee of uniqueness, and position in file. Subclass to pick up functionality for a particular Writer type.

Version:
$Date: 2010-06-19 19:33:12 +0000 (Sat, 19 Jun 2010) $ $Revision: 6900 $
Author:
stack

Field Summary
static java.lang.String DEFAULT_PREFIX
          Default file prefix.
static java.lang.String DEFAULT_SUFFIX
          Default for file suffix.
static java.lang.String HOSTNAME_ADMINPORT_VARIABLE
          Value to interpolate with actual hostname-port.
static java.lang.String HOSTNAME_VARIABLE
          Value to interpolate with actual hostname.
static java.lang.String UTF8
           
 
Fields inherited from interface org.archive.io.ArchiveFileConstants
ABSOLUTE_OFFSET_KEY, CDX, CDX_FILE, CDX_LINE_BUFFER_SIZE, COMPRESSED_FILE_EXTENSION, CRLF, DATE_FIELD_KEY, DEFAULT_DIGEST_METHOD, DOT_COMPRESSED_FILE_EXTENSION, DUMP, GZIP_DUMP, HEADER, INVALID_SUFFIX, LENGTH_FIELD_KEY, MIMETYPE_FIELD_KEY, NOHEAD, OCCUPIED_SUFFIX, READER_IDENTIFIER_FIELD_KEY, RECORD_IDENTIFIER_FIELD_KEY, SINGLE_SPACE, TYPE_FIELD_KEY, URL_FIELD_KEY, VERSION_FIELD_KEY
 
Constructor Summary
  WriterPoolMember(java.util.concurrent.atomic.AtomicInteger serialNo, java.util.List<java.io.File> dirs, java.lang.String prefix, boolean cmprs, long maxSize, java.lang.String extension)
          Constructor.
  WriterPoolMember(java.util.concurrent.atomic.AtomicInteger serialNo, java.util.List<java.io.File> dirs, java.lang.String prefix, java.lang.String suffix, boolean cmprs, long maxSize, java.lang.String extension)
          Constructor.
protected WriterPoolMember(java.util.concurrent.atomic.AtomicInteger serialNo, java.io.OutputStream out, java.io.File file, boolean cmprs, java.lang.String a14DigitDate)
          Constructor.
 
Method Summary
 void checkSize()
          Call this method just before/after any significant write.
protected  java.io.File checkWriteable(java.io.File d)
           
 void close()
           
protected  long copyFrom(java.io.InputStream is, long recordLength, boolean enforceLength)
          Copy bytes from the provided InputStream to the target file/stream being written.
protected  java.lang.String createFile()
          Create a new file.
protected  java.lang.String createFile(java.io.File file)
           
protected  void flush()
           
protected  java.lang.String getBaseFilename()
          Get the file name
protected  java.lang.String getCreateTimestamp()
           
 java.io.File getFile()
          Get this file.
protected  java.io.File getNextDirectory(java.util.List<java.io.File> dirs)
           
protected  java.io.OutputStream getOutputStream()
           
 long getPosition()
          Postion in current physical file.
protected  TimestampSerialno getTimestampSerialNo()
           
protected  TimestampSerialno getTimestampSerialNo(java.lang.String timestamp)
          Do static synchronization around getting of counter and timestamp so no chance of a thread getting in between the getting of timestamp and allocation of serial number throwing the two out of alignment.
 boolean isCompressed()
           
protected  void postWriteRecordTasks()
          Post file write tasks.
protected  void preWriteRecordTasks()
          Post write tasks.
protected  void readFullyFrom(java.io.InputStream is, long recordLength, byte[] b)
          Deprecated. Use copyFrom(InputStream,long,boolean) instead
protected  void readToLimitFrom(java.io.InputStream is, long limit, byte[] b)
          Deprecated. Use copyFrom(InputStream,long,boolean) instead
protected  int write(byte[] b)
           
protected  int write(byte[] b, int off, int len)
           
protected  int write(int b)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

UTF8

public static final java.lang.String UTF8
See Also:
Constant Field Values

DEFAULT_PREFIX

public static final java.lang.String DEFAULT_PREFIX
Default file prefix. Stands for Internet Archive Heritrix.

See Also:
Constant Field Values

HOSTNAME_VARIABLE

public static final java.lang.String HOSTNAME_VARIABLE
Value to interpolate with actual hostname.

See Also:
Constant Field Values

HOSTNAME_ADMINPORT_VARIABLE

public static final java.lang.String HOSTNAME_ADMINPORT_VARIABLE
Value to interpolate with actual hostname-port.

See Also:
Constant Field Values

DEFAULT_SUFFIX

public static final java.lang.String DEFAULT_SUFFIX
Default for file suffix.

See Also:
Constant Field Values
Constructor Detail

WriterPoolMember

protected WriterPoolMember(java.util.concurrent.atomic.AtomicInteger serialNo,
                           java.io.OutputStream out,
                           java.io.File file,
                           boolean cmprs,
                           java.lang.String a14DigitDate)
                    throws java.io.IOException
Constructor. Takes a stream. Use with caution. There is no upperbound check on size. Will just keep writing.

Parameters:
serialNo - used to create unique filename sequences
out - Where to write.
file - File the out is connected to.
cmprs - Compress the content written.
a14DigitDate - If null, we'll write current time.
Throws:
java.io.IOException

WriterPoolMember

public WriterPoolMember(java.util.concurrent.atomic.AtomicInteger serialNo,
                        java.util.List<java.io.File> dirs,
                        java.lang.String prefix,
                        boolean cmprs,
                        long maxSize,
                        java.lang.String extension)
Constructor.

Parameters:
serialNo - used to create unique filename sequences
dirs - Where to drop files.
prefix - File prefix to use.
cmprs - Compress the records written.
maxSize - Maximum size for ARC files written.
extension - Extension to give file.

WriterPoolMember

public WriterPoolMember(java.util.concurrent.atomic.AtomicInteger serialNo,
                        java.util.List<java.io.File> dirs,
                        java.lang.String prefix,
                        java.lang.String suffix,
                        boolean cmprs,
                        long maxSize,
                        java.lang.String extension)
Constructor.

Parameters:
serialNo - used to create unique filename sequences
dirs - Where to drop files.
prefix - File prefix to use.
cmprs - Compress the records written.
maxSize - Maximum size for ARC files written.
suffix - File tail to use. If null, unused.
extension - Extension to give file.
Method Detail

checkSize

public void checkSize()
               throws java.io.IOException
Call this method just before/after any significant write. Call at the end of the writing of a record or just before we start writing a new record. Will close current file and open a new file if file size has passed out maxSize.

Creates and opens a file if none already open. One use of this method then is after construction, call this method to add the metadata, then call getPosition() to find offset of first record.

Throws:
java.io.IOException

createFile

protected java.lang.String createFile()
                               throws java.io.IOException
Create a new file. Rotates off the current Writer and creates a new in its place to take subsequent writes. Usually called from checkSize().

Returns:
Name of file created.
Throws:
java.io.IOException

createFile

protected java.lang.String createFile(java.io.File file)
                               throws java.io.IOException
Throws:
java.io.IOException

getNextDirectory

protected java.io.File getNextDirectory(java.util.List<java.io.File> dirs)
                                 throws java.io.IOException
Parameters:
dirs - List of File objects that point at directories.
Returns:
Find next directory to write an arc too. If more than one, it tries to round-robin through each in turn.
Throws:
java.io.IOException

checkWriteable

protected java.io.File checkWriteable(java.io.File d)

getTimestampSerialNo

protected TimestampSerialno getTimestampSerialNo()

getTimestampSerialNo

protected TimestampSerialno getTimestampSerialNo(java.lang.String timestamp)
Do static synchronization around getting of counter and timestamp so no chance of a thread getting in between the getting of timestamp and allocation of serial number throwing the two out of alignment.

Parameters:
timestamp - If non-null, use passed timestamp (must be 14 digit ARC format), else if null, timestamp with now.
Returns:
Instance of data structure that has timestamp and serial no.

getBaseFilename

protected java.lang.String getBaseFilename()
Get the file name

Returns:
the filename, as if uncompressed

getFile

public java.io.File getFile()
Get this file. Used by junit test to test for creation and when WriterPool wants to invalidate a file.

Returns:
The current file.

preWriteRecordTasks

protected void preWriteRecordTasks()
                            throws java.io.IOException
Post write tasks. Has side effects. Will open new file if we're at the upperbound. If we're writing compressed files, it will wrap output stream with a GZIP writer with side effect that GZIP header is written out on the stream.

Throws:
java.io.IOException

postWriteRecordTasks

protected void postWriteRecordTasks()
                             throws java.io.IOException
Post file write tasks. If compressed, finishes up compression and flushes stream so any subsequent checks get good reading.

Throws:
java.io.IOException

getPosition

public long getPosition()
                 throws java.io.IOException
Postion in current physical file. Used making accounting of bytes written.

Returns:
Position in underlying file. Call before or after writing records *only* to be safe.
Throws:
java.io.IOException

isCompressed

public boolean isCompressed()

write

protected int write(byte[] b)
             throws java.io.IOException
Returns:
number of bytes written, which is always b.length
Throws:
java.io.IOException

flush

protected void flush()
              throws java.io.IOException
Throws:
java.io.IOException

write

protected int write(byte[] b,
                    int off,
                    int len)
             throws java.io.IOException
Returns:
Throws:
java.io.IOException

write

protected int write(int b)
             throws java.io.IOException
Returns:
Throws:
java.io.IOException

readFullyFrom

protected void readFullyFrom(java.io.InputStream is,
                             long recordLength,
                             byte[] b)
                      throws java.io.IOException
Deprecated. Use copyFrom(InputStream,long,boolean) instead

Throws:
java.io.IOException

readToLimitFrom

protected void readToLimitFrom(java.io.InputStream is,
                               long limit,
                               byte[] b)
                        throws java.io.IOException
Deprecated. Use copyFrom(InputStream,long,boolean) instead

Throws:
java.io.IOException

copyFrom

protected long copyFrom(java.io.InputStream is,
                        long recordLength,
                        boolean enforceLength)
                 throws java.io.IOException
Copy bytes from the provided InputStream to the target file/stream being written.

Parameters:
is - InputStream to copy bytes from
recordLength - expected number of bytes to copy
enforceLength - whether to throw an exception if too many/too few bytes are available from stream
Returns:
number of bytes written (normally equal to enforceLength)
Throws:
java.io.IOException

close

public void close()
           throws java.io.IOException
Throws:
java.io.IOException

getOutputStream

protected java.io.OutputStream getOutputStream()

getCreateTimestamp

protected java.lang.String getCreateTimestamp()


Copyright © 2003-2011 Internet Archive. All Rights Reserved.