org.archive.io
Class ArchiveRecord

java.lang.Object
  extended by java.io.InputStream
      extended by org.archive.io.ArchiveRecord
All Implemented Interfaces:
java.io.Closeable
Direct Known Subclasses:
ARCRecord, WARCRecord

public abstract class ArchiveRecord
extends java.io.InputStream

Archive file Record.

Version:
$Date: 2010-04-09 18:59:44 +0000 (Fri, 09 Apr 2010) $ $Version$
Author:
stack

Field Summary
protected  java.security.MessageDigest digest
          Compute digest on what we read and add to metadata when done.
(package private)  boolean eor
          Set flag when we've reached the end-of-record.
(package private)  ArchiveRecordHeader header
           
(package private)  java.io.InputStream in
          Stream to read this record from.
(package private)  long position
          Position w/i the Record content, within in.
(package private)  boolean strict
           
 
Constructor Summary
ArchiveRecord(java.io.InputStream in)
          Constructor.
ArchiveRecord(java.io.InputStream in, ArchiveRecordHeader header)
          Constructor.
ArchiveRecord(java.io.InputStream in, ArchiveRecordHeader header, int bodyOffset, boolean digest, boolean strict)
          Constructor.
 
Method Summary
 int available()
          This available is not the stream's available.
 void close()
          Calling close on a record skips us past this record to the next record in the stream.
 void dump()
          Writes output on STDOUT.
 void dump(java.io.OutputStream os)
          Writes output on passed os.
protected  java.lang.String getDigest4Cdx(ArchiveRecordHeader h)
           
 java.lang.String getDigestStr()
           
 ArchiveRecordHeader getHeader()
           
protected  java.io.InputStream getIn()
           
protected  java.lang.String getIp4Cdx(ArchiveRecordHeader h)
           
protected  java.lang.String getMimetype4Cdx(ArchiveRecordHeader h)
           
protected  long getPosition()
           
protected  java.lang.String getStatusCode4Cdx(ArchiveRecordHeader h)
           
protected  void incrementPosition()
           
protected  void incrementPosition(long incr)
           
protected  boolean isEor()
           
 boolean isStrict()
           
 boolean markSupported()
           
protected  java.lang.String outputCdx(java.lang.String strippedFileName)
           
 int read()
           
 int read(byte[] b, int offset, int length)
           
protected  void setEor(boolean eor)
           
protected  void setHeader(ArchiveRecordHeader header)
           
 void setStrict(boolean strict)
           
(package private)  void skip()
          Skip over this records content.
 long skip(long n)
           
 
Methods inherited from class java.io.InputStream
mark, read, reset
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

header

ArchiveRecordHeader header

in

java.io.InputStream in
Stream to read this record from. Stream can only be read sequentially. Will only return this records' content returning a -1 if you try to read beyond the end of the current record.

Streams can be markable or not. If they are, we'll be able to roll back when we've read too far. If not markable, assumption is that the underlying stream is managing our not reading too much (This pertains to the skipping over the end of the ARCRecord. See skip().


position

long position
Position w/i the Record content, within in. This position is relative within this Record. Its not same as the Archive file position.


eor

boolean eor
Set flag when we've reached the end-of-record.


digest

protected java.security.MessageDigest digest
Compute digest on what we read and add to metadata when done. Currently hardcoded as sha-1. TODO: Remove when archive records digest or else, add a facility that allows the arc reader to compare the calculated digest to that which is recorded in the arc.

Protected instead of private so subclasses can update and complete the digest.


strict

boolean strict
Constructor Detail

ArchiveRecord

public ArchiveRecord(java.io.InputStream in)
              throws java.io.IOException
Constructor.

Parameters:
in - Stream cue'd up to be at the start of the record this instance is to represent.
Throws:
java.io.IOException

ArchiveRecord

public ArchiveRecord(java.io.InputStream in,
                     ArchiveRecordHeader header)
              throws java.io.IOException
Constructor.

Parameters:
in - Stream cue'd up to be at the start of the record this instance is to represent.
header - Header data.
Throws:
java.io.IOException

ArchiveRecord

public ArchiveRecord(java.io.InputStream in,
                     ArchiveRecordHeader header,
                     int bodyOffset,
                     boolean digest,
                     boolean strict)
              throws java.io.IOException
Constructor.

Parameters:
in - Stream cue'd up to be at the start of the record this instance is to represent.
header - Header data.
bodyOffset - Offset into the body. Usually 0.
digest - True if we're to calculate digest for this record. Not digesting saves about ~15% of cpu during an ARC parse.
strict - Be strict parsing (Parsing stops if ARC inproperly formatted).
Throws:
java.io.IOException
Method Detail

markSupported

public boolean markSupported()
Overrides:
markSupported in class java.io.InputStream

getHeader

public ArchiveRecordHeader getHeader()
Returns:
Header data for this record.

setHeader

protected void setHeader(ArchiveRecordHeader header)

close

public void close()
           throws java.io.IOException
Calling close on a record skips us past this record to the next record in the stream. It does not actually close the stream. The underlying steam is probably being used by the next arc record.

Specified by:
close in interface java.io.Closeable
Overrides:
close in class java.io.InputStream
Throws:
java.io.IOException

read

public int read()
         throws java.io.IOException
Specified by:
read in class java.io.InputStream
Returns:
Next character in this Record content else -1 if at EOR.
Throws:
java.io.IOException

read

public int read(byte[] b,
                int offset,
                int length)
         throws java.io.IOException
Overrides:
read in class java.io.InputStream
Throws:
java.io.IOException

available

public int available()
This available is not the stream's available. Its an available based on what the stated Archive record length is minus what we've read to date.

Overrides:
available in class java.io.InputStream
Returns:
bytes remaining in record content.

skip

void skip()
    throws java.io.IOException
Skip over this records content.

Throws:
java.io.IOException

skip

public long skip(long n)
          throws java.io.IOException
Overrides:
skip in class java.io.InputStream
Throws:
java.io.IOException

isStrict

public boolean isStrict()
Returns:
Returns the strict.

setStrict

public void setStrict(boolean strict)
Parameters:
strict - The strict to set.

getIn

protected java.io.InputStream getIn()

getDigestStr

public java.lang.String getDigestStr()

incrementPosition

protected void incrementPosition()

incrementPosition

protected void incrementPosition(long incr)

getPosition

protected long getPosition()

isEor

protected boolean isEor()

setEor

protected void setEor(boolean eor)

getStatusCode4Cdx

protected java.lang.String getStatusCode4Cdx(ArchiveRecordHeader h)

getIp4Cdx

protected java.lang.String getIp4Cdx(ArchiveRecordHeader h)

getDigest4Cdx

protected java.lang.String getDigest4Cdx(ArchiveRecordHeader h)

getMimetype4Cdx

protected java.lang.String getMimetype4Cdx(ArchiveRecordHeader h)

outputCdx

protected java.lang.String outputCdx(java.lang.String strippedFileName)
                              throws java.io.IOException
Throws:
java.io.IOException

dump

public void dump()
          throws java.io.IOException
Writes output on STDOUT.

Throws:
java.io.IOException

dump

public void dump(java.io.OutputStream os)
          throws java.io.IOException
Writes output on passed os.

Throws:
java.io.IOException


Copyright © 2003-2011 Internet Archive. All Rights Reserved.