org.archive.io.arc
Class ARCRecord
java.lang.Object
java.io.InputStream
org.archive.io.ArchiveRecord
org.archive.io.arc.ARCRecord
- All Implemented Interfaces:
- java.io.Closeable, ARCConstants, ArchiveFileConstants
public class ARCRecord
- extends ArchiveRecord
- implements ARCConstants
An ARC file record.
Does not compass the ARCRecord metadata line, just the record content.
- Author:
- stack
Fields inherited from interface org.archive.io.arc.ARCConstants |
ARC_FILE_EXTENSION, ARC_GZIP_EXTRA_FIELD, ARC_MAGIC_NUMBER, CHECKSUM_FIELD_KEY, CHECKSUM_HEADER_FIELD_KEY, CODE_HEADER_FIELD_KEY, COMPRESSED_ARC_FILE_EXTENSION, DEFAULT_ENCODING, DEFAULT_GZIP_HEADER_LENGTH, DEFAULT_MAX_ARC_FILE_SIZE, DOT_ARC_FILE_EXTENSION, DOT_COMPRESSED_ARC_FILE_EXTENSION, DOT_COMPRESSED_FILE_EXTENSION, FILENAME_FIELD_KEY, FILENAME_HEADER_FIELD_KEY, GZIP_HEADER_BEGIN, HEADER_FIELD_SEPARATOR, IP_HEADER_FIELD_KEY, LINE_SEPARATOR, LOCATION_HEADER_FIELD_KEY, MAX_METADATA_LINE_LENGTH, MINIMUM_RECORD_LENGTH, OFFSET_FIELD_KEY, OFFSET_HEADER_FIELD_KEY, REQUIRED_VERSION_1_HEADER_FIELDS, STATUSCODE_FIELD_KEY, TOKENIZED_PREFIX |
Fields inherited from interface org.archive.io.ArchiveFileConstants |
ABSOLUTE_OFFSET_KEY, CDX, CDX_FILE, CDX_LINE_BUFFER_SIZE, COMPRESSED_FILE_EXTENSION, CRLF, DATE_FIELD_KEY, DEFAULT_DIGEST_METHOD, DUMP, GZIP_DUMP, HEADER, INVALID_SUFFIX, LENGTH_FIELD_KEY, MIMETYPE_FIELD_KEY, NOHEAD, OCCUPIED_SUFFIX, READER_IDENTIFIER_FIELD_KEY, RECORD_IDENTIFIER_FIELD_KEY, SINGLE_SPACE, TYPE_FIELD_KEY, URL_FIELD_KEY, VERSION_FIELD_KEY |
Methods inherited from class org.archive.io.ArchiveRecord |
available, close, dump, dump, getDigestStr, getHeader, getIn, getMimetype4Cdx, getPosition, incrementPosition, incrementPosition, isEor, isStrict, markSupported, outputCdx, setEor, setHeader, setStrict, skip |
Methods inherited from class java.io.InputStream |
mark, read, reset |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ARCRecord
public ARCRecord(java.io.InputStream in,
ArchiveRecordHeader metaData)
throws java.io.IOException
- Constructor.
- Parameters:
in
- Stream cue'd up to be at the start of the record this instance
is to represent.metaData
- Meta data.
- Throws:
java.io.IOException
ARCRecord
public ARCRecord(java.io.InputStream in,
ArchiveRecordHeader metaData,
int bodyOffset,
boolean digest,
boolean strict,
boolean parseHttpHeaders)
throws java.io.IOException
- Constructor.
- Parameters:
in
- Stream cue'd up to be at the start of the record this instance
is to represent.metaData
- Meta data.bodyOffset
- Offset into the body. Usually 0.digest
- True if we're to calculate digest for this record. Not
digesting saves about ~15% of cpu during an ARC parse.strict
- Be strict parsing (Parsing stops if ARC inproperly
formatted).parseHttpHeaders
- True if we are to parse HTTP headers. Costs
about ~20% of CPU during an ARC parse.
- Throws:
java.io.IOException
getHeaderString
public java.lang.String getHeaderString()
skipHttpHeader
public void skipHttpHeader()
throws java.io.IOException
- Skip over the the http header if one present.
Subsequent reads will get the body.
Calling this method in the midst of reading the header
will make for strange results. Otherwise, safe to call
at any time though before reading any of the arc record
content is only time that it makes sense.
After calling this method, you can call
getHttpHeaders()
to get the read http header.
- Throws:
java.io.IOException
dumpHttpHeader
public void dumpHttpHeader()
throws java.io.IOException
- Throws:
java.io.IOException
getStatusCode
public int getStatusCode()
- Return status code for this record.
This method will return -1 until the http header has been read.
- Returns:
- Status code.
getMetaData
public ARCRecordMetaData getMetaData()
- Returns:
- Meta data for this record.
getHttpHeaders
public org.apache.commons.httpclient.Header[] getHttpHeaders()
- Returns:
- http headers (Only available after header has been read).
read
public int read()
throws java.io.IOException
- Overrides:
read
in class ArchiveRecord
- Returns:
- Next character in this ARCRecord's content else -1 if at end of
this record.
- Throws:
java.io.IOException
read
public int read(byte[] b,
int offset,
int length)
throws java.io.IOException
- Overrides:
read
in class ArchiveRecord
- Throws:
java.io.IOException
getBodyOffset
public int getBodyOffset()
- Returns:
- Offset at which the body begins (Only known after
header has been read) or -1 if none or if we haven't read
headers yet. Usually length of HTTP headers (does not include ARC
metadata line length).
getIp4Cdx
protected java.lang.String getIp4Cdx(ArchiveRecordHeader h)
- Overrides:
getIp4Cdx
in class ArchiveRecord
getStatusCode4Cdx
protected java.lang.String getStatusCode4Cdx(ArchiveRecordHeader h)
- Overrides:
getStatusCode4Cdx
in class ArchiveRecord
getDigest4Cdx
protected java.lang.String getDigest4Cdx(ArchiveRecordHeader h)
- Overrides:
getDigest4Cdx
in class ArchiveRecord
Copyright © 2003-2011 Internet Archive. All Rights Reserved.