org.archive.io.arc
Class ARCRecordMetaData

java.lang.Object
  extended by org.archive.io.arc.ARCRecordMetaData
All Implemented Interfaces:
ARCConstants, ArchiveFileConstants, ArchiveRecordHeader

public class ARCRecordMetaData
extends java.lang.Object
implements ArchiveRecordHeader, ARCConstants

An immutable class to hold an ARC record meta data.

Author:
stack

Field Summary
protected  java.util.Map headerFields
          Map of record header fields.
 
Fields inherited from interface org.archive.io.arc.ARCConstants
ARC_FILE_EXTENSION, ARC_GZIP_EXTRA_FIELD, ARC_MAGIC_NUMBER, CHECKSUM_FIELD_KEY, CHECKSUM_HEADER_FIELD_KEY, CODE_HEADER_FIELD_KEY, COMPRESSED_ARC_FILE_EXTENSION, DEFAULT_ENCODING, DEFAULT_GZIP_HEADER_LENGTH, DEFAULT_MAX_ARC_FILE_SIZE, DOT_ARC_FILE_EXTENSION, DOT_COMPRESSED_ARC_FILE_EXTENSION, DOT_COMPRESSED_FILE_EXTENSION, FILENAME_FIELD_KEY, FILENAME_HEADER_FIELD_KEY, GZIP_HEADER_BEGIN, HEADER_FIELD_SEPARATOR, IP_HEADER_FIELD_KEY, LINE_SEPARATOR, LOCATION_HEADER_FIELD_KEY, MAX_METADATA_LINE_LENGTH, MINIMUM_RECORD_LENGTH, OFFSET_FIELD_KEY, OFFSET_HEADER_FIELD_KEY, REQUIRED_VERSION_1_HEADER_FIELDS, STATUSCODE_FIELD_KEY, TOKENIZED_PREFIX
 
Fields inherited from interface org.archive.io.ArchiveFileConstants
ABSOLUTE_OFFSET_KEY, CDX, CDX_FILE, CDX_LINE_BUFFER_SIZE, COMPRESSED_FILE_EXTENSION, CRLF, DATE_FIELD_KEY, DEFAULT_DIGEST_METHOD, DUMP, GZIP_DUMP, HEADER, INVALID_SUFFIX, LENGTH_FIELD_KEY, MIMETYPE_FIELD_KEY, NOHEAD, OCCUPIED_SUFFIX, READER_IDENTIFIER_FIELD_KEY, RECORD_IDENTIFIER_FIELD_KEY, SINGLE_SPACE, TYPE_FIELD_KEY, URL_FIELD_KEY, VERSION_FIELD_KEY
 
Constructor Summary
protected ARCRecordMetaData()
          Shut down the default constructor.
  ARCRecordMetaData(java.lang.String arc, java.util.Map headerFields)
          Constructor.
 
Method Summary
 java.lang.String getArc()
           
 java.io.File getArcFile()
           
 int getContentBegin()
          Offset at which the content begins.
 java.lang.String getDate()
          Get the time when the record was harvested.
 java.lang.String getDigest()
           
 java.util.Set getHeaderFieldKeys()
           
 java.util.Map getHeaderFields()
           
 java.lang.Object getHeaderValue(java.lang.String key)
           
 java.lang.String getIp()
           
 long getLength()
           
 java.lang.String getMimetype()
           
 long getOffset()
           
 java.lang.String getReaderIdentifier()
           
 java.lang.String getRecordIdentifier()
           
 java.lang.String getStatusCode()
           
 java.lang.String getUrl()
           
 java.lang.String getVersion()
           
(package private)  void setContentBegin(int offset)
           
 void setDigest(java.lang.String d)
           
 void setStatusCode(java.lang.String statusCode)
           
protected  void testRequiredField(java.util.Map fields, java.lang.String requiredField)
          Test required field is present in hash.
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

headerFields

protected java.util.Map headerFields
Map of record header fields. We store all in a hashmap. This way we can hold version 1 or version 2 record meta data.

Keys are lowercase.

Constructor Detail

ARCRecordMetaData

protected ARCRecordMetaData()
Shut down the default constructor.


ARCRecordMetaData

public ARCRecordMetaData(java.lang.String arc,
                         java.util.Map headerFields)
                  throws java.io.IOException
Constructor.

Parameters:
arc - The arc file this metadata came out of.
headerFields - Hash of meta fields.
Throws:
java.io.IOException
Method Detail

testRequiredField

protected void testRequiredField(java.util.Map fields,
                                 java.lang.String requiredField)
                          throws java.io.IOException
Test required field is present in hash.

Parameters:
fields - Map of fields.
requiredField - Field to test for.
Throws:
java.io.IOException - If required field is not present.

getDate

public java.lang.String getDate()
Get the time when the record was harvested.

Returns the date in Heritrix 14 digit time format (UTC). See the ArchiveUtils class for converting to Java dates.

Specified by:
getDate in interface ArchiveRecordHeader
Returns:
Header date in Heritrix 14 digit format.
See Also:
ArchiveUtils.parse14DigitDate(String)

getLength

public long getLength()
Specified by:
getLength in interface ArchiveRecordHeader
Returns:
Return length of the record.

getUrl

public java.lang.String getUrl()
Specified by:
getUrl in interface ArchiveRecordHeader
Returns:
Header url.

getIp

public java.lang.String getIp()
Returns:
IP.

getMimetype

public java.lang.String getMimetype()
Specified by:
getMimetype in interface ArchiveRecordHeader
Returns:
mimetype The mimetype that is in the ARC metaline -- NOT the http content-type content.

getVersion

public java.lang.String getVersion()
Specified by:
getVersion in interface ArchiveRecordHeader
Returns:
Arcfile version.

getOffset

public long getOffset()
Specified by:
getOffset in interface ArchiveRecordHeader
Returns:
Offset into arcfile at which this record begins.

getHeaderValue

public java.lang.Object getHeaderValue(java.lang.String key)
Specified by:
getHeaderValue in interface ArchiveRecordHeader
Parameters:
key - Key to use looking up field value.
Returns:
value for passed key of null if no such entry.

getHeaderFieldKeys

public java.util.Set getHeaderFieldKeys()
Specified by:
getHeaderFieldKeys in interface ArchiveRecordHeader
Returns:
Header field name keys.

getHeaderFields

public java.util.Map getHeaderFields()
Specified by:
getHeaderFields in interface ArchiveRecordHeader
Returns:
Map of header fields.

getArc

public java.lang.String getArc()
Returns:
Returns identifier for ARC.

getArcFile

public java.io.File getArcFile()
Returns:
Convenience method that does a return new File(this.arc) (Be aware this.arc is not always full path to an ARC file -- may be an URL). Test returned file for existence.

getDigest

public java.lang.String getDigest()
Specified by:
getDigest in interface ArchiveRecordHeader
Returns:
Returns the digest.

setDigest

public void setDigest(java.lang.String d)
Parameters:
d - The digest to set.

getStatusCode

public java.lang.String getStatusCode()
Returns:
Returns the statusCode. May be null.

setStatusCode

public void setStatusCode(java.lang.String statusCode)
Parameters:
statusCode - The statusCode to set.

toString

public java.lang.String toString()
Specified by:
toString in interface ArchiveRecordHeader
Overrides:
toString in class java.lang.Object

getReaderIdentifier

public java.lang.String getReaderIdentifier()
Specified by:
getReaderIdentifier in interface ArchiveRecordHeader
Returns:
Returns identifier for current Archive file. Be aware this may not be a file name or file path. It may just be an URL. Depends on how Archive file was made.

getRecordIdentifier

public java.lang.String getRecordIdentifier()
Specified by:
getRecordIdentifier in interface ArchiveRecordHeader
Returns:
Identifier for the record. If ARC, the URL + date. If WARC, the GUID assigned.

getContentBegin

public int getContentBegin()
Description copied from interface: ArchiveRecordHeader
Offset at which the content begins. For ARCs, its used to delimit where http headers end and content begins. For WARCs, its end of Named Fields before payload starts.

Specified by:
getContentBegin in interface ArchiveRecordHeader

setContentBegin

void setContentBegin(int offset)


Copyright © 2003-2011 Internet Archive. All Rights Reserved.