org.archive.io.warc
Interface WARCConstants

All Superinterfaces:
ArchiveFileConstants
All Known Implementing Classes:
WARCReader, WARCReaderFactory, WARCReaderFactory.CompressedWARCReader, WARCReaderFactory.UncompressedWARCReader, WARCRecord, WARCWriter, WARCWriterProcessor

public interface WARCConstants
extends ArchiveFileConstants

WARC Constants used by WARC readers and writers. Below constants are used WARC Reader/Writer.

Version:
$Revision: 6528 $ $Date: 2009-09-29 21:52:33 +0000 (Tue, 29 Sep 2009) $
Author:
stack

Field Summary
static java.lang.String COLON_SPACE
           
static java.lang.String COMPRESSED_WARC_FILE_EXTENSION
          Compressed WARC file extension.
static java.lang.String CONTENT_DESCRIPTION
           
static java.lang.String CONTENT_LENGTH
           
static java.lang.String CONTENT_TYPE
           
static java.lang.String CONTINUATION
           
static int CONTINUATION_INDEX
           
static java.lang.String CONVERSION
           
static int CONVERSION_INDEX
           
static java.lang.String DEFAULT_ENCODING
          Encoding to use getting bytes from strings.
static int DEFAULT_MAX_WARC_FILE_SIZE
          Default maximum WARC file size.
static java.lang.String DOT_COMPRESSED_FILE_EXTENSION
           
static java.lang.String DOT_COMPRESSED_WARC_FILE_EXTENSION
          Compressed dot WARC file extension.
static java.lang.String DOT_WARC_FILE_EXTENSION
          Dot WARC file extension.
static java.lang.String FTP_CONTROL_CONVERSATION_MIMETYPE
           
static java.lang.String[] HEADER_FIELD_KEYS
           
static char HEADER_FIELD_SEPARATOR
          Header field seperator character.
static java.lang.String HEADER_KEY_BLOCK_DIGEST
           
static java.lang.String HEADER_KEY_CONCURRENT_TO
           
static java.lang.String HEADER_KEY_DATE
           
static java.lang.String HEADER_KEY_ETAG
           
static java.lang.String HEADER_KEY_FILENAME
           
static java.lang.String HEADER_KEY_ID
           
static java.lang.String HEADER_KEY_IP
           
static java.lang.String HEADER_KEY_LAST_MODIFIED
           
static java.lang.String HEADER_KEY_PAYLOAD_DIGEST
           
static java.lang.String HEADER_KEY_PROFILE
           
static java.lang.String HEADER_KEY_TRUNCATED
           
static java.lang.String HEADER_KEY_TYPE
           
static java.lang.String HEADER_KEY_URI
           
static java.lang.String HEADER_LINE_ENCODING
           
static java.lang.String HTTP_REQUEST_MIMETYPE
          To be safe, lets use application type rather than message.
static java.lang.String HTTP_RESPONSE_MIMETYPE
           
static int MAX_LINE_LENGTH
           
static int MAX_WARC_HEADER_LINE_LENGTH
          Assumed maximum size of a Header Line.
static java.lang.String METADATA
           
static int METADATA_INDEX
           
static java.lang.String NAMED_FIELD_CHECKSUM_LABEL
           
static java.lang.String NAMED_FIELD_DESCRIPTION
           
static java.lang.String NAMED_FIELD_FILEDESC
           
static java.lang.String NAMED_FIELD_IP_LABEL
           
static java.lang.String NAMED_FIELD_RELATED_LABEL
           
static java.lang.String NAMED_FIELD_TRUNCATED
           
static java.lang.String NAMED_FIELD_TRUNCATED_VALUE_HEAD
           
static java.lang.String NAMED_FIELD_TRUNCATED_VALUE_LENGTH
           
static java.lang.String NAMED_FIELD_TRUNCATED_VALUE_TIME
           
static java.lang.String NAMED_FIELD_TRUNCATED_VALUE_UNSPECIFIED
           
static java.lang.String NAMED_FIELD_WARCFILENAME
           
static java.lang.String PLACEHOLDER_RECORD_LENGTH_STRING
          Placeholder for length in Header line.
static java.lang.String PROFILE_REVISIT_IDENTICAL_DIGEST
           
static java.lang.String PROFILE_REVISIT_NOT_MODIFIED
           
static java.lang.String REQUEST
           
static int REQUEST_INDEX
           
static java.lang.String RESOURCE
           
static int RESOURCE_INDEX
           
static java.lang.String RESPONSE
           
static int RESPONSE_INDEX
           
static java.lang.String REVISIT
           
static int REVISIT_INDEX
           
static java.lang.String TRUNCATED_VALUE_UNSPECIFIED
           
static java.lang.String TYPE
           
static java.lang.String[] TYPES
           
static java.util.List TYPES_LIST
           
static java.lang.String WARC_010_ID
           
static java.lang.String WARC_010_MAGIC
           
static java.lang.String WARC_FILE_EXTENSION
          WARC file extention.
static java.lang.String WARC_HEADER_ENCODING
           
static java.lang.String WARC_ID
          WARC-ID
static java.lang.String WARC_MAGIC
          WARC MAGIC WARC files and records begin with this sequence.
static java.lang.String WARC_VERSION
          Hard-coded version for WARC files made with this code.
static java.lang.String WARCINFO
          WARC Record Types.
static int WARCINFO_INDEX
           
static java.lang.Character[] WSP
          WSP One of a space or horizontal tab character.
 
Fields inherited from interface org.archive.io.ArchiveFileConstants
ABSOLUTE_OFFSET_KEY, CDX, CDX_FILE, CDX_LINE_BUFFER_SIZE, COMPRESSED_FILE_EXTENSION, CRLF, DATE_FIELD_KEY, DEFAULT_DIGEST_METHOD, DUMP, GZIP_DUMP, HEADER, INVALID_SUFFIX, LENGTH_FIELD_KEY, MIMETYPE_FIELD_KEY, NOHEAD, OCCUPIED_SUFFIX, READER_IDENTIFIER_FIELD_KEY, RECORD_IDENTIFIER_FIELD_KEY, SINGLE_SPACE, TYPE_FIELD_KEY, URL_FIELD_KEY, VERSION_FIELD_KEY
 

Field Detail

DEFAULT_MAX_WARC_FILE_SIZE

static final int DEFAULT_MAX_WARC_FILE_SIZE
Default maximum WARC file size. 1Gig.

See Also:
Constant Field Values

WARC_MAGIC

static final java.lang.String WARC_MAGIC
WARC MAGIC WARC files and records begin with this sequence.

See Also:
Constant Field Values

WARC_010_MAGIC

static final java.lang.String WARC_010_MAGIC
See Also:
Constant Field Values

WARC_VERSION

static final java.lang.String WARC_VERSION
Hard-coded version for WARC files made with this code. conforms to ISO 28500:2009 as of May 2009

See Also:
Constant Field Values

MAX_WARC_HEADER_LINE_LENGTH

static final int MAX_WARC_HEADER_LINE_LENGTH
Assumed maximum size of a Header Line. This 100k which seems massive but its the same as the LINE_LENGTH from alexa/include/a_arcio.h:
 #define LINE_LENGTH     (100*1024)
 

See Also:
Constant Field Values

MAX_LINE_LENGTH

static final int MAX_LINE_LENGTH
See Also:
Constant Field Values

WARC_FILE_EXTENSION

static final java.lang.String WARC_FILE_EXTENSION
WARC file extention.

See Also:
Constant Field Values

DOT_WARC_FILE_EXTENSION

static final java.lang.String DOT_WARC_FILE_EXTENSION
Dot WARC file extension.

See Also:
Constant Field Values

DOT_COMPRESSED_FILE_EXTENSION

static final java.lang.String DOT_COMPRESSED_FILE_EXTENSION
See Also:
Constant Field Values

COMPRESSED_WARC_FILE_EXTENSION

static final java.lang.String COMPRESSED_WARC_FILE_EXTENSION
Compressed WARC file extension.

See Also:
Constant Field Values

DOT_COMPRESSED_WARC_FILE_EXTENSION

static final java.lang.String DOT_COMPRESSED_WARC_FILE_EXTENSION
Compressed dot WARC file extension.

See Also:
Constant Field Values

DEFAULT_ENCODING

static final java.lang.String DEFAULT_ENCODING
Encoding to use getting bytes from strings. Specify an encoding rather than leave it to chance: i.e whatever the JVMs encoding. Use an encoding that gets the stream as bytes, not chars. ARC uses ISO-8859-1. By specification, WARC uses UTF-8.

See Also:
Constant Field Values

HEADER_LINE_ENCODING

static final java.lang.String HEADER_LINE_ENCODING
See Also:
Constant Field Values

WARC_HEADER_ENCODING

static final java.lang.String WARC_HEADER_ENCODING
See Also:
Constant Field Values

HEADER_FIELD_KEYS

static final java.lang.String[] HEADER_FIELD_KEYS

WARCINFO

static final java.lang.String WARCINFO
WARC Record Types.

See Also:
Constant Field Values

RESPONSE

static final java.lang.String RESPONSE
See Also:
Constant Field Values

RESOURCE

static final java.lang.String RESOURCE
See Also:
Constant Field Values

REQUEST

static final java.lang.String REQUEST
See Also:
Constant Field Values

METADATA

static final java.lang.String METADATA
See Also:
Constant Field Values

REVISIT

static final java.lang.String REVISIT
See Also:
Constant Field Values

CONVERSION

static final java.lang.String CONVERSION
See Also:
Constant Field Values

CONTINUATION

static final java.lang.String CONTINUATION
See Also:
Constant Field Values

TYPE

static final java.lang.String TYPE
See Also:
Constant Field Values

TYPES

static final java.lang.String[] TYPES

WARCINFO_INDEX

static final int WARCINFO_INDEX
See Also:
Constant Field Values

RESPONSE_INDEX

static final int RESPONSE_INDEX
See Also:
Constant Field Values

RESOURCE_INDEX

static final int RESOURCE_INDEX
See Also:
Constant Field Values

REQUEST_INDEX

static final int REQUEST_INDEX
See Also:
Constant Field Values

METADATA_INDEX

static final int METADATA_INDEX
See Also:
Constant Field Values

REVISIT_INDEX

static final int REVISIT_INDEX
See Also:
Constant Field Values

CONVERSION_INDEX

static final int CONVERSION_INDEX
See Also:
Constant Field Values

CONTINUATION_INDEX

static final int CONTINUATION_INDEX
See Also:
Constant Field Values

TYPES_LIST

static final java.util.List TYPES_LIST

WARC_ID

static final java.lang.String WARC_ID
WARC-ID

See Also:
Constant Field Values

WARC_010_ID

static final java.lang.String WARC_010_ID
See Also:
Constant Field Values

HEADER_FIELD_SEPARATOR

static final char HEADER_FIELD_SEPARATOR
Header field seperator character.

See Also:
Constant Field Values

WSP

static final java.lang.Character[] WSP
WSP One of a space or horizontal tab character. TODO: WSP undefined. Fix.


PLACEHOLDER_RECORD_LENGTH_STRING

static final java.lang.String PLACEHOLDER_RECORD_LENGTH_STRING
Placeholder for length in Header line. Placeholder is same size as the fixed field size allocated for length, 12 characters. 12 characters allows records of size almost 1TB.

See Also:
Constant Field Values

NAMED_FIELD_IP_LABEL

static final java.lang.String NAMED_FIELD_IP_LABEL
See Also:
Constant Field Values

NAMED_FIELD_CHECKSUM_LABEL

static final java.lang.String NAMED_FIELD_CHECKSUM_LABEL
See Also:
Constant Field Values

NAMED_FIELD_RELATED_LABEL

static final java.lang.String NAMED_FIELD_RELATED_LABEL
See Also:
Constant Field Values

NAMED_FIELD_WARCFILENAME

static final java.lang.String NAMED_FIELD_WARCFILENAME
See Also:
Constant Field Values

NAMED_FIELD_DESCRIPTION

static final java.lang.String NAMED_FIELD_DESCRIPTION
See Also:
Constant Field Values

NAMED_FIELD_FILEDESC

static final java.lang.String NAMED_FIELD_FILEDESC
See Also:
Constant Field Values

NAMED_FIELD_TRUNCATED

static final java.lang.String NAMED_FIELD_TRUNCATED
See Also:
Constant Field Values

NAMED_FIELD_TRUNCATED_VALUE_TIME

static final java.lang.String NAMED_FIELD_TRUNCATED_VALUE_TIME
See Also:
Constant Field Values

NAMED_FIELD_TRUNCATED_VALUE_LENGTH

static final java.lang.String NAMED_FIELD_TRUNCATED_VALUE_LENGTH
See Also:
Constant Field Values

NAMED_FIELD_TRUNCATED_VALUE_HEAD

static final java.lang.String NAMED_FIELD_TRUNCATED_VALUE_HEAD
See Also:
Constant Field Values

NAMED_FIELD_TRUNCATED_VALUE_UNSPECIFIED

static final java.lang.String NAMED_FIELD_TRUNCATED_VALUE_UNSPECIFIED

HEADER_KEY_DATE

static final java.lang.String HEADER_KEY_DATE
See Also:
Constant Field Values

HEADER_KEY_TYPE

static final java.lang.String HEADER_KEY_TYPE
See Also:
Constant Field Values

HEADER_KEY_ID

static final java.lang.String HEADER_KEY_ID
See Also:
Constant Field Values

HEADER_KEY_URI

static final java.lang.String HEADER_KEY_URI
See Also:
Constant Field Values

HEADER_KEY_IP

static final java.lang.String HEADER_KEY_IP
See Also:
Constant Field Values

HEADER_KEY_BLOCK_DIGEST

static final java.lang.String HEADER_KEY_BLOCK_DIGEST
See Also:
Constant Field Values

HEADER_KEY_PAYLOAD_DIGEST

static final java.lang.String HEADER_KEY_PAYLOAD_DIGEST
See Also:
Constant Field Values

HEADER_KEY_CONCURRENT_TO

static final java.lang.String HEADER_KEY_CONCURRENT_TO
See Also:
Constant Field Values

HEADER_KEY_TRUNCATED

static final java.lang.String HEADER_KEY_TRUNCATED
See Also:
Constant Field Values

HEADER_KEY_PROFILE

static final java.lang.String HEADER_KEY_PROFILE
See Also:
Constant Field Values

HEADER_KEY_FILENAME

static final java.lang.String HEADER_KEY_FILENAME
See Also:
Constant Field Values

HEADER_KEY_ETAG

static final java.lang.String HEADER_KEY_ETAG
See Also:
Constant Field Values

HEADER_KEY_LAST_MODIFIED

static final java.lang.String HEADER_KEY_LAST_MODIFIED
See Also:
Constant Field Values

PROFILE_REVISIT_IDENTICAL_DIGEST

static final java.lang.String PROFILE_REVISIT_IDENTICAL_DIGEST
See Also:
Constant Field Values

PROFILE_REVISIT_NOT_MODIFIED

static final java.lang.String PROFILE_REVISIT_NOT_MODIFIED
See Also:
Constant Field Values

CONTENT_LENGTH

static final java.lang.String CONTENT_LENGTH
See Also:
Constant Field Values

CONTENT_TYPE

static final java.lang.String CONTENT_TYPE
See Also:
Constant Field Values

CONTENT_DESCRIPTION

static final java.lang.String CONTENT_DESCRIPTION
See Also:
Constant Field Values

COLON_SPACE

static final java.lang.String COLON_SPACE
See Also:
Constant Field Values

TRUNCATED_VALUE_UNSPECIFIED

static final java.lang.String TRUNCATED_VALUE_UNSPECIFIED
See Also:
Constant Field Values

HTTP_REQUEST_MIMETYPE

static final java.lang.String HTTP_REQUEST_MIMETYPE
To be safe, lets use application type rather than message. Regards 'message/http', RFC says "...provided that it obeys the MIME restrictions for all 'message' types regarding line length and encodings." This usually means lines of 1000 octets max (unless a 'Content-Transfer-Encoding: binary' mime header is present).

See Also:
rfc2616 section 19.1, Constant Field Values

HTTP_RESPONSE_MIMETYPE

static final java.lang.String HTTP_RESPONSE_MIMETYPE
See Also:
Constant Field Values

FTP_CONTROL_CONVERSATION_MIMETYPE

static final java.lang.String FTP_CONTROL_CONVERSATION_MIMETYPE
See Also:
Constant Field Values


Copyright © 2003-2011 Internet Archive. All Rights Reserved.