org.archive.io.warc
Class WARCReaderFactory
java.lang.Object
org.archive.io.ArchiveReaderFactory
org.archive.io.warc.WARCReaderFactory
- All Implemented Interfaces:
- ArchiveFileConstants, WARCConstants
public class WARCReaderFactory
- extends ArchiveReaderFactory
- implements WARCConstants
Factory for WARC Readers.
Figures whether to give out a compressed file Reader or an uncompressed
Reader.
- Version:
- $Date: 2006-08-23 17:59:04 -0700 (Wed, 23 Aug 2006) $ $Version$
- Author:
- stack
Fields inherited from interface org.archive.io.warc.WARCConstants |
COLON_SPACE, COMPRESSED_WARC_FILE_EXTENSION, CONTENT_DESCRIPTION, CONTENT_LENGTH, CONTENT_TYPE, CONTINUATION, CONTINUATION_INDEX, CONVERSION, CONVERSION_INDEX, DEFAULT_ENCODING, DEFAULT_MAX_WARC_FILE_SIZE, DOT_COMPRESSED_FILE_EXTENSION, DOT_COMPRESSED_WARC_FILE_EXTENSION, DOT_WARC_FILE_EXTENSION, FTP_CONTROL_CONVERSATION_MIMETYPE, HEADER_FIELD_KEYS, HEADER_FIELD_SEPARATOR, HEADER_KEY_BLOCK_DIGEST, HEADER_KEY_CONCURRENT_TO, HEADER_KEY_DATE, HEADER_KEY_ETAG, HEADER_KEY_FILENAME, HEADER_KEY_ID, HEADER_KEY_IP, HEADER_KEY_LAST_MODIFIED, HEADER_KEY_PAYLOAD_DIGEST, HEADER_KEY_PROFILE, HEADER_KEY_TRUNCATED, HEADER_KEY_TYPE, HEADER_KEY_URI, HEADER_LINE_ENCODING, HTTP_REQUEST_MIMETYPE, HTTP_RESPONSE_MIMETYPE, MAX_LINE_LENGTH, MAX_WARC_HEADER_LINE_LENGTH, METADATA, METADATA_INDEX, NAMED_FIELD_CHECKSUM_LABEL, NAMED_FIELD_DESCRIPTION, NAMED_FIELD_FILEDESC, NAMED_FIELD_IP_LABEL, NAMED_FIELD_RELATED_LABEL, NAMED_FIELD_TRUNCATED, NAMED_FIELD_TRUNCATED_VALUE_HEAD, NAMED_FIELD_TRUNCATED_VALUE_LENGTH, NAMED_FIELD_TRUNCATED_VALUE_TIME, NAMED_FIELD_TRUNCATED_VALUE_UNSPECIFIED, NAMED_FIELD_WARCFILENAME, PLACEHOLDER_RECORD_LENGTH_STRING, PROFILE_REVISIT_IDENTICAL_DIGEST, PROFILE_REVISIT_NOT_MODIFIED, REQUEST, REQUEST_INDEX, RESOURCE, RESOURCE_INDEX, RESPONSE, RESPONSE_INDEX, REVISIT, REVISIT_INDEX, TRUNCATED_VALUE_UNSPECIFIED, TYPE, TYPES, TYPES_LIST, WARC_010_ID, WARC_010_MAGIC, WARC_FILE_EXTENSION, WARC_HEADER_ENCODING, WARC_ID, WARC_MAGIC, WARC_VERSION, WARCINFO, WARCINFO_INDEX, WSP |
Fields inherited from interface org.archive.io.ArchiveFileConstants |
ABSOLUTE_OFFSET_KEY, CDX, CDX_FILE, CDX_LINE_BUFFER_SIZE, COMPRESSED_FILE_EXTENSION, CRLF, DATE_FIELD_KEY, DEFAULT_DIGEST_METHOD, DUMP, GZIP_DUMP, HEADER, INVALID_SUFFIX, LENGTH_FIELD_KEY, MIMETYPE_FIELD_KEY, NOHEAD, OCCUPIED_SUFFIX, READER_IDENTIFIER_FIELD_KEY, RECORD_IDENTIFIER_FIELD_KEY, SINGLE_SPACE, TYPE_FIELD_KEY, URL_FIELD_KEY, VERSION_FIELD_KEY |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
get
public static WARCReader get(java.lang.String arcFileOrUrl)
throws java.net.MalformedURLException,
java.io.IOException
- Throws:
java.net.MalformedURLException
java.io.IOException
get
public static WARCReader get(java.io.File f)
throws java.io.IOException
- Throws:
java.io.IOException
get
public static WARCReader get(java.io.File f,
long offset)
throws java.io.IOException
- Parameters:
f
- An arcfile to read.offset
- Have returned Reader set to start reading at this offset.
- Returns:
- A WARCReader.
- Throws:
java.io.IOException
getArchiveReader
protected ArchiveReader getArchiveReader(java.io.File f,
long offset)
throws java.io.IOException
- Overrides:
getArchiveReader
in class ArchiveReaderFactory
- Throws:
java.io.IOException
get
public static ArchiveReader get(java.lang.String s,
java.io.InputStream is,
boolean atFirstRecord)
throws java.io.IOException
- Throws:
java.io.IOException
getArchiveReader
protected ArchiveReader getArchiveReader(java.lang.String f,
java.io.InputStream is,
boolean atFirstRecord)
throws java.io.IOException
- Overrides:
getArchiveReader
in class ArchiveReaderFactory
- Throws:
java.io.IOException
get
public static WARCReader get(java.net.URL arcUrl,
long offset)
throws java.io.IOException
- Throws:
java.io.IOException
get
public static WARCReader get(java.net.URL arcUrl)
throws java.io.IOException
- Get an WARCReader.
Pulls the WARC local into wherever the System Property
java.io.tmpdir
points. It then hands back an ARCReader that
points at this local copy. A close on this ARCReader instance will
remove the local copy.
- Parameters:
arcUrl
- An URL that points at an ARC.
- Returns:
- An ARCReader.
- Throws:
java.io.IOException
testCompressedWARCFile
public static boolean testCompressedWARCFile(java.io.File f)
throws java.io.IOException
- Check file is compressed WARC.
- Parameters:
f
- File to test.
- Returns:
- True if this is compressed WARC (TODO: Just tests if file is
GZIP'd file (It begins w/ GZIP MAGIC)).
- Throws:
java.io.IOException
- If file does not exist or is not unreadable.
isWARCSuffix
public static boolean isWARCSuffix(java.lang.String f)
Copyright © 2003-2011 Internet Archive. All Rights Reserved.