org.archive.io.arc
Class ARCReaderFactory

java.lang.Object
  extended by org.archive.io.ArchiveReaderFactory
      extended by org.archive.io.arc.ARCReaderFactory
All Implemented Interfaces:
ARCConstants, ArchiveFileConstants

public class ARCReaderFactory
extends ArchiveReaderFactory
implements ARCConstants

Factory that returns an ARCReader. Can handle compressed and uncompressed ARCs.

Author:
stack

Nested Class Summary
 class ARCReaderFactory.CompressedARCReader
          Compressed arc file reader.
 class ARCReaderFactory.UncompressedARCReader
          Uncompressed arc file reader.
 
Field Summary
 
Fields inherited from interface org.archive.io.arc.ARCConstants
ARC_FILE_EXTENSION, ARC_GZIP_EXTRA_FIELD, ARC_MAGIC_NUMBER, CHECKSUM_FIELD_KEY, CHECKSUM_HEADER_FIELD_KEY, CODE_HEADER_FIELD_KEY, COMPRESSED_ARC_FILE_EXTENSION, DEFAULT_ENCODING, DEFAULT_GZIP_HEADER_LENGTH, DEFAULT_MAX_ARC_FILE_SIZE, DOT_ARC_FILE_EXTENSION, DOT_COMPRESSED_ARC_FILE_EXTENSION, DOT_COMPRESSED_FILE_EXTENSION, FILENAME_FIELD_KEY, FILENAME_HEADER_FIELD_KEY, GZIP_HEADER_BEGIN, HEADER_FIELD_SEPARATOR, IP_HEADER_FIELD_KEY, LINE_SEPARATOR, LOCATION_HEADER_FIELD_KEY, MAX_METADATA_LINE_LENGTH, MINIMUM_RECORD_LENGTH, OFFSET_FIELD_KEY, OFFSET_HEADER_FIELD_KEY, REQUIRED_VERSION_1_HEADER_FIELDS, STATUSCODE_FIELD_KEY, TOKENIZED_PREFIX
 
Fields inherited from interface org.archive.io.ArchiveFileConstants
ABSOLUTE_OFFSET_KEY, CDX, CDX_FILE, CDX_LINE_BUFFER_SIZE, COMPRESSED_FILE_EXTENSION, CRLF, DATE_FIELD_KEY, DEFAULT_DIGEST_METHOD, DUMP, GZIP_DUMP, HEADER, INVALID_SUFFIX, LENGTH_FIELD_KEY, MIMETYPE_FIELD_KEY, NOHEAD, OCCUPIED_SUFFIX, READER_IDENTIFIER_FIELD_KEY, RECORD_IDENTIFIER_FIELD_KEY, SINGLE_SPACE, TYPE_FIELD_KEY, URL_FIELD_KEY, VERSION_FIELD_KEY
 
Constructor Summary
protected ARCReaderFactory()
          Shutdown any access to default constructor.
 
Method Summary
static ARCReader get(java.io.File f)
           
static ARCReader get(java.io.File f, boolean skipSuffixTest, long offset)
           
static ARCReader get(java.io.File f, long offset)
           
static ARCReader get(java.lang.String arcFileOrUrl)
           
static ArchiveReader get(java.lang.String s, java.io.InputStream is, boolean atFirstRecord)
           
static ARCReader get(java.lang.String arcFileOrUrl, long offset)
           
static ARCReader get(java.net.URL arcUrl)
          Get an ARCReader.
static ARCReader get(java.net.URL arcUrl, long offset)
          Get an ARCReader aligned at offset.
protected  ArchiveReader getArchiveReader(java.io.File arcFile, boolean skipSuffixTest, long offset)
           
protected  ArchiveReader getArchiveReader(java.io.File f, long offset)
           
protected  ArchiveReader getArchiveReader(java.lang.String arc, java.io.InputStream is, boolean atFirstRecord)
           
static boolean isARCSuffix(java.lang.String arcName)
           
 boolean isCompressed(java.io.File arcFile)
           
static boolean testCompressedARCFile(java.io.File arcFile)
          Check file is compressed and in ARC GZIP format.
static boolean testCompressedARCFile(java.io.File arcFile, boolean skipSuffixCheck)
          Check file is compressed and in ARC GZIP format.
static boolean testCompressedARCStream(java.io.InputStream is)
          Tests passed stream is gzip stream by reading in the HEAD.
 
Methods inherited from class org.archive.io.ArchiveReaderFactory
addUserAgent, asRepositionable, getArchiveReader, getArchiveReader, getArchiveReader, getArchiveReader, getArchiveReader, makeARCLocal
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ARCReaderFactory

protected ARCReaderFactory()
Shutdown any access to default constructor.

Method Detail

get

public static ARCReader get(java.lang.String arcFileOrUrl)
                     throws java.net.MalformedURLException,
                            java.io.IOException
Throws:
java.net.MalformedURLException
java.io.IOException

get

public static ARCReader get(java.lang.String arcFileOrUrl,
                            long offset)
                     throws java.net.MalformedURLException,
                            java.io.IOException
Throws:
java.net.MalformedURLException
java.io.IOException

get

public static ARCReader get(java.io.File f)
                     throws java.io.IOException
Throws:
java.io.IOException

get

public static ARCReader get(java.io.File f,
                            long offset)
                     throws java.io.IOException
Throws:
java.io.IOException

getArchiveReader

protected ArchiveReader getArchiveReader(java.io.File f,
                                         long offset)
                                  throws java.io.IOException
Overrides:
getArchiveReader in class ArchiveReaderFactory
Throws:
java.io.IOException

get

public static ARCReader get(java.io.File f,
                            boolean skipSuffixTest,
                            long offset)
                     throws java.io.IOException
Parameters:
f - An arcfile to read.
skipSuffixTest - Set to true if want to test that ARC has proper suffix. Use this method and pass false to open ARCs with the .open or otherwise suffix.
offset - Have returned ARCReader set to start reading at passed offset.
Returns:
An ARCReader.
Throws:
java.io.IOException

getArchiveReader

protected ArchiveReader getArchiveReader(java.io.File arcFile,
                                         boolean skipSuffixTest,
                                         long offset)
                                  throws java.io.IOException
Throws:
java.io.IOException

get

public static ArchiveReader get(java.lang.String s,
                                java.io.InputStream is,
                                boolean atFirstRecord)
                         throws java.io.IOException
Throws:
java.io.IOException

getArchiveReader

protected ArchiveReader getArchiveReader(java.lang.String arc,
                                         java.io.InputStream is,
                                         boolean atFirstRecord)
                                  throws java.io.IOException
Overrides:
getArchiveReader in class ArchiveReaderFactory
Throws:
java.io.IOException

get

public static ARCReader get(java.net.URL arcUrl,
                            long offset)
                     throws java.io.IOException
Get an ARCReader aligned at offset. This version of get will not bring the ARC local but will try to stream across the net making an HTTP 1.1 Range request on remote http server (RFC1435 Section 14.35).

Parameters:
arcUrl - HTTP URL for an ARC (All ARCs considered remote).
offset - Offset into ARC at which to start fetching.
Returns:
An ARCReader aligned at offset.
Throws:
java.io.IOException

get

public static ARCReader get(java.net.URL arcUrl)
                     throws java.io.IOException
Get an ARCReader. Pulls the ARC local into whereever the System Property java.io.tmpdir points. It then hands back an ARCReader that points at this local copy. A close on this ARCReader instance will remove the local copy.

Parameters:
arcUrl - An URL that points at an ARC.
Returns:
An ARCReader.
Throws:
java.io.IOException

isCompressed

public boolean isCompressed(java.io.File arcFile)
                     throws java.io.IOException
Overrides:
isCompressed in class ArchiveReaderFactory
Parameters:
arcFile - File to test.
Returns:
True if arcFile is compressed ARC.
Throws:
java.io.IOException

testCompressedARCFile

public static boolean testCompressedARCFile(java.io.File arcFile)
                                     throws java.io.IOException
Check file is compressed and in ARC GZIP format.

Parameters:
arcFile - File to test if its Internet Archive ARC file GZIP compressed.
Returns:
True if this is an Internet Archive GZIP'd ARC file (It begins w/ the Internet Archive GZIP header and has the COMPRESSED_ARC_FILE_EXTENSION suffix).
Throws:
java.io.IOException - If file does not exist or is not unreadable.

testCompressedARCFile

public static boolean testCompressedARCFile(java.io.File arcFile,
                                            boolean skipSuffixCheck)
                                     throws java.io.IOException
Check file is compressed and in ARC GZIP format.

Parameters:
arcFile - File to test if its Internet Archive ARC file GZIP compressed.
skipSuffixCheck - Set to true if we're not to test on the '.arc.gz' suffix.
Returns:
True if this is an Internet Archive GZIP'd ARC file (It begins w/ the Internet Archive GZIP header).
Throws:
java.io.IOException - If file does not exist or is not unreadable.

isARCSuffix

public static boolean isARCSuffix(java.lang.String arcName)

testCompressedARCStream

public static boolean testCompressedARCStream(java.io.InputStream is)
                                       throws java.io.IOException
Tests passed stream is gzip stream by reading in the HEAD. Does not reposition the stream. That is left up to the caller.

Parameters:
is - An InputStream.
Returns:
True if compressed stream.
Throws:
java.io.IOException


Copyright © 2003-2011 Internet Archive. All Rights Reserved.