org.archive.io.arc
Class ARCUtils

java.lang.Object
  extended by org.archive.io.arc.ARCUtils
All Implemented Interfaces:
ARCConstants, ArchiveFileConstants

public class ARCUtils
extends java.lang.Object
implements ARCConstants


Field Summary
 
Fields inherited from interface org.archive.io.arc.ARCConstants
ARC_FILE_EXTENSION, ARC_GZIP_EXTRA_FIELD, ARC_MAGIC_NUMBER, CHECKSUM_FIELD_KEY, CHECKSUM_HEADER_FIELD_KEY, CODE_HEADER_FIELD_KEY, COMPRESSED_ARC_FILE_EXTENSION, DEFAULT_ENCODING, DEFAULT_GZIP_HEADER_LENGTH, DEFAULT_MAX_ARC_FILE_SIZE, DOT_ARC_FILE_EXTENSION, DOT_COMPRESSED_ARC_FILE_EXTENSION, DOT_COMPRESSED_FILE_EXTENSION, FILENAME_FIELD_KEY, FILENAME_HEADER_FIELD_KEY, GZIP_HEADER_BEGIN, HEADER_FIELD_SEPARATOR, IP_HEADER_FIELD_KEY, LINE_SEPARATOR, LOCATION_HEADER_FIELD_KEY, MAX_METADATA_LINE_LENGTH, MINIMUM_RECORD_LENGTH, OFFSET_FIELD_KEY, OFFSET_HEADER_FIELD_KEY, REQUIRED_VERSION_1_HEADER_FIELDS, STATUSCODE_FIELD_KEY, TOKENIZED_PREFIX
 
Fields inherited from interface org.archive.io.ArchiveFileConstants
ABSOLUTE_OFFSET_KEY, CDX, CDX_FILE, CDX_LINE_BUFFER_SIZE, COMPRESSED_FILE_EXTENSION, CRLF, DATE_FIELD_KEY, DEFAULT_DIGEST_METHOD, DUMP, GZIP_DUMP, HEADER, INVALID_SUFFIX, LENGTH_FIELD_KEY, MIMETYPE_FIELD_KEY, NOHEAD, OCCUPIED_SUFFIX, READER_IDENTIFIER_FIELD_KEY, RECORD_IDENTIFIER_FIELD_KEY, SINGLE_SPACE, TYPE_FIELD_KEY, URL_FIELD_KEY, VERSION_FIELD_KEY
 
Constructor Summary
ARCUtils()
           
 
Method Summary
static boolean isCompressed(java.io.File arcFile)
           
static java.lang.String parseArcFilename(java.lang.String pathOrUri)
           
static boolean testCompressedARCFile(java.io.File arcFile)
          Check file is compressed and in ARC GZIP format.
static boolean testCompressedARCFile(java.io.File arcFile, boolean skipSuffixCheck)
          Check file is compressed and in ARC GZIP format.
static boolean testCompressedARCStream(java.io.InputStream is)
          Tests passed stream is gzip stream by reading in the HEAD.
static boolean testCompressedRepositionalStream(it.unimi.dsi.fastutil.io.RepositionableStream rs)
          Tests passed stream is gzip stream by reading in the HEAD.
static boolean testCompressedStream(java.io.InputStream is)
          Tests passed stream is gzip stream by reading in the HEAD.
static boolean testUncompressedARCFile(java.io.File arcFile)
          Check file is uncompressed ARC file.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ARCUtils

public ARCUtils()
Method Detail

parseArcFilename

public static java.lang.String parseArcFilename(java.lang.String pathOrUri)
                                         throws java.net.URISyntaxException
Parameters:
pathOrUri - Path or URI to extract arc filename from.
Returns:
Extracted arc file name.
Throws:
java.net.URISyntaxException

isCompressed

public static boolean isCompressed(java.io.File arcFile)
                            throws java.io.IOException
Parameters:
arcFile - File to test.
Returns:
True if arcFile is compressed ARC.
Throws:
java.io.IOException

testCompressedARCFile

public static boolean testCompressedARCFile(java.io.File arcFile)
                                     throws java.io.IOException
Check file is compressed and in ARC GZIP format.

Parameters:
arcFile - File to test if its Internet Archive ARC file GZIP compressed.
Returns:
True if this is an Internet Archive GZIP'd ARC file (It begins w/ the Internet Archive GZIP header and has the COMPRESSED_ARC_FILE_EXTENSION suffix).
Throws:
java.io.IOException - If file does not exist or is not unreadable.

testCompressedARCFile

public static boolean testCompressedARCFile(java.io.File arcFile,
                                            boolean skipSuffixCheck)
                                     throws java.io.IOException
Check file is compressed and in ARC GZIP format.

Parameters:
arcFile - File to test if its Internet Archive ARC file GZIP compressed.
skipSuffixCheck - Set to true if we're not to test on the '.arc.gz' suffix.
Returns:
True if this is an Internet Archive GZIP'd ARC file (It begins w/ the Internet Archive GZIP header).
Throws:
java.io.IOException - If file does not exist or is not unreadable.

testCompressedARCStream

public static boolean testCompressedARCStream(java.io.InputStream is)
                                       throws java.io.IOException
Tests passed stream is gzip stream by reading in the HEAD. Does not reposition the stream. That is left up to the caller.

Parameters:
is - An InputStream.
Returns:
True if compressed stream.
Throws:
java.io.IOException

testCompressedRepositionalStream

public static boolean testCompressedRepositionalStream(it.unimi.dsi.fastutil.io.RepositionableStream rs)
                                                throws java.io.IOException
Tests passed stream is gzip stream by reading in the HEAD. Does reposition of stream when done.

Parameters:
rs - An InputStream that is Repositionable.
Returns:
True if compressed stream.
Throws:
java.io.IOException

testCompressedStream

public static boolean testCompressedStream(java.io.InputStream is)
                                    throws java.io.IOException
Tests passed stream is gzip stream by reading in the HEAD. Does reposition of stream when done.

Parameters:
is - An InputStream.
Returns:
True if compressed stream.
Throws:
java.io.IOException

testUncompressedARCFile

public static boolean testUncompressedARCFile(java.io.File arcFile)
                                       throws java.io.IOException
Check file is uncompressed ARC file.

Parameters:
arcFile - File to test if its Internet Archive ARC file uncompressed.
Returns:
True if this is an Internet Archive ARC file.
Throws:
java.io.IOException - If file does not exist or is not unreadable.


Copyright © 2003-2011 Internet Archive. All Rights Reserved.