org.archive.io
Class GzippedInputStream

java.lang.Object
  extended by java.io.InputStream
      extended by java.io.FilterInputStream
          extended by java.util.zip.InflaterInputStream
              extended by java.util.zip.GZIPInputStream
                  extended by org.archive.io.GzippedInputStream
All Implemented Interfaces:
it.unimi.dsi.fastutil.io.RepositionableStream, java.io.Closeable

public class GzippedInputStream
extends java.util.zip.GZIPInputStream
implements it.unimi.dsi.fastutil.io.RepositionableStream

Subclass of GZIPInputStream that can handle a stream made of multiple concatenated GZIP members/records. This class is needed because GZIPInputStream only finds the first GZIP member in the file even if the file is made up of multiple GZIP members.

Takes an InputStream stream that implements RepositionableStream interface so it can backup over-reads done by the zlib Inflater class.

Use the iterator() method to get a gzip member iterator. Calls to Iterator.next() returns the next gzip member in the stream. Cast return from Iterator.next() to InputStream.

Use gzipMemberSeek(long) to position stream before reading a gzip member if doing random accessing of gzip members. Pass it offset at which gzip member starts.

If you need to know position at which a gzip member starts, call position() just after a call to Iterator.hasNext() and before you call Iterator.next().

Author:
stack

Field Summary
 
Fields inherited from class java.util.zip.GZIPInputStream
crc, eos, GZIP_MAGIC
 
Fields inherited from class java.util.zip.InflaterInputStream
buf, inf, len
 
Fields inherited from class java.io.FilterInputStream
in
 
Constructor Summary
GzippedInputStream(java.io.InputStream is)
           
GzippedInputStream(java.io.InputStream is, int size)
           
 
Method Summary
protected static java.io.InputStream checkStream(java.io.InputStream is)
           
protected  boolean compareBytes(int a, int b)
           
protected  GzipHeader getGzipHeader()
           
protected  java.util.zip.Inflater getInflater()
           
protected  java.io.InputStream getInputStream()
           
 long gotoEOR()
          Exhaust current GZIP member content.
 long gotoEOR(int ignore)
          Exhaust current GZIP member content.
static byte[] gzip(byte[] bytes)
          Gzip passed bytes.
 void gzipMemberSeek()
           
 void gzipMemberSeek(long position)
          Seek to a gzip member.
static boolean isCompressedRepositionableStream(it.unimi.dsi.fastutil.io.RepositionableStream rs)
          Tests passed stream is GZIP stream by reading in the HEAD.
static boolean isCompressedStream(java.io.InputStream is)
          Tests passed stream is gzip stream by reading in the HEAD.
 java.util.Iterator iterator()
          Returns a GZIP Member Iterator.
protected  boolean moveToNextGzipMember()
           
 long position()
           
 void position(long position)
          Seek to passed offset.
protected  void readHeader()
          Read in the gzip header.
protected  void resetInflater()
          Move to next gzip member in the file.
 
Methods inherited from class java.util.zip.GZIPInputStream
close, read
 
Methods inherited from class java.util.zip.InflaterInputStream
available, fill, mark, markSupported, read, reset, skip
 
Methods inherited from class java.io.FilterInputStream
read
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GzippedInputStream

public GzippedInputStream(java.io.InputStream is)
                   throws java.io.IOException
Throws:
java.io.IOException

GzippedInputStream

public GzippedInputStream(java.io.InputStream is,
                          int size)
                   throws java.io.IOException
Parameters:
is - An InputStream that implements RespositionableStream and returns true when we call InputStream.markSupported() (Latter is needed so can setup an Iterator against the Gzip stream).
size - Size of blocks to use reading.
Throws:
java.io.IOException
Method Detail

checkStream

protected static java.io.InputStream checkStream(java.io.InputStream is)
                                          throws java.io.IOException
Throws:
java.io.IOException

gotoEOR

public long gotoEOR(int ignore)
             throws java.io.IOException
Exhaust current GZIP member content. Call this method when you think you're on the end of the GZIP member. It will clean out any dross.

Parameters:
ignore - Character to ignore counting characters (Usually trailing new lines).
Returns:
Count of characters skipped over.
Throws:
java.io.IOException

gotoEOR

public long gotoEOR()
             throws java.io.IOException
Exhaust current GZIP member content. Call this method when you think you're on the end of the GZIP member. It will clean out any dross.

Returns:
Count of characters skipped over.
Throws:
java.io.IOException

iterator

public java.util.Iterator iterator()
Returns a GZIP Member Iterator. Has limitations. Can only get one Iterator per instance of this class; you must get new instance if you want to get Iterator again.

Returns:
Iterator over GZIP Members.

moveToNextGzipMember

protected boolean moveToNextGzipMember()
Returns:
True if we found another record in the stream.

compareBytes

protected boolean compareBytes(int a,
                               int b)

getInflater

protected java.util.zip.Inflater getInflater()

getInputStream

protected java.io.InputStream getInputStream()

getGzipHeader

protected GzipHeader getGzipHeader()

resetInflater

protected void resetInflater()
Move to next gzip member in the file.


readHeader

protected void readHeader()
                   throws java.io.IOException
Read in the gzip header.

Throws:
java.io.IOException

position

public void position(long position)
              throws java.io.IOException
Seek to passed offset. After positioning the stream, it resets the inflater. Assumption is that public use of this method is only to position stream at start of a gzip member.

Specified by:
position in interface it.unimi.dsi.fastutil.io.RepositionableStream
Parameters:
position - Absolute position of a gzip member start.
Throws:
java.io.IOException

position

public long position()
              throws java.io.IOException
Specified by:
position in interface it.unimi.dsi.fastutil.io.RepositionableStream
Throws:
java.io.IOException

gzipMemberSeek

public void gzipMemberSeek(long position)
                    throws java.io.IOException
Seek to a gzip member. Moves stream to new position, resets inflater and reads in the gzip header ready for subsequent calls to read.

Parameters:
position - Absolute position of a gzip member start.
Throws:
java.io.IOException

gzipMemberSeek

public void gzipMemberSeek()
                    throws java.io.IOException
Throws:
java.io.IOException

gzip

public static byte[] gzip(byte[] bytes)
                   throws java.io.IOException
Gzip passed bytes. Use only when bytes is small.

Parameters:
bytes - What to gzip.
Returns:
A gzip member of bytes.
Throws:
java.io.IOException

isCompressedRepositionableStream

public static boolean isCompressedRepositionableStream(it.unimi.dsi.fastutil.io.RepositionableStream rs)
                                                throws java.io.IOException
Tests passed stream is GZIP stream by reading in the HEAD. Does reposition of stream when done.

Parameters:
rs - An InputStream that is Repositionable.
Returns:
True if compressed stream.
Throws:
java.io.IOException

isCompressedStream

public static boolean isCompressedStream(java.io.InputStream is)
                                  throws java.io.IOException
Tests passed stream is gzip stream by reading in the HEAD. Does not reposition stream when done.

Parameters:
is - An InputStream.
Returns:
True if compressed stream.
Throws:
java.io.IOException


Copyright © 2003-2011 Internet Archive. All Rights Reserved.