org.archive.io
Class GenericReplayCharSequence

java.lang.Object
  extended by org.archive.io.GenericReplayCharSequence
All Implemented Interfaces:
java.lang.CharSequence, ReplayCharSequence

public class GenericReplayCharSequence
extends java.lang.Object
implements ReplayCharSequence

Provides a (Replay)CharSequence view on recorded streams (a prefix buffer and overflow backing file) that can handle streams of multibyte characters. For better performance on ISO-8859-1 text, use Latin1ByteReplayCharSequence.

Call close on this class when done so can clean up resources.

Implementation currently works by checking to see if content to read all fits the in-memory buffer. If so, we decode into a CharBuffer and keep this around for CharSequence operations. This CharBuffer is discarded on close.

If content length is greater than in-memory buffer, we decode the buffer plus backing file into a new file named for the backing file w/ a suffix of the encoding we write the file as. We then run w/ a memory-mapped CharBuffer against this file to implement CharSequence. Reasons for this implemenation are that CharSequence wants to return the length of the CharSequence.

Obvious optimizations would keep around decodings whether the in-memory decoded buffer or the file of decodings written to disk but the general usage pattern processing URIs is that the decoding is used by one processor only. Also of note, files usually fit into the in-memory buffer.

We might also be able to keep up 3 windows that moved across the file decoding a window at a time trying to keep one of the buffers just in front of the regex processing returning it a length that would be only the length of current position to end of current block or else the length could be got by multipling the backing files length by the decoders' estimate of average character size. This would save us writing out the decoded file. We'd have to do the latter for files that are > Integer.MAX_VALUE.

Version:
$Revision: 6090 $, $Date: 2008-12-09 23:36:27 +0000 (Tue, 09 Dec 2008) $
Author:
stack

Field Summary
protected static java.util.logging.Logger logger
           
 
Constructor Summary
GenericReplayCharSequence(byte[] buffer, long size, long responseBodyStart, java.lang.String encoding)
          Constructor for all in-memory operation.
GenericReplayCharSequence(ReplayInputStream contentReplayInputStream, java.lang.String backingFilename, java.lang.String characterEncoding)
          Constructor for overflow-to-disk-file operation.
 
Method Summary
 char charAt(int index)
           
 void close()
          Call this method when done so implementation has chance to clean up resources.
protected  void finalize()
           
 int length()
           
 java.lang.CharSequence subSequence(int start, int end)
           
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

logger

protected static java.util.logging.Logger logger
Constructor Detail

GenericReplayCharSequence

public GenericReplayCharSequence(byte[] buffer,
                                 long size,
                                 long responseBodyStart,
                                 java.lang.String encoding)
                          throws java.io.IOException
Constructor for all in-memory operation.

Parameters:
buffer - In-memory buffer of recordings prefix. We read from here first and will only go to the backing file if size requested is greater than buffer.length.
size - Total size of stream to replay in bytes. Used to find EOS. This is total length of content including HTTP headers if present.
responseBodyStart - Where the response body starts in bytes. Used to skip over the HTTP headers if present.
backingFilename - Path to backing file with content in excess of whats in buffer.
encoding - Encoding to use reading the passed prefix buffer and backing file. For now, should be java canonical name for the encoding. (If null is passed, we will default to ByteReplayCharSequence).
Throws:
java.io.IOException

GenericReplayCharSequence

public GenericReplayCharSequence(ReplayInputStream contentReplayInputStream,
                                 java.lang.String backingFilename,
                                 java.lang.String characterEncoding)
                          throws java.io.IOException
Constructor for overflow-to-disk-file operation.

Parameters:
contentReplayInputStream - inputStream of content
backingFilename - hint for name of temp file
characterEncoding - Encoding to use reading the stream. For now, should be java canonical name for the encoding.
Throws:
java.io.IOException
Method Detail

close

public void close()
Description copied from interface: ReplayCharSequence
Call this method when done so implementation has chance to clean up resources.

Specified by:
close in interface ReplayCharSequence

finalize

protected void finalize()
                 throws java.lang.Throwable
Overrides:
finalize in class java.lang.Object
Throws:
java.lang.Throwable

length

public int length()
Specified by:
length in interface java.lang.CharSequence

charAt

public char charAt(int index)
Specified by:
charAt in interface java.lang.CharSequence

subSequence

public java.lang.CharSequence subSequence(int start,
                                          int end)
Specified by:
subSequence in interface java.lang.CharSequence

toString

public java.lang.String toString()
Specified by:
toString in interface java.lang.CharSequence
Overrides:
toString in class java.lang.Object


Copyright © 2003-2011 Internet Archive. All Rights Reserved.