|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.archive.io.Latin1ByteReplayCharSequence
class Latin1ByteReplayCharSequence
Provides a (Replay)CharSequence view on recorded stream bytes (a prefix buffer and overflow backing file). Assumes the byte stream is ISO-8859-1 text, taking advantage of the fact that each byte in the stream corresponds to a single unicode character with the same numerical value as the byte.
Uses a wraparound rolling buffer of the last windowSize bytes read from disk in memory; as long as the 'random access' of a CharSequence user stays within this window, access should remain fairly efficient. (So design any regexps pointed at these CharSequences to work within that range!)
When rereading of a location is necessary, the whole window is recentered around the location requested. (TODO: More research into whether this is the best strategy.)
An implementation of a ReplayCharSequence done with ByteBuffers -- one to wrap the passed prefix buffer and the second, a memory-mapped ByteBuffer view into the backing file -- was consistently slower: ~10%. My tests did the following. Made a buffer filled w/ regular content. This buffer was used as the prefix buffer. The buffer content was written MULTIPLER times to a backing file. I then did accesses w/ the following pattern: Skip forward 32 bytes, then back 16 bytes, and then read forward from byte 16-32. Repeat. Though I varied the size of the buffer to the size of the backing file,from 3-10, the difference of 10% or so seemed to persist. Same if I tried to favor get() over get(index). I used a profiler, JMP, to study times taken (St.Ack did above comment).
TODO determine in memory mapped files is better way to do this; probably not -- they don't offer the level of control over total memory used that this approach does.
Field Summary | |
---|---|
protected int |
length
Total length of character stream to replay minus the HTTP headers if present. |
protected static java.util.logging.Logger |
logger
|
Constructor Summary | |
---|---|
Latin1ByteReplayCharSequence(byte[] buffer,
long size,
long responseBodyStart,
java.lang.String backingFilename)
Constructor. |
Method Summary | |
---|---|
char |
charAt(int index)
Get character at passed absolute position. |
void |
close()
Cleanup resources. |
protected void |
finalize()
|
int |
length()
|
java.lang.CharSequence |
subSequence(int start,
int end)
|
java.lang.String |
substring(int offset,
int len)
Deprecated. please use subSequence() and then toString() directly |
java.lang.String |
toString()
|
Methods inherited from class java.lang.Object |
---|
clone, equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
protected static java.util.logging.Logger logger
protected int length
Constructor Detail |
---|
public Latin1ByteReplayCharSequence(byte[] buffer, long size, long responseBodyStart, java.lang.String backingFilename) throws java.io.IOException
buffer
- In-memory buffer of recordings prefix. We read from
here first and will only go to the backing file if size
requested is greater than buffer.length
.size
- Total size of stream to replay in bytes. Used to find
EOS. This is total length of content including HTTP headers if
present.responseBodyStart
- Where the response body starts in bytes.
Used to skip over the HTTP headers if present.backingFilename
- Path to backing file with content in excess of
whats in buffer
.
java.io.IOException
Method Detail |
---|
public int length()
length
in interface java.lang.CharSequence
public char charAt(int index)
charAt(int)
which has a relative index into the
content, one that doesn't account for HTTP header if present.
charAt
in interface java.lang.CharSequence
index
- Index into content adjusted to accomodate initial offset
to get us past the HTTP header if present (i.e.
contentOffset
).
index
.public java.lang.CharSequence subSequence(int start, int end)
subSequence
in interface java.lang.CharSequence
public void close() throws java.io.IOException
close
in interface ReplayCharSequence
java.io.IOException
- Failed close of random access file.protected void finalize() throws java.lang.Throwable
finalize
in class java.lang.Object
java.lang.Throwable
public java.lang.String substring(int offset, int len)
public java.lang.String toString()
toString
in interface java.lang.CharSequence
toString
in class java.lang.Object
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |