org.archive.util.ms
Class Doc

java.lang.Object
  extended by org.archive.util.ms.Doc

public class Doc
extends java.lang.Object

Reads .doc files.

Author:
pjack

Method Summary
static SeekReader getText(BlockFileSystem wordDoc, int cacheSize)
          Returns the text for the given .doc file.
static SeekReader getText(java.io.File doc)
          Returns the text of the given .doc file.
static SeekReader getText(SeekInputStream doc)
          Returns the text of the given .doc file.
static SeekReader getText(java.lang.String docFilename)
          Returns the text of the .doc file with the given file name.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

getText

public static SeekReader getText(java.lang.String docFilename)
                          throws java.io.IOException
Returns the text of the .doc file with the given file name.

Parameters:
docFilename - the name of the file whose text to return
Returns:
the text of that file
Throws:
java.io.IOException - if an IO error occurs

getText

public static SeekReader getText(java.io.File doc)
                          throws java.io.IOException
Returns the text of the given .doc file.

Parameters:
doc - the .doc file whose text to return
Returns:
the text of that file
Throws:
java.io.IOException - if an IO error occurs

getText

public static SeekReader getText(SeekInputStream doc)
                          throws java.io.IOException
Returns the text of the given .doc file.

Parameters:
doc - the .doc file whose text to return
Returns:
the text of that file
Throws:
java.io.IOException - if an IO error occurs

getText

public static SeekReader getText(BlockFileSystem wordDoc,
                                 int cacheSize)
                          throws java.io.IOException
Returns the text for the given .doc file. The given cacheSize refers to the number of the .doc file's piece table entries to cache. Most .doc files only have 1 piece table entry; however, a "fast-saved" .doc file might have several. A cacheSize of 20 should be ample for most .doc files in the world. Since piece table entries are small -- only 12 bytes each -- caching them prevents many otherwise necessary file pointer repositionings.

Parameters:
wordDoc - the .doc file as a BlockFileSystem
cacheSize - the number of piece table entries to cache
Returns:
a reader that will return the text in the file
Throws:
java.io.IOException - if an IO error occurs


Copyright © 2003-2011 Internet Archive. All Rights Reserved.