org.archive.util.iterator
Class RegexpLineIterator

java.lang.Object
  extended by org.archive.util.iterator.LookaheadIterator<Transformed>
      extended by org.archive.util.iterator.TransformingIteratorWrapper<java.lang.String,java.lang.String>
          extended by org.archive.util.iterator.RegexpLineIterator
All Implemented Interfaces:
java.util.Iterator<java.lang.String>

public class RegexpLineIterator
extends TransformingIteratorWrapper<java.lang.String,java.lang.String>

Utility class providing an Iterator interface over line-oriented text input. By providing regexps indicating lines to ignore (such as pure whitespace or comments), lines to consider input, and what to return from the input lines (such as a whitespace-trimmed non-whitespace token with optional trailing comment), this can be configured to handle a number of formats. The public static members provide pattern configurations that will be helpful in a wide variety of contexts.

Author:
gojomo

Field Summary
static java.lang.String COMMENT_LINE
           
static java.lang.String ENTRY
           
protected  java.util.regex.Matcher extractLine
           
protected  java.util.regex.Matcher ignoreLine
           
static java.lang.String NONWHITESPACE_ENTRY_TRAILING_COMMENT
           
protected  java.lang.String outputTemplate
           
static java.lang.String TRIMMED_ENTRY_TRAILING_COMMENT
           
 
Fields inherited from class org.archive.util.iterator.TransformingIteratorWrapper
inner
 
Fields inherited from class org.archive.util.iterator.LookaheadIterator
next
 
Constructor Summary
RegexpLineIterator(java.util.Iterator<java.lang.String> inner, java.lang.String ignore, java.lang.String extract, java.lang.String replace)
           
 
Method Summary
protected  java.lang.String transform(java.lang.String line)
          Loads next item into lookahead spot, if available.
 
Methods inherited from class org.archive.util.iterator.TransformingIteratorWrapper
lookahead, noteExhausted
 
Methods inherited from class org.archive.util.iterator.LookaheadIterator
hasNext, next, remove
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

COMMENT_LINE

public static final java.lang.String COMMENT_LINE
See Also:
Constant Field Values

NONWHITESPACE_ENTRY_TRAILING_COMMENT

public static final java.lang.String NONWHITESPACE_ENTRY_TRAILING_COMMENT
See Also:
Constant Field Values

TRIMMED_ENTRY_TRAILING_COMMENT

public static final java.lang.String TRIMMED_ENTRY_TRAILING_COMMENT
See Also:
Constant Field Values

ENTRY

public static final java.lang.String ENTRY
See Also:
Constant Field Values

ignoreLine

protected java.util.regex.Matcher ignoreLine

extractLine

protected java.util.regex.Matcher extractLine

outputTemplate

protected java.lang.String outputTemplate
Constructor Detail

RegexpLineIterator

public RegexpLineIterator(java.util.Iterator<java.lang.String> inner,
                          java.lang.String ignore,
                          java.lang.String extract,
                          java.lang.String replace)
Method Detail

transform

protected java.lang.String transform(java.lang.String line)
Loads next item into lookahead spot, if available. Skips lines matching ignoreLine; extracts desired portion of lines matching extractLine; informationally reports any lines matching neither.

Specified by:
transform in class TransformingIteratorWrapper<java.lang.String,java.lang.String>
Parameters:
line - Object to transform.
Returns:
whether any item was loaded into next field


Copyright © 2003-2011 Internet Archive. All Rights Reserved.