Package org.archive.extractor

Interface Summary
CharSequenceProvider Interface indicating an object can efficiently provide a (perhaps cached or simulated) CharSequence version of itself.
ExtractErrorListener ExtractErrorListener receives exceptions that may need to be logged from inside a LinkExtractor, allowing the extraction to continue without raising an exception through hasNext()/next()/nextLink().
LinkExtractor LinkExtractor is a general interface for classes which, when given an InputStream and Charset, can scan for Links and return them via an Iterator interface.
 

Class Summary
CharSequenceLinkExtractor Abstract superclass providing utility methods for LinkExtractors which would prefer to work on a CharSequence rather than a stream.
RegexpCSSLinkExtractor This extractor is parsing URIs from CSS type files.
RegexpHTMLLinkExtractor Basic link-extraction, from an HTML content-body, using regular expressions.
RegexpJSLinkExtractor Uses regular expressions to find likely URIs inside Javascript.
 



Copyright © 2003-2011 Internet Archive. All Rights Reserved.