Uses of Class
org.archive.crawler.extractor.Extractor

Packages that use Extractor
org.archive.crawler.extractor   
 

Uses of Extractor in org.archive.crawler.extractor
 

Subclasses of Extractor in org.archive.crawler.extractor
 class AggressiveExtractorHTML
          Extended version of ExtractorHTML with more aggressive javascript link extraction where javascript code is parsed first with general HTML tags regexp, and than by javascript speculative link regexp.
 class ExtractorCSS
          This extractor is parsing URIs from CSS type files.
 class ExtractorDOC
          This class allows the caller to extract href style links from word97-format word documents.
 class ExtractorHTML
          Basic link-extraction, from an HTML content-body, using regular expressions.
 class ExtractorImpliedURI
          An extractor for finding 'implied' URIs inside other URIs.
 class ExtractorJS
          Processes Javascript files for strings that are likely to be crawlable URIs.
 class ExtractorPDF
          Allows the caller to process a CrawlURI representing a PDF for the purpose of extracting URIs
 class ExtractorSWF
          Process SWF (flash/shockwave) files for strings that are likely to be crawlable URIs.
 class ExtractorUniversal
          A last ditch extractor that will look at the raw byte code and try to extract anything that looks like a link.
 class ExtractorURI
          An extractor for finding URIs inside other URIs.
 class ExtractorXML
          A simple extractor which finds HTTP URIs inside XML/RSS files, inside attribute values and simple elements (those with only whitespace + HTTP URI + whitespace as contents)
 class JerichoExtractorHTML
          Improved link-extraction from an HTML content-body using jericho-html parser.
 class TrapSuppressExtractor
          Pseudo-extractor that suppresses link-extraction of likely trap pages, by noticing when content's digest is identical to that of its 'via'.
 



Copyright © 2003-2011 Internet Archive. All Rights Reserved.