Uses of Class org.archive.crawler.extractor.Extractor (Heritrix 1.15.5-201106092337)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV NEXT

FRAMES NO FRAMES

Uses of Class
org.archive.crawler.extractor.Extractor

Packages that use Extractor
org.archive.crawler.extractor

Uses of Extractor in org.archive.crawler.extractor

Subclasses of Extractor in org.archive.crawler.extractor
`class`	`AggressiveExtractorHTML` Extended version of ExtractorHTML with more aggressive javascript link extraction where javascript code is parsed first with general HTML tags regexp, and than by javascript speculative link regexp.
`class`	`ExtractorCSS` This extractor is parsing URIs from CSS type files.
`class`	`ExtractorDOC` This class allows the caller to extract href style links from word97-format word documents.
`class`	`ExtractorHTML` Basic link-extraction, from an HTML content-body, using regular expressions.
`class`	`ExtractorImpliedURI` An extractor for finding 'implied' URIs inside other URIs.
`class`	`ExtractorJS` Processes Javascript files for strings that are likely to be crawlable URIs.
`class`	`ExtractorPDF` Allows the caller to process a CrawlURI representing a PDF for the purpose of extracting URIs
`class`	`ExtractorSWF` Process SWF (flash/shockwave) files for strings that are likely to be crawlable URIs.
`class`	`ExtractorUniversal` A last ditch extractor that will look at the raw byte code and try to extract anything that looks like a link.
`class`	`ExtractorURI` An extractor for finding URIs inside other URIs.
`class`	`ExtractorXML` A simple extractor which finds HTTP URIs inside XML/RSS files, inside attribute values and simple elements (those with only whitespace + HTTP URI + whitespace as contents)
`class`	`JerichoExtractorHTML` Improved link-extraction from an HTML content-body using jericho-html parser.
`class`	`TrapSuppressExtractor` Pseudo-extractor that suppresses link-extraction of likely trap pages, by noticing when content's digest is identical to that of its 'via'.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV NEXT

FRAMES NO FRAMES

Copyright © 2003-2011 Internet Archive. All Rights Reserved.