org.archive.extractor
Class RegexpCSSLinkExtractor

java.lang.Object
  extended by org.archive.extractor.CharSequenceLinkExtractor
      extended by org.archive.extractor.RegexpCSSLinkExtractor
All Implemented Interfaces:
java.util.Iterator, LinkExtractor

public class RegexpCSSLinkExtractor
extends CharSequenceLinkExtractor

This extractor is parsing URIs from CSS type files. The format of a CSS URL value is 'url(' followed by optional white space followed by an optional single quote (') or double quote (") character followed by the URL itself followed by an optional single quote (') or double quote (") character followed by optional white space followed by ')'. Parentheses, commas, white space characters, single quotes (') and double quotes (") appearing in a URL must be escaped with a backslash: '\(', '\)', '\,'. Partial URLs are interpreted relative to the source of the style sheet, not relative to the document. Source: www.w3.org ROUGH DRAFT IN PROGRESS / incomplete... untested... major changes likely

Author:
igor gojomo

Field Summary
(package private) static java.lang.String CSS_BACKSLASH_ESCAPE
           
(package private) static java.lang.String CSS_URI_EXTRACTOR
          CSS URL extractor pattern.
protected  java.util.regex.Matcher uris
           
 
Fields inherited from class org.archive.extractor.CharSequenceLinkExtractor
base, extractErrorListener, next, source, sourceContent
 
Constructor Summary
RegexpCSSLinkExtractor()
           
 
Method Summary
protected  boolean findNextLink()
          Scan to the next link(s), if any, loading it into the next buffer.
protected static CharSequenceLinkExtractor newDefaultInstance()
           
 void reset()
          Discard all state.
 
Methods inherited from class org.archive.extractor.CharSequenceLinkExtractor
charSequenceFrom, createCharSequenceFrom, extract, hasNext, next, nextLink, remove, setup, setup, setup, setup
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CSS_BACKSLASH_ESCAPE

static final java.lang.String CSS_BACKSLASH_ESCAPE
See Also:
Constant Field Values

uris

protected java.util.regex.Matcher uris

CSS_URI_EXTRACTOR

static final java.lang.String CSS_URI_EXTRACTOR
CSS URL extractor pattern. This pattern extracts URIs for CSS files

See Also:
Constant Field Values
Constructor Detail

RegexpCSSLinkExtractor

public RegexpCSSLinkExtractor()
Method Detail

findNextLink

protected boolean findNextLink()
Description copied from class: CharSequenceLinkExtractor
Scan to the next link(s), if any, loading it into the next buffer.

Specified by:
findNextLink in class CharSequenceLinkExtractor
Returns:
true if any links are found/available, false otherwise

reset

public void reset()
Description copied from class: CharSequenceLinkExtractor
Discard all state. Another setup() is required to use again.

Specified by:
reset in interface LinkExtractor
Overrides:
reset in class CharSequenceLinkExtractor

newDefaultInstance

protected static CharSequenceLinkExtractor newDefaultInstance()


Copyright © 2003-2011 Internet Archive. All Rights Reserved.