org.archive.extractor
Class RegexpCSSLinkExtractor
java.lang.Object
org.archive.extractor.CharSequenceLinkExtractor
org.archive.extractor.RegexpCSSLinkExtractor
- All Implemented Interfaces:
- java.util.Iterator, LinkExtractor
public class RegexpCSSLinkExtractor
- extends CharSequenceLinkExtractor
This extractor is parsing URIs from CSS type files.
The format of a CSS URL value is 'url(' followed by optional white space
followed by an optional single quote (') or double quote (") character
followed by the URL itself followed by an optional single quote (') or
double quote (") character followed by optional white space followed by ')'.
Parentheses, commas, white space characters, single quotes (') and double
quotes (") appearing in a URL must be escaped with a backslash:
'\(', '\)', '\,'. Partial URLs are interpreted relative to the source of
the style sheet, not relative to the document.
Source: www.w3.org
ROUGH DRAFT IN PROGRESS / incomplete... untested... major changes likely
- Author:
- igor gojomo
Methods inherited from class org.archive.extractor.CharSequenceLinkExtractor |
charSequenceFrom, createCharSequenceFrom, extract, hasNext, next, nextLink, remove, setup, setup, setup, setup |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CSS_BACKSLASH_ESCAPE
static final java.lang.String CSS_BACKSLASH_ESCAPE
- See Also:
- Constant Field Values
uris
protected java.util.regex.Matcher uris
CSS_URI_EXTRACTOR
static final java.lang.String CSS_URI_EXTRACTOR
- CSS URL extractor pattern.
This pattern extracts URIs for CSS files
- See Also:
- Constant Field Values
RegexpCSSLinkExtractor
public RegexpCSSLinkExtractor()
findNextLink
protected boolean findNextLink()
- Description copied from class:
CharSequenceLinkExtractor
- Scan to the next link(s), if any, loading it into the next buffer.
- Specified by:
findNextLink
in class CharSequenceLinkExtractor
- Returns:
- true if any links are found/available, false otherwise
reset
public void reset()
- Description copied from class:
CharSequenceLinkExtractor
- Discard all state. Another setup() is required to use again.
- Specified by:
reset
in interface LinkExtractor
- Overrides:
reset
in class CharSequenceLinkExtractor
newDefaultInstance
protected static CharSequenceLinkExtractor newDefaultInstance()
Copyright © 2003-2011 Internet Archive. All Rights Reserved.