org.archive.extractor
Class RegexpJSLinkExtractor
java.lang.Object
org.archive.extractor.CharSequenceLinkExtractor
org.archive.extractor.RegexpJSLinkExtractor
- All Implemented Interfaces:
- java.util.Iterator, LinkExtractor
public class RegexpJSLinkExtractor
- extends CharSequenceLinkExtractor
Uses regular expressions to find likely URIs inside Javascript.
ROUGH DRAFT IN PROGRESS / incomplete... untested...
- Author:
- gojomo
Methods inherited from class org.archive.extractor.CharSequenceLinkExtractor |
charSequenceFrom, createCharSequenceFrom, extract, hasNext, next, nextLink, remove, setup, setup, setup, setup |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
AMP
static final java.lang.String AMP
- See Also:
- Constant Field Values
ESCAPED_AMP
static final java.lang.String ESCAPED_AMP
- See Also:
- Constant Field Values
WHITESPACE
static final java.lang.String WHITESPACE
- See Also:
- Constant Field Values
JAVASCRIPT_STRING_EXTRACTOR
static final java.util.regex.Pattern JAVASCRIPT_STRING_EXTRACTOR
STRING_URI_DETECTOR
static final java.util.regex.Pattern STRING_URI_DETECTOR
strings
java.util.regex.Matcher strings
matcherStack
java.util.LinkedList<java.util.regex.Matcher> matcherStack
RegexpJSLinkExtractor
public RegexpJSLinkExtractor()
findNextLink
protected boolean findNextLink()
- Description copied from class:
CharSequenceLinkExtractor
- Scan to the next link(s), if any, loading it into the next buffer.
- Specified by:
findNextLink
in class CharSequenceLinkExtractor
- Returns:
- true if any links are found/available, false otherwise
reset
public void reset()
- Description copied from class:
CharSequenceLinkExtractor
- Discard all state. Another setup() is required to use again.
- Specified by:
reset
in interface LinkExtractor
- Overrides:
reset
in class CharSequenceLinkExtractor
newDefaultInstance
protected static CharSequenceLinkExtractor newDefaultInstance()
Copyright © 2003-2011 Internet Archive. All Rights Reserved.