|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object javax.management.Attribute org.archive.crawler.settings.Type org.archive.crawler.settings.ComplexType org.archive.crawler.settings.ModuleType org.archive.crawler.framework.Processor org.archive.crawler.fetcher.FetchFTP
public class FetchFTP
Fetches documents and directory listings using FTP. This class will also try to extract FTP "links" from directory listings. For this class to archive a directory listing, the remote FTP server must support the NLIST command. Most modern FTP servers should.
Nested Class Summary |
---|
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType |
---|
ComplexType.MBeanAttributeInfoIterator |
Field Summary | |
---|---|
static java.lang.String |
ATTR_BANDWIDTH
The name for the fetch-bandwidth attribute. |
static java.lang.String |
ATTR_MAX_LENGTH
The name for the max-length-bytes attribute. |
static java.lang.String |
ATTR_PASSWORD
The name for the password attribute. |
static java.lang.String |
ATTR_TIMEOUT
The name for the timeout-seconds attribute. |
static java.lang.String |
ATTR_USERNAME
The name for the username attribute. |
Fields inherited from class org.archive.crawler.framework.Processor |
---|
ATTR_DECIDE_RULES, ATTR_ENABLED, attrDecideRules |
Fields inherited from class org.archive.crawler.settings.ComplexType |
---|
definition, definitionMap |
Constructor Summary | |
---|---|
FetchFTP(java.lang.String name)
Constructs a new FetchFTP . |
Method Summary | |
---|---|
boolean |
getExtractFromDirs(CrawlURI curi)
Returns the extract.from.dirs attribute for this
FetchFTP and the given curi. |
boolean |
getExtractParent(CrawlURI curi)
Returns the extract.parent attribute for this
FetchFTP and the given curi. |
int |
getFetchBandwidth(CrawlURI curi)
Returns the fetch-bandwidth attribute for this
FetchFTP and the given curi. |
long |
getMaxLength(CrawlURI curi)
Returns the max-length-bytes attribute for this
FetchFTP and the given curi. |
int |
getTimeout(CrawlURI curi)
Returns the timeout-seconds attribute for this
FetchFTP and the given curi. |
void |
innerProcess(CrawlURI curi)
Processes the given URI. |
Methods inherited from class org.archive.crawler.framework.Processor |
---|
checkForInterrupt, finalTasks, getController, getDecideRule, getDefaultNextProcessor, initialTasks, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn |
Methods inherited from class org.archive.crawler.settings.ModuleType |
---|
addElement, listUsedFiles |
Methods inherited from class org.archive.crawler.settings.Type |
---|
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
---|
getName, hashCode |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final java.lang.String ATTR_USERNAME
username
attribute.
public static final java.lang.String ATTR_PASSWORD
password
attribute.
public static final java.lang.String ATTR_MAX_LENGTH
max-length-bytes
attribute.
public static final java.lang.String ATTR_BANDWIDTH
fetch-bandwidth
attribute.
public static final java.lang.String ATTR_TIMEOUT
timeout-seconds
attribute.
Constructor Detail |
---|
public FetchFTP(java.lang.String name)
FetchFTP
.
name
- the name of this processorMethod Detail |
---|
public void innerProcess(CrawlURI curi) throws java.lang.InterruptedException
If the connection is successful, an attempt will be made to CD to the path specified in the URI. If the remote CD command succeeds, then it is assumed that the URI represents a directory. If the CD command fails, then it is assumed that the URI represents a file.
For directories, the directory listing will be fetched using
the FTP LIST command, and saved to the HttpRecorder. If the
extract.from.dirs
attribute is set to true, then
the files in the fetched list will be added to the curi as
extracted FTP links. (It was easier to do that here, rather
than writing a separate FTPExtractor.)
For files, the file will be fetched using the FTP RETR command, and saved to the HttpRecorder.
All file transfers (including directory listings) occur using Binary mode transfer. Also, the local passive transfer mode is always used, to play well with firewalls.
innerProcess
in class Processor
curi
- the curi to process
java.lang.InterruptedException
- if the thread is interrupted during
processingpublic boolean getExtractFromDirs(CrawlURI curi)
extract.from.dirs
attribute for this
FetchFTP
and the given curi.
curi
- the curi whose attribute to return
extract.from.dirs
public boolean getExtractParent(CrawlURI curi)
extract.parent
attribute for this
FetchFTP
and the given curi.
curi
- the curi whose attribute to return
extract-parent
public int getTimeout(CrawlURI curi)
timeout-seconds
attribute for this
FetchFTP
and the given curi.
curi
- the curi whose attribute to return
timeout-seconds
public long getMaxLength(CrawlURI curi)
max-length-bytes
attribute for this
FetchFTP
and the given curi.
curi
- the curi whose attribute to return
max-length-bytes
public int getFetchBandwidth(CrawlURI curi)
fetch-bandwidth
attribute for this
FetchFTP
and the given curi.
curi
- the curi whose attribute to return
fetch-bandwidth
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |