|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object javax.management.Attribute org.archive.crawler.settings.Type org.archive.crawler.settings.ComplexType org.archive.crawler.settings.ModuleType org.archive.crawler.framework.Processor org.archive.crawler.writer.MirrorWriterProcessor
public class MirrorWriterProcessor
Processor module that writes the results of successful fetches to files on disk. Writes contents of one URI to one file on disk. The files are arranged in a directory hierarchy based on the URI paths. In that sense they mirror the file hierarchy that might exist on the servers.
There are a number of issues involved:
There would normally be a single instance of this class per Heritrix instance. This class is thread-safe; any number of threads can be in its innerProcess method at once. However, conflicts can still arise in the file system. For example, if several threads try to create the same directory at the same time, only one can win. Therefore, there should be at most one access to a server at a given time.
Nested Class Summary | |
---|---|
(package private) class |
MirrorWriterProcessor.DirSegment
This class represents one directory segment (component) of a URI path. |
(package private) class |
MirrorWriterProcessor.EndSegment
This class represents the last segment (component) of a URI path. |
(package private) class |
MirrorWriterProcessor.LumpyString
This class represents a dynamically growable string consisting of substrings ("lumps") that are treated atomically. |
(package private) class |
MirrorWriterProcessor.PathSegment
This class represents one segment (component) of a URI path. |
(package private) class |
MirrorWriterProcessor.URIToFileReturn
This class is returned by uriToFile. |
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType |
---|
ComplexType.MBeanAttributeInfoIterator |
Field Summary | |
---|---|
static java.lang.String |
ATTR_CASE_SENSITIVE
Key to use asking settings for case sensitive option. |
static java.lang.String |
ATTR_CHAR_MAP
Key to use asking settings for character map. |
static java.lang.String |
ATTR_CONTENT_TYPE_MAP
Key to use asking settings for content type map. |
static java.lang.String |
ATTR_DIRECTORY_FILE
Key to use asking settings for directory file. |
static java.lang.String |
ATTR_DOT_BEGIN
Key to use asking settings for dot begin replacement. |
static java.lang.String |
ATTR_DOT_END
Key to use asking settings for dot end replacement. |
static java.lang.String |
ATTR_HOST_DIRECTORY
Key to use asking settings for host directory option. |
static java.lang.String |
ATTR_HOST_MAP
Key to use asking settings for host map. |
static java.lang.String |
ATTR_MAX_PATH_LEN
Key to use asking settings for maximum file system path length. |
static java.lang.String |
ATTR_MAX_SEG_LEN
Key to use asking settings for maximum file system path segment length. |
static java.lang.String |
ATTR_PATH
Key to use asking settings for base directory path value. |
static java.lang.String |
ATTR_PORT_DIRECTORY
Key to use asking settings for port directory option. |
static java.lang.String |
ATTR_SUFFIX_AT_END
Key to use asking settings for suffix at end option. |
static java.lang.String |
ATTR_TOO_LONG_DIRECTORY
Key to use asking settings for too-long directory. |
static java.lang.String |
ATTR_UNDERSCORE_SET
Key to use asking settings for underscore set. |
Fields inherited from class org.archive.crawler.framework.Processor |
---|
ATTR_DECIDE_RULES, ATTR_ENABLED, attrDecideRules |
Fields inherited from class org.archive.crawler.settings.ComplexType |
---|
definition, definitionMap |
Constructor Summary | |
---|---|
MirrorWriterProcessor(java.lang.String name)
|
Method Summary | |
---|---|
protected void |
innerProcess(CrawlURI curi)
Classes subclassing this one should override this method to perform their custom actions on the CrawlURI. |
Methods inherited from class org.archive.crawler.framework.Processor |
---|
checkForInterrupt, finalTasks, getController, getDecideRule, getDefaultNextProcessor, initialTasks, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn |
Methods inherited from class org.archive.crawler.settings.ModuleType |
---|
addElement, listUsedFiles |
Methods inherited from class org.archive.crawler.settings.Type |
---|
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
---|
getName, hashCode |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final java.lang.String ATTR_CASE_SENSITIVE
public static final java.lang.String ATTR_CHAR_MAP
public static final java.lang.String ATTR_CONTENT_TYPE_MAP
public static final java.lang.String ATTR_DOT_BEGIN
public static final java.lang.String ATTR_DOT_END
public static final java.lang.String ATTR_DIRECTORY_FILE
public static final java.lang.String ATTR_HOST_DIRECTORY
public static final java.lang.String ATTR_HOST_MAP
public static final java.lang.String ATTR_MAX_PATH_LEN
public static final java.lang.String ATTR_MAX_SEG_LEN
public static final java.lang.String ATTR_PATH
public static final java.lang.String ATTR_PORT_DIRECTORY
public static final java.lang.String ATTR_SUFFIX_AT_END
public static final java.lang.String ATTR_TOO_LONG_DIRECTORY
public static final java.lang.String ATTR_UNDERSCORE_SET
Constructor Detail |
---|
public MirrorWriterProcessor(java.lang.String name)
name
- Name of this processor.Method Detail |
---|
protected void innerProcess(CrawlURI curi)
Processor
innerProcess
in class Processor
curi
- The CrawlURI being processed.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |