|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object javax.management.Attribute org.archive.crawler.settings.Type org.archive.crawler.settings.ComplexType org.archive.crawler.settings.ModuleType org.archive.crawler.framework.Processor org.archive.crawler.writer.Kw3WriterProcessor
public class Kw3WriterProcessor
Processor module that writes the results of successful fetches to files on disk. These files are MIME-files of the type used by the Swedish National Library's Kulturarw3 web harvesting [http://www.kb.se/kw3/]. Each URI gets written to its own file and has a path consisting of:
Nested Class Summary |
---|
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType |
---|
ComplexType.MBeanAttributeInfoIterator |
Field Summary | |
---|---|
static java.lang.String |
ATTR_CHMOD
Key to use asking settings if chmod should be execuated . |
static java.lang.String |
ATTR_CHMOD_VALUE
Key to use asking settings for the new chmod value. |
static java.lang.String |
ATTR_COLLECTION
Key for the collection attribute. |
static java.lang.String |
ATTR_HARVESTER
Key for the harvester attribute. |
static java.lang.String |
ATTR_MAX_BYTES_WRITTEN
Key for the maximum ARC bytes to write attribute. |
static java.lang.String |
ATTR_MAX_SIZE_BYTES
Key to use asking settings for max size value. |
static java.lang.String |
ATTR_PATH
Key to use asking settings for arc path value. |
static java.lang.String |
DEFAULT_CHMOD_VALUE
Default value for permissions. |
static java.lang.String |
DEFAULT_COLLECTION_VALUE
Default value for collection. |
static java.lang.String |
DEFAULT_HARVESTER_VALUE
Default value for harvester. |
static int |
DEFAULT_MAX_FILE_SIZE
Default max file size. |
Fields inherited from class org.archive.crawler.framework.Processor |
---|
ATTR_DECIDE_RULES, ATTR_ENABLED, attrDecideRules |
Fields inherited from class org.archive.crawler.settings.ComplexType |
---|
definition, definitionMap |
Fields inherited from interface org.archive.crawler.writer.Kw3Constants |
---|
ARCHIVE_TIME_KEY, COLLECTION_KEY, CONTENT_LENGTH_KEY, CONTENT_MD5_KEY, CONTENT_TYPE_KEY, HARVESTER_KEY, HEADER_LENGTH_KEY, HEADER_MD5_KEY, IP_ADDRESS_KEY, STATUS_CODE_KEY, URL_KEY |
Constructor Summary | |
---|---|
Kw3WriterProcessor(java.lang.String name)
|
Method Summary | |
---|---|
protected void |
initialTasks()
Classes subclassing this one should override this method to perform processor specific actions. |
protected java.io.OutputStream |
initOutputStream(CrawlURI curi)
|
protected void |
innerProcess(CrawlURI curi)
Classes subclassing this one should override this method to perform their custom actions on the CrawlURI. |
protected void |
writeArchiveInfoPart(java.lang.String boundary,
CrawlURI curi,
ReplayInputStream ris,
java.io.OutputStream out)
|
protected void |
writeContentPart(java.lang.String boundary,
CrawlURI curi,
ReplayInputStream ris,
java.io.OutputStream out)
|
protected void |
writeHeaderPart(java.lang.String boundary,
ReplayInputStream ris,
java.io.OutputStream out)
|
protected void |
writeMimeFile(CrawlURI curi)
|
Methods inherited from class org.archive.crawler.framework.Processor |
---|
checkForInterrupt, finalTasks, getController, getDecideRule, getDefaultNextProcessor, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn |
Methods inherited from class org.archive.crawler.settings.ModuleType |
---|
addElement, listUsedFiles |
Methods inherited from class org.archive.crawler.settings.Type |
---|
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
---|
getName, hashCode |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final java.lang.String ATTR_PATH
public static final java.lang.String ATTR_MAX_SIZE_BYTES
public static final int DEFAULT_MAX_FILE_SIZE
public static final java.lang.String ATTR_CHMOD
public static final java.lang.String ATTR_CHMOD_VALUE
public static final java.lang.String DEFAULT_CHMOD_VALUE
public static final java.lang.String ATTR_MAX_BYTES_WRITTEN
public static final java.lang.String ATTR_COLLECTION
public static final java.lang.String DEFAULT_COLLECTION_VALUE
public static final java.lang.String ATTR_HARVESTER
public static final java.lang.String DEFAULT_HARVESTER_VALUE
Constructor Detail |
---|
public Kw3WriterProcessor(java.lang.String name)
name
- Name of this processor.Method Detail |
---|
protected void initialTasks()
Processor
This method is garanteed to be called after the crawl is set up, but before any URI-processing has occured.
initialTasks
in class Processor
protected void innerProcess(CrawlURI curi)
Processor
innerProcess
in class Processor
curi
- The CrawlURI being processed.protected void writeMimeFile(CrawlURI curi) throws java.io.IOException
java.io.IOException
protected java.io.OutputStream initOutputStream(CrawlURI curi) throws java.io.IOException
java.io.IOException
protected void writeArchiveInfoPart(java.lang.String boundary, CrawlURI curi, ReplayInputStream ris, java.io.OutputStream out) throws java.io.IOException
java.io.IOException
protected void writeHeaderPart(java.lang.String boundary, ReplayInputStream ris, java.io.OutputStream out) throws java.io.IOException
java.io.IOException
protected void writeContentPart(java.lang.String boundary, CrawlURI curi, ReplayInputStream ris, java.io.OutputStream out) throws java.io.IOException
java.io.IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |