|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object javax.management.Attribute org.archive.crawler.settings.Type org.archive.crawler.settings.ComplexType org.archive.crawler.settings.ModuleType org.archive.crawler.framework.Processor
public class Processor
Base class for URI processing classes.
Each URI is processed by a user defined series of processors. This class provides the basic infrastructure for these but does not actually do anything. New processors can be easily created by subclassing this class.
Classes subclassing this one should not trap InterruptedExceptions. They should be allowed to propagate to the ToeThread executing the processor. Also they should immediately exit their main method (innerProcess()) if the interrupted flag is set.
ToeThread
,
Serialized FormNested Class Summary |
---|
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType |
---|
ComplexType.MBeanAttributeInfoIterator |
Field Summary | |
---|---|
static java.lang.String |
ATTR_DECIDE_RULES
Key to use asking settings for decide-rules value. |
static java.lang.String |
ATTR_ENABLED
Key to use asking settings for enabled value. |
protected java.lang.String |
attrDecideRules
local name for decide-rules |
Fields inherited from class org.archive.crawler.settings.ComplexType |
---|
definition, definitionMap |
Constructor Summary | |
---|---|
Processor(java.lang.String name,
java.lang.String description)
|
Method Summary | |
---|---|
protected void |
checkForInterrupt()
|
protected void |
finalTasks()
Classes subclassing this one should override this method to perform processor specific actions. |
CrawlController |
getController()
Get the controller object. |
protected DecideRule |
getDecideRule(java.lang.Object o)
|
Processor |
getDefaultNextProcessor(CrawlURI curi)
Returns the next processor for the given CrawlURI in the processor chain. |
protected void |
initialTasks()
Classes subclassing this one should override this method to perform processor specific actions. |
protected void |
innerProcess(CrawlURI curi)
Classes subclassing this one should override this method to perform their custom actions on the CrawlURI. |
protected void |
innerRejectProcess(CrawlURI curi)
|
protected boolean |
isContentToProcess(CrawlURI curi)
|
boolean |
isEnabled()
|
protected boolean |
isExpectedMimeType(java.lang.String contentType,
java.lang.String expectedPrefix)
|
protected boolean |
isHttpTransactionContentToProcess(CrawlURI curi)
|
void |
kickUpdate()
|
void |
process(CrawlURI curi)
Perform processing on the given CrawlURI. |
java.lang.String |
report()
Compiles and returns a report (in human readable form) about the status of the processor. |
protected boolean |
rulesAccept(DecideRule rule,
java.lang.Object o)
|
protected boolean |
rulesAccept(java.lang.Object o)
|
void |
setDefaultNextProcessor(Processor nextProcessor)
Set the default next processor in the chain. |
Processor |
spawn(int serialNum)
|
Methods inherited from class org.archive.crawler.settings.ModuleType |
---|
addElement, listUsedFiles |
Methods inherited from class org.archive.crawler.settings.Type |
---|
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
---|
getName, hashCode |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final java.lang.String ATTR_DECIDE_RULES
protected java.lang.String attrDecideRules
public static final java.lang.String ATTR_ENABLED
Constructor Detail |
---|
public Processor(java.lang.String name, java.lang.String description)
name
- description
- Method Detail |
---|
public final void process(CrawlURI curi) throws java.lang.InterruptedException
curi
-
java.lang.InterruptedException
protected void checkForInterrupt() throws java.lang.InterruptedException
java.lang.InterruptedException
protected void innerRejectProcess(CrawlURI curi) throws java.lang.InterruptedException
curi
- CrawlURI instance.
java.lang.InterruptedException
protected void innerProcess(CrawlURI curi) throws java.lang.InterruptedException
curi
- The CrawlURI being processed.
java.lang.InterruptedException
protected void initialTasks()
This method is garanteed to be called after the crawl is set up, but before any URI-processing has occured.
protected void finalTasks()
protected DecideRule getDecideRule(java.lang.Object o)
protected boolean rulesAccept(java.lang.Object o)
protected boolean rulesAccept(DecideRule rule, java.lang.Object o)
public Processor getDefaultNextProcessor(CrawlURI curi)
curi
- The CrawlURI that we want to find the next processor for.
public void setDefaultNextProcessor(Processor nextProcessor)
nextProcessor
- the default next processor in the chain.public CrawlController getController()
public Processor spawn(int serialNum)
public java.lang.String report()
Examples of stats declared would include:
* Number of CrawlURIs handled.
* Number of links extracted (for link extractors)
etc.
protected boolean isContentToProcess(CrawlURI curi)
curi
- CrawlURI to examine.
protected boolean isHttpTransactionContentToProcess(CrawlURI curi)
curi
- CrawlURI to examine.
isContentToProcess(CrawlURI)
and
the CrawlURI represents a successful http transaction.protected boolean isExpectedMimeType(java.lang.String contentType, java.lang.String expectedPrefix)
contentType
- Found content type.expectedPrefix
- String to find at start of contenttype: e.g.
text/html
.
public void kickUpdate()
public boolean isEnabled()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |