org.archive.crawler.processor.recrawl
Class PersistStoreProcessor

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.framework.Processor
                      extended by org.archive.crawler.processor.recrawl.PersistProcessor
                          extended by org.archive.crawler.processor.recrawl.PersistOnlineProcessor
                              extended by org.archive.crawler.processor.recrawl.PersistStoreProcessor
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean, CrawlStatusListener

public class PersistStoreProcessor
extends PersistOnlineProcessor
implements CrawlStatusListener

Store CrawlURI attributes from latest fetch to persistent storage for consultation by a later recrawl.

Version:
$Date: 2006-09-25 20:19:54 +0000 (Mon, 25 Sep 2006) $, $Revision: 4654 $
Author:
gojomo
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
ComplexType.MBeanAttributeInfoIterator
 
Field Summary
 
Fields inherited from class org.archive.crawler.processor.recrawl.PersistOnlineProcessor
historyDb, store
 
Fields inherited from class org.archive.crawler.processor.recrawl.PersistProcessor
URI_HISTORY_DBNAME
 
Fields inherited from class org.archive.crawler.framework.Processor
ATTR_DECIDE_RULES, ATTR_ENABLED, attrDecideRules
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definition, definitionMap
 
Constructor Summary
PersistStoreProcessor(java.lang.String name)
          Usual constructor
 
Method Summary
 void crawlCheckpoint(java.io.File checkpointDir)
          Called by CrawlController when checkpointing.
 void crawlEnded(java.lang.String sExitMessage)
          Called when a CrawlController has ended a crawl and is about to exit.
 void crawlEnding(java.lang.String sExitMessage)
          Called when a CrawlController is ending a crawl (for any reason)
 void crawlPaused(java.lang.String statusMessage)
          Called when a CrawlController is actually paused (all threads are idle).
 void crawlPausing(java.lang.String statusMessage)
          Called when a CrawlController is going to be paused.
 void crawlResuming(java.lang.String statusMessage)
          Called when a CrawlController is resuming a crawl that had been paused.
 void crawlStarted(java.lang.String message)
          Called on crawl start.
protected  void initialTasks()
          Classes subclassing this one should override this method to perform processor specific actions.
protected  void innerProcess(CrawlURI curi)
          Classes subclassing this one should override this method to perform their custom actions on the CrawlURI.
 
Methods inherited from class org.archive.crawler.processor.recrawl.PersistOnlineProcessor
finalTasks, initStore
 
Methods inherited from class org.archive.crawler.processor.recrawl.PersistProcessor
copyPersistSourceToHistoryMap, historyDatabaseConfig, main, persistKeyFor, populatePersistEnv, setupCopyEnvironment, setupCopyEnvironment, shouldLoad, shouldStore
 
Methods inherited from class org.archive.crawler.framework.Processor
checkForInterrupt, getController, getDecideRule, getDefaultNextProcessor, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement, listUsedFiles
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName, hashCode
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

PersistStoreProcessor

public PersistStoreProcessor(java.lang.String name)
Usual constructor

Parameters:
name -
Method Detail

initialTasks

protected void initialTasks()
Description copied from class: Processor
Classes subclassing this one should override this method to perform processor specific actions.

This method is garanteed to be called after the crawl is set up, but before any URI-processing has occured.

Overrides:
initialTasks in class PersistOnlineProcessor

innerProcess

protected void innerProcess(CrawlURI curi)
                     throws java.lang.InterruptedException
Description copied from class: Processor
Classes subclassing this one should override this method to perform their custom actions on the CrawlURI.

Overrides:
innerProcess in class Processor
Parameters:
curi - The CrawlURI being processed.
Throws:
java.lang.InterruptedException

crawlCheckpoint

public void crawlCheckpoint(java.io.File checkpointDir)
                     throws java.lang.Exception
Description copied from interface: CrawlStatusListener
Called by CrawlController when checkpointing.

Specified by:
crawlCheckpoint in interface CrawlStatusListener
Parameters:
checkpointDir - Checkpoint dir. Write checkpoint state here.
Throws:
java.lang.Exception - A fatal exception. Any exceptions that are let out of this checkpoint are assumed fatal and terminate further checkpoint processing.

crawlEnded

public void crawlEnded(java.lang.String sExitMessage)
Description copied from interface: CrawlStatusListener
Called when a CrawlController has ended a crawl and is about to exit.

Specified by:
crawlEnded in interface CrawlStatusListener
Parameters:
sExitMessage - Type of exit. Should be one of the STATUS constants in defined in CrawlJob.
See Also:
CrawlJob

crawlEnding

public void crawlEnding(java.lang.String sExitMessage)
Description copied from interface: CrawlStatusListener
Called when a CrawlController is ending a crawl (for any reason)

Specified by:
crawlEnding in interface CrawlStatusListener
Parameters:
sExitMessage - Type of exit. Should be one of the STATUS constants in defined in CrawlJob.
See Also:
CrawlJob

crawlPaused

public void crawlPaused(java.lang.String statusMessage)
Description copied from interface: CrawlStatusListener
Called when a CrawlController is actually paused (all threads are idle).

Specified by:
crawlPaused in interface CrawlStatusListener
Parameters:
statusMessage - Should be CrawlJob.STATUS_PAUSED. Passed for convenience

crawlPausing

public void crawlPausing(java.lang.String statusMessage)
Description copied from interface: CrawlStatusListener
Called when a CrawlController is going to be paused.

Specified by:
crawlPausing in interface CrawlStatusListener
Parameters:
statusMessage - Should be STATUS_WAITING_FOR_PAUSE. Passed for convenience

crawlResuming

public void crawlResuming(java.lang.String statusMessage)
Description copied from interface: CrawlStatusListener
Called when a CrawlController is resuming a crawl that had been paused.

Specified by:
crawlResuming in interface CrawlStatusListener
Parameters:
statusMessage - Should be CrawlJob.STATUS_RUNNING. Passed for convenience

crawlStarted

public void crawlStarted(java.lang.String message)
Description copied from interface: CrawlStatusListener
Called on crawl start.

Specified by:
crawlStarted in interface CrawlStatusListener
Parameters:
message - Start message.


Copyright © 2003-2011 Internet Archive. All Rights Reserved.