|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object javax.management.Attribute org.archive.crawler.settings.Type org.archive.crawler.settings.ComplexType org.archive.crawler.settings.ModuleType org.archive.crawler.framework.AbstractTracker
public abstract class AbstractTracker
A partial implementation of the StatisticsTracking interface.
It covers the thread handling. (Launching, pausing etc.) Included in this is keeping track of the total time spent (actually) crawling. Several methods to access the time started, finished etc. are provided.
To handle the thread work the class implements the CrawlStatusListener and uses it's events to pause, resume and stop logging of statistics. The run() method will call logActivity() at intervals specified in the crawl order.
Implementation of logActivity (the actual logging) as well as listening for CrawlURIDisposition events is not addressed.
StatisticsTracking
,
StatisticsTracker
,
Serialized FormNested Class Summary |
---|
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType |
---|
ComplexType.MBeanAttributeInfoIterator |
Field Summary | |
---|---|
static java.lang.String |
ATTR_STATS_INTERVAL
Attribute name for logging interval in seconds setting |
protected CrawlController |
controller
A reference to the CrawlContoller of the crawl that we are to track statistics for. |
protected long |
crawlerEndTime
|
protected long |
crawlerPauseStarted
|
protected long |
crawlerStartTime
|
protected long |
crawlerTotalPausedTime
|
static java.lang.Integer |
DEFAULT_STATISTICS_REPORT_INTERVAL
Default period between logging stat values |
protected long |
lastLogPointTime
Timestamp of when this logger last wrote something to the log |
protected boolean |
shouldrun
|
Fields inherited from class org.archive.crawler.settings.ComplexType |
---|
definition, definitionMap |
Fields inherited from interface org.archive.crawler.framework.StatisticsTracking |
---|
SEED_DISPOSITION_DISREGARD, SEED_DISPOSITION_FAILURE, SEED_DISPOSITION_NOT_PROCESSED, SEED_DISPOSITION_RETRY, SEED_DISPOSITION_SUCCESS |
Constructor Summary | |
---|---|
AbstractTracker(java.lang.String name,
java.lang.String description)
|
Method Summary | |
---|---|
long |
crawlDuration()
Returns how long the current crawl has been running (excluding any time spent paused/suspended/stopped) since it began. |
void |
crawlEnded(java.lang.String sExitMessage)
Called when a CrawlController has ended a crawl and is about to exit. |
void |
crawlEnding(java.lang.String sExitMessage)
Called when a CrawlController is ending a crawl (for any reason) |
void |
crawlPaused(java.lang.String statusMessage)
Called when a CrawlController is actually paused (all threads are idle). |
void |
crawlPausing(java.lang.String statusMessage)
Called when a CrawlController is going to be paused. |
void |
crawlResuming(java.lang.String statusMessage)
Called when a CrawlController is resuming a crawl that had been paused. |
void |
crawlStarted(java.lang.String message)
Called on crawl start. |
protected void |
dumpReports()
Dump reports, if any, on request or at crawl end. |
protected void |
finalCleanup()
Cleanup resources used, at crawl end. |
long |
getCrawlEndTime()
If crawl has ended it will return the time it ended (given by System.currentTimeMillis() at that time). |
long |
getCrawlerTotalElapsedTime()
Total amount of time spent actively crawling so far. |
long |
getCrawlPauseStartedTime()
Get the time when the the crawl was last paused/suspended (as given by System.currentTimeMillis() at that time). |
long |
getCrawlStartTime()
Get the starting time of the crawl (as given by System.currentTimeMillis() when the crawl started). |
long |
getCrawlTotalPauseTime()
Returns the number of milliseconds that the crawl spent paused or otherwise in a nonactive state. |
protected int |
getLogWriteInterval()
The number of seconds to wait between writing snapshot data to log file. |
void |
initialize(CrawlController c)
Sets up the Logger (including logInterval) and registers with the CrawlController for CrawlStatus and CrawlURIDisposition events. |
protected void |
logNote(java.lang.String note)
|
void |
noteStart()
Notify tracker that crawl has begun. |
protected void |
progressStatisticsEvent(java.util.EventObject e)
A method for logging current crawler state. |
java.lang.String |
progressStatisticsLegend()
|
void |
run()
Start thread. |
protected void |
tallyCurrentPause()
For a current pause (if any), add paused time to total and reset |
Methods inherited from class org.archive.crawler.settings.ModuleType |
---|
addElement, listUsedFiles |
Methods inherited from class org.archive.crawler.settings.Type |
---|
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
---|
getName, hashCode |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Methods inherited from interface org.archive.crawler.framework.StatisticsTracking |
---|
activeThreadCount, averageDepth, congestionRatio, currentProcessedDocsPerSec, currentProcessedKBPerSec, deepestUri, getProgressStatistics, getProgressStatisticsLine, getSeedRecordsSortedByStatusCode, processedDocsPerSec, processedKBPerSec, successfullyFetchedCount, totalBytesCrawled, totalBytesWritten, totalCount |
Methods inherited from interface org.archive.crawler.event.CrawlStatusListener |
---|
crawlCheckpoint |
Field Detail |
---|
public static final java.lang.Integer DEFAULT_STATISTICS_REPORT_INTERVAL
public static final java.lang.String ATTR_STATS_INTERVAL
protected transient CrawlController controller
protected long crawlerStartTime
protected long crawlerEndTime
protected long crawlerPauseStarted
protected long crawlerTotalPausedTime
protected long lastLogPointTime
protected volatile boolean shouldrun
Constructor Detail |
---|
public AbstractTracker(java.lang.String name, java.lang.String description)
name
- description
- Method Detail |
---|
public void initialize(CrawlController c) throws FatalConfigurationException
initialize
in interface StatisticsTracking
c
- A crawl controller instance.
FatalConfigurationException
- Not thrown here. For overrides that
go to settings system for configuration.CrawlStatusListener
,
CrawlURIDispositionListener
public void run()
run
in interface java.lang.Runnable
public java.lang.String progressStatisticsLegend()
progressStatisticsLegend
in interface StatisticsTracking
public void noteStart()
noteStart
in interface StatisticsTracking
protected void progressStatisticsEvent(java.util.EventObject e)
CrawlController.logProgressStatistics(java.lang.String)
so CrawlController
can act on progress statistics event.
It is recommended that for implementations of this method it be carefully considered if it should be synchronized in whole or in part
e
- Progress statistics event.public long getCrawlStartTime()
System.currentTimeMillis()
when the crawl started).
public long getCrawlEndTime()
System.currentTimeMillis()
at that time).
System.currentTimeMillis()
at the time of the call.
public long getCrawlTotalPauseTime()
public long getCrawlPauseStartedTime()
System.currentTimeMillis()
at that time). Will be 0 if the
crawl is not currently paused.
public long getCrawlerTotalElapsedTime()
StatisticsTracking
Returns the total amount of time (in milliseconds) that has elapsed from the start of the crawl and until the current time or if the crawl has ended until the the end of the crawl minus any time spent paused.
getCrawlerTotalElapsedTime
in interface StatisticsTracking
protected int getLogWriteInterval()
public void crawlPausing(java.lang.String statusMessage)
CrawlStatusListener
crawlPausing
in interface CrawlStatusListener
statusMessage
- Should be
STATUS_WAITING_FOR_PAUSE
. Passed for convenienceCrawlStatusListener.crawlPausing(java.lang.String)
protected void logNote(java.lang.String note)
public void crawlPaused(java.lang.String statusMessage)
CrawlStatusListener
crawlPaused
in interface CrawlStatusListener
statusMessage
- Should be
CrawlJob.STATUS_PAUSED
. Passed for
conveniencepublic void crawlResuming(java.lang.String statusMessage)
CrawlStatusListener
crawlResuming
in interface CrawlStatusListener
statusMessage
- Should be
CrawlJob.STATUS_RUNNING
. Passed for
convenienceprotected void tallyCurrentPause()
public void crawlEnding(java.lang.String sExitMessage)
CrawlStatusListener
crawlEnding
in interface CrawlStatusListener
sExitMessage
- Type of exit. Should be one of the STATUS constants
in defined in CrawlJob.CrawlJob
public void crawlEnded(java.lang.String sExitMessage)
CrawlStatusListener
crawlEnded
in interface CrawlStatusListener
sExitMessage
- Type of exit. Should be one of the STATUS constants
in defined in CrawlJob.CrawlStatusListener.crawlEnded(java.lang.String)
public void crawlStarted(java.lang.String message)
CrawlStatusListener
crawlStarted
in interface CrawlStatusListener
message
- Start message.protected void dumpReports()
protected void finalCleanup()
public long crawlDuration()
StatisticsTracking
crawlDuration
in interface StatisticsTracking
StatisticsTracking.crawlDuration()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |