|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface StatisticsTracking
An interface for objects that want to collect statistics on running crawls. An implementation of this is referenced in the crawl order and loaded when the crawl begins.
It will be given a reference to the relevant CrawlController. The CrawlController will contain any additional configuration information needed.
Any class that implements this interface can be specified as a statistics tracker in a crawl order. The CrawlController will then create and initialize a copy of it and call it's start() method.
This interface also specifies several methods to access data that
the CrawlController or the URIFrontier may be interested in at
run time but do not want to have keep track of for themselves.
AbstractTracker
implements these. If there are more then one StatisticsTracking
classes defined in the crawl order only the first one will be
used to access this data.
It is recommended that it register for
CrawlStatus
events and
CrawlURIDisposition
events to be able to properly monitor a crawl. Both are registered with the
CrawlController.
AbstractTracker
,
CrawlStatusListener
,
CrawlURIDispositionListener
,
CrawlController
Field Summary | |
---|---|
static java.lang.String |
SEED_DISPOSITION_DISREGARD
Seed was disregarded |
static java.lang.String |
SEED_DISPOSITION_FAILURE
Failed to crawl seed |
static java.lang.String |
SEED_DISPOSITION_NOT_PROCESSED
Seed has not been processed |
static java.lang.String |
SEED_DISPOSITION_RETRY
Failed to crawl seed, will retry |
static java.lang.String |
SEED_DISPOSITION_SUCCESS
Seed successfully crawled |
Method Summary | |
---|---|
int |
activeThreadCount()
Get the number of active (non-paused) threads. |
long |
averageDepth()
|
float |
congestionRatio()
|
long |
crawlDuration()
Returns how long the current crawl has been running (excluding any time spent paused/suspended/stopped) since it began. |
double |
currentProcessedDocsPerSec()
Returns an estimate of recent document download rates based on a queue of recently seen CrawlURIs (as of last snapshot). |
int |
currentProcessedKBPerSec()
Calculates an estimate of the rate, in kb, at which documents are currently being processed by the crawler. |
long |
deepestUri()
|
long |
getCrawlerTotalElapsedTime()
Total amount of time spent actively crawling so far. |
java.util.Map |
getProgressStatistics()
|
java.lang.String |
getProgressStatisticsLine()
|
java.util.Iterator |
getSeedRecordsSortedByStatusCode()
Get a SeedRecord iterator for the job being monitored. |
void |
initialize(CrawlController c)
Do initialization. |
void |
noteStart()
Start the tracker's crawl timing. |
double |
processedDocsPerSec()
Returns the number of documents that have been processed per second over the life of the crawl (as of last snapshot) |
long |
processedKBPerSec()
Calculates the rate that data, in kb, has been processed over the life of the crawl (as of last snapshot.) |
java.lang.String |
progressStatisticsLegend()
|
long |
successfullyFetchedCount()
Number of successfully processed URIs. |
long |
totalBytesCrawled()
Returns the total number of uncompressed bytes crawled. |
long |
totalBytesWritten()
Deprecated. misnomer; use totalBytesCrawled instead |
long |
totalCount()
|
Methods inherited from interface java.lang.Runnable |
---|
run |
Field Detail |
---|
static final java.lang.String SEED_DISPOSITION_SUCCESS
static final java.lang.String SEED_DISPOSITION_FAILURE
static final java.lang.String SEED_DISPOSITION_RETRY
static final java.lang.String SEED_DISPOSITION_DISREGARD
static final java.lang.String SEED_DISPOSITION_NOT_PROCESSED
Method Detail |
---|
void initialize(CrawlController c) throws FatalConfigurationException
c
- The CrawlController
running the crawl
that this class is to gather statistics on.
FatalConfigurationException
long crawlDuration()
void noteStart()
long totalBytesWritten()
long totalBytesCrawled()
long getCrawlerTotalElapsedTime()
Returns the total amount of time (in milliseconds) that has elapsed from the start of the crawl and until the current time or if the crawl has ended until the the end of the crawl minus any time spent paused.
double currentProcessedDocsPerSec()
double processedDocsPerSec()
long processedKBPerSec()
int currentProcessedKBPerSec()
int activeThreadCount()
long successfullyFetchedCount()
If crawl not running (paused or stopped) this will return the value of the last snapshot.
Frontier.succeededFetchCount()
long totalCount()
float congestionRatio()
long deepestUri()
long averageDepth()
java.util.Iterator getSeedRecordsSortedByStatusCode()
Sort order is:
No status code (not processed)
Status codes smaller then 0 (largest to smallest)
Status codes larger then 0 (largest to smallest)
Note: This iterator will iterate over a list of SeedRecords.
java.lang.String progressStatisticsLegend()
java.lang.String getProgressStatisticsLine()
java.util.Map getProgressStatistics()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |