org.archive.crawler.datamodel
Class CrawlSubstats

java.lang.Object
  extended by org.archive.crawler.datamodel.CrawlSubstats
All Implemented Interfaces:
java.io.Serializable, FetchStatusCodes

public class CrawlSubstats
extends java.lang.Object
implements java.io.Serializable, FetchStatusCodes

Collector of statistics for a 'subset' of a crawl, such as a server (host:port), host, or frontier group (eg queue).

Author:
gojomo
See Also:
Serialized Form

Nested Class Summary
static interface CrawlSubstats.HasCrawlSubstats
           
static class CrawlSubstats.Stage
           
 
Field Summary
(package private)  long dupByHashBytes
           
(package private)  long dupByHashUrls
           
(package private)  long fetchDisregards
           
(package private)  long fetchFailures
           
(package private)  long fetchNonResponses
           
(package private)  long fetchResponses
           
(package private)  long fetchSuccesses
           
(package private)  long notModifiedBytes
           
(package private)  long notModifiedUrls
           
(package private)  long novelBytes
           
(package private)  long novelUrls
           
(package private)  long robotsDenials
           
(package private)  long successBytes
           
(package private)  long totalBytes
           
(package private)  long totalScheduled
           
 
Fields inherited from interface org.archive.crawler.datamodel.FetchStatusCodes
S_BLOCKED_BY_CUSTOM_PROCESSOR, S_BLOCKED_BY_QUOTA, S_BLOCKED_BY_RUNTIME_LIMIT, S_BLOCKED_BY_USER, S_CONNECT_FAILED, S_CONNECT_LOST, S_DEEMED_CHAFF, S_DEEMED_NOT_FOUND, S_DEFERRED, S_DELETED_BY_USER, S_DNS_SUCCESS, S_DOMAIN_PREREQUISITE_FAILURE, S_DOMAIN_UNRESOLVABLE, S_GETBYNAME_SUCCESS, S_OTHER_PREREQUISITE_FAILURE, S_OUT_OF_SCOPE, S_PREREQUISITE_UNSCHEDULABLE_FAILURE, S_PROCESSING_THREAD_KILLED, S_ROBOTS_PRECLUDED, S_ROBOTS_PREREQUISITE_FAILURE, S_RUNTIME_EXCEPTION, S_SERIOUS_ERROR, S_TIMEOUT, S_TOO_MANY_EMBED_HOPS, S_TOO_MANY_LINK_HOPS, S_TOO_MANY_RETRIES, S_UNATTEMPTED, S_UNFETCHABLE_URI, S_UNQUEUEABLE
 
Constructor Summary
CrawlSubstats()
           
 
Method Summary
 long getDupByHashBytes()
           
 long getDupByHashUrls()
           
 long getFetchDisregards()
           
 long getFetchNonResponses()
           
 long getFetchResponses()
           
 long getFetchSuccesses()
           
 long getNotModifiedBytes()
           
 long getNotModifiedUrls()
           
 long getNovelBytes()
           
 long getNovelUrls()
           
 long getRecordedFinishes()
           
 long getRemaining()
           
 long getRobotsDenials()
           
 long getSuccessBytes()
           
 long getTotalBytes()
           
 long getTotalScheduled()
           
 void tally(CrawlURI curi, CrawlSubstats.Stage stage)
          Examing the CrawlURI and based on its status and internal values, update tallies.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

totalScheduled

long totalScheduled

fetchSuccesses

long fetchSuccesses

fetchFailures

long fetchFailures

fetchDisregards

long fetchDisregards

fetchResponses

long fetchResponses

robotsDenials

long robotsDenials

successBytes

long successBytes

totalBytes

long totalBytes

fetchNonResponses

long fetchNonResponses

novelBytes

long novelBytes

novelUrls

long novelUrls

notModifiedBytes

long notModifiedBytes

notModifiedUrls

long notModifiedUrls

dupByHashBytes

long dupByHashBytes

dupByHashUrls

long dupByHashUrls
Constructor Detail

CrawlSubstats

public CrawlSubstats()
Method Detail

tally

public void tally(CrawlURI curi,
                  CrawlSubstats.Stage stage)
Examing the CrawlURI and based on its status and internal values, update tallies.

Parameters:
curi -

getFetchSuccesses

public long getFetchSuccesses()

getFetchResponses

public long getFetchResponses()

getSuccessBytes

public long getSuccessBytes()

getTotalBytes

public long getTotalBytes()

getFetchNonResponses

public long getFetchNonResponses()

getTotalScheduled

public long getTotalScheduled()

getFetchDisregards

public long getFetchDisregards()

getRobotsDenials

public long getRobotsDenials()

getRemaining

public long getRemaining()

getRecordedFinishes

public long getRecordedFinishes()

getNovelBytes

public long getNovelBytes()

getNovelUrls

public long getNovelUrls()

getNotModifiedBytes

public long getNotModifiedBytes()

getNotModifiedUrls

public long getNotModifiedUrls()

getDupByHashBytes

public long getDupByHashBytes()

getDupByHashUrls

public long getDupByHashUrls()


Copyright © 2003-2011 Internet Archive. All Rights Reserved.