org.archive.crawler.framework
Interface FrontierHostStatistics


public interface FrontierHostStatistics

An optional interface the Frontiers can implement to provide information about specific hosts.

Some URIFrontier implmentations will want to provide a number of statistics relating to the progress of particular hosts. This only applies to those Frontiers whose internal structure uses hosts to split up the workload and (for example) implement politeness. Some other Frontiers may also provide this info based on calculations.

Author:
Kristinn Sigurdsson
See Also:
Frontier

Field Summary
static int HOST_DEFERRED
          Host has been deferred for some amount of time, will become ready once once that time has elapsed.
static int HOST_INACTIVE
          Host has been encountered and all availible URIs for it have been processed already.
static int HOST_INPROCESS
          Host has URIs currently being proessed.
static int HOST_READY
          Host has URIs ready to be emited.
static int HOST_UNKNOWN
          Host has not been encountered by the Frontier, or has been encountered but has been inactive so long that it has expired.
 
Method Summary
 int activeHosts()
          Total number of hosts that are currently active.
 int deferredHosts()
          Total number of deferred hosts.
 int hostStatus(java.lang.String host)
          Get the status of a host.
 int inactiveHosts()
          Total number of inactive hosts.
 int inProcessHosts()
          Total number of hosts with URIs in process.
 int readyHosts()
          Total number of hosts that have a URI ready for processing.
 

Field Detail

HOST_UNKNOWN

static final int HOST_UNKNOWN
Host has not been encountered by the Frontier, or has been encountered but has been inactive so long that it has expired.

See Also:
Constant Field Values

HOST_READY

static final int HOST_READY
Host has URIs ready to be emited.

See Also:
Constant Field Values

HOST_INPROCESS

static final int HOST_INPROCESS
Host has URIs currently being proessed.

See Also:
Constant Field Values

HOST_DEFERRED

static final int HOST_DEFERRED
Host has been deferred for some amount of time, will become ready once once that time has elapsed. This is most likely due to politeness or waiting between retries. Other conditions may exist.

See Also:
Constant Field Values

HOST_INACTIVE

static final int HOST_INACTIVE
Host has been encountered and all availible URIs for it have been processed already. More URIs may become availible later or not. Inactive hosts may eventually become 'forgotten'.

See Also:
Constant Field Values
Method Detail

activeHosts

int activeHosts()
Total number of hosts that are currently active.

Active hosts are considered to be those that are ready, deferred or in process.

Returns:
Total number of hosts that are currently active.

inactiveHosts

int inactiveHosts()
Total number of inactive hosts.

Inactive hosts are those hosts that have been active but have now been exhausted and contain no more additional URIs.

Returns:
Total number of inactive hosts.

deferredHosts

int deferredHosts()
Total number of deferred hosts.

Deferred hosts are currently active hosts that have been deferred from processing for the time being (becausee of politeness or waiting before retrying.

Returns:
Total number of deferred hosts.

inProcessHosts

int inProcessHosts()
Total number of hosts with URIs in process.

It is generally assumed that each host can have only 1 URI in process at the same time. However some frontiers may implement politeness differently meaning that the same host is both ready and in process. activeHosts() will not count them twice though.

Returns:
Total number of hosts with URIs in process.

readyHosts

int readyHosts()
Total number of hosts that have a URI ready for processing.

Returns:
Total number of hosts that have a URI ready for processing.

hostStatus

int hostStatus(java.lang.String host)
Get the status of a host.

Hosts can be in one of the following states:

Some Frontiers may allow a host to have more then one URI in process at the same time. In those cases it will be reported as Ready as long as it is has more URIs ready for processing. Only once it has no more possible URIs for processing will it be reported as In process

Parameters:
host - The name of the host to lookup the status for.
Returns:
The status of the specified host.
See Also:
HOST_DEFERRED, HOST_INACTIVE, HOST_INPROCESS, HOST_READY, HOST_UNKNOWN


Copyright © 2003-2011 Internet Archive. All Rights Reserved.