org.archive.crawler.datamodel
Class CrawlServer

java.lang.Object
  extended by org.archive.crawler.datamodel.CrawlServer
All Implemented Interfaces:
java.io.Serializable, CrawlSubstats.HasCrawlSubstats, FetchStatusCodes

public class CrawlServer
extends java.lang.Object
implements java.io.Serializable, CrawlSubstats.HasCrawlSubstats, FetchStatusCodes

Represents a single remote "server". A server is a service on a host. There might be more than one service on a host differentiated by a port number.

Author:
gojomo
See Also:
Serialized Form

Field Summary
protected  int consecutiveConnectionErrors
           
static long MIN_ROBOTS_RETRIES
          only check if robots-fetch is perhaps superfluous after this many tries
static long ROBOTS_NOT_FETCHED
           
(package private)  long robotsFetched
           
(package private)  java.util.zip.Checksum robotstxtChecksum
           
(package private)  CrawlSubstats substats
           
(package private)  boolean validRobots
           
 
Fields inherited from interface org.archive.crawler.datamodel.FetchStatusCodes
S_BLOCKED_BY_CUSTOM_PROCESSOR, S_BLOCKED_BY_QUOTA, S_BLOCKED_BY_RUNTIME_LIMIT, S_BLOCKED_BY_USER, S_CONNECT_FAILED, S_CONNECT_LOST, S_DEEMED_CHAFF, S_DEEMED_NOT_FOUND, S_DEFERRED, S_DELETED_BY_USER, S_DNS_SUCCESS, S_DOMAIN_PREREQUISITE_FAILURE, S_DOMAIN_UNRESOLVABLE, S_GETBYNAME_SUCCESS, S_OTHER_PREREQUISITE_FAILURE, S_OUT_OF_SCOPE, S_PREREQUISITE_UNSCHEDULABLE_FAILURE, S_PROCESSING_THREAD_KILLED, S_ROBOTS_PRECLUDED, S_ROBOTS_PREREQUISITE_FAILURE, S_RUNTIME_EXCEPTION, S_SERIOUS_ERROR, S_TIMEOUT, S_TOO_MANY_EMBED_HOPS, S_TOO_MANY_LINK_HOPS, S_TOO_MANY_RETRIES, S_UNATTEMPTED, S_UNFETCHABLE_URI, S_UNQUEUEABLE
 
Constructor Summary
CrawlServer(java.lang.String h)
          Creates a new CrawlServer object.
 
Method Summary
 void addCredentialAvatar(CredentialAvatar ca)
          Add an avatar.
 boolean equals(java.lang.Object obj)
           
 java.util.Set<CredentialAvatar> getCredentialAvatars()
           
 java.lang.String getName()
           
 int getPort()
          Get the port number for this server.
 RobotsExclusionPolicy getRobots()
          Get the robots exclusion policy for this server.
 long getRobotsFetchedTime()
           
static java.lang.String getServerKey(CandidateURI cauri)
          Get key to use doing lookup on server instances.
 SettingsHandler getSettingsHandler()
          Get the settings handler.
 CrawlSubstats getSubstats()
           
 boolean hasCredentialAvatars()
           
 int hashCode()
           
 void incrementConsecutiveConnectionErrors()
           
 boolean isValidRobots()
          If true then valid robots.txt information has been retrieved.
 void resetConsecutiveConnectionErrors()
           
 void setRobots(RobotsExclusionPolicy policy)
          Set the robots exclusion policy for this server.
 void setSettingsHandler(SettingsHandler settingsHandler)
          Set the settings handler to be used by this server.
 java.lang.String toString()
           
 void updateRobots(CrawlURI curi)
          Update the robots exclusion policy.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

ROBOTS_NOT_FETCHED

public static final long ROBOTS_NOT_FETCHED
See Also:
Constant Field Values

MIN_ROBOTS_RETRIES

public static final long MIN_ROBOTS_RETRIES
only check if robots-fetch is perhaps superfluous after this many tries

See Also:
Constant Field Values

robotsFetched

long robotsFetched

validRobots

boolean validRobots

robotstxtChecksum

java.util.zip.Checksum robotstxtChecksum

substats

CrawlSubstats substats

consecutiveConnectionErrors

protected int consecutiveConnectionErrors
Constructor Detail

CrawlServer

public CrawlServer(java.lang.String h)
Creates a new CrawlServer object.

Parameters:
h - the host string for the server.
Method Detail

getRobots

public RobotsExclusionPolicy getRobots()
Get the robots exclusion policy for this server.

Returns:
the robots exclusion policy for this server.

setRobots

public void setRobots(RobotsExclusionPolicy policy)
Set the robots exclusion policy for this server.

Parameters:
policy - the policy to set.

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

hashCode

public int hashCode()
Overrides:
hashCode in class java.lang.Object

equals

public boolean equals(java.lang.Object obj)
Overrides:
equals in class java.lang.Object

updateRobots

public void updateRobots(CrawlURI curi)
Update the robots exclusion policy.

Parameters:
curi - the crawl URI containing the fetched robots.txt
Throws:
java.io.IOException

getRobotsFetchedTime

public long getRobotsFetchedTime()
Returns:
Returns the time when robots.txt was fetched.

getName

public java.lang.String getName()
Returns:
The server string which might include a port number.

getPort

public int getPort()
Get the port number for this server.

Returns:
the port number or -1 if not known (uses default for protocol)

getSettingsHandler

public SettingsHandler getSettingsHandler()
Get the settings handler.

Returns:
the settings handler.

setSettingsHandler

public void setSettingsHandler(SettingsHandler settingsHandler)
Set the settings handler to be used by this server.

Parameters:
settingsHandler - the settings handler to be used by this server.

incrementConsecutiveConnectionErrors

public void incrementConsecutiveConnectionErrors()

resetConsecutiveConnectionErrors

public void resetConsecutiveConnectionErrors()

getCredentialAvatars

public java.util.Set<CredentialAvatar> getCredentialAvatars()
Returns:
Credential avatars for this server. Returns null if none.

hasCredentialAvatars

public boolean hasCredentialAvatars()
Returns:
True if there are avatars attached to this instance.

addCredentialAvatar

public void addCredentialAvatar(CredentialAvatar ca)
Add an avatar.

Parameters:
ca - Credential avatar to add to set of avatars.

isValidRobots

public boolean isValidRobots()
If true then valid robots.txt information has been retrieved. If false either no attempt has been made to fetch robots.txt or the attempt failed.

Returns:
Returns the validRobots.

getServerKey

public static java.lang.String getServerKey(CandidateURI cauri)
                                     throws org.apache.commons.httpclient.URIException
Get key to use doing lookup on server instances.

Parameters:
cauri - CandidateURI we're to get server key for.
Returns:
String to use as server key.
Throws:
org.apache.commons.httpclient.URIException

getSubstats

public CrawlSubstats getSubstats()
Specified by:
getSubstats in interface CrawlSubstats.HasCrawlSubstats


Copyright © 2003-2011 Internet Archive. All Rights Reserved.