org.archive.crawler.event
Interface CrawlURIDispositionListener

All Known Implementing Classes:
DomainSensitiveFrontier, SelfTestCrawlJobHandler, StatisticsTracker

public interface CrawlURIDispositionListener

An interface for objects that want to be notified of a CrawlURI disposition (happens each time a curi has been through the processors). Classes implementing this interface can register with the CrawlController to receive these events.

This interface is to facilitate the gathering of statistics on a running crawl.

WARNING: One of these methods will be called for each CrawlURI that is processed. It is therefor imperative that the methods execute quickly!

Also note that the object implementing this interface must under no circumstances maintain a reference to the CrawlURI beyond the scope of the relevant method body!

Author:
Kristinn Sigurdsson
See Also:
CrawlController

Method Summary
 void crawledURIDisregard(CrawlURI curi)
          Notification of a crawled URI that is to be disregarded.
 void crawledURIFailure(CrawlURI curi)
          Notification of a failed crawling of a URI.
 void crawledURINeedRetry(CrawlURI curi)
          Notification of a failed crawl of a URI that will be retried (failure due to possible transient problems).
 void crawledURISuccessful(CrawlURI curi)
          Notification of a successfully crawled URI
 

Method Detail

crawledURISuccessful

void crawledURISuccessful(CrawlURI curi)
Notification of a successfully crawled URI

Parameters:
curi - The relevant CrawlURI

crawledURINeedRetry

void crawledURINeedRetry(CrawlURI curi)
Notification of a failed crawl of a URI that will be retried (failure due to possible transient problems).

Parameters:
curi - The relevant CrawlURI

crawledURIDisregard

void crawledURIDisregard(CrawlURI curi)
Notification of a crawled URI that is to be disregarded. Usually this means that the robots.txt file for the relevant site forbids this from being crawled and we are therefor not going to keep it. Other reasons may apply. In all cases this means that it was successfully downloaded but will not be stored.

Parameters:
curi - The relevant CrawlURI

crawledURIFailure

void crawledURIFailure(CrawlURI curi)
Notification of a failed crawling of a URI. The failure is of a type that precludes retries (either by it's very nature or because it has been retried to many times)

Parameters:
curi - The relevant CrawlURI


Copyright © 2003-2011 Internet Archive. All Rights Reserved.