|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
Interface Summary | |
---|---|
AdaptiveRevisitAttributeConstants | Defines static constants for the Adaptive Revisiting module defining data keys in the CrawlURI AList. |
FrontierJournal | Record of key Frontier happenings. |
Class Summary | |
---|---|
AbstractFrontier | Shared facilities for Frontier implementations. |
AdaptiveRevisitFrontier | A Frontier that will repeatedly visit all encountered URIs. |
AdaptiveRevisitHostQueue | A priority based queue of CrawlURIs. |
AdaptiveRevisitQueueList | Maintains an ordered list of AdaptiveRevisitHostQueue s used by a
Frontier. |
AntiCalendarCostAssignmentPolicy | CostAssignmentPolicy that further penalizes URIs with calendar-suggestive strings in them, with an extra unit of cost. |
BdbFrontier | A Frontier using several BerkeleyDB JE Databases to hold its record of known hosts (queues), and pending URIs. |
BdbMultipleWorkQueues | A BerkeleyDB-database-backed structure for holding ordered groupings of CrawlURIs. |
BdbWorkQueue | One independent queue of items with the same 'classKey' (eg host). |
BucketQueueAssignmentPolicy | Uses the target IPs as basis for queue-assignment, distributing them over a fixed number of sub-queues. |
CostAssignmentPolicy | Calculate a integer 'cost' value for the given CrawlURI. |
DomainSensitiveFrontier | Deprecated. As of release 1.10.0. |
HostnameQueueAssignmentPolicy | QueueAssignmentPolicy based on the hostname:port evident in the given CrawlURI. |
IPQueueAssignmentPolicy | Uses target IP as basis for queue-assignment, unless it is unavailable, in which case it behaves as HostnameQueueAssignmentPolicy. |
QueueAssignmentPolicy | Establishes a mapping from CrawlURIs to String keys (queue names). |
RecoveryJournal | Helper class for managing a simple Frontier change-events journal which is useful for recovering from crawl problems. |
RecyclingSerialBinding | A SerialBinding that recycles a single FastOutputStream per thread, avoiding reallocation of the internal buffer for either repeated serializations or because of mid-serialization expansions. |
SurtAuthorityQueueAssignmentPolicy | SurtAuthorityQueueAssignmentPolicy based on the surt form of hostname. |
TopmostAssignedSurtQueueAssignmentPolicy | Create a queueKey based on the SURT authority, reduced to the public-suffix-plus-one domain (topmost assignable domain). |
UnitCostAssignmentPolicy | A CostAssignment policy that uses a constant value of 1 for all CrawlURIs. |
WagCostAssignmentPolicy | A CostAssignmentPolicy based on some wild guesses of kinds of URIs that should be deferred into the (potentially never-crawled) future. |
WorkQueue | A single queue of related URIs to visit, grouped by a classKey (typically "hostname:port" or similar) |
WorkQueueFrontier | A common Frontier base using several queues to hold pending URIs. |
ZeroCostAssignmentPolicy | CostAssignmentPolicy considering all URIs costless -- essentially disabling budgetting features. |
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |