Class Summary |
CandidateURI |
A URI, discovered or passed-in, that may be scheduled. |
Checkpoint |
Record of a specific checkpoint on disk. |
CrawlHost |
Represents a single remote "host". |
CrawlOrder |
Represents the 'root' of the settings hierarchy. |
CrawlServer |
Represents a single remote "server". |
CrawlSubstats |
Collector of statistics for a 'subset' of a crawl,
such as a server (host:port), host, or frontier group
(eg queue). |
CrawlURI |
Represents a candidate URI and the associated state it
collects as it is crawled. |
CredentialStore |
Front door to the credential store. |
LocalizedError |
|
RobotsDirectives |
Represents the directives that apply to a user-agent (or set of
user-agents) |
RobotsExclusionPolicy |
RobotsExclusionPolicy represents the actual policy adopted with
respect to a specific remote server, usually constructed from
consulting the robots.txt, if any, the server provided. |
RobotsHonoringPolicy |
RobotsHonoringPolicy represent the strategy used by the crawler
for determining how robots.txt files will be honored. |
Robotstxt |
Utility class for parsing and representing 'robots.txt' format
directives, into a list of named user-agents and map from user-agents
to RobotsDirectives. |
ServerCache |
Server and Host cache. |