Uses of Class
org.archive.crawler.deciderules.PredicatedDecideRule

Packages that use PredicatedDecideRule
org.archive.crawler.deciderules Provides classes for a simple decision rules framework. 
org.archive.crawler.deciderules.recrawl   
 

Uses of PredicatedDecideRule in org.archive.crawler.deciderules
 

Subclasses of PredicatedDecideRule in org.archive.crawler.deciderules
 class AddRedirectFromRootServerToScope
           
 class ClassKeyMatchesRegExpDecideRule
          Rule applies configured decision to any CrawlURI class key -- i.e.
 class ContentTypeMatchesRegExpDecideRule
          DecideRule whose decision is applied if the URI's content-type is present and matches the supplied regular expression.
 class ContentTypeNotMatchesRegExpDecideRule
          DecideRule whose decision is applied if the URI's content-type is present and does not match the supplied regular expression.
 class ExceedsDocumentLengthTresholdDecideRule
           
 class ExternalGeoLocationDecideRule
          A rule that can be configured to take alternate implementations of the ExternalGeoLocationInterface.
 class ExternalImplDecideRule
          A rule that can be configured to take alternate implementations of the ExternalImplInterface.
 class FetchStatusDecideRule
          Rule applies the configured decision for any URI which has a fetch status equal to the 'target-status' setting.
 class FetchStatusMatchesRegExpDecideRule
           
 class FetchStatusNotMatchesRegExpDecideRule
           
 class HasViaDecideRule
          Rule applies the configured decision for any URI which has a 'via' (essentially, any URI that was a seed or some kinds of mid-crawl adds).
 class HopsPathMatchesRegExpDecideRule
          Rule applies configured decision to any CrawlURIs whose 'hops-path' (string like "LLXE" etc.) matches the supplied regexp.
 class IsCrossTopmostAssignedSurtHopDecideRule
          Applies its decision if the current URI differs in that portion of its hostname/domain that is assigned/sold by registrars (AKA its 'topmost assigned SURT' or 'public suffix'.)
 class MatchesFilePatternDecideRule
          Compares suffix of a passed CrawlURI, UURI, or String against a regular expression pattern, applying its configured decision to all matches.
 class MatchesListRegExpDecideRule
          Rule applies configured decision to any CrawlURIs whose String URI matches the supplied regexps.
 class MatchesRegExpDecideRule
          Rule applies configured decision to any CrawlURIs whose String URI matches the supplied regexp.
 class NotExceedsDocumentLengthTresholdDecideRule
           
 class NotMatchesFilePatternDecideRule
          Rule applies configured decision to any URIs which do *not* match the supplied (file-pattern) regexp.
 class NotMatchesListRegExpDecideRule
          Rule applies configured decision to any URIs which do *not* match the supplied regexp.
 class NotMatchesRegExpDecideRule
          Rule applies configured decision to any URIs which do *not* match the supplied regexp.
 class NotOnDomainsDecideRule
          Rule applies configured decision to any URIs that are *not* in one of the domains in the configured set of domains, filled from the seed set.
 class NotOnHostsDecideRule
          Rule applies configured decision to any URIs that are *not* on one of the hosts in the configured set of hosts, filled from the seed set.
 class NotSurtPrefixedDecideRule
          Rule applies configured decision to any URIs that, when expressed in SURT form, do *not* begin with one of the prefixes in the configured set.
 class OnDomainsDecideRule
          Rule applies configured decision to any URIs that are on one of the domains in the configured set of domains, filled from the seed set.
 class OnHostsDecideRule
          Rule applies configured decision to any URIs that are on one of the hosts in the configured set of hosts, filled from the seed set.
 class PathologicalPathDecideRule
          Rule REJECTs any URI which contains an excessive number of identical, consecutive path-segments (eg http://example.com/a/a/a/boo.html == 3 '/a' segments)
 class QueueOverbudgetDecideRule
          Applies configured decision to every candidate URI that would overbudget its queue.
 class ScopePlusOneDecideRule
          Rule allows one level of discovery beyond configured scope (e.g.
 class SurtPrefixedDecideRule
          Rule applies configured decision to any URIs that, when expressed in SURT form, begin with one of the prefixes in the configured set.
 class TooManyHopsDecideRule
          Rule REJECTs any CrawlURIs whose total number of hops (length of the hopsPath string, traversed links of any type) is over a threshold.
 class TooManyPathSegmentsDecideRule
          Rule REJECTs any CrawlURIs whose total number of path-segments (as indicated by the count of '/' characters not including the first '//') is over a given threshold.
 class TransclusionDecideRule
          Rule ACCEPTs any CrawlURIs whose path-from-seed ('hopsPath' -- see CandidateURI.getPathFromSeed()) ends with at least one, but not more than, the given number of non-navlink ('L') hops.
 

Uses of PredicatedDecideRule in org.archive.crawler.deciderules.recrawl
 

Subclasses of PredicatedDecideRule in org.archive.crawler.deciderules.recrawl
 class IdenticalDigestDecideRule
          Rule applies configured decision to any CrawlURIs whose prior-history content-digest matches the latest fetch.
 



Copyright © 2003-2011 Internet Archive. All Rights Reserved.