|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Packages that use DecideRule | |
---|---|
org.archive.crawler.deciderules | Provides classes for a simple decision rules framework. |
org.archive.crawler.deciderules.recrawl | |
org.archive.crawler.fetcher | |
org.archive.crawler.framework | |
org.archive.crawler.postprocessor | |
org.archive.crawler.processor |
Uses of DecideRule in org.archive.crawler.deciderules |
---|
Subclasses of DecideRule in org.archive.crawler.deciderules | |
---|---|
class |
AcceptDecideRule
Rule which responds ACCEPT to anything passed in. |
class |
AddRedirectFromRootServerToScope
|
class |
BeanShellDecideRule
Rule which runs a groovy script to make its decision. |
class |
ClassKeyMatchesRegExpDecideRule
Rule applies configured decision to any CrawlURI class key -- i.e. |
class |
ConfiguredDecideRule
Rule which can be configured to ACCEPT or REJECT at operator's option. |
class |
ContentTypeMatchesRegExpDecideRule
DecideRule whose decision is applied if the URI's content-type is present and matches the supplied regular expression. |
class |
ContentTypeNotMatchesRegExpDecideRule
DecideRule whose decision is applied if the URI's content-type is present and does not match the supplied regular expression. |
class |
DecideRuleSequence
RuleSequence represents a series of Rules, which are applied in turn to give the final result. |
class |
ExceedsDocumentLengthTresholdDecideRule
|
class |
ExternalGeoLocationDecideRule
A rule that can be configured to take alternate implementations of the ExternalGeoLocationInterface. |
class |
ExternalImplDecideRule
A rule that can be configured to take alternate implementations of the ExternalImplInterface. |
class |
FetchStatusDecideRule
Rule applies the configured decision for any URI which has a fetch status equal to the 'target-status' setting. |
class |
FetchStatusMatchesRegExpDecideRule
|
class |
FetchStatusNotMatchesRegExpDecideRule
|
class |
FilterDecideRule
FilterDecideRule wraps a legacy Filter for use in DecideRule contexts. |
class |
HasViaDecideRule
Rule applies the configured decision for any URI which has a 'via' (essentially, any URI that was a seed or some kinds of mid-crawl adds). |
class |
HopsPathMatchesRegExpDecideRule
Rule applies configured decision to any CrawlURIs whose 'hops-path' (string like "LLXE" etc.) matches the supplied regexp. |
class |
IsCrossTopmostAssignedSurtHopDecideRule
Applies its decision if the current URI differs in that portion of its hostname/domain that is assigned/sold by registrars (AKA its 'topmost assigned SURT' or 'public suffix'.) |
class |
MatchesFilePatternDecideRule
Compares suffix of a passed CrawlURI, UURI, or String against a regular expression pattern, applying its configured decision to all matches. |
class |
MatchesListRegExpDecideRule
Rule applies configured decision to any CrawlURIs whose String URI matches the supplied regexps. |
class |
MatchesRegExpDecideRule
Rule applies configured decision to any CrawlURIs whose String URI matches the supplied regexp. |
class |
NotExceedsDocumentLengthTresholdDecideRule
|
class |
NotMatchesFilePatternDecideRule
Rule applies configured decision to any URIs which do *not* match the supplied (file-pattern) regexp. |
class |
NotMatchesListRegExpDecideRule
Rule applies configured decision to any URIs which do *not* match the supplied regexp. |
class |
NotMatchesRegExpDecideRule
Rule applies configured decision to any URIs which do *not* match the supplied regexp. |
class |
NotOnDomainsDecideRule
Rule applies configured decision to any URIs that are *not* in one of the domains in the configured set of domains, filled from the seed set. |
class |
NotOnHostsDecideRule
Rule applies configured decision to any URIs that are *not* on one of the hosts in the configured set of hosts, filled from the seed set. |
class |
NotSurtPrefixedDecideRule
Rule applies configured decision to any URIs that, when expressed in SURT form, do *not* begin with one of the prefixes in the configured set. |
class |
OnDomainsDecideRule
Rule applies configured decision to any URIs that are on one of the domains in the configured set of domains, filled from the seed set. |
class |
OnHostsDecideRule
Rule applies configured decision to any URIs that are on one of the hosts in the configured set of hosts, filled from the seed set. |
class |
PathologicalPathDecideRule
Rule REJECTs any URI which contains an excessive number of identical, consecutive path-segments (eg http://example.com/a/a/a/boo.html == 3 '/a' segments) |
class |
PredicatedDecideRule
Rule which applies the configured decision only if a test evaluates to true. |
class |
PrerequisiteAcceptDecideRule
Rule which ACCEPTs all 'prerequisite' URIs (those with a 'P' in the last hopsPath position). |
class |
QueueOverbudgetDecideRule
Applies configured decision to every candidate URI that would overbudget its queue. |
class |
RejectDecideRule
Rule which answers REJECT to everything evaluated. |
class |
ScopePlusOneDecideRule
Rule allows one level of discovery beyond configured scope (e.g. |
class |
SeedAcceptDecideRule
Rule which ACCEPTs all 'seed' URIs (those for which isSeed is true). |
class |
SurtPrefixedDecideRule
Rule applies configured decision to any URIs that, when expressed in SURT form, begin with one of the prefixes in the configured set. |
class |
TooManyHopsDecideRule
Rule REJECTs any CrawlURIs whose total number of hops (length of the hopsPath string, traversed links of any type) is over a threshold. |
class |
TooManyPathSegmentsDecideRule
Rule REJECTs any CrawlURIs whose total number of path-segments (as indicated by the count of '/' characters not including the first '//') is over a given threshold. |
class |
TransclusionDecideRule
Rule ACCEPTs any CrawlURIs whose path-from-seed ('hopsPath' -- see CandidateURI.getPathFromSeed() ) ends
with at least one, but not more than, the given number of
non-navlink ('L') hops. |
Methods in org.archive.crawler.deciderules that return DecideRule | |
---|---|
protected DecideRule |
DecidingFilter.getDecideRule(java.lang.Object o)
|
protected DecideRule |
DecidingScope.getDecideRule(java.lang.Object o)
|
Uses of DecideRule in org.archive.crawler.deciderules.recrawl |
---|
Subclasses of DecideRule in org.archive.crawler.deciderules.recrawl | |
---|---|
class |
IdenticalDigestDecideRule
Rule applies configured decision to any CrawlURIs whose prior-history content-digest matches the latest fetch. |
Uses of DecideRule in org.archive.crawler.fetcher |
---|
Methods in org.archive.crawler.fetcher that return DecideRule | |
---|---|
protected DecideRule |
FetchHTTP.getMidfetchRule(java.lang.Object o)
|
Uses of DecideRule in org.archive.crawler.framework |
---|
Methods in org.archive.crawler.framework that return DecideRule | |
---|---|
protected DecideRule |
Processor.getDecideRule(java.lang.Object o)
|
Methods in org.archive.crawler.framework with parameters of type DecideRule | |
---|---|
protected boolean |
Processor.rulesAccept(DecideRule rule,
java.lang.Object o)
|
Uses of DecideRule in org.archive.crawler.postprocessor |
---|
Methods in org.archive.crawler.postprocessor that return DecideRule | |
---|---|
protected DecideRule |
SupplementaryLinksScoper.getLinkRules(java.lang.Object o)
|
protected DecideRule |
LinksScoper.getRejectLogRules(java.lang.Object o)
|
Uses of DecideRule in org.archive.crawler.processor |
---|
Methods in org.archive.crawler.processor that return DecideRule | |
---|---|
protected DecideRule |
CrawlMapper.getMapOutlinkDecideRule(java.lang.Object o)
|
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |