|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Interface Summary | |
---|---|
ExternalGeoLookupInterface | Interface used by ExternalImplDecideRule . |
ExternalImplInterface | Interface used by ExternalImplDecideRule . |
Class Summary | |
---|---|
AcceptDecideRule | Rule which responds ACCEPT to anything passed in. |
AddRedirectFromRootServerToScope | |
BeanShellDecideRule | Rule which runs a groovy script to make its decision. |
ClassKeyMatchesRegExpDecideRule | Rule applies configured decision to any CrawlURI class key -- i.e. |
ConfiguredDecideRule | Rule which can be configured to ACCEPT or REJECT at operator's option. |
ContentTypeMatchesRegExpDecideRule | DecideRule whose decision is applied if the URI's content-type is present and matches the supplied regular expression. |
ContentTypeNotMatchesRegExpDecideRule | DecideRule whose decision is applied if the URI's content-type is present and does not match the supplied regular expression. |
DecideRule | Interface for rules which, given an object to evaluate,
respond with a decision: DecideRule.ACCEPT ,
DecideRule.REJECT , or
DecideRule.PASS . |
DecideRuleSequence | RuleSequence represents a series of Rules, which are applied in turn to give the final result. |
DecidingFilter | DecidingFilter: a classic Filter which makes its accept/reject
decision based on whatever DecideRule s have been set up inside
it. |
DecidingScope | DecidingScope: a Scope which makes its accept/reject decision based on whatever DecideRules have been set up inside it. |
ExceedsDocumentLengthTresholdDecideRule | |
ExternalGeoLocationDecideRule | A rule that can be configured to take alternate implementations of the ExternalGeoLocationInterface. |
ExternalImplDecideRule | A rule that can be configured to take alternate implementations of the ExternalImplInterface. |
FetchStatusDecideRule | Rule applies the configured decision for any URI which has a fetch status equal to the 'target-status' setting. |
FetchStatusMatchesRegExpDecideRule | |
FetchStatusNotMatchesRegExpDecideRule | |
FilterDecideRule | FilterDecideRule wraps a legacy Filter for use in DecideRule contexts. |
HasViaDecideRule | Rule applies the configured decision for any URI which has a 'via' (essentially, any URI that was a seed or some kinds of mid-crawl adds). |
HopsPathMatchesRegExpDecideRule | Rule applies configured decision to any CrawlURIs whose 'hops-path' (string like "LLXE" etc.) matches the supplied regexp. |
IsCrossTopmostAssignedSurtHopDecideRule | Applies its decision if the current URI differs in that portion of its hostname/domain that is assigned/sold by registrars (AKA its 'topmost assigned SURT' or 'public suffix'.) |
MatchesFilePatternDecideRule | Compares suffix of a passed CrawlURI, UURI, or String against a regular expression pattern, applying its configured decision to all matches. |
MatchesListRegExpDecideRule | Rule applies configured decision to any CrawlURIs whose String URI matches the supplied regexps. |
MatchesRegExpDecideRule | Rule applies configured decision to any CrawlURIs whose String URI matches the supplied regexp. |
NotExceedsDocumentLengthTresholdDecideRule | |
NotMatchesFilePatternDecideRule | Rule applies configured decision to any URIs which do *not* match the supplied (file-pattern) regexp. |
NotMatchesListRegExpDecideRule | Rule applies configured decision to any URIs which do *not* match the supplied regexp. |
NotMatchesRegExpDecideRule | Rule applies configured decision to any URIs which do *not* match the supplied regexp. |
NotOnDomainsDecideRule | Rule applies configured decision to any URIs that are *not* in one of the domains in the configured set of domains, filled from the seed set. |
NotOnHostsDecideRule | Rule applies configured decision to any URIs that are *not* on one of the hosts in the configured set of hosts, filled from the seed set. |
NotSurtPrefixedDecideRule | Rule applies configured decision to any URIs that, when expressed in SURT form, do *not* begin with one of the prefixes in the configured set. |
OnDomainsDecideRule | Rule applies configured decision to any URIs that are on one of the domains in the configured set of domains, filled from the seed set. |
OnHostsDecideRule | Rule applies configured decision to any URIs that are on one of the hosts in the configured set of hosts, filled from the seed set. |
PathologicalPathDecideRule | Rule REJECTs any URI which contains an excessive number of identical, consecutive path-segments (eg http://example.com/a/a/a/boo.html == 3 '/a' segments) |
PredicatedDecideRule | Rule which applies the configured decision only if a test evaluates to true. |
PrerequisiteAcceptDecideRule | Rule which ACCEPTs all 'prerequisite' URIs (those with a 'P' in the last hopsPath position). |
QueueOverbudgetDecideRule | Applies configured decision to every candidate URI that would overbudget its queue. |
RejectDecideRule | Rule which answers REJECT to everything evaluated. |
ScopePlusOneDecideRule | Rule allows one level of discovery beyond configured scope (e.g. |
SeedAcceptDecideRule | Rule which ACCEPTs all 'seed' URIs (those for which isSeed is true). |
SurtPrefixedDecideRule | Rule applies configured decision to any URIs that, when expressed in SURT form, begin with one of the prefixes in the configured set. |
TooManyHopsDecideRule | Rule REJECTs any CrawlURIs whose total number of hops (length of the hopsPath string, traversed links of any type) is over a threshold. |
TooManyPathSegmentsDecideRule | Rule REJECTs any CrawlURIs whose total number of path-segments (as indicated by the count of '/' characters not including the first '//') is over a given threshold. |
TransclusionDecideRule | Rule ACCEPTs any CrawlURIs whose path-from-seed ('hopsPath' -- see
CandidateURI.getPathFromSeed() ) ends
with at least one, but not more than, the given number of
non-navlink ('L') hops. |
Provides classes for a simple decision rules framework.
Each 'step' in a decision rule set which can
affect an objects ultimate fate is called a DecideRule
.
Each DecideRule renders a decision (possibly neutral) on the
passed objects fate.
Possible decisions are:
As previously outlined, each DecideRule is applied in turn; the last one to express a non-PASS preference wins.
For example, if the rules are:
To allow this style of decision processing to be plugged into the existing Filter and Scope slots:
See NewScopingModel for background.
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |