org.archive.crawler.postprocessor
Class SupplementaryLinksScoper

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.framework.Processor
                      extended by org.archive.crawler.framework.Scoper
                          extended by org.archive.crawler.postprocessor.SupplementaryLinksScoper
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean

public class SupplementaryLinksScoper
extends Scoper

Run CandidateURI links carried in the passed CrawlURI through a filter and 'handle' rejections. Used to do supplementary processing of links after they've been scope processed and ruled 'in-scope' by LinkScoper. An example of 'supplementary processing' would check that a Link is intended for this host to crawl in a multimachine crawl setting. Configure filters to rule on links. Default handler writes rejected URLs to disk. Subclass to handle rejected URLs otherwise.

Author:
stack
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
ComplexType.MBeanAttributeInfoIterator
 
Field Summary
static java.lang.String ATTR_LINKS_DECIDE_RULES
           
 
Fields inherited from class org.archive.crawler.framework.Scoper
ATTR_OVERRIDE_LOGGER_ENABLED
 
Fields inherited from class org.archive.crawler.framework.Processor
ATTR_DECIDE_RULES, ATTR_ENABLED, attrDecideRules
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definition, definitionMap
 
Constructor Summary
SupplementaryLinksScoper(java.lang.String name)
           
 
Method Summary
protected  DecideRule getLinkRules(java.lang.Object o)
           
protected  void innerProcess(CrawlURI curi)
          Classes subclassing this one should override this method to perform their custom actions on the CrawlURI.
protected  boolean isInScope(CandidateURI caUri)
          Schedule the given CandidateURI with the Frontier.
protected  void outOfScope(CandidateURI caUri)
          Called when a CandidateUri is ruled out of scope.
 
Methods inherited from class org.archive.crawler.framework.Scoper
finalTasks, initialTasks, isOverrideLogger
 
Methods inherited from class org.archive.crawler.framework.Processor
checkForInterrupt, getController, getDecideRule, getDefaultNextProcessor, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement, listUsedFiles
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName, hashCode
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

ATTR_LINKS_DECIDE_RULES

public static final java.lang.String ATTR_LINKS_DECIDE_RULES
See Also:
Constant Field Values
Constructor Detail

SupplementaryLinksScoper

public SupplementaryLinksScoper(java.lang.String name)
Parameters:
name - Name of this filter.
Method Detail

innerProcess

protected void innerProcess(CrawlURI curi)
Description copied from class: Processor
Classes subclassing this one should override this method to perform their custom actions on the CrawlURI.

Overrides:
innerProcess in class Processor
Parameters:
curi - The CrawlURI being processed.

isInScope

protected boolean isInScope(CandidateURI caUri)
Description copied from class: Scoper
Schedule the given CandidateURI with the Frontier.

Overrides:
isInScope in class Scoper
Parameters:
caUri - The CandidateURI to be scheduled.
Returns:
true if CandidateURI was accepted by crawl scope, false otherwise.

getLinkRules

protected DecideRule getLinkRules(java.lang.Object o)

outOfScope

protected void outOfScope(CandidateURI caUri)
Called when a CandidateUri is ruled out of scope.

Overrides:
outOfScope in class Scoper
Parameters:
caUri - CandidateURI that is out of scope.


Copyright © 2003-2011 Internet Archive. All Rights Reserved.