org.archive.crawler.framework
Class Scoper

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.framework.Processor
                      extended by org.archive.crawler.framework.Scoper
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean
Direct Known Subclasses:
LinksScoper, Preselector, SupplementaryLinksScoper

public abstract class Scoper
extends Processor

Base class for Scopers. Scopers test CandidateURIs against a scope. Scopers allow logging of rejected CandidateURIs.

Version:
$Date: 2010-05-11 22:15:04 +0000 (Tue, 11 May 2010) $, $Revision: 6867 $
Author:
stack
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
ComplexType.MBeanAttributeInfoIterator
 
Field Summary
protected static java.lang.String ATTR_OVERRIDE_LOGGER_ENABLED
          Protected so avaiilable to subclasses.
 
Fields inherited from class org.archive.crawler.framework.Processor
ATTR_DECIDE_RULES, ATTR_ENABLED, attrDecideRules
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definition, definitionMap
 
Constructor Summary
Scoper(java.lang.String name, java.lang.String description)
          Constructor.
 
Method Summary
protected  void finalTasks()
          Classes subclassing this one should override this method to perform processor specific actions.
protected  void initialTasks()
          Classes subclassing this one should override this method to perform processor specific actions.
protected  boolean isInScope(CandidateURI caUri)
          Schedule the given CandidateURI with the Frontier.
protected  boolean isOverrideLogger(java.lang.Object context)
           
protected  void outOfScope(CandidateURI caUri)
          Called when a CandidateUri is ruled out of scope.
 
Methods inherited from class org.archive.crawler.framework.Processor
checkForInterrupt, getController, getDecideRule, getDefaultNextProcessor, innerProcess, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement, listUsedFiles
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName, hashCode
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

ATTR_OVERRIDE_LOGGER_ENABLED

protected static final java.lang.String ATTR_OVERRIDE_LOGGER_ENABLED
Protected so avaiilable to subclasses.

See Also:
Constant Field Values
Constructor Detail

Scoper

public Scoper(java.lang.String name,
              java.lang.String description)
Constructor.

Parameters:
name -
description -
Method Detail

initialTasks

protected void initialTasks()
Description copied from class: Processor
Classes subclassing this one should override this method to perform processor specific actions.

This method is garanteed to be called after the crawl is set up, but before any URI-processing has occured.

Overrides:
initialTasks in class Processor

finalTasks

protected void finalTasks()
Description copied from class: Processor
Classes subclassing this one should override this method to perform processor specific actions.

Overrides:
finalTasks in class Processor

isOverrideLogger

protected boolean isOverrideLogger(java.lang.Object context)
Parameters:
context - Context to use looking up attribute.
Returns:
True if we are to override default logger (default logs to console) with a logger that writes all loggings to a file named for this class.

isInScope

protected boolean isInScope(CandidateURI caUri)
Schedule the given CandidateURI with the Frontier.

Parameters:
caUri - The CandidateURI to be scheduled.
Returns:
true if CandidateURI was accepted by crawl scope, false otherwise.

outOfScope

protected void outOfScope(CandidateURI caUri)
Called when a CandidateUri is ruled out of scope. Override if you don't want logs as coming from this class.

Parameters:
caUri - CandidateURI that is out of scope.


Copyright © 2003-2011 Internet Archive. All Rights Reserved.