org.archive.crawler.processor
Class BeanShellProcessor

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.framework.Processor
                      extended by org.archive.crawler.processor.BeanShellProcessor
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean, FetchStatusCodes

public class BeanShellProcessor
extends Processor
implements FetchStatusCodes

A processor which runs a BeanShell script on the CrawlURI. Script source may be provided via a file local to the crawler. Script source should define a method with one argument, 'run(curi)'. Each processed CrawlURI is passed to this script method. Other variables available to the script include 'self' (this BeanShellProcessor instance) and 'controller' (the crawl's CrawlController instance).

Version:
$Date: 2011-02-19 01:28:50 +0000 (Sat, 19 Feb 2011) $, $Revision: 7090 $
Author:
gojomo
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
ComplexType.MBeanAttributeInfoIterator
 
Field Summary
static java.lang.String ATTR_ISOLATE_THREADS
          whether each thread should have its own script runner (true), or they should share a single script runner with synchronized access
static java.lang.String ATTR_SCRIPT_FILE
          setting for script file
protected  bsh.Interpreter sharedInterpreter
           
 java.util.Map<java.lang.Object,java.lang.Object> sharedMap
           
protected  java.lang.ThreadLocal<bsh.Interpreter> threadInterpreter
           
 
Fields inherited from class org.archive.crawler.framework.Processor
ATTR_DECIDE_RULES, ATTR_ENABLED, attrDecideRules
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definition, definitionMap
 
Fields inherited from interface org.archive.crawler.datamodel.FetchStatusCodes
S_BLOCKED_BY_CUSTOM_PROCESSOR, S_BLOCKED_BY_QUOTA, S_BLOCKED_BY_RUNTIME_LIMIT, S_BLOCKED_BY_USER, S_CONNECT_FAILED, S_CONNECT_LOST, S_DEEMED_CHAFF, S_DEEMED_NOT_FOUND, S_DEFERRED, S_DELETED_BY_USER, S_DNS_SUCCESS, S_DOMAIN_PREREQUISITE_FAILURE, S_DOMAIN_UNRESOLVABLE, S_GETBYNAME_SUCCESS, S_OTHER_PREREQUISITE_FAILURE, S_OUT_OF_SCOPE, S_PREREQUISITE_UNSCHEDULABLE_FAILURE, S_PROCESSING_THREAD_KILLED, S_ROBOTS_PRECLUDED, S_ROBOTS_PREREQUISITE_FAILURE, S_RUNTIME_EXCEPTION, S_SERIOUS_ERROR, S_TIMEOUT, S_TOO_MANY_EMBED_HOPS, S_TOO_MANY_LINK_HOPS, S_TOO_MANY_RETRIES, S_UNATTEMPTED, S_UNFETCHABLE_URI, S_UNQUEUEABLE
 
Constructor Summary
BeanShellProcessor(java.lang.String name)
          Constructor.
 
Method Summary
protected  bsh.Interpreter getInterpreter()
          Get the proper Interpreter instance -- either shared or local to this thread.
protected  void initialTasks()
          Classes subclassing this one should override this method to perform processor specific actions.
protected  void innerProcess(CrawlURI curi)
          Classes subclassing this one should override this method to perform their custom actions on the CrawlURI.
 void kickUpdate()
          Setup (or reset) Intepreter variables, as appropraite based on thread-isolation setting.
protected  bsh.Interpreter newInterpreter()
          Create a new Interpreter instance, preloaded with any supplied source code or source file and the variables 'self' (this BeanShellProcessor) and 'controller' (the CrawlController).
 
Methods inherited from class org.archive.crawler.framework.Processor
checkForInterrupt, finalTasks, getController, getDecideRule, getDefaultNextProcessor, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement, listUsedFiles
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName, hashCode
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

ATTR_SCRIPT_FILE

public static final java.lang.String ATTR_SCRIPT_FILE
setting for script file

See Also:
Constant Field Values

ATTR_ISOLATE_THREADS

public static final java.lang.String ATTR_ISOLATE_THREADS
whether each thread should have its own script runner (true), or they should share a single script runner with synchronized access

See Also:
Constant Field Values

threadInterpreter

protected java.lang.ThreadLocal<bsh.Interpreter> threadInterpreter

sharedInterpreter

protected bsh.Interpreter sharedInterpreter

sharedMap

public java.util.Map<java.lang.Object,java.lang.Object> sharedMap
Constructor Detail

BeanShellProcessor

public BeanShellProcessor(java.lang.String name)
Constructor.

Parameters:
name - Name of this processor.
Method Detail

innerProcess

protected void innerProcess(CrawlURI curi)
Description copied from class: Processor
Classes subclassing this one should override this method to perform their custom actions on the CrawlURI.

Overrides:
innerProcess in class Processor
Parameters:
curi - The CrawlURI being processed.

getInterpreter

protected bsh.Interpreter getInterpreter()
Get the proper Interpreter instance -- either shared or local to this thread.

Returns:
Interpreter to use

newInterpreter

protected bsh.Interpreter newInterpreter()
Create a new Interpreter instance, preloaded with any supplied source code or source file and the variables 'self' (this BeanShellProcessor) and 'controller' (the CrawlController).

Returns:
the new Interpreter instance

initialTasks

protected void initialTasks()
Description copied from class: Processor
Classes subclassing this one should override this method to perform processor specific actions.

This method is garanteed to be called after the crawl is set up, but before any URI-processing has occured.

Overrides:
initialTasks in class Processor

kickUpdate

public void kickUpdate()
Setup (or reset) Intepreter variables, as appropraite based on thread-isolation setting.

Overrides:
kickUpdate in class Processor


Copyright © 2003-2011 Internet Archive. All Rights Reserved.