org.archive.crawler.settings
Class SettingsHandler

java.lang.Object
  extended by org.archive.crawler.settings.SettingsHandler
Direct Known Subclasses:
XMLSettingsHandler

public abstract class SettingsHandler
extends java.lang.Object

An instance of this class holds a hierarchy of settings. More than one instance in memory is allowed so that a new CrawlJob could be configured while another job is running. This class should be subclassed to adapt to a persistent storage.

Author:
John Erik Halse

Field Summary
(package private) static java.lang.String BOOLEAN
           
(package private) static java.lang.String DOUBLE
           
(package private) static java.lang.String DOUBLE_LIST
           
(package private) static java.lang.String FLOAT
           
(package private) static java.lang.String FLOAT_LIST
           
(package private) static java.lang.String INTEGER
          Datatypes supported by the settings framwork
(package private) static java.lang.String INTEGER_LIST
           
(package private) static java.lang.String LONG
           
(package private) static java.lang.String LONG_LIST
           
(package private) static java.lang.String MAP
           
(package private) static java.lang.String OBJECT
           
(package private) static java.lang.String STRING
           
(package private) static java.lang.String STRING_LIST
           
(package private) static java.lang.String TEXT
           
(package private) static java.lang.ThreadLocal<SettingsHandler> threadContextSettingsHandler
           
(package private) static java.lang.String TIMESTAMP
           
 
Constructor Summary
SettingsHandler()
          Create a new SettingsHandler object.
 
Method Summary
 void cleanup()
           
 void clearPerHostSettingsCache()
          Clear any per-host settings cached in memory; allows editting of per-host settings files on disk, perhaps in bulk/automated fashion, to take effect in running crawl.
 void deleteSettingsObject(CrawlerSettings settings)
          Delete a settings object from persistent storage.
(package private)  boolean fireValueErrorHandlers(Constraint.FailedCheck error)
          Fire events on all registered ValueErrorHandler.
protected static java.lang.String getClassName(java.lang.String typeName)
           
 ComplexType getComplexTypeByAbsoluteName(CrawlerSettings settings, java.lang.String absoluteName)
          Get a complex type by its absolute name.
abstract  java.util.Collection getDomainOverrides(java.lang.String rootDomain)
          Will return a Collection of strings with domains that contain 'per' domain overrides (or their subdomains contain them).
abstract  java.util.List getListOfAllFiles()
          Creates and returns a List of all files comprising the current settings framework.
 ModuleType getModule(java.lang.String name)
          Get a module by name.
 CrawlerSettings getOrCreateSettingsObject(java.lang.String scope)
          Get or create CrawlerSettings object for a host or domain.
 CrawlerSettings getOrCreateSettingsObject(java.lang.String scope, java.lang.String refinement)
           
 CrawlOrder getOrder()
          Get the CrawlOrder.
protected  java.lang.String getParentScope(java.lang.String scope)
          Strip off the leftmost part of a domain name.
abstract  java.io.File getPathRelativeToWorkingDirectory(java.lang.String path)
          Transforms a relative path so that it is relative to a location that is regarded as a working dir for these settings.
 CrawlerSettings getSettings(java.lang.String host)
          Get CrawlerSettings object in effect for a host or domain.
 CrawlerSettings getSettings(java.lang.String host, UURI uuri)
          Get CrawlerSettings object in effect for a host or domain.
protected  CrawlerSettings getSettingsForHost(java.lang.String host)
           
 CrawlerSettings getSettingsObject(java.lang.String scope)
          Get CrawlerSettings object for a host or domain.
 CrawlerSettings getSettingsObject(java.lang.String scope, java.lang.String refinement)
          Get CrawlerSettings object for a host/domain and a particular refinement.
static SettingsHandler getThreadContextSettingsHandler()
           
protected static java.lang.String getTypeName(java.lang.String className)
           
 void initialize()
          Initialize the SettingsHandler.
static ModuleType instantiateModuleTypeFromClassName(java.lang.String name, java.lang.String className)
          Instatiate a new ModuleType given its name and className.
protected abstract  CrawlerSettings readSettingsObject(CrawlerSettings settings)
          Read the CrawlerSettings object from persistent storage.
 void registerValueErrorHandler(ValueErrorHandler errorHandler)
          Register an instance of ValueErrorHandler.
 void setErrorReportingLevel(java.util.logging.Level level)
          Set the level for which notification of failed constraints will be fired.
static void setThreadContextSettingsHandler(SettingsHandler settingsHandler)
           
protected static java.lang.Object StringToType(java.lang.String stringValue, java.lang.String typeName)
          Convert a String object to an object of typeName.
 void unregisterValueErrorHandler(ValueErrorHandler errorHandler)
          Unregister an instance of ValueErrorHandler.
abstract  void writeSettingsObject(CrawlerSettings settings)
          Write the CrawlerSettings object to persistent storage.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

INTEGER

static final java.lang.String INTEGER
Datatypes supported by the settings framwork

See Also:
Constant Field Values

LONG

static final java.lang.String LONG
See Also:
Constant Field Values

FLOAT

static final java.lang.String FLOAT
See Also:
Constant Field Values

DOUBLE

static final java.lang.String DOUBLE
See Also:
Constant Field Values

BOOLEAN

static final java.lang.String BOOLEAN
See Also:
Constant Field Values

STRING

static final java.lang.String STRING
See Also:
Constant Field Values

TEXT

static final java.lang.String TEXT
See Also:
Constant Field Values

OBJECT

static final java.lang.String OBJECT
See Also:
Constant Field Values

TIMESTAMP

static final java.lang.String TIMESTAMP
See Also:
Constant Field Values

MAP

static final java.lang.String MAP
See Also:
Constant Field Values

INTEGER_LIST

static final java.lang.String INTEGER_LIST
See Also:
Constant Field Values

LONG_LIST

static final java.lang.String LONG_LIST
See Also:
Constant Field Values

FLOAT_LIST

static final java.lang.String FLOAT_LIST
See Also:
Constant Field Values

DOUBLE_LIST

static final java.lang.String DOUBLE_LIST
See Also:
Constant Field Values

STRING_LIST

static final java.lang.String STRING_LIST
See Also:
Constant Field Values

threadContextSettingsHandler

static java.lang.ThreadLocal<SettingsHandler> threadContextSettingsHandler
Constructor Detail

SettingsHandler

public SettingsHandler()
                throws javax.management.InvalidAttributeValueException
Create a new SettingsHandler object.

Throws:
javax.management.InvalidAttributeValueException
Method Detail

initialize

public void initialize()
Initialize the SettingsHandler. This method reads the default settings from the persistent storage.


cleanup

public void cleanup()

getParentScope

protected java.lang.String getParentScope(java.lang.String scope)
Strip off the leftmost part of a domain name.

Parameters:
scope - the domain name.
Returns:
scope with everything before the first dot ripped off.

getModule

public ModuleType getModule(java.lang.String name)
Get a module by name. All modules in the order should have unique names. This method makes it possible to get the modules of the order by its name.

Parameters:
name - the modules name.
Returns:
the module the name references.

getComplexTypeByAbsoluteName

public ComplexType getComplexTypeByAbsoluteName(CrawlerSettings settings,
                                                java.lang.String absoluteName)
                                         throws javax.management.AttributeNotFoundException
Get a complex type by its absolute name. The absolute name is the complex types name and the path leading to it.

Parameters:
settings - the settings object to query.
absoluteName - the absolute name of the complex type to get.
Returns:
the complex type referenced by the absolute name or null if the complex type could not be found in this settings object.
Throws:
javax.management.AttributeNotFoundException - is thrown if no ComplexType by this name exist.

getTypeName

protected static java.lang.String getTypeName(java.lang.String className)

getClassName

protected static java.lang.String getClassName(java.lang.String typeName)

StringToType

protected static java.lang.Object StringToType(java.lang.String stringValue,
                                               java.lang.String typeName)
Convert a String object to an object of typeName.

Parameters:
stringValue - string to convert.
typeName - type to convert to. typeName should be one of the supported types represented by constants in this class.
Returns:
the new value object.
Throws:
java.lang.ClassCastException - is thrown if string could not be converted.

getSettings

public CrawlerSettings getSettings(java.lang.String host)
Get CrawlerSettings object in effect for a host or domain. If there is no specific settings for the host/domain, it will recursively go up the hierarchy to find the settings object that should be used for this host/domain.

Parameters:
host - the host or domain to get the settings for.
Returns:
settings object in effect for the host/domain.
See Also:
getSettingsObject(String), getOrCreateSettingsObject(String)

getSettings

public CrawlerSettings getSettings(java.lang.String host,
                                   UURI uuri)
Get CrawlerSettings object in effect for a host or domain. If there is no specific settings for the host/domain, it will recursively go up the hierarchy to find the settings object that should be used for this host/domain.

This method passes around a URI that refinement are checked against.

Parameters:
host - the host or domain to get the settings for.
uuri - UURI for context.
Returns:
settings object in effect for the host/domain.
See Also:
getSettingsObject(String), getOrCreateSettingsObject(String)

getSettingsForHost

protected CrawlerSettings getSettingsForHost(java.lang.String host)

getSettingsObject

public CrawlerSettings getSettingsObject(java.lang.String scope)
Get CrawlerSettings object for a host or domain. The difference between this method and the getSettings(String host) is that this method will return null if there is no settings for particular host or domain.

Parameters:
scope - the host or domain to get the settings for.
Returns:
settings object for the host/domain or null if no settings exist for the host/domain.
See Also:
getSettings(String), getOrCreateSettingsObject(String)

getSettingsObject

public CrawlerSettings getSettingsObject(java.lang.String scope,
                                         java.lang.String refinement)
Get CrawlerSettings object for a host/domain and a particular refinement.

Parameters:
scope - the host or domain to get the settings for.
refinement - the refinement reference to get.
Returns:
CrawlerSettings object for a host/domain and a particular refinement or null if no settings exist for the host/domain.

getOrCreateSettingsObject

public CrawlerSettings getOrCreateSettingsObject(java.lang.String scope)
Get or create CrawlerSettings object for a host or domain. This method is similar to getSettingsObject(String) except that if there is no settings for this particular host or domain a new settings object will be returned.

Parameters:
scope - the host or domain to get or create the settings for.
Returns:
settings object for the host/domain.
See Also:
getSettings(String), getSettingsObject(String)

getOrCreateSettingsObject

public CrawlerSettings getOrCreateSettingsObject(java.lang.String scope,
                                                 java.lang.String refinement)

writeSettingsObject

public abstract void writeSettingsObject(CrawlerSettings settings)
Write the CrawlerSettings object to persistent storage.

Parameters:
settings - the settings object to write.

readSettingsObject

protected abstract CrawlerSettings readSettingsObject(CrawlerSettings settings)
Read the CrawlerSettings object from persistent storage.

Parameters:
settings - the settings object to be updated with data from the persistent storage.
Returns:
the updated settings object or null if there was no data for this in the persistent storage.

deleteSettingsObject

public void deleteSettingsObject(CrawlerSettings settings)
Delete a settings object from persistent storage.

Parameters:
settings - the settings object to delete.

getOrder

public CrawlOrder getOrder()
Get the CrawlOrder.

Returns:
the CrawlOrder

instantiateModuleTypeFromClassName

public static ModuleType instantiateModuleTypeFromClassName(java.lang.String name,
                                                            java.lang.String className)
                                                     throws java.lang.reflect.InvocationTargetException
Instatiate a new ModuleType given its name and className.

Parameters:
name - the name for the new ComplexType.
className - the class name of the new ComplexType.
Returns:
an instance of the class identified by className.
Throws:
java.lang.reflect.InvocationTargetException

getPathRelativeToWorkingDirectory

public abstract java.io.File getPathRelativeToWorkingDirectory(java.lang.String path)
Transforms a relative path so that it is relative to a location that is regarded as a working dir for these settings. If an absolute path is given, it will be returned unchanged.

Parameters:
path - A relative path to a file (or directory)
Returns:
The same path modified so that it is relative to the file level location that is considered the working directory for these settings.

getDomainOverrides

public abstract java.util.Collection getDomainOverrides(java.lang.String rootDomain)
Will return a Collection of strings with domains that contain 'per' domain overrides (or their subdomains contain them). The domains considered are limited to those that are subdomains of the supplied domain. If null or empty string is supplied the TLDs will be considered.

Parameters:
rootDomain - The domain to get domain overrides for. Examples: 'org', 'archive.org', 'crawler.archive.org' etc.
Returns:
An array of domains that contain overrides. If rootDomain does not exist an empty array will be returned.

unregisterValueErrorHandler

public void unregisterValueErrorHandler(ValueErrorHandler errorHandler)
Unregister an instance of ValueErrorHandler.

Parameters:
errorHandler - the CalueErrorHandler to unregister.
See Also:
ValueErrorHandler, setErrorReportingLevel(Level), registerValueErrorHandler(ValueErrorHandler)

registerValueErrorHandler

public void registerValueErrorHandler(ValueErrorHandler errorHandler)
Register an instance of ValueErrorHandler.

If a ValueErrorHandler is registered, only constraints with level Level.SEVEREwill throw an InvalidAttributeValueException. The ValueErrorHandler will recieve a notification for all failed checks with level equal or greater than the error reporting level.

Parameters:
errorHandler - the CalueErrorHandler to register.
See Also:
ValueErrorHandler, setErrorReportingLevel(Level), unregisterValueErrorHandler(ValueErrorHandler)

fireValueErrorHandlers

boolean fireValueErrorHandlers(Constraint.FailedCheck error)
Fire events on all registered ValueErrorHandler.

Parameters:
error - the failed constraints return value.
Returns:
true if there was any registered ValueErrorHandlers to notify.

setErrorReportingLevel

public void setErrorReportingLevel(java.util.logging.Level level)
Set the level for which notification of failed constraints will be fired.

Parameters:
level - the error reporting level.

getListOfAllFiles

public abstract java.util.List getListOfAllFiles()
Creates and returns a List of all files comprising the current settings framework.

The List contains the absolute String path of each file.

The list should contain any configurable files, including such files as seed file and any other files use by the various settings modules.

Implementations of the SettingsHandler that do not use files for permanent storage should return an empty list.

Returns:
List of framework files.

clearPerHostSettingsCache

public void clearPerHostSettingsCache()
Clear any per-host settings cached in memory; allows editting of per-host settings files on disk, perhaps in bulk/automated fashion, to take effect in running crawl.


setThreadContextSettingsHandler

public static void setThreadContextSettingsHandler(SettingsHandler settingsHandler)

getThreadContextSettingsHandler

public static SettingsHandler getThreadContextSettingsHandler()


Copyright © 2003-2011 Internet Archive. All Rights Reserved.