org.archive.crawler.settings
Class XMLSettingsHandler

java.lang.Object
  extended by org.archive.crawler.settings.SettingsHandler
      extended by org.archive.crawler.settings.XMLSettingsHandler

public class XMLSettingsHandler
extends SettingsHandler

A SettingsHandler which uses XML files as persistent storage.

Author:
John Erik Halse

Field Summary
protected static java.lang.String XML_ATTRIBUTE_CLASS
           
protected static java.lang.String XML_ATTRIBUTE_FROM
           
protected static java.lang.String XML_ATTRIBUTE_NAME
           
protected static java.lang.String XML_ATTRIBUTE_TO
           
protected static java.lang.String XML_ELEMENT_AUDIENCE
           
protected static java.lang.String XML_ELEMENT_CONTENTMATCHES
           
protected static java.lang.String XML_ELEMENT_CONTROLLER
           
protected static java.lang.String XML_ELEMENT_DATE
           
protected static java.lang.String XML_ELEMENT_DESCRIPTION
           
protected static java.lang.String XML_ELEMENT_LIMITS
           
protected static java.lang.String XML_ELEMENT_META
           
protected static java.lang.String XML_ELEMENT_NAME
           
protected static java.lang.String XML_ELEMENT_NEW_OBJECT
           
protected static java.lang.String XML_ELEMENT_OBJECT
           
protected static java.lang.String XML_ELEMENT_OPERATOR
           
protected static java.lang.String XML_ELEMENT_ORGANIZATION
           
protected static java.lang.String XML_ELEMENT_PORTNUMBER
           
protected static java.lang.String XML_ELEMENT_REFERENCE
           
protected static java.lang.String XML_ELEMENT_REFINEMENT
           
protected static java.lang.String XML_ELEMENT_REFINEMENTLIST
           
protected static java.lang.String XML_ELEMENT_TIMESPAN
           
protected static java.lang.String XML_ELEMENT_URIMATCHES
           
protected static java.lang.String XML_ROOT_HOST_SETTINGS
           
protected static java.lang.String XML_ROOT_ORDER
           
protected static java.lang.String XML_ROOT_REFINEMENT
           
protected static java.lang.String XML_SCHEMA
           
 
Fields inherited from class org.archive.crawler.settings.SettingsHandler
BOOLEAN, DOUBLE, DOUBLE_LIST, FLOAT, FLOAT_LIST, INTEGER, INTEGER_LIST, LONG, LONG_LIST, MAP, OBJECT, STRING, STRING_LIST, TEXT, threadContextSettingsHandler, TIMESTAMP
 
Constructor Summary
XMLSettingsHandler(java.io.File orderFile)
          Create a new XMLSettingsHandler object.
 
Method Summary
 void copySettings(java.io.File newOrderFileName, java.lang.String newSettingsDirectory)
          Creates a replica of the settings file structure in another directory (fully recursive, includes all per host settings).
 void deleteSettingsObject(CrawlerSettings settings)
          Delete a settings object from persistent storage.
 java.util.Collection getDomainOverrides(java.lang.String rootDomain)
          Will return a Collection of strings with domains that contain 'per' domain overrides (or their subdomains contain them).
 java.util.List<java.lang.String> getListOfAllFiles()
          Creates and returns a List of all files comprising the current settings framework.
 java.io.File getOrderFile()
          Get the File object pointing to the order file.
 java.io.File getPathRelativeToWorkingDirectory(java.lang.String path)
          Transforms a relative path so that it is relative to the location of the order file.
 void initialize()
          Initialize the SettingsHandler.
 void initialize(java.io.File source)
          Initialize the SettingsHandler from a source.
protected  CrawlerSettings readSettingsObject(CrawlerSettings settings)
          Read the CrawlerSettings object from persistent storage.
protected  CrawlerSettings readSettingsObject(CrawlerSettings settings, java.io.File f)
          Read the CrawlerSettings object from a specific file.
protected  java.io.File settingsToFilename(CrawlerSettings settings)
          Resolves the filename for a settings object into a file path.
static java.lang.String toResourcePath(java.io.File f)
          Convert a File to a path that might be resolved from classpath/JAR resource sources.
 void writeSettingsObject(CrawlerSettings settings)
          Write the CrawlerSettings object to persistent storage.
 void writeSettingsObject(CrawlerSettings settings, java.io.File filename)
          Write a CrawlerSettings object to a specified file.
 
Methods inherited from class org.archive.crawler.settings.SettingsHandler
cleanup, clearPerHostSettingsCache, fireValueErrorHandlers, getClassName, getComplexTypeByAbsoluteName, getModule, getOrCreateSettingsObject, getOrCreateSettingsObject, getOrder, getParentScope, getSettings, getSettings, getSettingsForHost, getSettingsObject, getSettingsObject, getThreadContextSettingsHandler, getTypeName, instantiateModuleTypeFromClassName, registerValueErrorHandler, setErrorReportingLevel, setThreadContextSettingsHandler, StringToType, unregisterValueErrorHandler
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

XML_SCHEMA

protected static final java.lang.String XML_SCHEMA
See Also:
Constant Field Values

XML_ROOT_ORDER

protected static final java.lang.String XML_ROOT_ORDER
See Also:
Constant Field Values

XML_ROOT_HOST_SETTINGS

protected static final java.lang.String XML_ROOT_HOST_SETTINGS
See Also:
Constant Field Values

XML_ROOT_REFINEMENT

protected static final java.lang.String XML_ROOT_REFINEMENT
See Also:
Constant Field Values

XML_ELEMENT_CONTROLLER

protected static final java.lang.String XML_ELEMENT_CONTROLLER
See Also:
Constant Field Values

XML_ELEMENT_META

protected static final java.lang.String XML_ELEMENT_META
See Also:
Constant Field Values

XML_ELEMENT_NAME

protected static final java.lang.String XML_ELEMENT_NAME
See Also:
Constant Field Values

XML_ELEMENT_DESCRIPTION

protected static final java.lang.String XML_ELEMENT_DESCRIPTION
See Also:
Constant Field Values

XML_ELEMENT_OPERATOR

protected static final java.lang.String XML_ELEMENT_OPERATOR
See Also:
Constant Field Values

XML_ELEMENT_ORGANIZATION

protected static final java.lang.String XML_ELEMENT_ORGANIZATION
See Also:
Constant Field Values

XML_ELEMENT_AUDIENCE

protected static final java.lang.String XML_ELEMENT_AUDIENCE
See Also:
Constant Field Values

XML_ELEMENT_DATE

protected static final java.lang.String XML_ELEMENT_DATE
See Also:
Constant Field Values

XML_ELEMENT_REFINEMENTLIST

protected static final java.lang.String XML_ELEMENT_REFINEMENTLIST
See Also:
Constant Field Values

XML_ELEMENT_REFINEMENT

protected static final java.lang.String XML_ELEMENT_REFINEMENT
See Also:
Constant Field Values

XML_ELEMENT_REFERENCE

protected static final java.lang.String XML_ELEMENT_REFERENCE
See Also:
Constant Field Values

XML_ELEMENT_LIMITS

protected static final java.lang.String XML_ELEMENT_LIMITS
See Also:
Constant Field Values

XML_ELEMENT_TIMESPAN

protected static final java.lang.String XML_ELEMENT_TIMESPAN
See Also:
Constant Field Values

XML_ELEMENT_PORTNUMBER

protected static final java.lang.String XML_ELEMENT_PORTNUMBER
See Also:
Constant Field Values

XML_ELEMENT_URIMATCHES

protected static final java.lang.String XML_ELEMENT_URIMATCHES
See Also:
Constant Field Values

XML_ELEMENT_CONTENTMATCHES

protected static final java.lang.String XML_ELEMENT_CONTENTMATCHES
See Also:
Constant Field Values

XML_ELEMENT_OBJECT

protected static final java.lang.String XML_ELEMENT_OBJECT
See Also:
Constant Field Values

XML_ELEMENT_NEW_OBJECT

protected static final java.lang.String XML_ELEMENT_NEW_OBJECT
See Also:
Constant Field Values

XML_ATTRIBUTE_NAME

protected static final java.lang.String XML_ATTRIBUTE_NAME
See Also:
Constant Field Values

XML_ATTRIBUTE_CLASS

protected static final java.lang.String XML_ATTRIBUTE_CLASS
See Also:
Constant Field Values

XML_ATTRIBUTE_FROM

protected static final java.lang.String XML_ATTRIBUTE_FROM
See Also:
Constant Field Values

XML_ATTRIBUTE_TO

protected static final java.lang.String XML_ATTRIBUTE_TO
See Also:
Constant Field Values
Constructor Detail

XMLSettingsHandler

public XMLSettingsHandler(java.io.File orderFile)
                   throws javax.management.InvalidAttributeValueException
Create a new XMLSettingsHandler object.

Parameters:
orderFile - where the order file is located.
Throws:
javax.management.InvalidAttributeValueException
Method Detail

initialize

public void initialize()
Initialize the SettingsHandler. This method builds the settings data structure and initializes it with settings from the order file given to the constructor.

Overrides:
initialize in class SettingsHandler

initialize

public void initialize(java.io.File source)
Initialize the SettingsHandler from a source. This method builds the settings data structure and initializes it with settings from the order file given as a parameter. The intended use is to create a new order file based on a default (template) order file.

Parameters:
source - the order file to initialize from.

settingsToFilename

protected final java.io.File settingsToFilename(CrawlerSettings settings)
Resolves the filename for a settings object into a file path. It will also create the directory structure leading to this file if it doesn't exist.

Parameters:
settings - the settings object to get file path for.
Returns:
the file path for this settings object.

writeSettingsObject

public final void writeSettingsObject(CrawlerSettings settings)
Description copied from class: SettingsHandler
Write the CrawlerSettings object to persistent storage.

Specified by:
writeSettingsObject in class SettingsHandler
Parameters:
settings - the settings object to write.

writeSettingsObject

public final void writeSettingsObject(CrawlerSettings settings,
                                      java.io.File filename)
Write a CrawlerSettings object to a specified file. This method is similar to writeSettingsObject(CrawlerSettings) except that it uses the submitted File object instead of trying to resolve where the file should be written.

Parameters:
settings - the settings object to be serialized.
filename - the file to which the settings object should be written.

readSettingsObject

protected final CrawlerSettings readSettingsObject(CrawlerSettings settings,
                                                   java.io.File f)
Read the CrawlerSettings object from a specific file.

Parameters:
settings - the settings object to be updated with data from the persistent storage.
f - the file to read from.
Returns:
the updated settings object or null if there was no data for this in the persistent storage.

toResourcePath

public static java.lang.String toResourcePath(java.io.File f)
Convert a File to a path that might be resolved from classpath/JAR resource sources. Such paths use linux-like path-separators.

Parameters:
f - File
Returns:
path, shorn of any Windows-specific drive identifiers

readSettingsObject

protected final CrawlerSettings readSettingsObject(CrawlerSettings settings)
Description copied from class: SettingsHandler
Read the CrawlerSettings object from persistent storage.

Specified by:
readSettingsObject in class SettingsHandler
Parameters:
settings - the settings object to be updated with data from the persistent storage.
Returns:
the updated settings object or null if there was no data for this in the persistent storage.

getOrderFile

public java.io.File getOrderFile()
Get the File object pointing to the order file.

Returns:
File object for the order file.

copySettings

public void copySettings(java.io.File newOrderFileName,
                         java.lang.String newSettingsDirectory)
                  throws java.io.IOException
Creates a replica of the settings file structure in another directory (fully recursive, includes all per host settings). The SettingsHandler will then refer to the new files. Observe that this method should only be called after the SettingsHandler has been initialized.

Parameters:
newOrderFileName - where the new order file should be saved.
newSettingsDirectory - the top level directory of the per host/domain settings files.
Throws:
java.io.IOException

getPathRelativeToWorkingDirectory

public java.io.File getPathRelativeToWorkingDirectory(java.lang.String path)
Transforms a relative path so that it is relative to the location of the order file. If an absolute path is given, it will be returned unchanged.

The location of it's order file is always considered as the 'working' directory for any given settings.

Specified by:
getPathRelativeToWorkingDirectory in class SettingsHandler
Parameters:
path - A relative path to a file (or directory)
Returns:
The same path modified so that it is relative to the file level location of the order file for the settings handler.

getDomainOverrides

public java.util.Collection getDomainOverrides(java.lang.String rootDomain)
Description copied from class: SettingsHandler
Will return a Collection of strings with domains that contain 'per' domain overrides (or their subdomains contain them). The domains considered are limited to those that are subdomains of the supplied domain. If null or empty string is supplied the TLDs will be considered.

Specified by:
getDomainOverrides in class SettingsHandler
Parameters:
rootDomain - The domain to get domain overrides for. Examples: 'org', 'archive.org', 'crawler.archive.org' etc.
Returns:
An array of domains that contain overrides. If rootDomain does not exist an empty array will be returned.

deleteSettingsObject

public void deleteSettingsObject(CrawlerSettings settings)
Delete a settings object from persistent storage. Deletes the file represented by the submitted settings object. All empty directories that are parents to the files path are also deleted.

Overrides:
deleteSettingsObject in class SettingsHandler
Parameters:
settings - the settings object to delete.

getListOfAllFiles

public java.util.List<java.lang.String> getListOfAllFiles()
Description copied from class: SettingsHandler
Creates and returns a List of all files comprising the current settings framework.

The List contains the absolute String path of each file.

The list should contain any configurable files, including such files as seed file and any other files use by the various settings modules.

Implementations of the SettingsHandler that do not use files for permanent storage should return an empty list.

Specified by:
getListOfAllFiles in class SettingsHandler
Returns:
List of framework files.


Copyright © 2003-2011 Internet Archive. All Rights Reserved.