|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Interface Summary | |
---|---|
ValueErrorHandler | If a ValueErrorHandler is registered with a SettingsHandler , only
constraints with level Level.SEVERE will throw an
InvalidAttributeValueException . |
Class Summary | |
---|---|
ComplexType | Superclass of all configurable modules. |
Constraint | Superclass for constraints that can be set on attribute definitions. |
CrawlerSettings | Class representing a settings file. |
CrawlSettingsSAXHandler | An SAX element handler that updates a CrawlerSettings object. |
CrawlSettingsSAXSource | Class that takes a CrawlerSettings object and create SAXEvents from it. |
DataContainer | This class holds the data for a ComplexType for a settings object. |
DoubleList | List of Double values |
FloatList | List of Float values |
IntegerList | List of Integer values |
LegalValueListConstraint | A constraint that checks that an attribute value matches one of the items in the list of legal values. |
LegalValueTypeConstraint | A constraint that checks that an attribute value is of the right type |
ListType<T> | Super type for all lists. |
LongList | List of Long values |
MapType | This class represents a container of settings. |
ModuleAttributeInfo | |
ModuleType | Superclass of all modules that should be configurable. |
RegularExpressionConstraint | A constraint that checks that a value matches a regular expression. |
SettingsCache | This class keeps a map of host names to settings objects. |
SettingsFrameworkTestCase | Set up a couple of settings to test different functions of the settings framework. |
SettingsHandler | An instance of this class holds a hierarchy of settings. |
SimpleType | A type that holds a Java type. |
SoftSettingsHash | |
SoftSettingsHash.SettingsEntry | The entries in this hash extend SoftReference, using the host string as the key. |
StringList | List of String values. |
TextField | Class to hold values for text fields. |
Type | Interface implemented by all element types. |
XMLSettingsHandler | A SettingsHandler which uses XML files as persistent storage. |
Provides classes for the settings framework.
The settings framework is designed to be a flexible way to configure a crawl with special treatment for subparts of the web without adding to much performance overhead.
At it's core the settings framework is a way to keep persistent, context sensitive configuration settings for any class in the crawler.
All classes in the crawler that has configurable settings subclasses
ComplexType
or one of its descendants. The ComplexType
implements the
DynamicMBean
interface. This gives you a way to ask the object
for what attributes it supports and standard methods for getting and setting
these attributes.
The entry point into the settings framework is the SettingsHandler
. This class
is responsible for loading and saving from persistent storage and for
interconnecting the different parts of the framework.
Figure 1. Schematic view of the Settings Framework
CrawlerSettings
objects. On the top there is a settings object
representing the global settings. This consist of all the settings that a crawl
job needs for running. Beneath this global object there is one "per" settings
object for each host/domain which has settings that should override the order
for that particular host or domain.
When the settings framework is asked for an attribute for a specific host, it will first try to see if this attribute is set for this particular host. If it is, the value will be returned. If not, it will go up one level recursively until it eventually reach the order object and returns the global value. If no value is set here either (normally it would be), a hard coded default value is returned.
All per domain/host settings objects only contain those settings which are to be overridden for that particular domain/host. The convention is to name the top level object "global settings" and the objects beneath "per settings" or "overrides" (although the refinements described next, also do overriding).
To further complicate the picture, there is also settings objects called refinements. An object of this type belongs to a global or per settings object and overrides the settings in it's owners object if some criteria is met. These criteria could be that the URI in question conforms to a regular expression or that it the settings are consulted at a specific time of day limited by a time span.
ComplexType
or one of
its descendants. The ComplexType
is responsible for keeping the definition of
the configurable attributes of the module. The actual values are stored in an
instance of DataContainer
. The DataContainer
is never accessed directly from
user code. Instead the user accesses the attributes through methods in the
ComplexType
. The attributes are accessed in different ways depending if it is
from the user interface or from inside a running crawl.
When an attribute is accessed from the URI (either reading or writing) you want
to make sure that you are editing the attribute in the right context. When
trying to override an attribute, you don't want the settings framework to
traverse up to effective value for the attribute, but instead want to know that
the attribute is not set on this level. To achieve this, there is
ComplexType.getLocalAttribute(CrawlerSettings settings, String name)
and
ComplexType.setAttribute(CrawlerSettings settings, Attribute attribute)
methods taking a
settings object as a parameter. These methods works only on the supplied
settings object. In addition the methods ComplexType.getAttribute(String)
and
ComplexType.setAttribute(Attribute attribute)
is there for conformance to the Java JMX
specification. The latter two always works on the global settings object.
Getting an attribute within a crawl is different in that you always want to get
a value even if it is not set in it's context. That means that the settings
framework should work its way up the settings hierarchy to find the value in
effect for the context. The method ComplexType.getAttribute(String name, CrawlURI uri)
should be used to make sure that the right context is used. Figure 2 shows
how the settings framework finds the effective value given a context.
Figure 2. Flow of getting an attribute
The different attributes has a type. The allowed type all subclasses the Type
class. There are tree main Types:
SimpleType
, the actual type used will be a subclass of one of
these main types.
SimpleType
is mainly for representing Java??? wrappers for the Java???
primitive types. In addition it also handles the Date
type and a
special Heritrix TextField
type. Overrides of a SimpleType
must be of the same
type as the initial default value for the SimpleType
.
ListType
is further subclassed into versions for some of the wrapped Java???
primitive types (DoubleList
, FloatList
, IntegerList
, LongList
, StringList
). A
List holds values in the same order as they were added. If an attribute of type
ListType
is overridden, then the complete list of values is replaced at the
override level.
ComplexType
is a map of name/value pairs. The values can be any Type
including new MapTypes
. The ComplexType
is defined abstract and you should
use one of the subclasses MapType
or ModuleType
. The MapType
allows adding of
new name/value pairs at runtime, while the ModuleType
only allows the
name/value pairs that it defines at construction time. When overriding the
MapType
the options are either override the value of an already existing
attribute or add a new one. It is not possible in an override to remove an
existing attribute. The ModuleType
doesn't allow additions in overrides, but
the predefined attributes' values might be overridden. Since the ModuleType
is
defined at construction time, it is possible to set more restrictions on each
attribute than in the MapType
. Another consequence of definition at
construction time is that you would normally subclass the ModuleType
, while the
MapType
is usable as it is. It is possible to restrict the MapType
to only
allow attributes of a certain type. There is also a restriction that MapTypes
can not contain nested MapTypes
.
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |