org.archive.crawler.datamodel
Class CrawlOrder

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.datamodel.CrawlOrder
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean

public class CrawlOrder
extends ModuleType
implements java.io.Serializable

Represents the 'root' of the settings hierarchy. Contains those settings that do not belong to any specific module, but rather relate to the crawl as a whole (much of this is used by the CrawlController directly or indirectly).

See Also:
ModuleType, Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
ComplexType.MBeanAttributeInfoIterator
 
Field Summary
static java.lang.String ATTR_BDB_CACHE_PERCENT
          Percentage of heap to allocate to bdb cache
static java.lang.String ATTR_CHECKPOINT_COPY_BDBJE_LOGS
          When checkpointing, copy the bdb logs.
static java.lang.String ATTR_CHECKPOINTS_PATH
           
static java.lang.String ATTR_DISK_PATH
           
static java.lang.String ATTR_EXTRACT_PROCESSORS
           
static java.lang.String ATTR_FETCH_PROCESSORS
           
static java.lang.String ATTR_FROM
           
static java.lang.String ATTR_HTTP_HEADERS
           
static java.lang.String ATTR_INDEPENDENT_EXTRACTORS
           
static java.lang.String ATTR_LOGGERS
           
static java.lang.String ATTR_LOGS_PATH
           
static java.lang.String ATTR_MAX_BYTES_DOWNLOAD
           
static java.lang.String ATTR_MAX_DOCUMENT_DOWNLOAD
           
static java.lang.String ATTR_MAX_TIME_SEC
           
static java.lang.String ATTR_MAX_TOE_THREADS
           
static java.lang.String ATTR_NAME
           
static java.lang.String ATTR_POST_PROCESSORS
           
static java.lang.String ATTR_PRE_FETCH_PROCESSORS
           
static java.lang.String ATTR_RECORDER_IN_BUFFER
           
static java.lang.String ATTR_RECORDER_OUT_BUFFER
           
static java.lang.String ATTR_RECOVER_PATH
           
static java.lang.String ATTR_RECOVER_RETAIN_FAILURES
           
static java.lang.String ATTR_RECOVER_SCOPE_ENQUEUES
           
static java.lang.String ATTR_RECOVER_SCOPE_INCLUDES
           
static java.lang.String ATTR_RULES
           
static java.lang.String ATTR_SCRATCH_PATH
           
static java.lang.String ATTR_SETTINGS_DIRECTORY
           
static java.lang.String ATTR_STATE_PATH
           
static java.lang.String ATTR_USER_AGENT
           
static java.lang.String ATTR_WRITE_PROCESSORS
           
static java.lang.Boolean DEFAULT_CHECKPOINT_COPY_BDBJE_LOGS
           
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definition, definitionMap
 
Constructor Summary
CrawlOrder()
          Construct a CrawlOrder.
 
Method Summary
 void checkUserAgentAndFrom()
          Checks if the User Agent and From field are set 'correctly' in the specified Crawl Order.
 java.io.File getCheckpointsDirectory()
           
 CrawlController getController()
           
 java.lang.String getCrawlOrderName()
          Get the name of the order file.
 java.lang.String getFrom(CrawlURI curi)
           
 MapType getLoggers()
          Returns the Map of the StatisticsTracking modules that are included in the configuration that the current instance of this class is representing.
 int getMaxToes()
          Returns the set number of maximum toe threads.
 RobotsHonoringPolicy getRobotsHonoringPolicy()
          This method gets the RobotsHonoringPolicy object from the orders file.
 java.io.File getSettingsDir(java.lang.String key)
          Return fullpath to the directory named by key in settings.
 java.lang.String getUserAgent(CrawlURI curi)
           
 void setController(CrawlController controller)
           
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement, listUsedFiles
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName, hashCode
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

ATTR_NAME

public static final java.lang.String ATTR_NAME
See Also:
Constant Field Values

ATTR_SETTINGS_DIRECTORY

public static final java.lang.String ATTR_SETTINGS_DIRECTORY
See Also:
Constant Field Values

ATTR_DISK_PATH

public static final java.lang.String ATTR_DISK_PATH
See Also:
Constant Field Values

ATTR_LOGS_PATH

public static final java.lang.String ATTR_LOGS_PATH
See Also:
Constant Field Values

ATTR_CHECKPOINTS_PATH

public static final java.lang.String ATTR_CHECKPOINTS_PATH
See Also:
Constant Field Values

ATTR_STATE_PATH

public static final java.lang.String ATTR_STATE_PATH
See Also:
Constant Field Values

ATTR_SCRATCH_PATH

public static final java.lang.String ATTR_SCRATCH_PATH
See Also:
Constant Field Values

ATTR_RECOVER_PATH

public static final java.lang.String ATTR_RECOVER_PATH
See Also:
Constant Field Values

ATTR_RECOVER_RETAIN_FAILURES

public static final java.lang.String ATTR_RECOVER_RETAIN_FAILURES
See Also:
Constant Field Values

ATTR_RECOVER_SCOPE_INCLUDES

public static final java.lang.String ATTR_RECOVER_SCOPE_INCLUDES
See Also:
Constant Field Values

ATTR_RECOVER_SCOPE_ENQUEUES

public static final java.lang.String ATTR_RECOVER_SCOPE_ENQUEUES
See Also:
Constant Field Values

ATTR_MAX_BYTES_DOWNLOAD

public static final java.lang.String ATTR_MAX_BYTES_DOWNLOAD
See Also:
Constant Field Values

ATTR_MAX_DOCUMENT_DOWNLOAD

public static final java.lang.String ATTR_MAX_DOCUMENT_DOWNLOAD
See Also:
Constant Field Values

ATTR_MAX_TIME_SEC

public static final java.lang.String ATTR_MAX_TIME_SEC
See Also:
Constant Field Values

ATTR_MAX_TOE_THREADS

public static final java.lang.String ATTR_MAX_TOE_THREADS
See Also:
Constant Field Values

ATTR_HTTP_HEADERS

public static final java.lang.String ATTR_HTTP_HEADERS
See Also:
Constant Field Values

ATTR_USER_AGENT

public static final java.lang.String ATTR_USER_AGENT
See Also:
Constant Field Values

ATTR_FROM

public static final java.lang.String ATTR_FROM
See Also:
Constant Field Values

ATTR_PRE_FETCH_PROCESSORS

public static final java.lang.String ATTR_PRE_FETCH_PROCESSORS
See Also:
Constant Field Values

ATTR_FETCH_PROCESSORS

public static final java.lang.String ATTR_FETCH_PROCESSORS
See Also:
Constant Field Values

ATTR_EXTRACT_PROCESSORS

public static final java.lang.String ATTR_EXTRACT_PROCESSORS
See Also:
Constant Field Values

ATTR_WRITE_PROCESSORS

public static final java.lang.String ATTR_WRITE_PROCESSORS
See Also:
Constant Field Values

ATTR_POST_PROCESSORS

public static final java.lang.String ATTR_POST_PROCESSORS
See Also:
Constant Field Values

ATTR_LOGGERS

public static final java.lang.String ATTR_LOGGERS
See Also:
Constant Field Values

ATTR_RULES

public static final java.lang.String ATTR_RULES
See Also:
Constant Field Values

ATTR_RECORDER_OUT_BUFFER

public static final java.lang.String ATTR_RECORDER_OUT_BUFFER
See Also:
Constant Field Values

ATTR_RECORDER_IN_BUFFER

public static final java.lang.String ATTR_RECORDER_IN_BUFFER
See Also:
Constant Field Values

ATTR_INDEPENDENT_EXTRACTORS

public static final java.lang.String ATTR_INDEPENDENT_EXTRACTORS
See Also:
Constant Field Values

ATTR_BDB_CACHE_PERCENT

public static final java.lang.String ATTR_BDB_CACHE_PERCENT
Percentage of heap to allocate to bdb cache

See Also:
Constant Field Values

ATTR_CHECKPOINT_COPY_BDBJE_LOGS

public static final java.lang.String ATTR_CHECKPOINT_COPY_BDBJE_LOGS
When checkpointing, copy the bdb logs. Default is true. If false, then we do not copy logs on checkpoint AND we tell bdbje never to delete log files; instead it renames files-to-delete with a '.del' extension. Assumption is that when this setting is false, an external process is managing the removing of bdbje log files and that come time to recover from a checkpoint, the files that comprise a checkpoint are manually assembled.

See Also:
Constant Field Values

DEFAULT_CHECKPOINT_COPY_BDBJE_LOGS

public static final java.lang.Boolean DEFAULT_CHECKPOINT_COPY_BDBJE_LOGS
Constructor Detail

CrawlOrder

public CrawlOrder()
Construct a CrawlOrder.

Method Detail

getUserAgent

public java.lang.String getUserAgent(CrawlURI curi)
Parameters:
curi -
Returns:
user-agent header value to use

getFrom

public java.lang.String getFrom(CrawlURI curi)
Parameters:
curi -
Returns:
from header value to use

getMaxToes

public int getMaxToes()
Returns the set number of maximum toe threads.

Returns:
Number of maximum toe threads

getRobotsHonoringPolicy

public RobotsHonoringPolicy getRobotsHonoringPolicy()
This method gets the RobotsHonoringPolicy object from the orders file.

Returns:
the new RobotsHonoringPolicy

getCrawlOrderName

public java.lang.String getCrawlOrderName()
Get the name of the order file.

Returns:
the name of the order file.

getController

public CrawlController getController()
Returns:
The crawl controller.

setController

public void setController(CrawlController controller)
Parameters:
controller -

getLoggers

public MapType getLoggers()
Returns the Map of the StatisticsTracking modules that are included in the configuration that the current instance of this class is representing.

Returns:
Map of the StatisticsTracking modules

checkUserAgentAndFrom

public void checkUserAgentAndFrom()
                           throws FatalConfigurationException
Checks if the User Agent and From field are set 'correctly' in the specified Crawl Order.

Throws:
FatalConfigurationException

getCheckpointsDirectory

public java.io.File getCheckpointsDirectory()
Returns:
Checkpoint directory.

getSettingsDir

public java.io.File getSettingsDir(java.lang.String key)
                            throws javax.management.AttributeNotFoundException
Return fullpath to the directory named by key in settings. If directory does not exist, it and all intermediary dirs will be created.

Parameters:
key - Key to use going to settings.
Returns:
Full path to directory named by key.
Throws:
javax.management.AttributeNotFoundException


Copyright © 2003-2011 Internet Archive. All Rights Reserved.