org.archive.crawler.prefetch
Class QuotaEnforcer

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.framework.Processor
                      extended by org.archive.crawler.prefetch.QuotaEnforcer
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean, FetchStatusCodes

public class QuotaEnforcer
extends Processor
implements FetchStatusCodes

A simple quota enforcer. If the host, server, or frontier group associated with the current CrawlURI is already over its quotas, blocks the current URI's processing with S_BLOCKED_BY_QUOTA.

Version:
$Date: 2007-04-06 00:40:50 +0000 (Fri, 06 Apr 2007) $, $Revision: 5040 $
Author:
gojomo
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
ComplexType.MBeanAttributeInfoIterator
 
Field Summary
protected static java.lang.String ATTR_FORCE_RETIRE
          whether to force-retire when over-quote detected
protected static java.lang.String ATTR_GROUP_MAX_ALL_KB
          group max all fetch bytes (including error responses)
protected static java.lang.String ATTR_GROUP_MAX_FETCH_RESPONSES
          group max fetch responses (including error codes)
protected static java.lang.String ATTR_GROUP_MAX_FETCH_SUCCESSES
          group max successful fetches
protected static java.lang.String ATTR_GROUP_MAX_SUCCESS_KB
          group max successful fetch bytes
protected static java.lang.String ATTR_HOST_MAX_ALL_KB
          host max all fetch bytes (including error responses)
protected static java.lang.String ATTR_HOST_MAX_FETCH_RESPONSES
          host max fetch responses (including error codes)
protected static java.lang.String ATTR_HOST_MAX_FETCH_SUCCESSES
          host max successful fetches
protected static java.lang.String ATTR_HOST_MAX_SUCCESS_KB
          host max successful fetch bytes
protected static java.lang.String ATTR_SERVER_MAX_ALL_KB
          server max all fetch bytes (including error responses)
protected static java.lang.String ATTR_SERVER_MAX_FETCH_RESPONSES
          server max fetch responses (including error codes)
protected static java.lang.String ATTR_SERVER_MAX_FETCH_SUCCESSES
          server max successful fetches
protected static java.lang.String ATTR_SERVER_MAX_SUCCESS_KB
          server max successful fetch bytes
protected static java.lang.Boolean DEFAULT_FORCE_RETIRE
           
protected static java.lang.Long DEFAULT_GROUP_MAX_ALL_KB
           
protected static java.lang.Long DEFAULT_GROUP_MAX_FETCH_RESPONSES
           
protected static java.lang.Long DEFAULT_GROUP_MAX_FETCH_SUCCESSES
           
protected static java.lang.Long DEFAULT_GROUP_MAX_SUCCESS_KB
           
protected static java.lang.Long DEFAULT_HOST_MAX_ALL_KB
           
protected static java.lang.Long DEFAULT_HOST_MAX_FETCH_RESPONSES
           
protected static java.lang.Long DEFAULT_HOST_MAX_FETCH_SUCCESSES
           
protected static java.lang.Long DEFAULT_HOST_MAX_SUCCESS_KB
           
protected static java.lang.Long DEFAULT_SERVER_MAX_ALL_KB
           
protected static java.lang.Long DEFAULT_SERVER_MAX_FETCH_RESPONSES
           
protected static java.lang.Long DEFAULT_SERVER_MAX_FETCH_SUCCESSES
           
protected static java.lang.Long DEFAULT_SERVER_MAX_SUCCESS_KB
           
protected static int GROUP
           
protected static int HOST
           
protected static java.lang.String[][] keys
           
protected static int NAME
           
protected static int RESPONSE_KB
           
protected static int RESPONSES
           
protected static int SERVER
           
protected static int SUCCESS_KB
           
protected static int SUCCESSES
           
 
Fields inherited from class org.archive.crawler.framework.Processor
ATTR_DECIDE_RULES, ATTR_ENABLED, attrDecideRules
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definition, definitionMap
 
Fields inherited from interface org.archive.crawler.datamodel.FetchStatusCodes
S_BLOCKED_BY_CUSTOM_PROCESSOR, S_BLOCKED_BY_QUOTA, S_BLOCKED_BY_RUNTIME_LIMIT, S_BLOCKED_BY_USER, S_CONNECT_FAILED, S_CONNECT_LOST, S_DEEMED_CHAFF, S_DEEMED_NOT_FOUND, S_DEFERRED, S_DELETED_BY_USER, S_DNS_SUCCESS, S_DOMAIN_PREREQUISITE_FAILURE, S_DOMAIN_UNRESOLVABLE, S_GETBYNAME_SUCCESS, S_OTHER_PREREQUISITE_FAILURE, S_OUT_OF_SCOPE, S_PREREQUISITE_UNSCHEDULABLE_FAILURE, S_PROCESSING_THREAD_KILLED, S_ROBOTS_PRECLUDED, S_ROBOTS_PREREQUISITE_FAILURE, S_RUNTIME_EXCEPTION, S_SERIOUS_ERROR, S_TIMEOUT, S_TOO_MANY_EMBED_HOPS, S_TOO_MANY_LINK_HOPS, S_TOO_MANY_RETRIES, S_UNATTEMPTED, S_UNFETCHABLE_URI, S_UNQUEUEABLE
 
Constructor Summary
QuotaEnforcer(java.lang.String name)
          Constructor.
 
Method Summary
protected  boolean applyQuota(CrawlURI curi, java.lang.String quotaKey, long actual)
          Apply the quota specified by the given key against the actual value provided.
protected  boolean checkQuotas(CrawlURI curi, CrawlSubstats.HasCrawlSubstats hasStats, int CAT)
          Check all quotas for the given substats and category (server, host, or group).
protected  void innerProcess(CrawlURI curi)
          Classes subclassing this one should override this method to perform their custom actions on the CrawlURI.
 
Methods inherited from class org.archive.crawler.framework.Processor
checkForInterrupt, finalTasks, getController, getDecideRule, getDefaultNextProcessor, initialTasks, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement, listUsedFiles
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName, hashCode
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

SERVER

protected static final int SERVER
See Also:
Constant Field Values

HOST

protected static final int HOST
See Also:
Constant Field Values

GROUP

protected static final int GROUP
See Also:
Constant Field Values

NAME

protected static final int NAME
See Also:
Constant Field Values

SUCCESSES

protected static final int SUCCESSES
See Also:
Constant Field Values

SUCCESS_KB

protected static final int SUCCESS_KB
See Also:
Constant Field Values

RESPONSES

protected static final int RESPONSES
See Also:
Constant Field Values

RESPONSE_KB

protected static final int RESPONSE_KB
See Also:
Constant Field Values

keys

protected static final java.lang.String[][] keys

ATTR_SERVER_MAX_FETCH_SUCCESSES

protected static final java.lang.String ATTR_SERVER_MAX_FETCH_SUCCESSES
server max successful fetches


DEFAULT_SERVER_MAX_FETCH_SUCCESSES

protected static final java.lang.Long DEFAULT_SERVER_MAX_FETCH_SUCCESSES

ATTR_SERVER_MAX_SUCCESS_KB

protected static final java.lang.String ATTR_SERVER_MAX_SUCCESS_KB
server max successful fetch bytes


DEFAULT_SERVER_MAX_SUCCESS_KB

protected static final java.lang.Long DEFAULT_SERVER_MAX_SUCCESS_KB

ATTR_SERVER_MAX_FETCH_RESPONSES

protected static final java.lang.String ATTR_SERVER_MAX_FETCH_RESPONSES
server max fetch responses (including error codes)


DEFAULT_SERVER_MAX_FETCH_RESPONSES

protected static final java.lang.Long DEFAULT_SERVER_MAX_FETCH_RESPONSES

ATTR_SERVER_MAX_ALL_KB

protected static final java.lang.String ATTR_SERVER_MAX_ALL_KB
server max all fetch bytes (including error responses)


DEFAULT_SERVER_MAX_ALL_KB

protected static final java.lang.Long DEFAULT_SERVER_MAX_ALL_KB

ATTR_HOST_MAX_FETCH_SUCCESSES

protected static final java.lang.String ATTR_HOST_MAX_FETCH_SUCCESSES
host max successful fetches


DEFAULT_HOST_MAX_FETCH_SUCCESSES

protected static final java.lang.Long DEFAULT_HOST_MAX_FETCH_SUCCESSES

ATTR_HOST_MAX_SUCCESS_KB

protected static final java.lang.String ATTR_HOST_MAX_SUCCESS_KB
host max successful fetch bytes


DEFAULT_HOST_MAX_SUCCESS_KB

protected static final java.lang.Long DEFAULT_HOST_MAX_SUCCESS_KB

ATTR_HOST_MAX_FETCH_RESPONSES

protected static final java.lang.String ATTR_HOST_MAX_FETCH_RESPONSES
host max fetch responses (including error codes)


DEFAULT_HOST_MAX_FETCH_RESPONSES

protected static final java.lang.Long DEFAULT_HOST_MAX_FETCH_RESPONSES

ATTR_HOST_MAX_ALL_KB

protected static final java.lang.String ATTR_HOST_MAX_ALL_KB
host max all fetch bytes (including error responses)


DEFAULT_HOST_MAX_ALL_KB

protected static final java.lang.Long DEFAULT_HOST_MAX_ALL_KB

ATTR_GROUP_MAX_FETCH_SUCCESSES

protected static final java.lang.String ATTR_GROUP_MAX_FETCH_SUCCESSES
group max successful fetches


DEFAULT_GROUP_MAX_FETCH_SUCCESSES

protected static final java.lang.Long DEFAULT_GROUP_MAX_FETCH_SUCCESSES

ATTR_GROUP_MAX_SUCCESS_KB

protected static final java.lang.String ATTR_GROUP_MAX_SUCCESS_KB
group max successful fetch bytes


DEFAULT_GROUP_MAX_SUCCESS_KB

protected static final java.lang.Long DEFAULT_GROUP_MAX_SUCCESS_KB

ATTR_GROUP_MAX_FETCH_RESPONSES

protected static final java.lang.String ATTR_GROUP_MAX_FETCH_RESPONSES
group max fetch responses (including error codes)


DEFAULT_GROUP_MAX_FETCH_RESPONSES

protected static final java.lang.Long DEFAULT_GROUP_MAX_FETCH_RESPONSES

ATTR_GROUP_MAX_ALL_KB

protected static final java.lang.String ATTR_GROUP_MAX_ALL_KB
group max all fetch bytes (including error responses)


DEFAULT_GROUP_MAX_ALL_KB

protected static final java.lang.Long DEFAULT_GROUP_MAX_ALL_KB

ATTR_FORCE_RETIRE

protected static final java.lang.String ATTR_FORCE_RETIRE
whether to force-retire when over-quote detected

See Also:
Constant Field Values

DEFAULT_FORCE_RETIRE

protected static final java.lang.Boolean DEFAULT_FORCE_RETIRE
Constructor Detail

QuotaEnforcer

public QuotaEnforcer(java.lang.String name)
Constructor.

Parameters:
name - Name of this processor.
Method Detail

innerProcess

protected void innerProcess(CrawlURI curi)
Description copied from class: Processor
Classes subclassing this one should override this method to perform their custom actions on the CrawlURI.

Overrides:
innerProcess in class Processor
Parameters:
curi - The CrawlURI being processed.

checkQuotas

protected boolean checkQuotas(CrawlURI curi,
                              CrawlSubstats.HasCrawlSubstats hasStats,
                              int CAT)
Check all quotas for the given substats and category (server, host, or group).

Parameters:
curi - CrawlURI to mark up with results
hasStats - holds CrawlSubstats with actual values to test
CAT - category index (SERVER, HOST, GROUP) to quota settings keys
Returns:
true if quota precludes fetching of CrawlURI

applyQuota

protected boolean applyQuota(CrawlURI curi,
                             java.lang.String quotaKey,
                             long actual)
Apply the quota specified by the given key against the actual value provided. If the quota and actual values rule out processing the given CrawlURI, mark up the CrawlURI appropriately.

Parameters:
curi - CrawlURI whose processing is subject to a potential quota limitation
quotaKey - settings key to get applicable quota
actual - current value to compare to quota
Returns:
true is CrawlURI is blocked by a quota, false otherwise


Copyright © 2003-2011 Internet Archive. All Rights Reserved.