org.archive.crawler.scope
Class SeedCachingScope

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.framework.Filter
                      extended by org.archive.crawler.framework.CrawlScope
                          extended by org.archive.crawler.scope.ClassicScope
                              extended by org.archive.crawler.scope.SeedCachingScope
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean
Direct Known Subclasses:
DomainScope, HostScope, PathScope

public class SeedCachingScope
extends ClassicScope

A CrawlScope that caches its seed list for the convenience of scope-tests that are based on the seeds.

Author:
gojomo
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
ComplexType.MBeanAttributeInfoIterator
 
Field Summary
(package private)  java.util.List<UURI> seeds
           
 
Fields inherited from class org.archive.crawler.scope.ClassicScope
ATTR_EXCLUDE_FILTER, ATTR_MAX_LINK_HOPS, ATTR_MAX_TRANS_HOPS
 
Fields inherited from class org.archive.crawler.framework.CrawlScope
ATTR_NAME, ATTR_REREAD_SEEDS_ON_CONFIG, ATTR_SEEDS, DEFAULT_REREAD_SEEDS_ON_CONFIG, seedListeners
 
Fields inherited from class org.archive.crawler.framework.Filter
ATTR_ENABLED
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definition, definitionMap
 
Constructor Summary
SeedCachingScope(java.lang.String name)
           
 
Method Summary
 boolean addSeed(CrawlURI curi)
           
protected  void fillSeedsCache()
          Ensure seeds cache is created/filled
 void refreshSeeds()
          Refresh seeds.
 java.util.Iterator<UURI> seedsIterator()
          Gets an iterator over all configured seeds.
 
Methods inherited from class org.archive.crawler.scope.ClassicScope
additionalFocusAccepts, exceedsMaxHops, excludeAccepts, focusAccepts, innerAccepts, kickUpdate, transitiveAccepts, xforceAccepts
 
Methods inherited from class org.archive.crawler.framework.CrawlScope
addSeed, addSeedListener, checkClose, getSeedfile, initialize, isSameHost, isSeed, listUsedFiles, seedsIterator, toString
 
Methods inherited from class org.archive.crawler.framework.Filter
accepts, getFilterOffPosition, returnTrueIfMatches
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName, hashCode
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

seeds

java.util.List<UURI> seeds
Constructor Detail

SeedCachingScope

public SeedCachingScope(java.lang.String name)
Method Detail

addSeed

public boolean addSeed(CrawlURI curi)

refreshSeeds

public void refreshSeeds()
Description copied from class: CrawlScope
Refresh seeds.

Overrides:
refreshSeeds in class CrawlScope

seedsIterator

public java.util.Iterator<UURI> seedsIterator()
Description copied from class: CrawlScope
Gets an iterator over all configured seeds. Subclasses which cache seeds in memory can override with more efficient implementation.

Overrides:
seedsIterator in class CrawlScope
Returns:
Iterator, perhaps over a disk file, of seeds

fillSeedsCache

protected void fillSeedsCache()
Ensure seeds cache is created/filled



Copyright © 2003-2011 Internet Archive. All Rights Reserved.