A B C D E F G H I J K L M N O P Q R S T U V W X Z _

A

A_ANNOTATIONS - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
shorthand string tokens indicating notable occurences, separated by commas
A_CONTENT_DIGEST - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
content digest
A_CONTENT_STATE_KEY - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
Key to use getting state of crawluri from the CrawlURI alist.
A_CONTENT_TYPE - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
Extracted MIME type of fetched content; should be set immediately by fetching module if possible (rather than waiting for a later analyzer)
A_CREDENTIAL_AVATARS_KEY - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
Key to get credential avatars from A_LIST.
A_DELAY_FACTOR - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
Multiplier of last fetch duration to wait before fetching another item of the same class (eg host)
A_DISCARD_REVISIT - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
Mark a URI to be dropped from revisit handling.
A_DISTANCE_FROM_SEED - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_DNS_FETCH_TIME - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_DNS_SERVER_IP_LABEL - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_ETAG_HEADER - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
header name (and AList key) for ETag
A_FETCH_BEGAN_TIME - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_FETCH_COMPLETED_TIME - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_FETCH_HISTORY - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
fetch history array
A_FETCH_OVERDUE - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
 
A_FORCE_RETIRE - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
flag indicating the containing queue should be retired
A_FTP_CONTROL_CONVERSATION - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_FTP_FETCH_STATUS - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_HERITABLE_KEYS - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
Key to (optional) attribute specifying a list of keys that are passed to CandidateURIs that 'descend' (are discovered via) this URI.
A_HTML_BASE - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_HTTP_BIND_ADDRESS - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
local override of origin bind address
A_HTTP_PROXY_HOST - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
local override of proxy host
A_HTTP_PROXY_PORT - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
local override of proxy port
A_HTTP_TRANSACTION - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_LAST_CONTENT_DIGEST - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
Designates a field in the CrawlURIs AList for the content digest of an earlier visit.
A_LAST_DATESTAMP - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
 
A_LAST_ETAG - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
 
A_LAST_MODIFIED_HEADER - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
header name (and AList key) for last-modified timestamp
A_LOCALIZED_ERRORS - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_META_ROBOTS - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_MINIMUM_DELAY - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
Minimum delay before fetching another item of th same class (eg host).
A_MIRROR_PATH - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
Define for org.archive.crawler.writer.MirrorWriterProcessor.
A_NUMBER_OF_VERSIONS - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
 
A_NUMBER_OF_VISITS - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
 
A_PREREQUISITE_URI - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_REFERENCE_LENGTH - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
reference length (content length or virtual length
A_RETRY_DELAY - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_RRECORD_SET_LABEL - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_RUNTIME_EXCEPTION - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
A_SOURCE_TAG - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
a 'source' (usu.
A_STATUS - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
key for status (when in history)
A_TIME_OF_NEXT_PROCESSING - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
 
A_VIA_DIGEST - Static variable in class org.archive.crawler.extractor.TrapSuppressExtractor
ALIst attribute key for carrying-forward content-digest from 'via'
A_WAIT_INTERVAL - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
 
A_WAIT_REEVALUATED - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
 
A_WRITTEN_TO_WARC - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
name of warc file where uri had records written
aboutToLog() - Method in class org.archive.crawler.datamodel.CrawlURI
Notify CrawlURI it is about to be logged; opportunity for self-annotation
ABS_HTTP_URI_PATTERN - Static variable in class org.archive.crawler.extractor.ExtractorURI
 
ABSOLUTE_OFFSET_KEY - Static variable in interface org.archive.io.ArchiveFileConstants
Key for the Archive Record absolute offset into Archive file.
AbstractFrontier - Class in org.archive.crawler.frontier
Shared facilities for Frontier implementations.
AbstractFrontier(String, String) - Constructor for class org.archive.crawler.frontier.AbstractFrontier
 
AbstractLongFPSet - Class in org.archive.util
Shell of functionality for a Set of primitive long fingerprints, held in an array of possibly-empty slots.
AbstractLongFPSet() - Constructor for class org.archive.util.AbstractLongFPSet
To support serialization TODO: verify needed?
AbstractLongFPSet(int, float) - Constructor for class org.archive.util.AbstractLongFPSet
Create a new AbstractLongFPSet with a given capacity and load Factor
AbstractTracker - Class in org.archive.crawler.framework
A partial implementation of the StatisticsTracking interface.
AbstractTracker(String, String) - Constructor for class org.archive.crawler.framework.AbstractTracker
 
ACCEPT - Static variable in class org.archive.crawler.deciderules.DecideRule
 
accept(File, String) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment.CaseInsensitiveFilenameFilter
 
ACCEPTABLE_ASCII_DOMAIN - Static variable in class org.archive.net.UURIFactory
Characters we'll accept in the domain label part of a URI authority: ASCII letters-digits-hyphen (LDH) plus underscore, with single intervening '.' characters.
ACCEPTABLE_FORCE_QUEUE - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
ACCEPTABLE_FORCE_QUEUE - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Acceptable characters in forced queue names.
AcceptDecideRule - Class in org.archive.crawler.deciderules
Rule which responds ACCEPT to anything passed in.
AcceptDecideRule(String) - Constructor for class org.archive.crawler.deciderules.AcceptDecideRule
 
AcceptRevisitProcessor - Class in org.archive.crawler.postprocessor
Set a URI to be revisited by the ARFrontier.
AcceptRevisitProcessor(String) - Constructor for class org.archive.crawler.postprocessor.AcceptRevisitProcessor
 
accepts(Object) - Method in class org.archive.crawler.filter.FilePatternFilter
Deprecated.  
accepts(Object) - Method in class org.archive.crawler.framework.Filter
 
accepts(CrawlURI) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
 
accumulate(CrawlURI) - Method in class org.archive.crawler.util.CrawledBytesHistotable
 
accumulate(T) - Method in interface org.archive.util.Accumulator
 
accumulatingBuffer - Variable in class org.archive.crawler.io.CrawlerJournal
Allocate a buffer for accumulating lines to write and reuse it.
Accumulator<T> - Interface in org.archive.util
Parameterized interface for a stats-aggregating role.
acquireContinuePermission() - Method in class org.archive.crawler.framework.CrawlController
Proceed only if allowed, giving CrawlController a chance to enforce single-thread mode.
ACTION - Static variable in class org.archive.crawler.admin.ui.JobConfigureUtils
 
actions - Variable in class org.archive.crawler.extractor.CustomSWFTags
 
activeHosts() - Method in interface org.archive.crawler.framework.FrontierHostStatistics
Total number of hosts that are currently active.
activeThreadCount() - Method in class org.archive.crawler.admin.StatisticsTracker
 
activeThreadCount() - Method in interface org.archive.crawler.framework.StatisticsTracking
Get the number of active (non-paused) threads.
AdaptiveRevisitAttributeConstants - Interface in org.archive.crawler.frontier
Defines static constants for the Adaptive Revisiting module defining data keys in the CrawlURI AList.
AdaptiveRevisitFrontier - Class in org.archive.crawler.frontier
A Frontier that will repeatedly visit all encountered URIs.
AdaptiveRevisitFrontier(String) - Constructor for class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
AdaptiveRevisitFrontier(String, String) - Constructor for class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
AdaptiveRevisitHostQueue - Class in org.archive.crawler.frontier
A priority based queue of CrawlURIs.
AdaptiveRevisitHostQueue(String, Environment, StoredClassCatalog, int) - Constructor for class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Constructor
AdaptiveRevisitQueueList - Class in org.archive.crawler.frontier
Maintains an ordered list of AdaptiveRevisitHostQueues used by a Frontier.
AdaptiveRevisitQueueList(Environment, StoredClassCatalog) - Constructor for class org.archive.crawler.frontier.AdaptiveRevisitQueueList
 
add(String, CandidateURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Add given uri, if not already present.
add(SinkHandlerLogRecord) - Method in interface org.archive.crawler.framework.AlertManager
 
add(CrawlURI, boolean) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Add a CrawlURI to this host queue.
add(int, Double) - Method in class org.archive.crawler.settings.DoubleList
Add a new Double at the specified index to this list.
add(int, double) - Method in class org.archive.crawler.settings.DoubleList
Add a new double at the specified index to this list.
add(Double) - Method in class org.archive.crawler.settings.DoubleList
Add a new Double at the end of this list.
add(double) - Method in class org.archive.crawler.settings.DoubleList
Add a new double at the end of this list.
add(int, Float) - Method in class org.archive.crawler.settings.FloatList
Add a new Float at the specified index to this list.
add(int, float) - Method in class org.archive.crawler.settings.FloatList
Add a new float at the specified index to this list.
add(Float) - Method in class org.archive.crawler.settings.FloatList
Add a new Float at the end of this list.
add(float) - Method in class org.archive.crawler.settings.FloatList
Add a new float at the end of this list.
add(int, Integer) - Method in class org.archive.crawler.settings.IntegerList
Add a new Integer at the specified index to this list.
add(int, int) - Method in class org.archive.crawler.settings.IntegerList
Add a new int at the specified index to this list.
add(Integer) - Method in class org.archive.crawler.settings.IntegerList
Add a new Integer at the end of this list.
add(int) - Method in class org.archive.crawler.settings.IntegerList
Add a new int at the end of this list.
add(Object) - Method in class org.archive.crawler.settings.ListType
Appends the specified element to the end of this list.
add(int, Object) - Method in class org.archive.crawler.settings.ListType
Inserts the specified element at the specified position in this list.
add(int, Long) - Method in class org.archive.crawler.settings.LongList
Add a new Long at the specified index to this list.
add(int, long) - Method in class org.archive.crawler.settings.LongList
Add a new long at the specified index to this list.
add(Long) - Method in class org.archive.crawler.settings.LongList
Add a new Long at the end of this list.
add(long) - Method in class org.archive.crawler.settings.LongList
Add a new long at the end of this list.
add(int, String) - Method in class org.archive.crawler.settings.StringList
Add a new String at the specified index to this list.
add(String) - Method in class org.archive.crawler.settings.StringList
Add a new String at the end of this list.
add(String, CandidateURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
add(String, CandidateURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
add(long) - Method in class org.archive.util.AbstractLongFPSet
Add the given value to this set
add(CharSequence) - Method in interface org.archive.util.BloomFilter
Adds a character sequence to the filter.
add(CharSequence) - Method in class org.archive.util.BloomFilter64bit
Adds a character sequence to the filter.
add(long) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
add(long) - Method in interface org.archive.util.fingerprint.LongFPSet
Add a fingerprint to the set.
add(Histotable<K>) - Method in class org.archive.util.Histotable
 
add(Iterator) - Method in class org.archive.util.iterator.CompositeIterator
Add an iterator to the internal chain.
add(String) - Method in class org.archive.util.PrefixSet
Maintains additional invariant: if one entry is a prefix of another, keep only the prefix.
add(int, E) - Method in class org.archive.util.SubList
 
ADD_CRAWL_JOB_BASEDON_OPER - Static variable in class org.archive.crawler.Heritrix
 
ADD_CRAWL_JOB_OPER - Static variable in class org.archive.crawler.Heritrix
 
addAlistPersistentMember(Object) - Static method in class org.archive.crawler.datamodel.CrawlURI
Add the key of alist items you want to persist across processings.
addAll(DoubleList) - Method in class org.archive.crawler.settings.DoubleList
Appends all of the elements in the specified list to the end of this list, in the order that they are returned by the specified lists's iterator.
addAll(Double[]) - Method in class org.archive.crawler.settings.DoubleList
Appends all of the elements in the specified array to the end of this list, in the same order that they are in the array.
addAll(double[]) - Method in class org.archive.crawler.settings.DoubleList
Appends all of the elements in the specified array to the end of this list, in the same order that they are in the array.
addAll(FloatList) - Method in class org.archive.crawler.settings.FloatList
Appends all of the elements in the specified list to the end of this list, in the order that they are returned by the specified lists's iterator.
addAll(Float[]) - Method in class org.archive.crawler.settings.FloatList
Appends all of the elements in the specified array to the end of this list, in the same order that they are in the array.
addAll(float[]) - Method in class org.archive.crawler.settings.FloatList
Appends all of the elements in the specified array to the end of this list, in the same order that they are in the array.
addAll(IntegerList) - Method in class org.archive.crawler.settings.IntegerList
Appends all of the elements in the specified list to the end of this list, in the order that they are returned by the specified lists's iterator.
addAll(Integer[]) - Method in class org.archive.crawler.settings.IntegerList
Appends all of the elements in the specified array to the end of this list, in the same order that they are in the array.
addAll(int[]) - Method in class org.archive.crawler.settings.IntegerList
Appends all of the elements in the specified array to the end of this list, in the same order that they are in the array.
addAll(ListType<T>) - Method in class org.archive.crawler.settings.ListType
Appends all of the elements in the specified list to the end of this list, in the order that they are returned by the specified lists's iterator.
addAll(Collection<? extends Object>) - Method in class org.archive.crawler.settings.ListType
 
addAll(int, Collection<? extends Object>) - Method in class org.archive.crawler.settings.ListType
 
addAll(LongList) - Method in class org.archive.crawler.settings.LongList
Appends all of the elements in the specified list to the end of this list, in the order that they are returned by the specified lists's iterator.
addAll(Long[]) - Method in class org.archive.crawler.settings.LongList
Appends all of the elements in the specified array to the end of this list, in the same order that they are in the array.
addAll(long[]) - Method in class org.archive.crawler.settings.LongList
Appends all of the elements in the specified array to the end of this list, in the same order that they are in the array.
addAll(StringList) - Method in class org.archive.crawler.settings.StringList
Appends all of the elements in the specified list to the end of this list, in the order that they are returned by the specified lists's iterator.
addAll(String[]) - Method in class org.archive.crawler.settings.StringList
Appends all of the elements in the specified array to the end of this list, in the same order that they are in the array.
addAllow(String) - Method in class org.archive.crawler.datamodel.RobotsDirectives
 
addAnnotation(String) - Method in class org.archive.crawler.datamodel.CrawlURI
Add an annotation: an abbrieviated indication of something special about this URI that need not be present in every crawl.log line, but should be noted for future reference.
addBdbjeAttributes(List<OpenMBeanAttributeInfo>, List<MBeanAttributeInfo>, List<String>) - Method in class org.archive.crawler.admin.CrawlJob
 
addBdbjeOperations(List<OpenMBeanOperationInfo>, List<MBeanOperationInfo>, List<String>) - Method in class org.archive.crawler.admin.CrawlJob
 
addCap(byte[]) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Add a dummy 'cap' entry at the given insertion key.
addComplexType(ComplexType) - Method in class org.archive.crawler.settings.CrawlerSettings
 
addConstraint(Constraint) - Method in class org.archive.crawler.settings.MapType
 
addConstraint(Constraint) - Method in class org.archive.crawler.settings.Type
Add a constraint to this type.
addCrawlJob(String, String, String, String) - Method in class org.archive.crawler.Heritrix
This method is called when we have an order file to hand that we want to base a job on.
addCrawlJob(URL, HttpURLConnection, String, String, String) - Method in class org.archive.crawler.Heritrix
 
addCrawlJob(File, String, String, String) - Method in class org.archive.crawler.Heritrix
 
addCrawlJob(CrawlJob) - Method in class org.archive.crawler.Heritrix
 
addCrawlJobBasedOn(String, String, String, String) - Method in class org.archive.crawler.Heritrix
 
addCrawlJobBasedOn(File, String, String, String) - Method in class org.archive.crawler.Heritrix
 
addCrawlJobBasedonJar(File, String, String, String) - Method in class org.archive.crawler.Heritrix
Undo jar file and use as basis for a new job.
addCrawlOrderAttributes(ComplexType, List<OpenMBeanAttributeInfo>) - Method in class org.archive.crawler.admin.CrawlJob
 
addCrawlStatusListener(CrawlStatusListener) - Method in class org.archive.crawler.framework.CrawlController
Register for CrawlStatus events.
addCrawlURIDispositionListener(CrawlURIDispositionListener) - Method in class org.archive.crawler.framework.CrawlController
Register for CrawlURIDisposition events.
addCredentialAvatar(CredentialAvatar) - Method in class org.archive.crawler.datamodel.CrawlServer
Add an avatar.
addCredentialAvatar(CredentialAvatar) - Method in class org.archive.crawler.datamodel.CrawlURI
Add an avatar.
addCriteria(Criteria) - Method in class org.archive.crawler.settings.refinements.Refinement
Add a new criterion to this refinement.
addDisallow(String) - Method in class org.archive.crawler.datamodel.RobotsDirectives
 
added(CandidateURI) - Method in interface org.archive.crawler.frontier.FrontierJournal
 
added(CandidateURI) - Method in class org.archive.crawler.frontier.RecoveryJournal
 
addedSeed(CandidateURI) - Method in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
 
addedSeed(CandidateURI) - Method in interface org.archive.crawler.scope.SeedListener
 
addElement(CrawlerSettings, Type) - Method in class org.archive.crawler.settings.ComplexType
 
addElement(CrawlerSettings, Type) - Method in class org.archive.crawler.settings.MapType
Add a new element to this map.
addElement(CrawlerSettings, Type) - Method in class org.archive.crawler.settings.ModuleType
 
addElementToDefinition(Type) - Method in class org.archive.crawler.settings.ComplexType
Add a new attribute to the definition of this ComplexType.
addElementType(Type, int) - Method in class org.archive.crawler.settings.DataContainer
Add a new element to the data container.
addElementType(Type) - Method in class org.archive.crawler.settings.DataContainer
Appends the specified element to the end of this data container.
addFilter(CrawlerSettings, Filter) - Method in class org.archive.crawler.filter.OrFilter
Deprecated.  
addForce(String, CandidateURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Add given uri, all the way through to underlying destination, even if already present.
addForce(String, CandidateURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
addForce(String, CandidateURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
addGuiPort(ObjectName) - Static method in class org.archive.crawler.Heritrix
 
addHeaderLink(CrawlURI, Header) - Method in class org.archive.crawler.extractor.ExtractorHTTP
 
addIfNotBlank(ANVLRecord, String, String) - Method in class org.archive.crawler.writer.WARCWriterProcessor
 
addImpliedHttpIfNecessary(String) - Static method in class org.archive.util.ArchiveUtils
Given a string that may be a plain host or host/path (without URI scheme), add an implied http:// if necessary.
addInProcessing(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Adds a CrawlURI to the list of CrawlURIs belonging to this HQ and are being processed at the moment.
additionalFocusAccepts(Object) - Method in class org.archive.crawler.scope.ClassicScope
Check if URI is accepted by the additional focus of this scope.
additionalFocusAccepts(Object) - Method in class org.archive.crawler.scope.DomainScope
Deprecated.  
additionalFocusAccepts(Object) - Method in class org.archive.crawler.scope.HostScope
Deprecated.  
additionalFocusAccepts(Object) - Method in class org.archive.crawler.scope.PathScope
Deprecated.  
additionalFocusAccepts(Object) - Method in class org.archive.crawler.scope.RefinedScope
 
additionalFocusFilter - Variable in class org.archive.crawler.scope.DomainScope
Deprecated.  
additionalFocusFilter - Variable in class org.archive.crawler.scope.HostScope
Deprecated.  
additionalFocusFilter - Variable in class org.archive.crawler.scope.PathScope
Deprecated.  
additionalFocusFilter - Variable in class org.archive.crawler.scope.RefinedScope
 
addJob(CrawlJob) - Method in class org.archive.crawler.admin.CrawlJobHandler
Submit a job to the handler.
addLabel(String) - Method in class org.archive.util.anvl.ANVLRecord
 
addLabelValue(String, String) - Method in class org.archive.util.anvl.ANVLRecord
 
addLinkFromString(CrawlURI, CharSequence, CharSequence, char) - Method in class org.archive.crawler.extractor.ExtractorHTML
 
addLocalizedError(String, Throwable, String) - Method in class org.archive.crawler.datamodel.CrawlURI
Make note of a non-fatal error, local to a particular Processor, which should be logged somewhere, but allows processing to continue.
addNewFp(long) - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
addNewFp(long) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
Add an FP (which may be an old or new FP) to the new complete list.
addNewFp(long) - Method in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
addNow(String, CandidateURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Immediately add uri.
addNow(String, CandidateURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
addNow(String, CandidateURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
addOrderToManifest() - Method in class org.archive.crawler.framework.CrawlController
Add order file contents to manifest.
addOutLink(Link) - Method in class org.archive.crawler.datamodel.CrawlURI
Add a discovered Link, unless it would exceed the max number to accept.
addProcessorMap(String, MapType) - Method in class org.archive.crawler.framework.ProcessorChainList
Add a new chain of processors to the chain list.
addProfile(CrawlJob) - Method in class org.archive.crawler.admin.CrawlJobHandler
Add a new profile
addProxyConnectionHeader(HttpState, HttpConnection) - Method in class org.archive.httpclient.HttpRecorderGetMethod
 
addProxyConnectionHeader(HttpState, HttpConnection) - Method in class org.archive.httpclient.HttpRecorderPostMethod
 
AddRedirectFromRootServerToScope - Class in org.archive.crawler.deciderules
 
AddRedirectFromRootServerToScope(String) - Constructor for class org.archive.crawler.deciderules.AddRedirectFromRootServerToScope
 
addRefinement(Refinement) - Method in class org.archive.crawler.settings.CrawlerSettings
Add a refinement to this settings object.
addResponseContent(HttpMethod, CrawlURI) - Method in class org.archive.crawler.fetcher.FetchHTTP
This method populates curi with response status and content type.
ADDRESS_BITS_PER_UNIT - Static variable in class org.archive.util.BloomFilter64bit
 
addSeed(CandidateURI) - Method in class org.archive.crawler.framework.CrawlScope
Add a new seed to scope.
addSeed(CrawlURI) - Method in class org.archive.crawler.scope.SeedCachingScope
 
addSeedListener(SeedListener) - Method in class org.archive.crawler.framework.CrawlScope
 
addStats(Map<String, Map<String, Long>>) - Method in class org.archive.crawler.writer.WARCWriterProcessor
 
addToManifest(String, char, boolean) - Method in class org.archive.crawler.framework.CrawlController
Add a file to the manifest of files used/generated by the current crawl.
addToPath(MirrorWriterProcessor.URIToFileReturn) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.DirSegment
 
addToPath(MirrorWriterProcessor.URIToFileReturn) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.EndSegment
 
addToPath(MirrorWriterProcessor.URIToFileReturn) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment
Adds this segment to a file path.
addTopLevelModule(ModuleType) - Method in class org.archive.crawler.settings.CrawlerSettings
 
addUserAgent(HttpURLConnection) - Method in class org.archive.io.ArchiveReaderFactory
 
addVitals(ObjectName) - Static method in class org.archive.crawler.Heritrix
Add vital stats to passed in ObjectName.
addWebapp(String, String, boolean) - Method in class org.archive.crawler.SimpleHttpServer
Add a webapp.
ADMIN - Static variable in class org.archive.crawler.Heritrix
Web UI server, realm, context name.
agentsToDirectives - Variable in class org.archive.crawler.datamodel.Robotstxt
 
AggressiveExtractorHTML - Class in org.archive.crawler.extractor
Extended version of ExtractorHTML with more aggressive javascript link extraction where javascript code is parsed first with general HTML tags regexp, and than by javascript speculative link regexp.
AggressiveExtractorHTML(String) - Constructor for class org.archive.crawler.extractor.AggressiveExtractorHTML
 
ALERT_OPER - Static variable in class org.archive.crawler.Heritrix
 
ALERTCOUNT_ATTR - Static variable in class org.archive.crawler.Heritrix
 
AlertManager - Interface in org.archive.crawler.framework
Manager for application alerts.
ALL - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
ALL - Static variable in class org.archive.crawler.filter.FilePatternFilter
Deprecated.  
ALL_DEFAULT_PATTERNS - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
ALL_DEFAULT_PATTERNS - Static variable in class org.archive.crawler.filter.FilePatternFilter
Deprecated.  
ALL_NONEMPTY - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
ALL_QUEUES - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
allFps - Variable in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
ALLOWALL - Static variable in class org.archive.crawler.datamodel.RobotsExclusionPolicy
 
ALLOWED_TYPES - Static variable in class org.archive.crawler.deciderules.ConfiguredDecideRule
 
ALLOWED_TYPES - Static variable in class org.archive.crawler.deciderules.FilterDecideRule
 
allows - Variable in class org.archive.crawler.datamodel.RobotsDirectives
 
allows(String) - Method in class org.archive.crawler.datamodel.RobotsDirectives
 
allowsAll() - Method in class org.archive.crawler.datamodel.Robotstxt
Does this policy effectively allow everything? (No disallows or timing (crawl-delay) directives?)
allQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
All known queues.
AllSelfTestCases - Class in org.archive.crawler.selftest
All registered heritrix selftests.
AllSelfTestCases() - Constructor for class org.archive.crawler.selftest.AllSelfTestCases
 
alreadyIncluded - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
those UURIs which are already in-process (or processed), and thus should not be rescheduled
alreadySeen - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
altPrefix - Variable in class org.archive.crawler.selftest.AltTestSuite
a method prefix other than 'test' that is also recognized as tests
AltTestSuite - Class in org.archive.crawler.selftest
Variant TestSuite that can build tests including methods with an alternate prefix (other than 'test').
AltTestSuite(Class, String) - Constructor for class org.archive.crawler.selftest.AltTestSuite
Constructs a TestSuite from the given class.
AltTestSuite() - Constructor for class org.archive.crawler.selftest.AltTestSuite
 
AMP - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
AMP - Static variable in class org.archive.extractor.RegexpJSLinkExtractor
 
ANNOTATION_UNWRITTEN - Static variable in class org.archive.crawler.framework.WriterPoolProcessor
CrawlURI annotation indicating no record was written
annotationContains(String) - Method in class org.archive.crawler.datamodel.CrawlURI
 
AntiCalendarCostAssignmentPolicy - Class in org.archive.crawler.frontier
CostAssignmentPolicy that further penalizes URIs with calendar-suggestive strings in them, with an extra unit of cost.
AntiCalendarCostAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.AntiCalendarCostAssignmentPolicy
 
ANVLRecord - Class in org.archive.util.anvl
An ordered List with 'data' Element values.
ANVLRecord() - Constructor for class org.archive.util.anvl.ANVLRecord
 
ANVLRecord(Collection<? extends Element>) - Constructor for class org.archive.util.anvl.ANVLRecord
 
ANVLRecord(int) - Constructor for class org.archive.util.anvl.ANVLRecord
 
ANVLRecords - Class in org.archive.util.anvl
List of ANVLRecords.
ANVLRecords() - Constructor for class org.archive.util.anvl.ANVLRecords
 
ANVLRecords(int) - Constructor for class org.archive.util.anvl.ANVLRecords
 
ANVLRecords(Collection<ANVLRecord>) - Constructor for class org.archive.util.anvl.ANVLRecords
 
APOSTROPH - Static variable in class org.archive.net.UURIFactory
 
append(String) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.LumpyString
Appends one lump to the end of this string.
append(File, String) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.URIToFileReturn
Appends one more segment to this path.
append(String) - Method in class org.archive.util.PaddingStringBuffer
append a string directly to the buffer
append(int) - Method in class org.archive.util.PaddingStringBuffer
append an int to the buffer.
append(long) - Method in class org.archive.util.PaddingStringBuffer
append a long to the buffer.
append(StringBuffer, CharSequence, int, int) - Static method in class org.archive.util.PreJ15Utils
Deprecated. Version of 1.5's StringBuffer.append(CharSequence s, int start, int finish)
appendField(StringBuilder, Object) - Static method in class org.archive.io.arc.ARC2WCDX
 
appendQueueReports(PrintWriter, Iterator<?>, int, int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Append queue report to general Frontier report.
appendTimeField(StringBuilder, Object) - Static method in class org.archive.io.arc.ARC2WCDX
 
APPLET - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
APPLET - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
applyQuota(CrawlURI, String, long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
Apply the quota specified by the given key against the actual value provided.
applySpecialHandling(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Perform any special handling of the CrawlURI, such as promoting its URI to seed-status, or preferencing it because it is an embed.
Arc2Warc - Class in org.archive.io
Convert ARCs to (sortof) WARCs.
Arc2Warc() - Constructor for class org.archive.io.Arc2Warc
 
ARC2WCDX - Class in org.archive.io.arc
Create a 'Wide' CDX from an ARC.
ARC2WCDX() - Constructor for class org.archive.io.arc.ARC2WCDX
 
ARC_FILE_EXTENSION - Static variable in interface org.archive.io.arc.ARCConstants
ARC file extention.
ARC_GZIP_EXTRA_FIELD - Static variable in interface org.archive.io.arc.ARCConstants
The FLG.FEXTRA field that is added to ARC files.
ARC_MAGIC_NUMBER - Static variable in interface org.archive.io.arc.ARCConstants
ARC file *MAGIC NUMBER*.
ARCConstants - Interface in org.archive.io.arc
Constants used by ARC files and in ARC file processing.
ARCHIVE_PACKAGE - Static variable in class org.archive.crawler.Heritrix
The org.archive package
ARCHIVE_TIME_KEY - Static variable in interface org.archive.crawler.writer.Kw3Constants
 
ArchiveFileConstants - Interface in org.archive.io
Constants used by Archive files and in Archive file processing.
ArchiveReader - Class in org.archive.io
Reader for an Archive file of Archive ArchiveRecords.
ArchiveReader() - Constructor for class org.archive.io.ArchiveReader
 
ArchiveReader.ArchiveRecordIterator - Class in org.archive.io
Inner ArchiveRecord Iterator class.
ArchiveReader.ArchiveRecordIterator() - Constructor for class org.archive.io.ArchiveReader.ArchiveRecordIterator
 
ArchiveReader.RandomAccessBufferedInputStream - Class in org.archive.io
Add buffering to RandomAccessInputStream.
ArchiveReader.RandomAccessBufferedInputStream(RandomAccessInputStream) - Constructor for class org.archive.io.ArchiveReader.RandomAccessBufferedInputStream
 
ArchiveReader.RandomAccessBufferedInputStream(RandomAccessInputStream, int) - Constructor for class org.archive.io.ArchiveReader.RandomAccessBufferedInputStream
 
ArchiveReaderFactory - Class in org.archive.io
Factory that returns an Archive file Reader.
ArchiveReaderFactory() - Constructor for class org.archive.io.ArchiveReaderFactory
Shutdown any public access to default constructor.
ArchiveRecord - Class in org.archive.io
Archive file Record.
ArchiveRecord(InputStream) - Constructor for class org.archive.io.ArchiveRecord
Constructor.
ArchiveRecord(InputStream, ArchiveRecordHeader) - Constructor for class org.archive.io.ArchiveRecord
Constructor.
ArchiveRecord(InputStream, ArchiveRecordHeader, int, boolean, boolean) - Constructor for class org.archive.io.ArchiveRecord
Constructor.
ArchiveRecordHeader - Interface in org.archive.io
Archive Record Header.
ArchiveUtils - Class in org.archive.util
Miscellaneous useful methods.
ArchiveUtils() - Constructor for class org.archive.util.ArchiveUtils
 
ARCLocation - Interface in org.archive.io.arc
Datastructure to hold ARC record location.
ARCReader - Class in org.archive.io.arc
Get an iterator on an ARC file or get a record by absolute position.
ARCReader() - Constructor for class org.archive.io.arc.ARCReader
 
ARCReaderFactory - Class in org.archive.io.arc
Factory that returns an ARCReader.
ARCReaderFactory() - Constructor for class org.archive.io.arc.ARCReaderFactory
Shutdown any access to default constructor.
ARCReaderFactory.CompressedARCReader - Class in org.archive.io.arc
Compressed arc file reader.
ARCReaderFactory.CompressedARCReader(File) - Constructor for class org.archive.io.arc.ARCReaderFactory.CompressedARCReader
Constructor.
ARCReaderFactory.CompressedARCReader(File, long) - Constructor for class org.archive.io.arc.ARCReaderFactory.CompressedARCReader
Constructor.
ARCReaderFactory.CompressedARCReader(String, InputStream, boolean) - Constructor for class org.archive.io.arc.ARCReaderFactory.CompressedARCReader
Constructor.
ARCReaderFactory.UncompressedARCReader - Class in org.archive.io.arc
Uncompressed arc file reader.
ARCReaderFactory.UncompressedARCReader(File) - Constructor for class org.archive.io.arc.ARCReaderFactory.UncompressedARCReader
Constructor.
ARCReaderFactory.UncompressedARCReader(File, long) - Constructor for class org.archive.io.arc.ARCReaderFactory.UncompressedARCReader
Constructor.
ARCReaderFactory.UncompressedARCReader(String, InputStream) - Constructor for class org.archive.io.arc.ARCReaderFactory.UncompressedARCReader
Constructor.
ARCRecord - Class in org.archive.io.arc
An ARC file record.
ARCRecord(InputStream, ArchiveRecordHeader) - Constructor for class org.archive.io.arc.ARCRecord
Constructor.
ARCRecord(InputStream, ArchiveRecordHeader, int, boolean, boolean, boolean) - Constructor for class org.archive.io.arc.ARCRecord
Constructor.
ARCRecordMetaData - Class in org.archive.io.arc
An immutable class to hold an ARC record meta data.
ARCRecordMetaData() - Constructor for class org.archive.io.arc.ARCRecordMetaData
Shut down the default constructor.
ARCRecordMetaData(String, Map) - Constructor for class org.archive.io.arc.ARCRecordMetaData
Constructor.
ARCUtils - Class in org.archive.io.arc
 
ARCUtils() - Constructor for class org.archive.io.arc.ARCUtils
 
ARCWriter - Class in org.archive.io.arc
Write ARC files.
ARCWriter(AtomicInteger, PrintStream, File, boolean, String, List) - Constructor for class org.archive.io.arc.ARCWriter
Constructor.
ARCWriter(AtomicInteger, List<File>, String, boolean, long) - Constructor for class org.archive.io.arc.ARCWriter
Constructor.
ARCWriter(AtomicInteger, List<File>, String, String, boolean, long, List) - Constructor for class org.archive.io.arc.ARCWriter
Constructor.
ARCWriterPool - Class in org.archive.io.arc
A pool of ARCWriters.
ARCWriterPool(WriterPoolSettings, int, int) - Constructor for class org.archive.io.arc.ARCWriterPool
Constructor
ARCWriterPool(AtomicInteger, WriterPoolSettings, int, int) - Constructor for class org.archive.io.arc.ARCWriterPool
Constructor
ARCWriterProcessor - Class in org.archive.crawler.writer
Processor module for writing the results of successful fetches (and perhaps someday, certain kinds of network failures) to the Internet Archive ARC file format.
ARCWriterProcessor(String) - Constructor for class org.archive.crawler.writer.ARCWriterProcessor
 
ArrayLongFPCache - Class in org.archive.util.fingerprint
Simple long fingerprint cache using a backing array; any long maps to one of 'smear' slots.
ArrayLongFPCache() - Constructor for class org.archive.util.fingerprint.ArrayLongFPCache
 
ArraySeekInputStream - Class in org.archive.io
A repositionable stream backed by an array.
ArraySeekInputStream(byte[]) - Constructor for class org.archive.io.ArraySeekInputStream
Constructor.
asCrawlUri(CandidateURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
asCrawlUri(CandidateURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
asMap() - Method in class org.archive.util.anvl.ANVLRecord
 
asRepositionable(InputStream) - Method in class org.archive.io.ArchiveReaderFactory
 
assertExists(File) - Static method in class org.archive.crawler.selftest.SelfTestCase
Test nonull and exits.
assertInitialized() - Method in class org.archive.crawler.selftest.SelfTestCase
 
assertNonEmpty(String) - Static method in class org.archive.crawler.selftest.SelfTestCase
Test non null and not empty.
asStringBuffer() - Method in class org.archive.crawler.writer.MirrorWriterProcessor.LumpyString
Returns the string as a StringBuffer.
atFinish() - Method in class org.archive.crawler.framework.CrawlController
Evaluate if the crawl should stop because it is finished, without actually stopping the crawl.
ATT_CACHE_PERCENT - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_CACHE_SIZE - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_ENV_HOME - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_IS_READ_ONLY - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_IS_SERIALIZABLE - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_IS_TRANSACTIONAL - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_LOCK_TIMEOUT - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_OPEN - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_SET_READ_ONLY - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_SET_SERIALIZABLE - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_SET_TRANSACTIONAL - Static variable in class org.archive.util.JEMBeanHelper
 
ATT_TXN_TIMEOUT - Static variable in class org.archive.util.JEMBeanHelper
 
attach(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
Attach this credentials avatar to the passed curi .
attach(CrawlURI, String) - Method in class org.archive.crawler.datamodel.credential.Credential
Attach this credentials avatar to the passed curi .
ATTR_ACCEPT_HEADERS - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_ADDITIONAL_FOCUS_FILTER - Static variable in class org.archive.crawler.scope.DomainScope
Deprecated.  
ATTR_ADDITIONAL_FOCUS_FILTER - Static variable in class org.archive.crawler.scope.HostScope
Deprecated.  
ATTR_ADDITIONAL_FOCUS_FILTER - Static variable in class org.archive.crawler.scope.PathScope
Deprecated.  
ATTR_ADDITIONAL_FOCUS_FILTER - Static variable in class org.archive.crawler.scope.RefinedScope
 
ATTR_ALLOW_BY_REGEXP - Static variable in class org.archive.crawler.prefetch.Preselector
indicator allowing all matching URIs
ATTR_ALSO_CHECK_VIA - Static variable in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
Whether the 'via' of CrawlURIs should also be checked to see if it is prefixed by the set of SURT prefixes
ATTR_ALSO_CHECK_VIA - Static variable in class org.archive.crawler.scope.SurtPrefixScope
Deprecated. Whether the 'via' of CrawlURIs should also be checked to see if it is prefixed by the set of SURT prefixes
ATTR_AVAILABLE_MODES - Static variable in class org.archive.crawler.frontier.DomainSensitiveFrontier
Deprecated.  
ATTR_BALANCE_REPLENISH_AMOUNT - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
amount to replenish budget on each activation (duty cycle)
ATTR_BANDWIDTH - Static variable in class org.archive.crawler.fetcher.FetchFTP
The name for the fetch-bandwidth attribute.
ATTR_BDB_CACHE_PERCENT - Static variable in class org.archive.crawler.datamodel.CrawlOrder
Percentage of heap to allocate to bdb cache
ATTR_BDB_COOKIES - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_BLOCK_ALL - Static variable in class org.archive.crawler.prefetch.Preselector
indicator allowing all URIs (of a given host, typically) to be blocked at this step
ATTR_BLOCK_BY_REGEXP - Static variable in class org.archive.crawler.prefetch.Preselector
indicator allowing all matching URIs to be blocked at this step
ATTR_BUILD_PATTERN - Static variable in class org.archive.crawler.extractor.ExtractorImpliedURI
replacement pattern used to build 'implied' URI
ATTR_CALCULATE_ROBOTS_ONLY - Static variable in class org.archive.crawler.prefetch.PreconditionEnforcer
 
ATTR_CASE_SENSITIVE - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for case sensitive option.
ATTR_CHANGED_FACTOR - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
Factor decrease on wait when changed
ATTR_CHAR_MAP - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for character map.
ATTR_CHECK_OUTLINKS - Static variable in class org.archive.crawler.processor.CrawlMapper
whether to map CrawlURI's outlinks (if CandidateURIs)
ATTR_CHECK_URI - Static variable in class org.archive.crawler.processor.CrawlMapper
whether to map CrawlURI itself (if status nonpositive)
ATTR_CHECKPOINT_COPY_BDBJE_LOGS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
When checkpointing, copy the bdb logs.
ATTR_CHECKPOINTS_PATH - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_CHMOD - Static variable in class org.archive.crawler.writer.Kw3WriterProcessor
Key to use asking settings if chmod should be execuated .
ATTR_CHMOD_VALUE - Static variable in class org.archive.crawler.writer.Kw3WriterProcessor
Key to use asking settings for the new chmod value.
ATTR_COLLECTION - Static variable in class org.archive.crawler.writer.Kw3WriterProcessor
Key for the collection attribute.
ATTR_COMPRESS - Static variable in class org.archive.crawler.framework.WriterPoolProcessor
Key to use asking settings for file compression value.
ATTR_CONTENT_LENGTH_TRESHOLD - Static variable in class org.archive.crawler.deciderules.NotExceedsDocumentLengthTresholdDecideRule
 
ATTR_CONTENT_REGEXPR - Static variable in class org.archive.crawler.postprocessor.ContentBasedWaitEvaluator
The regular expression that we limit this evaluator to.
ATTR_CONTENT_TYPE_MAP - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for content type map.
ATTR_COST_POLICY - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
cost assignment policy to use (by class name)
ATTR_COUNTER_MODE - Static variable in class org.archive.crawler.frontier.DomainSensitiveFrontier
Deprecated.  
ATTR_COUNTRY_CODE - Static variable in class org.archive.crawler.deciderules.ExternalGeoLocationDecideRule
 
ATTR_CRAWLER_COUNT - Static variable in class org.archive.crawler.processor.HashCrawlMapper
count of crawlers
ATTR_CREDENTIALS - Static variable in class org.archive.crawler.datamodel.CredentialStore
Name of the contained credentials map type.
ATTR_CUSTOM_ROBOTS - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
ATTR_DECIDE_RULES - Static variable in class org.archive.crawler.deciderules.DecidingFilter
 
ATTR_DECIDE_RULES - Static variable in class org.archive.crawler.deciderules.DecidingScope
 
ATTR_DECIDE_RULES - Static variable in class org.archive.crawler.framework.Processor
Key to use asking settings for decide-rules value.
ATTR_DECISION - Static variable in class org.archive.crawler.deciderules.ConfiguredDecideRule
 
ATTR_DEFAULT_ENCODING - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_DEFAULT_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
Fixed wait time for 'unknown' change status.
ATTR_DELAY_FACTOR - Static variable in class org.archive.crawler.frontier.AbstractFrontier
how many multiples of last fetch elapsed time to wait before recontacting same server
ATTR_DELAY_FACTOR - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
How many multiples of last fetch elapsed time to wait before recontacting same server
ATTR_DIGEST_ALGORITHM - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_DIGEST_CONTENT - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_DIRECTORY_FILE - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for directory file.
ATTR_DISK_PATH - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_DIVERSION_DIR - Static variable in class org.archive.crawler.processor.CrawlMapper
where to log diversions
ATTR_DOT_BEGIN - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for dot begin replacement.
ATTR_DOT_END - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for dot end replacement.
ATTR_DUMP_PENDING_AT_CLOSE - Static variable in class org.archive.crawler.frontier.BdbFrontier
URI-already-included to use (by class name)
ATTR_ENABLED - Static variable in class org.archive.crawler.framework.Filter
 
ATTR_ENABLED - Static variable in class org.archive.crawler.framework.Processor
Key to use asking settings for enabled value.
ATTR_ENABLED - Static variable in class org.archive.crawler.url.canonicalize.BaseRule
 
ATTR_END_OPERATION - Static variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
ATTR_ERROR_PENALTY_AMOUNT - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
whether to hold queues INACTIVE until needed for throughput
ATTR_EXCLUDE_FILTER - Static variable in class org.archive.crawler.scope.ClassicScope
 
ATTR_EXTRACT_JAVASCRIPT - Static variable in class org.archive.crawler.extractor.ExtractorHTML
whether to try finding links in Javscript; default true
ATTR_EXTRACT_ONLY_FORM_GETS - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
ATTR_EXTRACT_PROCESSORS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_FALSE_DECISION - Static variable in class org.archive.crawler.deciderules.FilterDecideRule
 
ATTR_FETCH_BANDWIDTH_MAX - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_FETCH_PROCESSORS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_FILTERS - Static variable in class org.archive.crawler.deciderules.FilterDecideRule
Filters setting
ATTR_FILTERS - Static variable in class org.archive.crawler.filter.OrFilter
Deprecated.  
ATTR_FORCE_QUEUE - Static variable in class org.archive.crawler.frontier.AbstractFrontier
queue assignment to force onto CrawlURIs; intended to be overridden
ATTR_FORCE_QUEUE - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Queue assignment to force on CrawlURIs.
ATTR_FORCE_RETIRE - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
whether to force-retire when over-quote detected
ATTR_FROM - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_GROUP_MAX_ALL_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
group max all fetch bytes (including error responses)
ATTR_GROUP_MAX_FETCH_RESPONSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
group max fetch responses (including error codes)
ATTR_GROUP_MAX_FETCH_SUCCESSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
group max successful fetches
ATTR_GROUP_MAX_SUCCESS_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
group max successful fetch bytes
ATTR_HARVESTER - Static variable in class org.archive.crawler.writer.Kw3WriterProcessor
Key for the harvester attribute.
ATTR_HISTORY_LENGTH - Static variable in class org.archive.crawler.processor.recrawl.FetchHistoryProcessor
setting for desired history array length
ATTR_HOLD_QUEUES - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
whether to hold queues INACTIVE until needed for throughput
ATTR_HOST_DIRECTORY - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for host directory option.
ATTR_HOST_MAP - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for host map.
ATTR_HOST_MAX_ALL_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
host max all fetch bytes (including error responses)
ATTR_HOST_MAX_FETCH_RESPONSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
host max fetch responses (including error codes)
ATTR_HOST_MAX_FETCH_SUCCESSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
host max successful fetches
ATTR_HOST_MAX_SUCCESS_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
host max successful fetch bytes
ATTR_HOST_VALENCE - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Maximum simultaneous requests in process to a host (queue)
ATTR_HTTP_BIND_ADDRESS - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_HTTP_HEADERS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_HTTP_PROXY_HOST - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_HTTP_PROXY_PORT - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_IGNORE_COOKIES - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_IGNORE_FORM_ACTION_URLS - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
ATTR_IGNORE_UNEXPECTED_HTML - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
ATTR_IMPLEMENTATION - Static variable in class org.archive.crawler.deciderules.ExternalGeoLocationDecideRule
 
ATTR_IMPLEMENTATION - Static variable in class org.archive.crawler.deciderules.ExternalImplDecideRule
 
ATTR_INCLUDED - Static variable in class org.archive.crawler.frontier.BdbFrontier
URI-already-included to use (by class name)
ATTR_INDEPENDENT_EXTRACTORS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_INITIAL_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
Default wait time after initial visit.
ATTR_IP_VALIDITY_DURATION - Static variable in class org.archive.crawler.prefetch.PreconditionEnforcer
seconds to keep IP information for
ATTR_ISOLATE_THREADS - Static variable in class org.archive.crawler.deciderules.BeanShellDecideRule
whether each thread should have its own script runner (true), or they should share a single script runner with synchronized access
ATTR_ISOLATE_THREADS - Static variable in class org.archive.crawler.processor.BeanShellProcessor
whether each thread should have its own script runner (true), or they should share a single script runner with synchronized access
ATTR_LINKS_DECIDE_RULES - Static variable in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
 
ATTR_LIST_LOGIC - Static variable in class org.archive.crawler.deciderules.MatchesListRegExpDecideRule
 
ATTR_LIST_LOGIC - Static variable in class org.archive.crawler.filter.URIListRegExpFilter
Deprecated.  
ATTR_LOAD_COOKIES - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_LOCAL_NAME - Static variable in class org.archive.crawler.processor.CrawlMapper
name of local crawler (URIs mapped to here are not diverted)
ATTR_LOG_FILENAME - Static variable in class org.archive.crawler.processor.recrawl.PersistLogProcessor
setting for log filename
ATTR_LOGGERS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_LOGS_PATH - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_MAP_OUTLINK_DECIDE_RULES - Static variable in class org.archive.crawler.processor.CrawlMapper
decide rules to determine if an outlink is subject to mapping
ATTR_MAP_SOURCE - Static variable in class org.archive.crawler.processor.LexicalCrawlMapper
where to load map from
ATTR_MASQUERADE - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
ATTR_MATCH_RETURN_VALUE - Static variable in class org.archive.crawler.filter.OrFilter
Deprecated.  
ATTR_MATCH_RETURN_VALUE - Static variable in class org.archive.crawler.filter.PathDepthFilter
Deprecated.  
ATTR_MATCH_RETURN_VALUE - Static variable in class org.archive.crawler.filter.SurtPrefixFilter
Deprecated.  
ATTR_MATCH_RETURN_VALUE - Static variable in class org.archive.crawler.filter.URIListRegExpFilter
Deprecated.  
ATTR_MATCH_RETURN_VALUE - Static variable in class org.archive.crawler.filter.URIRegExpFilter
Deprecated.  
ATTR_MAX_BYTES_DOWNLOAD - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_MAX_BYTES_WRITTEN - Static variable in class org.archive.crawler.framework.WriterPoolProcessor
Key for the maximum bytes to write attribute.
ATTR_MAX_BYTES_WRITTEN - Static variable in class org.archive.crawler.writer.Kw3WriterProcessor
Key for the maximum ARC bytes to write attribute.
ATTR_MAX_DELAY - Static variable in class org.archive.crawler.frontier.AbstractFrontier
never wait more than this long, regardless of multiple
ATTR_MAX_DELAY - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Never wait more than this long, regardless of multiple
ATTR_MAX_DOCS - Static variable in class org.archive.crawler.frontier.DomainSensitiveFrontier
Deprecated.  
ATTR_MAX_DOCUMENT_DOWNLOAD - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_MAX_HOST_BANDWIDTH_USAGE - Static variable in class org.archive.crawler.frontier.AbstractFrontier
maximum per-host bandwidth usage
ATTR_MAX_LENGTH - Static variable in class org.archive.crawler.fetcher.FetchFTP
The name for the max-length-bytes attribute.
ATTR_MAX_LENGTH_BYTES - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_MAX_LINK_HOPS - Static variable in class org.archive.crawler.scope.ClassicScope
 
ATTR_MAX_OVERALL_BANDWIDTH_USAGE - Static variable in class org.archive.crawler.frontier.AbstractFrontier
maximum overall bandwidth usage
ATTR_MAX_PATH_DEPTH - Static variable in class org.archive.crawler.deciderules.TooManyPathSegmentsDecideRule
 
ATTR_MAX_PATH_DEPTH - Static variable in class org.archive.crawler.filter.PathDepthFilter
Deprecated.  
ATTR_MAX_PATH_LEN - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for maximum file system path length.
ATTR_MAX_RETRIES - Static variable in class org.archive.crawler.frontier.AbstractFrontier
maximum times to emit a CrawlURI without final disposition
ATTR_MAX_RETRIES - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Maximum times to emit a CrawlURI without final disposition
ATTR_MAX_SEG_LEN - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for maximum file system path segment length.
ATTR_MAX_SIZE_BYTES - Static variable in class org.archive.crawler.extractor.HTTPContentDigest
Maximum file size for - longer files will be ignored.
ATTR_MAX_SIZE_BYTES - Static variable in class org.archive.crawler.framework.WriterPoolProcessor
Key to use asking settings for file max size value.
ATTR_MAX_SIZE_BYTES - Static variable in class org.archive.crawler.writer.Kw3WriterProcessor
Key to use asking settings for max size value.
ATTR_MAX_TIME_SEC - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_MAX_TOE_THREADS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_MAX_TRANS_HOPS - Static variable in class org.archive.crawler.scope.ClassicScope
 
ATTR_MAX_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
Maximum wait between visits
ATTR_MIDFETCH_DECIDE_RULES - Static variable in class org.archive.crawler.fetcher.FetchHTTP
Rules to apply mid-fetch, just after receipt of the response headers before we start to download body.
ATTR_MIN_DELAY - Static variable in class org.archive.crawler.frontier.AbstractFrontier
always wait this long after one completion before recontacting same server, regardless of multiple
ATTR_MIN_DELAY - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Always wait this long after one completion before recontacting same server, regardless of multiple
ATTR_MIN_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
Minimum wait between visits
ATTR_MONITOR_MOUNTS - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
List of mounts to monitor; should match "Mounted on" column of 'df' output
ATTR_NAME - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_NAME - Static variable in class org.archive.crawler.datamodel.CredentialStore
 
ATTR_NAME - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
ATTR_NAME - Static variable in class org.archive.crawler.framework.CrawlScope
 
ATTR_NAME - Static variable in interface org.archive.crawler.framework.Frontier
All URI Frontiers should have the same 'name' attribute.
ATTR_OVERRIDE_LOGGER_ENABLED - Static variable in class org.archive.crawler.framework.Scoper
Protected so avaiilable to subclasses.
ATTR_PASSWORD - Static variable in class org.archive.crawler.fetcher.FetchFTP
The name for the password attribute.
ATTR_PATH - Static variable in class org.archive.crawler.framework.WriterPoolProcessor
Key to use asking settings for arc path value.
ATTR_PATH - Static variable in class org.archive.crawler.writer.Kw3WriterProcessor
Key to use asking settings for arc path value.
ATTR_PATH - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for base directory path value.
ATTR_PAUSE_AT_FINISH - Static variable in class org.archive.crawler.frontier.AbstractFrontier
whether pause, rather than finish, when crawl appears done
ATTR_PAUSE_AT_START - Static variable in class org.archive.crawler.frontier.AbstractFrontier
whether to pause at crawl start
ATTR_PAUSE_THRESHOLD - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Space available level below which a crawl-pause should be triggered.
ATTR_POOL_MAX_ACTIVE - Static variable in class org.archive.crawler.framework.WriterPoolProcessor
Key to get maximum pool size.
ATTR_POOL_MAX_WAIT - Static variable in class org.archive.crawler.framework.WriterPoolProcessor
Key to get maximum wait on pool object before we give up and throw IOException.
ATTR_PORT_DIRECTORY - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for port directory option.
ATTR_POST_PROCESSORS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_PRE_FETCH_PROCESSORS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_PREFERENCE_DEPTH_HOPS - Static variable in class org.archive.crawler.postprocessor.LinksScoper
 
ATTR_PREFERENCE_EMBED_HOPS - Static variable in class org.archive.crawler.frontier.AbstractFrontier
number of hops of embeds (ERX) to bump to front of host queue
ATTR_PREFERENCE_EMBED_HOPS - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Number of hops of embeds (ERX) to bump to front of host queue
ATTR_PREFIX - Static variable in class org.archive.crawler.framework.WriterPoolProcessor
Key to use asking settings for file prefix value.
ATTR_PRELOAD_SOURCE - Static variable in class org.archive.crawler.processor.recrawl.PersistLoadProcessor
file (log) or directory (state/env) from which to preload history
ATTR_QUEUE_ASSIGNMENT_POLICY - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
ATTR_QUEUE_ASSIGNMENT_POLICY - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
The Class to use for QueueAssignmentPolicy
ATTR_QUEUE_IGNORE_WWW - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Should the queue assignment ignore www in hostnames, effectively stripping them away.
ATTR_QUEUE_TOTAL_BUDGET - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
total expenditure to allow a queue before 'retiring' it
ATTR_REBUILD_ON_RECONFIG - Static variable in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
Whether every config change should trigger a rebuilding of the prefix set.
ATTR_RECHECK_SCOPE - Static variable in class org.archive.crawler.prefetch.Preselector
whether to reapply crawl scope at this step
ATTR_RECHECK_THRESHOLD - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Amount of content received between each recheck of free space
ATTR_RECORDER_IN_BUFFER - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_RECORDER_OUT_BUFFER - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_RECOVER_PATH - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_RECOVER_RETAIN_FAILURES - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_RECOVER_SCOPE_ENQUEUES - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_RECOVER_SCOPE_INCLUDES - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_RECOVERY_ENABLED - Static variable in class org.archive.crawler.frontier.AbstractFrontier
Recover log on or off attribute.
ATTR_REDUCE_PATTERN - Static variable in class org.archive.crawler.processor.HashCrawlMapper
regex pattern for reducing classKey
ATTR_REGEXP - Static variable in class org.archive.crawler.deciderules.FetchStatusMatchesRegExpDecideRule
 
ATTR_REGEXP - Static variable in class org.archive.crawler.deciderules.HopsPathMatchesRegExpDecideRule
 
ATTR_REGEXP - Static variable in class org.archive.crawler.deciderules.MatchesRegExpDecideRule
 
ATTR_REGEXP - Static variable in class org.archive.crawler.filter.URIRegExpFilter
Deprecated.  
ATTR_REGEXP_LIST - Static variable in class org.archive.crawler.deciderules.MatchesListRegExpDecideRule
 
ATTR_REGEXP_LIST - Static variable in class org.archive.crawler.filter.URIListRegExpFilter
Deprecated.  
ATTR_REJECTLOG_DECIDE_RULES - Static variable in class org.archive.crawler.postprocessor.LinksScoper
 
ATTR_REMOVE_TRIGGER_URIS - Static variable in class org.archive.crawler.extractor.ExtractorImpliedURI
whether to remove URIs that trigger addition of 'implied' URI; default false
ATTR_REPETITIONS - Static variable in class org.archive.crawler.deciderules.PathologicalPathDecideRule
 
ATTR_REPETITIONS - Static variable in class org.archive.crawler.filter.PathologicalPathFilter
Deprecated.  
ATTR_REREAD_SEEDS_ON_CONFIG - Static variable in class org.archive.crawler.framework.CrawlScope
Whether every configu change should trigger a rereading of the original seeds spec/file.
ATTR_RESPECT_CRAWL_DELAY_UP_TO_SECS - Static variable in class org.archive.crawler.frontier.AbstractFrontier
Whether to respect a 'Crawl-Delay' (in seconds) given in a site's robots.txt
ATTR_RETRY_DELAY - Static variable in class org.archive.crawler.frontier.AbstractFrontier
for retryable problems, seconds to wait before a retry
ATTR_RETRY_DELAY - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
For retryable problems, seconds to wait before a retry
ATTR_ROBOTS_VALIDITY_DURATION - Static variable in class org.archive.crawler.prefetch.PreconditionEnforcer
seconds to cache robots info
ATTR_ROTATION_DIGITS - Static variable in class org.archive.crawler.processor.CrawlMapper
rotate logs when change occurs within this # of digits of timestamp
ATTR_RULES - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_RULES - Static variable in class org.archive.crawler.deciderules.DecideRuleSequence
 
ATTR_RUNTIME_SECONDS - Static variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
ATTR_SAVE_COOKIES - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_SCOPE - Static variable in class org.archive.crawler.deciderules.ScopePlusOneDecideRule
 
ATTR_SCRATCH_PATH - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_SCRIPT_FILE - Static variable in class org.archive.crawler.deciderules.BeanShellDecideRule
setting for script file
ATTR_SCRIPT_FILE - Static variable in class org.archive.crawler.processor.BeanShellProcessor
setting for script file
ATTR_SEEDS - Static variable in class org.archive.crawler.framework.CrawlScope
 
ATTR_SEEDS_AS_SURT_PREFIXES - Static variable in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
 
ATTR_SEEDS_AS_SURT_PREFIXES - Static variable in class org.archive.crawler.scope.SurtPrefixScope
Deprecated.  
ATTR_SEND_CONNECTION_CLOSE - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_SEND_IF_MODIFIED_SINCE - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_SEND_IF_NONE_MATCH - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_SEND_RANGE - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_SEND_REFERER - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_SERVER_MAX_ALL_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
server max all fetch bytes (including error responses)
ATTR_SERVER_MAX_FETCH_RESPONSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
server max fetch responses (including error codes)
ATTR_SERVER_MAX_FETCH_SUCCESSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
server max successful fetches
ATTR_SERVER_MAX_SUCCESS_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
server max successful fetch bytes
ATTR_SETTINGS_DIRECTORY - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_SKIP_IDENTICAL_DIGESTS - Static variable in class org.archive.crawler.framework.WriterPoolProcessor
Key for whether to skip writing records of content-digest repeats
ATTR_SNOOZE_DEACTIVATE_MS - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
When a snooze target for a queue is longer than this amount, and there are already ready queues, deactivate rather than snooze the current queue -- so other more responsive sites get a chance in active rotation.
ATTR_SOTIMEOUT_MS - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_SOURCE_TAG_SEEDS - Static variable in class org.archive.crawler.frontier.AbstractFrontier
whether to pause at crawl start
ATTR_STATE_PATH - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_STATS_INTERVAL - Static variable in class org.archive.crawler.framework.AbstractTracker
Attribute name for logging interval in seconds setting
ATTR_STRIP_REG_EXPR - Static variable in class org.archive.crawler.extractor.HTTPContentDigest
A regular expression detailing elements to strip before making digest
ATTR_SUFFIX - Static variable in class org.archive.crawler.framework.WriterPoolProcessor
Key to use asking settings for file suffix value.
ATTR_SUFFIX_AT_END - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for suffix at end option.
ATTR_SURTS_DUMP_FILE - Static variable in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
 
ATTR_SURTS_DUMP_FILE - Static variable in class org.archive.crawler.scope.SurtPrefixScope
Deprecated.  
ATTR_SURTS_SOURCE_FILE - Static variable in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
 
ATTR_SURTS_SOURCE_FILE - Static variable in class org.archive.crawler.filter.SurtPrefixFilter
Deprecated.  
ATTR_SURTS_SOURCE_FILE - Static variable in class org.archive.crawler.scope.SurtPrefixScope
Deprecated.  
ATTR_TARGET_READY_QUEUES_BACKLOG - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
target size of ready queues backlog
ATTR_TIMEOUT - Static variable in class org.archive.crawler.fetcher.FetchFTP
The name for the timeout-seconds attribute.
ATTR_TIMEOUT_SECONDS - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ATTR_TOO_LONG_DIRECTORY - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for too-long directory.
ATTR_TRANSITIVE_FILTER - Static variable in class org.archive.crawler.scope.DomainScope
Deprecated.  
ATTR_TRANSITIVE_FILTER - Static variable in class org.archive.crawler.scope.HostScope
Deprecated.  
ATTR_TRANSITIVE_FILTER - Static variable in class org.archive.crawler.scope.PathScope
Deprecated.  
ATTR_TRANSITIVE_FILTER - Static variable in class org.archive.crawler.scope.RefinedScope
 
ATTR_TREAT_FRAMES_AS_EMBED_LINKS - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
ATTR_TRIGGER_REGEXP - Static variable in class org.archive.crawler.extractor.ExtractorImpliedURI
regex which when matched triggers addition of 'implied' URI
ATTR_TRUE_DECISION - Static variable in class org.archive.crawler.deciderules.FilterDecideRule
 
ATTR_TRUST - Static variable in class org.archive.crawler.fetcher.FetchHTTP
SSL trust level setting attribute name.
ATTR_TYPE - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
ATTR_UNCHANGED_FACTOR - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
Factor increase on wait when unchanged
ATTR_UNDERSCORE_SET - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor
Key to use asking settings for underscore set.
ATTR_USE_AS_MIDFETCH - Static variable in class org.archive.crawler.deciderules.NotExceedsDocumentLengthTresholdDecideRule
 
ATTR_USE_DEFAULT - Static variable in class org.archive.crawler.filter.FilePatternFilter
Deprecated.  
ATTR_USE_OVERDUE_TIME - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
Indicates if the amount of time the URI was overdue should be added to the wait time before the new wait time is calculated.
ATTR_USE_PRESET - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
ATTR_USE_PUBLICSUFFIX_REDUCE - Static variable in class org.archive.crawler.processor.HashCrawlMapper
ruse publicsuffixes pattern for reducing classKey?
ATTR_USE_URI_UNIQ_FILTER - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Should the Frontier use a seperate 'already included' datastructure or rely on the queues'.
ATTR_USER_AGENT - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_USER_AGENTS - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
ATTR_USERNAME - Static variable in class org.archive.crawler.fetcher.FetchFTP
The name for the username attribute.
ATTR_WRITE_METADATA - Static variable in class org.archive.crawler.writer.WARCWriterProcessor
Key for whether to write 'metadata' type records where possible
ATTR_WRITE_PROCESSORS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
ATTR_WRITE_REQUESTS - Static variable in class org.archive.crawler.writer.WARCWriterProcessor
Key for whether to write 'request' type records where possible
ATTR_WRITE_REVISIT_FOR_IDENTICAL_DIGESTS - Static variable in class org.archive.crawler.writer.WARCWriterProcessor
Key for whether to write 'revisit' type records when consecutive identical digest
ATTR_WRITE_REVISIT_FOR_NOT_MODIFIED - Static variable in class org.archive.crawler.writer.WARCWriterProcessor
Key for whether to write 'revisit' type records for server "304 not modified" responses
attrDecideRules - Variable in class org.archive.crawler.framework.Processor
local name for decide-rules
ATTRIBUTE_ARRAY - Static variable in class org.archive.crawler.admin.CrawlJob
 
ATTRIBUTE_LIST - Static variable in class org.archive.crawler.admin.CrawlJob
 
ATTRIBUTE_LIST - Static variable in class org.archive.crawler.Heritrix
 
AUDIO - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
AUDIO - Static variable in class org.archive.crawler.filter.FilePatternFilter
Deprecated.  
AUDIO_PATTERNS - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
AUDIO_PATTERNS - Static variable in class org.archive.crawler.filter.FilePatternFilter
Deprecated.  
auxiliaryDirectoryStack - Variable in class org.archive.io.ObjectPlusFilesInputStream
 
auxiliaryDirectoryStack - Variable in class org.archive.io.ObjectPlusFilesOutputStream
 
avail - Variable in class org.archive.io.RecyclingFastBufferedOutputStream
The number of buffer bytes available starting from RecyclingFastBufferedOutputStream.pos.
available() - Method in class org.archive.io.ArchiveReader.RandomAccessBufferedInputStream
 
available() - Method in class org.archive.io.ArchiveRecord
This available is not the stream's available.
available() - Method in class org.archive.io.OriginSeekInputStream
 
available() - Method in class org.archive.io.RandomAccessInputStream
 
AVAILABLE_COST_POLICIES - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
all policies available to be chosen
AVAILABLE_END_OPERATIONS - Static variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
AVAILABLE_EXTRACTOR - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
 
averageDepth - Variable in class org.archive.crawler.admin.StatisticsTracker
 
averageDepth() - Method in class org.archive.crawler.admin.StatisticsTracker
Average depth of the last URI in all eligible queues.
averageDepth() - Method in interface org.archive.crawler.framework.Frontier
 
averageDepth() - Method in interface org.archive.crawler.framework.StatisticsTracking
 
averageDepth() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
averageDepth() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 

B

BackgroundImageExtractionSelfTestCase - Class in org.archive.crawler.selftest
Test the crawler can find background images in pages.
BackgroundImageExtractionSelfTestCase() - Constructor for class org.archive.crawler.selftest.BackgroundImageExtractionSelfTestCase
 
BACKSLASH - Static variable in class org.archive.net.UURIFactory
 
BACKSLASH_PATTERN - Static variable in class org.archive.net.UURIFactory
 
bandwidthKbytesPerSec - Variable in class org.archive.crawler.admin.StatisticsSummary
 
BASE - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
base - Variable in class org.archive.extractor.CharSequenceLinkExtractor
 
BASE - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
Base32 - Class in org.archive.util
Base32 - encodes and decodes RFC3548 Base32 (see http://www.faqs.org/rfcs/rfc3548.html ) Imported public-domain code of Bitzi.
Base32() - Constructor for class org.archive.util.Base32
 
baseCharacterCheck(char, String) - Method in class org.archive.io.warc.WARCWriter
 
baseCheck(String) - Method in class org.archive.util.anvl.SubElement
 
baseCheck(String) - Method in class org.archive.util.anvl.Value
 
BaseRule - Class in org.archive.crawler.url.canonicalize
Base of all rules applied canonicalizing a URL that are configurable via the Heritrix settings system.
BaseRule(String, String) - Constructor for class org.archive.crawler.url.canonicalize.BaseRule
Constructor.
batchFlush() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
batchSchedule(CandidateURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
BdbFrontier - Class in org.archive.crawler.frontier
A Frontier using several BerkeleyDB JE Databases to hold its record of known hosts (queues), and pending URIs.
BdbFrontier(String) - Constructor for class org.archive.crawler.frontier.BdbFrontier
Constructor.
BdbFrontier(String, String) - Constructor for class org.archive.crawler.frontier.BdbFrontier
Create the BdbFrontier
BdbMultipleWorkQueues - Class in org.archive.crawler.frontier
A BerkeleyDB-database-backed structure for holding ordered groupings of CrawlURIs.
BdbMultipleWorkQueues(Environment, StoredClassCatalog, boolean) - Constructor for class org.archive.crawler.frontier.BdbMultipleWorkQueues
Create the multi queue in the given environment.
BdbMultipleWorkQueues.BdbFrontierMarker - Class in org.archive.crawler.frontier
Marker for remembering a position within the BdbMultipleWorkQueues.
BdbMultipleWorkQueues.BdbFrontierMarker(DatabaseEntry, String) - Constructor for class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
Create a marker pointed at the given start location.
BdbUriUniqFilter - Class in org.archive.crawler.util
A BDB implementation of an AlreadySeen list.
BdbUriUniqFilter() - Constructor for class org.archive.crawler.util.BdbUriUniqFilter
Shutdown default constructor.
BdbUriUniqFilter(Environment) - Constructor for class org.archive.crawler.util.BdbUriUniqFilter
Constructor.
BdbUriUniqFilter(File) - Constructor for class org.archive.crawler.util.BdbUriUniqFilter
Constructor.
BdbUriUniqFilter(File, int) - Constructor for class org.archive.crawler.util.BdbUriUniqFilter
Constructor.
BdbWorkQueue - Class in org.archive.crawler.frontier
One independent queue of items with the same 'classKey' (eg host).
BdbWorkQueue(String, BdbFrontier) - Constructor for class org.archive.crawler.frontier.BdbWorkQueue
Create a virtual queue inside the given BdbMultipleWorkQueues
BeanShellDecideRule - Class in org.archive.crawler.deciderules
Rule which runs a groovy script to make its decision.
BeanShellDecideRule(String) - Constructor for class org.archive.crawler.deciderules.BeanShellDecideRule
 
BeanShellProcessor - Class in org.archive.crawler.processor
A processor which runs a BeanShell script on the CrawlURI.
BeanShellProcessor(String) - Constructor for class org.archive.crawler.processor.BeanShellProcessor
Constructor.
BEGIN_TRANSFORMED_AUTHORITY - Static variable in class org.archive.util.SURT
 
beginCrawlStop() - Method in class org.archive.crawler.framework.CrawlController
Start the process of stopping the crawl.
beginFpMerge() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
beginFpMerge() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
Begin merging pending candidates with complete list.
beginFpMerge() - Method in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
BenchmarkBlooms - Class in org.archive.util
Simple benchmarking of different BloomFilter implementations.
BenchmarkBlooms() - Constructor for class org.archive.util.BenchmarkBlooms
 
BenchmarkUriUniqFilters - Class in org.archive.crawler.util
BenchmarkUriUniqFilters
BenchmarkUriUniqFilters() - Constructor for class org.archive.crawler.util.BenchmarkUriUniqFilters
 
betterPrintStack(RuntimeException) - Static method in class org.archive.util.DevUtils
Deprecated. This method was never used.
bigChar(InputStream) - Static method in class org.archive.io.Endian
Reads the next big-endian unsigned 16 bit integer from the given stream.
bigInt(InputStream) - Static method in class org.archive.io.Endian
Reads the next big-endian signed 32-bit integer from the given stream.
bindObjectName(Context, ObjectName) - Static method in class org.archive.util.JndiUtils
 
BIT_INDEX_MASK - Static variable in class org.archive.util.BloomFilter64bit
 
bitIndexesFor(CharSequence) - Method in class org.archive.util.BloomFilter64bit
 
bits - Variable in class org.archive.util.BloomFilter64bit
The underlying bit vector
BLOCK_SIZE - Static variable in interface org.archive.util.ms.BlockFileSystem
The size of a block in bytes.
BlockFileSystem - Interface in org.archive.util.ms
Describes the internal file system contained in .doc files.
BlockInputStream - Class in org.archive.util.ms
InputStream for a file contained in a BlockFileSystem.
BlockInputStream(BlockFileSystem, int) - Constructor for class org.archive.util.ms.BlockInputStream
Constructor.
bloom - Variable in class org.archive.crawler.util.BloomUriUniqFilter
 
BloomFilter - Interface in org.archive.util
Common interface for different Bloom filter implementations
BloomFilter64bit - Class in org.archive.util
A Bloom filter.
BloomFilter64bit(long, int) - Constructor for class org.archive.util.BloomFilter64bit
Creates a new Bloom filter with given number of hash functions and expected number of elements.
BloomFilter64bit(long, int, boolean) - Constructor for class org.archive.util.BloomFilter64bit
 
BloomFilter64bit(long, int, Random, boolean) - Constructor for class org.archive.util.BloomFilter64bit
Creates a new Bloom filter with given number of hash functions and expected number of elements.
BloomFilterTestBase - Class in org.archive.util
BloomFilter tests.
BloomFilterTestBase() - Constructor for class org.archive.util.BloomFilterTestBase
 
BloomUriUniqFilter - Class in org.archive.crawler.util
A MG4J BloomFilter-based implementation of an AlreadySeen list.
BloomUriUniqFilter() - Constructor for class org.archive.crawler.util.BloomUriUniqFilter
Default constructor
BloomUriUniqFilter(int, int) - Constructor for class org.archive.crawler.util.BloomUriUniqFilter
Constructor.
BOOLEAN - Static variable in class org.archive.crawler.settings.SettingsHandler
 
borrowFile() - Method in class org.archive.io.WriterPool
Check out a WriterPoolMember.
BroadScope - Class in org.archive.crawler.scope
A CrawlScope instance defines which URIs are "in" a particular crawl.
BroadScope(String) - Constructor for class org.archive.crawler.scope.BroadScope
Constructor.
bucketCount - Variable in class org.archive.crawler.processor.HashCrawlMapper
 
BucketQueueAssignmentPolicy - Class in org.archive.crawler.frontier
Uses the target IPs as basis for queue-assignment, distributing them over a fixed number of sub-queues.
BucketQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
 
buffer - Variable in class org.archive.io.RecyclingFastBufferedOutputStream
The internal buffer.
buffer - Variable in class org.archive.util.PaddingStringBuffer
 
BufferedSeekInputStream - Class in org.archive.io
Buffers data from some other SeekInputStream.
BufferedSeekInputStream(SeekInputStream, int) - Constructor for class org.archive.io.BufferedSeekInputStream
Constructor.
bufStreamBuf - Variable in class org.archive.io.RecordingOutputStream
Reusable buffer for FastBufferedOutputStream
buildDisplayingHeader(int, long) - Static method in class org.archive.crawler.util.LogReader
 
buildMBeanInfo() - Method in class org.archive.crawler.admin.CrawlJob
Build up the MBean info for Heritrix main.
buildMBeanInfo() - Method in class org.archive.crawler.Heritrix
Build up the MBean info for Heritrix main.
buildRegex(String, StringBuilder, SortedSet<String>) - Static method in class org.archive.net.PublicSuffixes
 
buildSurtPrefixSet() - Method in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
Construct the set of prefixes to use, from the seed list ( which may include both URIs and '+'-prefixed directives).
busyThreads - Variable in class org.archive.crawler.admin.StatisticsTracker
 
byteArrayEquals(byte[], byte[]) - Static method in class org.archive.util.ArchiveUtils
check that two byte arrays are equal.
byteArrayIntoLong(byte[]) - Static method in class org.archive.util.ArchiveUtils
 
byteArrayIntoLong(byte[], int) - Static method in class org.archive.util.ArchiveUtils
Byte array into long.

C

cache - Variable in class org.archive.crawler.processor.CrawlMapper
 
cache - Variable in class org.archive.util.fingerprint.ArrayLongFPCache
 
CachedBdbMap<K,V> - Class in org.archive.util
Deprecated. use ObjectIdentityBdbCache instead
CachedBdbMap(String) - Constructor for class org.archive.util.CachedBdbMap
Deprecated. Constructor.
CachedBdbMap(File, String, Class<K>, Class<V>) - Constructor for class org.archive.util.CachedBdbMap
Deprecated. A constructor for creating a new CachedBdbMap.
CachedBdbMap.DbEnvironmentEntry - Class in org.archive.util
Deprecated. Simple structure to keep needed information about a DB Environment.
CachedBdbMap.DbEnvironmentEntry() - Constructor for class org.archive.util.CachedBdbMap.DbEnvironmentEntry
Deprecated.  
CachedBdbMap.LowMemoryCanary - Class in org.archive.util
Deprecated.  
CachedBdbMap.LowMemoryCanary() - Constructor for class org.archive.util.CachedBdbMap.LowMemoryCanary
Deprecated.  
cacheLength() - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
cacheMetadata() - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
calculateInsertKey(CrawlURI) - Static method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Calculate the insertKey that places a CrawlURI in the desired spot.
calculateOriginKey(String) - Static method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Calculate the 'origin' key for a virtual queue of items with the given classKey.
calculateSnoozeTime(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Calculates how long a host queue needs to be snoozed following the crawling of a URI.
CALENDARISH - Static variable in class org.archive.crawler.frontier.AntiCalendarCostAssignmentPolicy
 
canary - Variable in class org.archive.util.CachedBdbMap
Deprecated.  
canary - Variable in class org.archive.util.ObjectIdentityBdbCache
 
CandidateURI - Class in org.archive.crawler.datamodel
A URI, discovered or passed-in, that may be scheduled.
CandidateURI() - Constructor for class org.archive.crawler.datamodel.CandidateURI
Constructor.
CandidateURI(UURI) - Constructor for class org.archive.crawler.datamodel.CandidateURI
 
CandidateURI(UURI, String, UURI, CharSequence) - Constructor for class org.archive.crawler.datamodel.CandidateURI
 
CanonicalizationRule - Interface in org.archive.crawler.url
A rule to apply canonicalizing a url.
canonicalize(UURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Canonicalize passed uuri.
canonicalize(CandidateURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Canonicalize passed CandidateURI.
canonicalize(UURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Canonicalize passed uuri.
canonicalize(CandidateURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Canonicalize passed CandidateURI.
canonicalize(String, Object) - Method in interface org.archive.crawler.url.CanonicalizationRule
Apply this canonicalization rule.
canonicalize(String, Object) - Method in class org.archive.crawler.url.canonicalize.FixupQueryStr
 
canonicalize(String, Object) - Method in class org.archive.crawler.url.canonicalize.LowercaseRule
 
canonicalize(String, Object) - Method in class org.archive.crawler.url.canonicalize.RegexRule
 
canonicalize(String, Object) - Method in class org.archive.crawler.url.canonicalize.StripExtraSlashes
 
canonicalize(String, Object) - Method in class org.archive.crawler.url.canonicalize.StripSessionCFIDs
 
canonicalize(String, Object) - Method in class org.archive.crawler.url.canonicalize.StripSessionIDs
 
canonicalize(String, Object) - Method in class org.archive.crawler.url.canonicalize.StripUserinfoRule
 
canonicalize(String, Object) - Method in class org.archive.crawler.url.canonicalize.StripWWWNRule
 
canonicalize(String, Object) - Method in class org.archive.crawler.url.canonicalize.StripWWWRule
 
canonicalize(UURI, CrawlOrder) - Static method in class org.archive.crawler.url.Canonicalizer
Convenience method that is passed a settings object instance pulling from it what it needs to canonicalize.
canonicalize(UURI, Iterator) - Static method in class org.archive.crawler.url.Canonicalizer
Run the passed uuri through the list of rules.
Canonicalizer - Class in org.archive.crawler.url
URL canonicalizer.
capacityPowerOfTwo - Variable in class org.archive.util.AbstractLongFPSet
the capacity of this set, specified as the exponent of a power of 2
catalog - Variable in class org.archive.crawler.extractor.PDFParser
 
caUri - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter.PendingItem
 
CDX - Static variable in interface org.archive.io.ArchiveFileConstants
 
CDX_FILE - Static variable in interface org.archive.io.ArchiveFileConstants
 
CDX_LINE_BUFFER_SIZE - Static variable in interface org.archive.io.ArchiveFileConstants
Size used to preallocate stringbuffer used outputting a cdx line.
cdxOutput(boolean) - Method in class org.archive.io.ArchiveReader
 
ChangeEvaluator - Class in org.archive.crawler.extractor
This processor compares the CrawlURI's current content digest with the one from a previous crawl.
ChangeEvaluator(String) - Constructor for class org.archive.crawler.extractor.ChangeEvaluator
Constructor
characters(char[], int, int) - Method in class org.archive.crawler.settings.CrawlSettingsSAXHandler
 
charAt(int) - Method in class org.archive.crawler.settings.TextField
 
charAt(int) - Method in class org.archive.io.CharSubSequence
 
charAt(int) - Method in class org.archive.io.GenericReplayCharSequence
 
charAt(int) - Method in class org.archive.io.Latin1ByteReplayCharSequence
Get character at passed absolute position.
charAt(int) - Method in class org.archive.io.SeekReaderCharSequence
 
charAt(int) - Method in class org.archive.net.UURI
 
charAt(int) - Method in class org.archive.util.InterruptibleCharSequence
 
charSequenceFrom(InputStream, Charset) - Method in class org.archive.extractor.CharSequenceLinkExtractor
 
CharSequenceLinkExtractor - Class in org.archive.extractor
Abstract superclass providing utility methods for LinkExtractors which would prefer to work on a CharSequence rather than a stream.
CharSequenceLinkExtractor() - Constructor for class org.archive.extractor.CharSequenceLinkExtractor
 
CharSequenceProvider - Interface in org.archive.extractor
Interface indicating an object can efficiently provide a (perhaps cached or simulated) CharSequence version of itself.
CharSubSequence - Class in org.archive.io
Provides a subsequence view onto a CharSequence.
CharSubSequence(CharSequence, int, int) - Constructor for class org.archive.io.CharSubSequence
 
check(CrawlerSettings, ComplexType, Type, Object) - Method in class org.archive.crawler.settings.Constraint
Run the check.
checkAdds(BloomFilter, long) - Method in class org.archive.util.BloomFilterTestBase
Check that the given filter behaves properly as a large number of constructed unique strings are added: responding positively to contains, and negatively to redundant adds.
checkAttribute(ModuleAttributeInfo, ComplexType, CrawlerSettings, HttpServletRequest, boolean) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
Process passed attribute.
checkBytesWritten() - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
checkCharacter(char, String, int) - Method in class org.archive.util.anvl.Label
 
checkCharacter(char, String, int) - Method in class org.archive.util.anvl.SubElement
 
checkCharacter(char, String, int) - Method in class org.archive.util.anvl.Value
 
checkClientTrusted(X509Certificate[], String) - Method in class org.archive.httpclient.ConfigurableX509TrustManager
 
checkClose(Iterator) - Method in class org.archive.crawler.framework.CrawlScope
Convenience method to close SeedFileIterator, if appropriate.
checkContains(BloomFilter, long) - Method in class org.archive.util.BloomFilterTestBase
Check if the given filter contains any of the given constructed strings.
checkControlCharacter(char, String, int) - Method in class org.archive.util.anvl.SubElement
 
checkCrawlJob(CrawlJob, HttpServletResponse, String, String) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
Check passed job is not null and not readonly.
checkCRLF(char, String, int) - Method in class org.archive.util.anvl.SubElement
 
checkDirectory(File) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
checkDistribution(BloomFilter) - Method in class org.archive.util.BloomFilterTestBase
Check that the given bloom filter, assumed to have already had a significant number of items added, has bits set in the lower and upper 10% of its bit field.
checkFinish() - Method in class org.archive.crawler.framework.CrawlController
Evaluate if the crawl should stop because it is finished.
checkForEmptyPlaceHolder(String) - Method in class org.archive.crawler.Heritrix
If passed str has placeholder for the empty string, return the empty string else return orginal.
checkForInterrupt() - Method in class org.archive.crawler.framework.Processor
 
checkForNull(String) - Method in class org.archive.crawler.io.UriProcessingFormatter
 
checkHeaderLineMimetypeParameter(String) - Method in class org.archive.io.warc.WARCWriter
 
checkHeaderValue(String) - Method in class org.archive.io.warc.WARCWriter
 
checkHttpSchemeSpecificPartSlashPrefix(URI, String, String) - Method in class org.archive.net.UURIFactory
If http(s) scheme, check scheme specific part begins '//'.
checkLimits() - Method in class org.archive.io.RecordingOutputStream
Check any enforced limits.
checkMidfetchAbort(CrawlURI, HttpRecorderMethod, HttpConnection) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
checkParamsCount(String, Object[], int) - Static method in class org.archive.util.JmxUtils
 
checkpoint() - Method in class org.archive.crawler.admin.CrawlJob
 
Checkpoint - Class in org.archive.crawler.datamodel
Record of a specific checkpoint on disk.
Checkpoint() - Constructor for class org.archive.crawler.datamodel.Checkpoint
Publically inaccessible default constructor.
Checkpoint(File) - Constructor for class org.archive.crawler.datamodel.Checkpoint
Create a Checkpoint instance based on the given prexisting checkpoint directory
checkpoint() - Method in class org.archive.crawler.framework.Checkpointer
Run a checkpoint of the crawler.
checkpoint() - Method in class org.archive.crawler.framework.CrawlController
Run checkpointing.
checkpoint(File) - Method in interface org.archive.crawler.frontier.FrontierJournal
Checkpoint.
checkpoint(File) - Method in class org.archive.crawler.io.CrawlerJournal
Handle a checkpoint by rotating the current log to a checkpoint-named file and starting a new log.
CHECKPOINT_OPER - Static variable in class org.archive.crawler.admin.CrawlJob
 
checkpointBdb(File) - Method in class org.archive.crawler.framework.CrawlController
Checkpoint bdb.
checkpointBigMaps(File) - Method in class org.archive.crawler.framework.CrawlController
 
Checkpointer - Class in org.archive.crawler.framework
Runs checkpointing.
Checkpointer(CrawlController, File) - Constructor for class org.archive.crawler.framework.Checkpointer
Create a new CheckpointContext with the given store directory
Checkpointer(CrawlController, String) - Constructor for class org.archive.crawler.framework.Checkpointer
Create a new CheckpointContext with the given store directory
Checkpointer.CheckpointingThread - Class in org.archive.crawler.framework
Thread to run the checkpointing.
Checkpointer.CheckpointingThread(String) - Constructor for class org.archive.crawler.framework.Checkpointer.CheckpointingThread
 
checkpointFailed(Exception) - Method in class org.archive.crawler.framework.Checkpointer
Note that a checkpoint failed
checkpointFailed(String) - Method in class org.archive.crawler.framework.Checkpointer
 
checkpointFailed() - Method in class org.archive.crawler.framework.Checkpointer
 
CHECKPOINTING - Static variable in class org.archive.crawler.framework.CrawlController
 
checkpointJob() - Method in class org.archive.crawler.admin.CrawlJobHandler
Cause the current job to write a checkpoint to disk.
checkpointRecover() - Method in class org.archive.crawler.framework.WriterPoolProcessor
Called out of WriterPoolProcessor.initialTasks() when recovering a checkpoint.
CheckpointUtils - Class in org.archive.crawler.util
Utilities useful checkpointing.
CheckpointUtils() - Constructor for class org.archive.crawler.util.CheckpointUtils
 
checkQuotas(CrawlURI, CrawlSubstats.HasCrawlSubstats, int) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
Check all quotas for the given substats and category (server, host, or group).
checkServerTrusted(X509Certificate[], String) - Method in class org.archive.httpclient.ConfigurableX509TrustManager
 
checkSize() - Method in class org.archive.io.WriterPoolMember
Call this method just before/after any significant write.
checkStream(InputStream) - Static method in class org.archive.io.GzippedInputStream
 
CHECKSUM_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for checksum field.
CHECKSUM_HEADER_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for the ARC Header Checksum field.
checkType(Class) - Method in class org.archive.crawler.datamodel.credential.CredentialAvatar
 
checkType(Object) - Method in class org.archive.crawler.settings.DoubleList
Check if element is of right type for this list.
checkType(Object) - Method in class org.archive.crawler.settings.FloatList
Check if element is of right type for this list.
checkType(Object) - Method in class org.archive.crawler.settings.IntegerList
Check if element is of right type for this list.
checkType(Object) - Method in class org.archive.crawler.settings.ListType
Check if element is of right type for this list.
checkType(Object) - Method in class org.archive.crawler.settings.LongList
Check if element is of right type for this list.
checkType(Object) - Method in class org.archive.crawler.settings.StringList
Check if element is of right type for this list.
checkUserAgentAndFrom() - Method in class org.archive.crawler.datamodel.CrawlOrder
Checks if the User Agent and From field are set 'correctly' in the specified Crawl Order.
checkValue(CrawlerSettings, String, Object) - Method in class org.archive.crawler.settings.ComplexType
Check an attribute to see if it fulfills all the constraints set on the definition of this attribute.
checkValue(CrawlerSettings, String, Type, Object) - Method in class org.archive.crawler.settings.ComplexType
 
checkValue(CrawlerSettings, String, Type, Object) - Method in class org.archive.crawler.settings.MapType
 
checkWriteable(File) - Method in class org.archive.io.WriterPoolMember
 
CIRCUMFLEX - Static variable in class org.archive.net.UURIFactory
 
CIRCUMFLEX_PATTERN - Static variable in class org.archive.net.UURIFactory
 
classCatalog - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
For BDB serialization of objects
classCatalog - Variable in class org.archive.util.bdbje.EnhancedEnvironment
 
classCatalog - Variable in class org.archive.util.CachedBdbMap.DbEnvironmentEntry
Deprecated.  
classCatalogDB - Variable in class org.archive.util.bdbje.EnhancedEnvironment
 
CLASSEXT - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
CLASSEXT - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
CLASSIC - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
ClassicScope - Class in org.archive.crawler.scope
ClassicScope: superclass with shared Scope behavior for most common scopes.
ClassicScope(String) - Constructor for class org.archive.crawler.scope.ClassicScope
 
ClassicScope() - Constructor for class org.archive.crawler.scope.ClassicScope
Default constructor.
classKey - Variable in class org.archive.crawler.frontier.WorkQueue
The classKey
ClassKeyMatchesRegExpDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any CrawlURI class key -- i.e.
ClassKeyMatchesRegExpDecideRule(String) - Constructor for class org.archive.crawler.deciderules.ClassKeyMatchesRegExpDecideRule
Usual constructor.
classnameBasedUID(Class<?>, int) - Static method in class org.archive.util.ArchiveUtils
Generate a long UID based on the given class and version number.
cleanup() - Method in class org.archive.crawler.datamodel.ServerCache
Called when shutting down the cache so we can do clean up.
cleanup() - Method in class org.archive.crawler.framework.Checkpointer
 
cleanup() - Method in class org.archive.crawler.framework.ToePool
 
cleanup() - Method in class org.archive.crawler.settings.SettingsHandler
 
cleanup() - Method in class org.archive.util.HttpRecorder
Cleanup backing files.
cleanupCurrentRecord() - Method in class org.archive.io.ArchiveReader
Cleanout the current record if there is one.
cleanupHttp() - Method in class org.archive.crawler.fetcher.FetchHTTP
Perform any final cleanup related to the HttpClient instance.
cleanUpOldFiles(String) - Method in class org.archive.util.TmpDirTestCase
Delete any files left over from previous run.
cleanUpOldFiles(File, String) - Method in class org.archive.util.TmpDirTestCase
Delete any files left over from previous run.
clear() - Method in class org.archive.crawler.settings.ListType
Removes all elements from this list.
clear() - Method in class org.archive.crawler.settings.SettingsCache
Clear all cached settings.
clear() - Method in class org.archive.crawler.settings.SoftSettingsHash
Removes all settings object from this hash.
clear() - Method in class org.archive.util.CachedBdbMap
Deprecated. Note that a call to this method CLOSEs the underlying bdbje.
clearAList() - Method in class org.archive.crawler.datamodel.CandidateURI
 
clearAt(long) - Method in class org.archive.util.AbstractLongFPSet
 
clearAt(long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
clearCheckpointInProgressDirectory() - Method in class org.archive.crawler.framework.Checkpointer
 
clearErrors() - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
Reset handler.
clearHeld() - Method in class org.archive.crawler.frontier.WorkQueue
Clear isHeld to false
clearOutlinks() - Method in class org.archive.crawler.datamodel.CrawlURI
 
clearPerHostSettingsCache() - Method in class org.archive.crawler.settings.SettingsHandler
Clear any per-host settings cached in memory; allows editting of per-host settings files on disk, perhaps in bulk/automated fashion, to take effect in running crawl.
ClientFTP - Class in org.archive.net
Client for FTP operations.
ClientFTP() - Constructor for class org.archive.net.ClientFTP
Constructs a new ClientFTP.
clone() - Method in class org.archive.util.anvl.ANVLRecord
 
close() - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Close down any allocated resources.
close() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Cleanup all open Berkeley Database objects.
close() - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
Closes all HQs and the Environment.
close() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
clean up
close() - Method in interface org.archive.crawler.frontier.FrontierJournal
Flush and close any held objects.
close() - Method in class org.archive.crawler.io.CrawlerJournal
Flush and close the underlying IO objects.
close() - Method in class org.archive.crawler.scope.SeedFileIterator
 
close() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
close() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
close() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
close() - Method in class org.archive.io.ArchiveReader
 
close() - Method in class org.archive.io.ArchiveRecord
Calling close on a record skips us past this record to the next record in the stream.
close() - Method in class org.archive.io.BufferedSeekInputStream
Close the stream, including the wrapped input stream.
close() - Method in class org.archive.io.GenericReplayCharSequence
 
close() - Method in class org.archive.io.Latin1ByteReplayCharSequence
Cleanup resources.
close() - Method in class org.archive.io.ObjectPlusFilesInputStream
In addition to default, do any registered cleanup tasks.
close() - Method in class org.archive.io.RandomAccessInputStream
 
close() - Method in class org.archive.io.RandomAccessOutputStream
 
close() - Method in class org.archive.io.RecordingInputStream
 
close() - Method in class org.archive.io.RecordingOutputStream
 
close() - Method in class org.archive.io.RecyclingFastBufferedOutputStream
 
close() - Method in interface org.archive.io.ReplayCharSequence
Call this method when done so implementation has chance to clean up resources.
close() - Method in class org.archive.io.ReplayInputStream
 
close() - Method in class org.archive.io.SinkHandler
 
close() - Method in class org.archive.io.WriterPool
Close all WriterPoolMembers in pool.
close() - Method in class org.archive.io.WriterPoolMember
 
close() - Method in class org.archive.queue.StoredQueue
 
close() - Method in class org.archive.util.bdbje.EnhancedEnvironment
 
close() - Method in class org.archive.util.CachedBdbMap
Deprecated.  
close() - Method in class org.archive.util.HttpRecorder
Close all streams.
close() - Method in class org.archive.util.ms.PieceReader
 
close() - Method in class org.archive.util.ObjectIdentityBdbCache
 
close() - Method in interface org.archive.util.ObjectIdentityCache
close/release any associated resources
close() - Method in class org.archive.util.ObjectIdentityMemCache
 
closeDataConnection() - Method in class org.archive.net.ClientFTP
 
closeDiskStream() - Method in class org.archive.io.RecordingOutputStream
 
closeIdleConnections(long) - Method in class org.archive.httpclient.ThreadLocalHttpConnectionManager
 
closeLogFiles() - Method in class org.archive.crawler.framework.CrawlController
Close all log files and remove handlers from loggers.
closeQueue() - Method in class org.archive.crawler.frontier.BdbFrontier
 
closeQueue() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
closeRecorder() - Method in class org.archive.io.RecordingInputStream
 
closeRecorder() - Method in class org.archive.io.RecordingOutputStream
 
closeRecorders() - Method in class org.archive.util.HttpRecorder
Close both input and output recorders.
coalesceHostAuthorityStrings() - Method in class org.archive.net.UURI
The two String fields cachedHost and cachedAuthorityMinusUserInfo are usually identical; if so, coalesce into a single instance.
coalesceUriStrings() - Method in class org.archive.net.UURI
The two String fields cachedString and cachedEscapedURI are usually identical; if so, coalesce into a single instance.
CODE_HEADER_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for the ARC Header Result Code field.
COLLECTION_KEY - Static variable in interface org.archive.crawler.writer.Kw3Constants
 
COLON - Static variable in class org.archive.net.UURIFactory
 
COLON - Static variable in class org.archive.util.anvl.Label
 
COLON_SPACE - Static variable in interface org.archive.io.warc.WARCConstants
 
commandLine - Static variable in class org.archive.crawler.Heritrix
Set to true if application is started from command line.
CommandLineParser - Class in org.archive.crawler
Print Heritrix command-line usage message.
CommandLineParser(String[], PrintWriter, String) - Constructor for class org.archive.crawler.CommandLineParser
Constructor.
CommandLineParser.HeritrixHelpFormatter - Class in org.archive.crawler
Override so can customize usage output.
CommandLineParser.HeritrixHelpFormatter() - Constructor for class org.archive.crawler.CommandLineParser.HeritrixHelpFormatter
 
COMMENT_LINE - Static variable in class org.archive.util.iterator.RegexpLineIterator
 
COMMERCIAL_AT - Static variable in class org.archive.net.UURIFactory
 
COMPACT_REPORT - Static variable in class org.archive.crawler.framework.ToePool
 
compactReportTo(PrintWriter) - Method in class org.archive.crawler.framework.ToePool
 
compare(StringIntPair, StringIntPair) - Method in class org.archive.crawler.util.StringIntPairComparator
 
compareBytes(int, int) - Method in class org.archive.io.GzippedInputStream
 
compareTo(Object) - Method in class org.archive.crawler.frontier.WorkQueue
 
compareTo(Constraint) - Method in class org.archive.crawler.settings.Constraint
Compare this constraints level to another constraint.
compareTo(Object) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter.PendingItem
 
compareTo(Object) - Method in class org.archive.net.UURI
 
COMPLETED_JOBS_OPER - Static variable in class org.archive.crawler.Heritrix
 
completePause() - Method in class org.archive.crawler.framework.CrawlController
 
completeStop() - Method in class org.archive.crawler.admin.CrawlJob.MBeanCrawlController
 
completeStop() - Method in class org.archive.crawler.framework.CrawlController
Called when the last toethread exits.
ComplexType - Class in org.archive.crawler.settings
Superclass of all configurable modules.
ComplexType(String, String) - Constructor for class org.archive.crawler.settings.ComplexType
Creates a new instance of ComplexType.
ComplexType.Context - Class in org.archive.crawler.settings
 
ComplexType.Context() - Constructor for class org.archive.crawler.settings.ComplexType.Context
 
ComplexType.Context(CrawlerSettings, UURI) - Constructor for class org.archive.crawler.settings.ComplexType.Context
 
ComplexType.MBeanAttributeInfoIterator - Class in org.archive.crawler.settings
Iterator over all MBeanAttributeInfo for this ComplexType
ComplexType.MBeanAttributeInfoIterator(Object) - Constructor for class org.archive.crawler.settings.ComplexType.MBeanAttributeInfoIterator
 
CompositeFileInputStream - Class in org.archive.io
 
CompositeFileInputStream(List<File>) - Constructor for class org.archive.io.CompositeFileInputStream
 
CompositeFileReader - Class in org.archive.io
 
CompositeFileReader(List<File>) - Constructor for class org.archive.io.CompositeFileReader
 
CompositeIterator - Class in org.archive.util.iterator
An iterator that's built up out of any number of other iterators.
CompositeIterator() - Constructor for class org.archive.util.iterator.CompositeIterator
Create an empty CompositeIterator.
CompositeIterator(Iterator, Iterator) - Constructor for class org.archive.util.iterator.CompositeIterator
Convenience method for concatenating together two iterators.
COMPRESSED_ARC_FILE_EXTENSION - Static variable in interface org.archive.io.arc.ARCConstants
Compressed arc file extension.
COMPRESSED_FILE_EXTENSION - Static variable in interface org.archive.io.ArchiveFileConstants
Compressed file extention.
COMPRESSED_WARC_FILE_EXTENSION - Static variable in interface org.archive.io.warc.WARCConstants
Compressed WARC file extension.
ConfigurableX509TrustManager - Class in org.archive.httpclient
A configurable trust manager built on X509TrustManager.
ConfigurableX509TrustManager() - Constructor for class org.archive.httpclient.ConfigurableX509TrustManager
 
ConfigurableX509TrustManager(String) - Constructor for class org.archive.httpclient.ConfigurableX509TrustManager
Constructor.
ConfigurationException - Exception in org.archive.crawler.framework.exceptions
ConfigurationExceptions should be thrown when a configuration file is missing data, or contains uninterpretable data, at runtime.
ConfigurationException() - Constructor for exception org.archive.crawler.framework.exceptions.ConfigurationException
default constructor
ConfigurationException(String) - Constructor for exception org.archive.crawler.framework.exceptions.ConfigurationException
Create a ConfigurationException
ConfigurationException(String, Throwable) - Constructor for exception org.archive.crawler.framework.exceptions.ConfigurationException
 
ConfigurationException(Throwable) - Constructor for exception org.archive.crawler.framework.exceptions.ConfigurationException
Create a ConfigurationException
ConfigurationException(String, String, String) - Constructor for exception org.archive.crawler.framework.exceptions.ConfigurationException
Create ConfigurationException
ConfigurationException(String, Throwable, String, String) - Constructor for exception org.archive.crawler.framework.exceptions.ConfigurationException
Create ConfigurationException
ConfigurationException(Throwable, String, String) - Constructor for exception org.archive.crawler.framework.exceptions.ConfigurationException
Create ConfigurationException
ConfiguredDecideRule - Class in org.archive.crawler.deciderules
Rule which can be configured to ACCEPT or REJECT at operator's option.
ConfiguredDecideRule(String) - Constructor for class org.archive.crawler.deciderules.ConfiguredDecideRule
 
configureHttp() - Method in class org.archive.crawler.fetcher.FetchHTTP
 
configureMethod(CrawlURI, HttpMethod) - Method in class org.archive.crawler.fetcher.FetchHTTP
Configure the HttpMethod setting options and headers.
configureTrustStore() - Static method in class org.archive.crawler.Heritrix
Configure our trust store.
congestionRatio - Variable in class org.archive.crawler.admin.StatisticsTracker
 
congestionRatio() - Method in class org.archive.crawler.admin.StatisticsTracker
Ratio of number of threads that would theoretically allow maximum crawl progress (if each was as productive as current threads), to current number of threads.
congestionRatio() - Method in interface org.archive.crawler.framework.Frontier
 
congestionRatio() - Method in interface org.archive.crawler.framework.StatisticsTracking
 
congestionRatio() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
congestionRatio() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
connect() - Method in class org.archive.net.DownloadURLConnection
Do script copy to local file.
consecutiveConnectionErrors - Variable in class org.archive.crawler.datamodel.CrawlServer
 
considerIfLikelyUri(CrawlURI, CharSequence, CharSequence, char) - Method in class org.archive.crawler.extractor.ExtractorHTML
Consider whether a given string is URI-like.
considerIncluded(UURI) - Method in interface org.archive.crawler.framework.Frontier
Notify Frontier that it should consider the given UURI as if already scheduled.
considerIncluded(UURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
considerIncluded(UURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
considerQueryStringValues(CrawlURI, CharSequence, CharSequence, char) - Method in class org.archive.crawler.extractor.ExtractorHTML
Consider a query-string-like collections of key=value[&key=value] pairs for URI-like strings in the values.
considerStringAsUri(String) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorSWFActions
 
considerStrings(CrawlURI, CharSequence, CrawlController, boolean) - Static method in class org.archive.crawler.extractor.ExtractorJS
 
considerTimestamp() - Method in class org.archive.crawler.io.CrawlerJournal
Write a timestamp line if appropriate
Constraint - Class in org.archive.crawler.settings
Superclass for constraints that can be set on attribute definitions.
Constraint(Level, String) - Constructor for class org.archive.crawler.settings.Constraint
Constructs a new Constraint.
Constraint.FailedCheck - Class in org.archive.crawler.settings
Objects of this class represents failed constraint checks.
Constraint.FailedCheck(CrawlerSettings, ComplexType, Type, Object, String) - Constructor for class org.archive.crawler.settings.Constraint.FailedCheck
Construct a new FailedCheck object.
Constraint.FailedCheck(CrawlerSettings, ComplexType, Type, Object) - Constructor for class org.archive.crawler.settings.Constraint.FailedCheck
Construct a new FailedCheck object using the constraints default message.
constructedRegexp - Variable in class org.archive.crawler.deciderules.PathologicalPathDecideRule
 
constructRegexp() - Method in class org.archive.crawler.deciderules.PathologicalPathDecideRule
 
containerInitialization() - Static method in class org.archive.crawler.Heritrix
Run setup tasks for this 'container'.
contains(Object) - Method in class org.archive.crawler.settings.ListType
 
contains(long) - Method in class org.archive.util.AbstractLongFPSet
Does this set contain the given value?
contains(CharSequence) - Method in interface org.archive.util.BloomFilter
Checks whether the given character sequence is in this filter.
contains(CharSequence) - Method in class org.archive.util.BloomFilter64bit
Checks whether the given character sequence is in this filter.
contains(long) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
contains(long) - Method in interface org.archive.util.fingerprint.LongFPSet
Does this set contain a given fingerprint.
contains(int) - Method in class org.archive.util.ms.Piece
 
containsAll(Collection) - Method in class org.archive.crawler.settings.ListType
 
containsHost(String) - Method in class org.archive.crawler.datamodel.ServerCache
 
containsKey(String) - Method in class org.archive.crawler.datamodel.CandidateURI
 
containsKey(Object) - Method in class org.archive.util.CachedBdbMap
Deprecated.  
containsPrefixOf(String) - Method in class org.archive.util.PrefixSet
Test whether the given String is prefixed by one of this set's entries.
containsServer(String) - Method in class org.archive.crawler.datamodel.ServerCache
 
containsValue(Object) - Method in class org.archive.util.CachedBdbMap
Deprecated.  
CONTENT_BYTES - Static variable in class org.archive.io.warc.WARCWriter
 
CONTENT_CHANGED - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
URI content had changed between the two latest, successfully completed fetches.
CONTENT_DESCRIPTION - Static variable in interface org.archive.io.warc.WARCConstants
 
CONTENT_LENGTH - Static variable in interface org.archive.io.warc.WARCConstants
 
CONTENT_LENGTH_KEY - Static variable in interface org.archive.crawler.writer.Kw3Constants
 
CONTENT_MD5_KEY - Static variable in interface org.archive.crawler.writer.Kw3Constants
 
CONTENT_TYPE - Static variable in interface org.archive.io.warc.WARCConstants
 
CONTENT_TYPE_KEY - Static variable in interface org.archive.crawler.writer.Kw3Constants
 
CONTENT_UNCHANGED - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
URI content has not changed between the two latest, successfully completed fetches.
CONTENT_UNKNOWN - Static variable in interface org.archive.crawler.frontier.AdaptiveRevisitAttributeConstants
No knowledge of URI content.
ContentBasedWaitEvaluator - Class in org.archive.crawler.postprocessor
A WaitEvaluator that compares the CrawlURIs content type to a configurable regular expression.
ContentBasedWaitEvaluator(String) - Constructor for class org.archive.crawler.postprocessor.ContentBasedWaitEvaluator
Constructor
ContentBasedWaitEvaluator(String, String, String, Long, Long, Long, Double, Double) - Constructor for class org.archive.crawler.postprocessor.ContentBasedWaitEvaluator
Constructor
contentSinceCheck - Variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
 
ContentTypeMatchesRegExpDecideRule - Class in org.archive.crawler.deciderules
DecideRule whose decision is applied if the URI's content-type is present and matches the supplied regular expression.
ContentTypeMatchesRegExpDecideRule(String) - Constructor for class org.archive.crawler.deciderules.ContentTypeMatchesRegExpDecideRule
 
ContentTypeNotMatchesRegExpDecideRule - Class in org.archive.crawler.deciderules
DecideRule whose decision is applied if the URI's content-type is present and does not match the supplied regular expression.
ContentTypeNotMatchesRegExpDecideRule(String) - Constructor for class org.archive.crawler.deciderules.ContentTypeNotMatchesRegExpDecideRule
 
ContentTypeRegExpFilter - Class in org.archive.crawler.filter
Deprecated. As of release 1.10.0. To be replaced by an equivalent DecideRule.
ContentTypeRegExpFilter(String) - Constructor for class org.archive.crawler.filter.ContentTypeRegExpFilter
Deprecated.  
ContentTypeRegExpFilter(String, String) - Constructor for class org.archive.crawler.filter.ContentTypeRegExpFilter
Deprecated.  
contextDestroyed(ServletContextEvent) - Method in class org.archive.crawler.WebappLifecycle
 
contextInitialized(ServletContextEvent) - Method in class org.archive.crawler.WebappLifecycle
 
CONTINUATION - Static variable in interface org.archive.io.warc.WARCConstants
 
CONTINUATION_INDEX - Static variable in interface org.archive.io.warc.WARCConstants
 
controlConversation - Variable in class org.archive.net.ClientFTP
 
controller - Variable in class org.archive.crawler.extractor.CrawlUriSWFAction
 
controller - Variable in class org.archive.crawler.framework.AbstractTracker
A reference to the CrawlContoller of the crawl that we are to track statistics for.
controller - Variable in class org.archive.crawler.framework.ToePool
 
controller - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
CONVERSION - Static variable in interface org.archive.io.warc.WARCConstants
 
CONVERSION_INDEX - Static variable in interface org.archive.io.warc.WARCConstants
 
convertAllPrefixesToDomains() - Method in class org.archive.util.SurtPrefixSet
Changes all prefixes so that they only enforce a general domain (allowing subdomains).For prefixes that don't include a ')', no change is necessary.
convertAllPrefixesToHosts() - Method in class org.archive.util.SurtPrefixSet
Changes all prefixes so that they enforce an exact host.
convertImpact(int) - Static method in class org.archive.util.JmxUtils
 
convertPrefixToDomain(String) - Static method in class org.archive.util.SurtPrefixSet
 
convertPrefixToHost(String) - Static method in class org.archive.util.SurtPrefixSet
 
convertToFatalConfigurationException(Exception) - Method in class org.archive.crawler.framework.CrawlController
 
convertToOpenMBeanAttribute(MBeanAttributeInfo) - Static method in class org.archive.util.JmxUtils
 
convertToOpenMBeanAttribute(MBeanAttributeInfo, String) - Static method in class org.archive.util.JmxUtils
 
convertToOpenMBeanOperation(MBeanOperationInfo) - Static method in class org.archive.util.JmxUtils
 
convertToOpenMBeanOperation(MBeanOperationInfo, String, OpenType) - Static method in class org.archive.util.JmxUtils
 
convertToOpenMBeanOperationInfo(MBeanParameterInfo) - Static method in class org.archive.util.JmxUtils
 
cookieDb - Variable in class org.archive.crawler.fetcher.FetchHTTP
Database backing cookie map, if using BDB
COOKIEDB_NAME - Static variable in class org.archive.crawler.fetcher.FetchHTTP
Name of cookie BDB Database
cookies - Variable in class org.archive.crawler.fetcher.FetchHTTP.PostRestore
 
CookieUtils - Class in org.archive.crawler.admin.ui
Utility methods for accessing cookies.
CookieUtils() - Constructor for class org.archive.crawler.admin.ui.CookieUtils
 
copyAttribute(String, DataContainer) - Method in class org.archive.crawler.settings.DataContainer
 
copyAttributeInfo(String, DataContainer) - Method in class org.archive.crawler.settings.DataContainer
 
copyContentBodyTo(File) - Method in class org.archive.io.RecordingInputStream
 
copyFile(File, File) - Static method in class org.archive.util.FileUtils
Copy the src file to the destination.
copyFile(File, File, boolean) - Static method in class org.archive.util.FileUtils
Copy the src file to the destination.
copyFile(File, File, long) - Static method in class org.archive.util.FileUtils
Copy up to extent bytes of the source file to the destination
copyFile(File, File, long, boolean) - Static method in class org.archive.util.FileUtils
Copy up to extent bytes of the source file to the destination
copyFiles(File, Set, File) - Static method in class org.archive.util.FileUtils
 
copyFiles(File, File) - Static method in class org.archive.util.FileUtils
Recursively copy all files from one directory to another.
copyFiles(File, FilenameFilter, File, boolean, boolean) - Static method in class org.archive.util.FileUtils
Recursively copy all files from one directory to another.
copyFiles(File, FilenameFilter, File, boolean, boolean, List<IOException>) - Static method in class org.archive.util.FileUtils
Recursively copy all files from one directory to another.
copyFrom(InputStream, long, boolean) - Method in class org.archive.io.WriterPoolMember
Copy bytes from the provided InputStream to the target file/stream being written.
copyPersistSourceToHistoryMap(File, String, StoredSortedMap<String, AList>) - Static method in class org.archive.crawler.processor.recrawl.PersistProcessor
Populates a given StoredSortedMap (history map) from an old environment db or a persist log.
copySettings(File) - Method in class org.archive.crawler.framework.CrawlController
Copy off the settings.
copySettings(File, String) - Method in class org.archive.crawler.settings.XMLSettingsHandler
Creates a replica of the settings file structure in another directory (fully recursive, includes all per host settings).
copyToSystemProperty(String, String) - Static method in class org.archive.crawler.Heritrix
Copy the given key-value into System properties, as long as there is no existing value.
CoreAttributeConstants - Interface in org.archive.crawler.datamodel
CrawlURI attribute keys used by the core crawler classes.
CostAssignmentPolicy - Class in org.archive.crawler.frontier
Calculate a integer 'cost' value for the given CrawlURI.
CostAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.CostAssignmentPolicy
 
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.AntiCalendarCostAssignmentPolicy
 
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.CostAssignmentPolicy
 
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.UnitCostAssignmentPolicy
 
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.WagCostAssignmentPolicy
Add constant penalties for certain features of URI (and its 'via') that make it more delayable/skippable.
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.ZeroCostAssignmentPolicy
 
count() - Method in interface org.archive.crawler.datamodel.UriUniqFilter
 
count - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
count - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
count() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
count() - Method in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
count() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
count - Variable in class org.archive.util.AbstractLongFPSet
The current number of elements in the set
count() - Method in class org.archive.util.AbstractLongFPSet
Return the number of entries in this set.
count - Variable in class org.archive.util.fingerprint.ArrayLongFPCache
 
count() - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
count() - Method in interface org.archive.util.fingerprint.LongFPSet
get the number of elements in the Set
count - Variable in class org.archive.util.ObjectIdentityBdbCache
 
COUNT_DOMAIN - Static variable in class org.archive.crawler.frontier.DomainSensitiveFrontier
Deprecated.  
COUNT_HOST - Static variable in class org.archive.crawler.frontier.DomainSensitiveFrontier
Deprecated.  
COUNT_OVERRIDE - Static variable in class org.archive.crawler.frontier.DomainSensitiveFrontier
Deprecated.  
countCrawlURIs() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Count all entries in both primaryUriDB and processingUriDB.
Cp1252 - Class in org.archive.util.ms
A fast implementation of code page 1252.
CP1252_INDICATOR - Static variable in class org.archive.util.ms.PieceTable
The bit that indicates if a piece uses Cp1252 or unicode.
CP1252_MASK - Static variable in class org.archive.util.ms.PieceTable
The mask to use to clear the Cp1252 flag bit.
CRAWL_LOG_STYLE - Static variable in class org.archive.crawler.admin.CrawlJob
 
CRAWL_TIME_ATTR - Static variable in class org.archive.crawler.admin.CrawlJob
 
crawlCheckpoint(File) - Method in class org.archive.crawler.admin.CrawlJob
 
crawlCheckpoint(File) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
crawlCheckpoint(File) - Method in class org.archive.crawler.admin.StatisticsTracker
 
crawlCheckpoint(File) - Method in interface org.archive.crawler.event.CrawlStatusListener
Called by CrawlController when checkpointing.
crawlCheckpoint(File) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
crawlCheckpoint(File) - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
crawlCheckpoint(File) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
crawlCheckpoint(File) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
crawlCheckpoint(File) - Method in class org.archive.crawler.frontier.BdbFrontier
 
crawlCheckpoint(File) - Method in class org.archive.crawler.processor.recrawl.PersistLogProcessor
 
crawlCheckpoint(File) - Method in class org.archive.crawler.processor.recrawl.PersistStoreProcessor
 
CrawlController - Class in org.archive.crawler.framework
CrawlController collects all the classes which cooperate to perform a crawl and provides a high-level interface to the running crawl.
CrawlController() - Constructor for class org.archive.crawler.framework.CrawlController
Default constructor
crawlDelay - Variable in class org.archive.crawler.datamodel.RobotsDirectives
 
crawlDuration() - Method in class org.archive.crawler.framework.AbstractTracker
 
crawlDuration() - Method in interface org.archive.crawler.framework.StatisticsTracking
Returns how long the current crawl has been running (excluding any time spent paused/suspended/stopped) since it began.
crawledBytes - Variable in class org.archive.crawler.admin.StatisticsTracker
tally sizes novel, verified (same hash), vouched (not-modified)
CrawledBytesHistotable - Class in org.archive.crawler.util
 
CrawledBytesHistotable() - Constructor for class org.archive.crawler.util.CrawledBytesHistotable
 
crawledBytesSummary() - Method in class org.archive.crawler.admin.StatisticsTracker
 
crawledURIDisregard(CrawlURI) - Method in class org.archive.crawler.admin.StatisticsTracker
 
crawledURIDisregard(CrawlURI) - Method in interface org.archive.crawler.event.CrawlURIDispositionListener
Notification of a crawled URI that is to be disregarded.
crawledURIDisregard(CrawlURI) - Method in class org.archive.crawler.frontier.DomainSensitiveFrontier
Deprecated.  
crawledURIDisregard(CrawlURI) - Method in class org.archive.crawler.selftest.SelfTestCrawlJobHandler
 
crawledURIFailure(CrawlURI) - Method in class org.archive.crawler.admin.StatisticsTracker
 
crawledURIFailure(CrawlURI) - Method in interface org.archive.crawler.event.CrawlURIDispositionListener
Notification of a failed crawling of a URI.
crawledURIFailure(CrawlURI) - Method in class org.archive.crawler.frontier.DomainSensitiveFrontier
Deprecated.  
crawledURIFailure(CrawlURI) - Method in class org.archive.crawler.selftest.SelfTestCrawlJobHandler
 
crawledURINeedRetry(CrawlURI) - Method in class org.archive.crawler.admin.StatisticsTracker
 
crawledURINeedRetry(CrawlURI) - Method in interface org.archive.crawler.event.CrawlURIDispositionListener
Notification of a failed crawl of a URI that will be retried (failure due to possible transient problems).
crawledURINeedRetry(CrawlURI) - Method in class org.archive.crawler.frontier.DomainSensitiveFrontier
Deprecated.  
crawledURINeedRetry(CrawlURI) - Method in class org.archive.crawler.selftest.SelfTestCrawlJobHandler
 
crawledURISuccessful(CrawlURI) - Method in class org.archive.crawler.admin.StatisticsTracker
 
crawledURISuccessful(CrawlURI) - Method in interface org.archive.crawler.event.CrawlURIDispositionListener
Notification of a successfully crawled URI
crawledURISuccessful(CrawlURI) - Method in class org.archive.crawler.frontier.DomainSensitiveFrontier
Deprecated.  
crawledURISuccessful(CrawlURI) - Method in class org.archive.crawler.selftest.SelfTestCrawlJobHandler
 
CRAWLEND_REPORT_OPER - Static variable in class org.archive.crawler.Heritrix
 
crawlEnded(String) - Method in class org.archive.crawler.admin.CrawlJob
 
crawlEnded(String) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
crawlEnded(String) - Method in class org.archive.crawler.admin.StatisticsTracker
 
crawlEnded(String) - Method in interface org.archive.crawler.event.CrawlStatusListener
Called when a CrawlController has ended a crawl and is about to exit.
crawlEnded(String) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
crawlEnded(String) - Method in class org.archive.crawler.framework.AbstractTracker
 
crawlEnded(String) - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
crawlEnded(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
crawlEnded(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
crawlEnded(String) - Method in class org.archive.crawler.frontier.BdbFrontier
 
crawlEnded(String) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
crawlEnded(String) - Method in class org.archive.crawler.processor.recrawl.PersistLogProcessor
 
crawlEnded(String) - Method in class org.archive.crawler.processor.recrawl.PersistStoreProcessor
 
crawlEnded(String) - Method in class org.archive.crawler.selftest.SelfTestCrawlJobHandler
 
crawlEnding(String) - Method in class org.archive.crawler.admin.CrawlJob
 
crawlEnding(String) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
crawlEnding(String) - Method in interface org.archive.crawler.event.CrawlStatusListener
Called when a CrawlController is ending a crawl (for any reason)
crawlEnding(String) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
crawlEnding(String) - Method in class org.archive.crawler.framework.AbstractTracker
 
crawlEnding(String) - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
crawlEnding(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
crawlEnding(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
crawlEnding(String) - Method in class org.archive.crawler.processor.recrawl.PersistLogProcessor
 
crawlEnding(String) - Method in class org.archive.crawler.processor.recrawl.PersistStoreProcessor
 
CRAWLER_PACKAGE - Static variable in class org.archive.crawler.Heritrix
The crawler package.
crawlerEndTime - Variable in class org.archive.crawler.framework.AbstractTracker
 
CrawlerJournal - Class in org.archive.crawler.io
Utility class for a crawler journal/log that is compressed and rotates by serial number at checkpoints.
CrawlerJournal(String, String) - Constructor for class org.archive.crawler.io.CrawlerJournal
Create a new crawler journal at the given location
CrawlerJournal(File) - Constructor for class org.archive.crawler.io.CrawlerJournal
Create a new crawler journal at the given location
crawlerPauseStarted - Variable in class org.archive.crawler.framework.AbstractTracker
 
CrawlerSettings - Class in org.archive.crawler.settings
Class representing a settings file.
CrawlerSettings(SettingsHandler, String) - Constructor for class org.archive.crawler.settings.CrawlerSettings
Constructs a new CrawlerSettings object.
CrawlerSettings(SettingsHandler, String, String) - Constructor for class org.archive.crawler.settings.CrawlerSettings
Constructs a new CrawlerSettings object which is a refinement of another settings object.
crawlerStartTime - Variable in class org.archive.crawler.framework.AbstractTracker
 
crawlerTotalPausedTime - Variable in class org.archive.crawler.framework.AbstractTracker
 
CrawlHost - Class in org.archive.crawler.datamodel
Represents a single remote "host".
CrawlHost(String) - Constructor for class org.archive.crawler.datamodel.CrawlHost
Create a new CrawlHost object.
CrawlHost(String, String) - Constructor for class org.archive.crawler.datamodel.CrawlHost
Create a new CrawlHost object.
CrawlJob - Class in org.archive.crawler.admin
A CrawlJob encapsulates a 'crawl order' with any and all information and methods needed by a CrawlJobHandler to accept and execute them.
CrawlJob() - Constructor for class org.archive.crawler.admin.CrawlJob
A shutdown Constructor.
CrawlJob(String, String, XMLSettingsHandler, CrawlJobErrorHandler, int, File) - Constructor for class org.archive.crawler.admin.CrawlJob
A constructor for jobs.
CrawlJob(String, XMLSettingsHandler, CrawlJobErrorHandler) - Constructor for class org.archive.crawler.admin.CrawlJob
A constructor for profiles.
CrawlJob(String, String, XMLSettingsHandler, CrawlJobErrorHandler, int, File, String, boolean, boolean) - Constructor for class org.archive.crawler.admin.CrawlJob
 
CrawlJob(File, CrawlJobErrorHandler) - Constructor for class org.archive.crawler.admin.CrawlJob
A constructor for reloading jobs from disk.
CrawlJob.MBeanCrawlController - Class in org.archive.crawler.admin
Subclass of crawlcontroller that unregisters beans when stopped.
CrawlJob.MBeanCrawlController() - Constructor for class org.archive.crawler.admin.CrawlJob.MBeanCrawlController
 
CRAWLJOB_JMXMBEAN_TYPE - Static variable in class org.archive.crawler.admin.CrawlJob
 
CrawlJobErrorHandler - Class in org.archive.crawler.admin
An implementation of the ValueErrorHandler for the UI.
CrawlJobErrorHandler() - Constructor for class org.archive.crawler.admin.CrawlJobErrorHandler
 
CrawlJobErrorHandler(Level) - Constructor for class org.archive.crawler.admin.CrawlJobErrorHandler
 
CrawlJobHandler - Class in org.archive.crawler.admin
This class manages CrawlJobs.
CrawlJobHandler(File) - Constructor for class org.archive.crawler.admin.CrawlJobHandler
Constructor.
CrawlJobHandler(File, boolean, boolean) - Constructor for class org.archive.crawler.admin.CrawlJobHandler
Constructor allowing for optional loading of profiles and jobs.
CrawlMapper - Class in org.archive.crawler.processor
A simple crawl splitter/mapper, dividing up CandidateURIs/CrawlURIs between crawlers by diverting some range of URIs to local log files (which can then be imported to other crawlers).
CrawlMapper(String, String) - Constructor for class org.archive.crawler.processor.CrawlMapper
Constructor.
CrawlOrder - Class in org.archive.crawler.datamodel
Represents the 'root' of the settings hierarchy.
CrawlOrder() - Constructor for class org.archive.crawler.datamodel.CrawlOrder
Construct a CrawlOrder.
crawlPaused(String) - Method in class org.archive.crawler.admin.CrawlJob
 
crawlPaused(String) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
crawlPaused(String) - Method in interface org.archive.crawler.event.CrawlStatusListener
Called when a CrawlController is actually paused (all threads are idle).
crawlPaused(String) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
crawlPaused(String) - Method in class org.archive.crawler.framework.AbstractTracker
 
crawlPaused(String) - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
crawlPaused(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
crawlPaused(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
crawlPaused(String) - Method in class org.archive.crawler.processor.recrawl.PersistLogProcessor
 
crawlPaused(String) - Method in class org.archive.crawler.processor.recrawl.PersistStoreProcessor
 
crawlPausing(String) - Method in class org.archive.crawler.admin.CrawlJob
 
crawlPausing(String) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
crawlPausing(String) - Method in interface org.archive.crawler.event.CrawlStatusListener
Called when a CrawlController is going to be paused.
crawlPausing(String) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
crawlPausing(String) - Method in class org.archive.crawler.framework.AbstractTracker
 
crawlPausing(String) - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
crawlPausing(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
crawlPausing(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
crawlPausing(String) - Method in class org.archive.crawler.processor.recrawl.PersistLogProcessor
 
crawlPausing(String) - Method in class org.archive.crawler.processor.recrawl.PersistStoreProcessor
 
crawlResuming(String) - Method in class org.archive.crawler.admin.CrawlJob
 
crawlResuming(String) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
crawlResuming(String) - Method in interface org.archive.crawler.event.CrawlStatusListener
Called when a CrawlController is resuming a crawl that had been paused.
crawlResuming(String) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
crawlResuming(String) - Method in class org.archive.crawler.framework.AbstractTracker
 
crawlResuming(String) - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
crawlResuming(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
crawlResuming(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
crawlResuming(String) - Method in class org.archive.crawler.processor.recrawl.PersistLogProcessor
 
crawlResuming(String) - Method in class org.archive.crawler.processor.recrawl.PersistStoreProcessor
 
CrawlScope - Class in org.archive.crawler.framework
A CrawlScope instance defines which URIs are "in" a particular crawl.
CrawlScope(String) - Constructor for class org.archive.crawler.framework.CrawlScope
Constructs a new CrawlScope.
CrawlScope() - Constructor for class org.archive.crawler.framework.CrawlScope
Default constructor.
CrawlServer - Class in org.archive.crawler.datamodel
Represents a single remote "server".
CrawlServer(String) - Constructor for class org.archive.crawler.datamodel.CrawlServer
Creates a new CrawlServer object.
CrawlSettingsSAXHandler - Class in org.archive.crawler.settings
An SAX element handler that updates a CrawlerSettings object.
CrawlSettingsSAXHandler(CrawlerSettings) - Constructor for class org.archive.crawler.settings.CrawlSettingsSAXHandler
Creates a new CrawlSettingsSAXHandler.
CrawlSettingsSAXSource - Class in org.archive.crawler.settings
Class that takes a CrawlerSettings object and create SAXEvents from it.
CrawlSettingsSAXSource(CrawlerSettings) - Constructor for class org.archive.crawler.settings.CrawlSettingsSAXSource
Constructs a new CrawlSettingsSAXSource.
crawlStarted(String) - Method in class org.archive.crawler.admin.CrawlJob
 
crawlStarted(String) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
crawlStarted(String) - Method in interface org.archive.crawler.event.CrawlStatusListener
Called on crawl start.
crawlStarted(String) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
crawlStarted(String) - Method in class org.archive.crawler.framework.AbstractTracker
 
crawlStarted(String) - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
crawlStarted(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
crawlStarted(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
crawlStarted(String) - Method in class org.archive.crawler.processor.recrawl.PersistLogProcessor
 
crawlStarted(String) - Method in class org.archive.crawler.processor.recrawl.PersistStoreProcessor
 
crawlStarted(String) - Method in class org.archive.crawler.selftest.SelfTestCrawlJobHandler
 
CrawlStateUpdater - Class in org.archive.crawler.postprocessor
A step, late in the processing of a CrawlURI, for updating the per-host information that may have been affected by the fetch.
CrawlStateUpdater(String) - Constructor for class org.archive.crawler.postprocessor.CrawlStateUpdater
 
CrawlStatusListener - Interface in org.archive.crawler.event
Listen for CrawlStatus events.
CrawlSubstats - Class in org.archive.crawler.datamodel
Collector of statistics for a 'subset' of a crawl, such as a server (host:port), host, or frontier group (eg queue).
CrawlSubstats() - Constructor for class org.archive.crawler.datamodel.CrawlSubstats
 
CrawlSubstats.HasCrawlSubstats - Interface in org.archive.crawler.datamodel
 
CrawlSubstats.Stage - Enum in org.archive.crawler.datamodel
 
CrawlURI - Class in org.archive.crawler.datamodel
Represents a candidate URI and the associated state it collects as it is crawled.
CrawlURI(UURI) - Constructor for class org.archive.crawler.datamodel.CrawlURI
Create a new instance of CrawlURI from a UURI.
CrawlURI(CandidateURI, long) - Constructor for class org.archive.crawler.datamodel.CrawlURI
Create a new instance of CrawlURI from a CandidateURI
crawlURIBinding - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
A binding for the CrawlURIARWrapper object
CrawlURIDispositionListener - Interface in org.archive.crawler.event
An interface for objects that want to be notified of a CrawlURI disposition (happens each time a curi has been through the processors).
CrawlUriSWFAction - Class in org.archive.crawler.extractor
SWF action that handles discovered URIs.
CrawlUriSWFAction(CrawlURI, CrawlController) - Constructor for class org.archive.crawler.extractor.CrawlUriSWFAction
 
create(CrawlerSettings, String, Class) - Method in class org.archive.crawler.datamodel.CredentialStore
Create and add to the list a credential of the passed type giving the credential the passed name.
createAlreadyIncluded() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Create a UriUniqFilter that will serve as record of already seen URIs.
createAlreadyIncluded() - Method in class org.archive.crawler.frontier.BdbFrontier
Create a UriUniqFilter that will serve as record of already seen URIs.
createAlreadyIncluded() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Create a UriUniqFilter that will serve as record of already seen URIs.
createAndAddLink(String, CharSequence, char) - Method in class org.archive.crawler.datamodel.CrawlURI
Convenience method for creating a Link with the given string and context
createAndAddLinkRelativeToBase(String, CharSequence, char) - Method in class org.archive.crawler.datamodel.CrawlURI
Convenience method for creating a Link with the given string and context, relative to a previously set base HREF if available (or relative to the current CrawlURI if no other base has been set)
createAndAddLinkRelativeToVia(String, CharSequence, char) - Method in class org.archive.crawler.datamodel.CrawlURI
Convenience method for creating a Link with the given string and context, relative to this CrawlURI's via UURI if available.
createArchiveRecord(InputStream, long) - Method in class org.archive.io.arc.ARCReader
Create new arc record.
createArchiveRecord(InputStream, long) - Method in class org.archive.io.ArchiveReader
Return an Archive Record homed on offset into is.
createArchiveRecord(InputStream, long) - Method in class org.archive.io.warc.WARCReader
Create new WARC record.
createBloom(long, int, Random) - Method in class org.archive.util.BloomFilterTestBase
 
createCandidateURI(UURI, Link) - Method in class org.archive.crawler.datamodel.CandidateURI
Utility method for creation of CandidateURIs found extracting links from this CrawlURI.
createCandidateURI(UURI, Link, int, boolean) - Method in class org.archive.crawler.datamodel.CandidateURI
Utility method for creation of CandidateURIs found extracting links from this CrawlURI.
createCDXIndexFile(String) - Static method in class org.archive.io.arc.ARCReader
Generate a CDX index file for an ARC file.
createCDXIndexFile(String) - Static method in class org.archive.io.warc.WARCReader
Generate a CDX index file for an ARC file.
createCharSequenceFrom(InputStream, Charset) - Method in class org.archive.extractor.CharSequenceLinkExtractor
 
createCheckpointInProgressDirectory() - Method in class org.archive.crawler.framework.Checkpointer
 
createCompositeType(Map, String, String) - Static method in class org.archive.util.JmxUtils
 
createCrawlController() - Method in class org.archive.crawler.admin.CrawlJob
 
createCrawlJob(CrawlJobHandler, File, String) - Static method in class org.archive.crawler.Heritrix
 
createCrawlJobBasedOn(File, String, String, String) - Method in class org.archive.crawler.Heritrix
 
createdEnvironment - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
createDiskMap(Database, StoredClassCatalog, Class, Class) - Method in class org.archive.util.CachedBdbMap
Deprecated.  
createDiskMap(Database, StoredClassCatalog, Class) - Method in class org.archive.util.ObjectIdentityBdbCache
 
createFile() - Method in class org.archive.io.arc.ARCWriter
 
createFile(File) - Method in class org.archive.io.warc.WARCWriter
 
createFile() - Method in class org.archive.io.WriterPoolMember
Create a new file.
createFile(File) - Method in class org.archive.io.WriterPoolMember
 
createFileLogger(File, String, Logger) - Static method in class org.archive.crawler.util.LogUtils
Creates a file logger that use heritrix.properties file logger configuration.
createFp(CharSequence) - Static method in class org.archive.crawler.util.FPMergeUriUniqFilter
Create a fingerprint from the given key
createHQ(String, int) - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
Creates a new AdaptiveRevisitHostQueue.
createKey(CharSequence) - Static method in class org.archive.crawler.util.BdbUriUniqFilter
Create fingerprint.
createLink(String, CharSequence, char) - Method in class org.archive.crawler.datamodel.CrawlURI
Convenience method for creating a Link discovered at this URI with the given string and context
createMetaline(String, String, String, String, String) - Method in class org.archive.io.arc.ARCWriter
 
createNewJob(File, String, String, String, int) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
createOpenMBeanAttributeInfo(OpenType, MBeanAttributeInfo, String) - Static method in class org.archive.util.JmxUtils
 
createRecordHeader(String, String, String, String, URI, ANVLRecord, long) - Method in class org.archive.io.warc.WARCWriter
 
createSeedCandidateURI(UURI) - Static method in class org.archive.crawler.datamodel.CandidateURI
 
createSettingsHandler(File, String, String, String, File, CrawlJobErrorHandler, String, String) - Method in class org.archive.crawler.admin.CrawlJobHandler
Creates a new settings handler based on an existing job.
createSocket(String, int, InetAddress, int) - Method in class org.archive.crawler.fetcher.HeritrixProtocolSocketFactory
 
createSocket(String, int, InetAddress, int, HttpConnectionParams) - Method in class org.archive.crawler.fetcher.HeritrixProtocolSocketFactory
Attempts to get a new socket connection to the given host within the given time limit.
createSocket(String, int) - Method in class org.archive.crawler.fetcher.HeritrixProtocolSocketFactory
 
createSocket(String, int, InetAddress, int) - Method in class org.archive.crawler.fetcher.HeritrixSSLProtocolSocketFactory
 
createSocket(String, int) - Method in class org.archive.crawler.fetcher.HeritrixSSLProtocolSocketFactory
 
createSocket(String, int, InetAddress, int, HttpConnectionParams) - Method in class org.archive.crawler.fetcher.HeritrixSSLProtocolSocketFactory
 
createSocket(Socket, String, int, boolean) - Method in class org.archive.crawler.fetcher.HeritrixSSLProtocolSocketFactory
 
createUriSet() - Method in class org.archive.crawler.util.MemUriUniqFilter
 
createUriSet() - Method in class org.archive.crawler.util.NoopUriUniqFilter
 
createWcdx(String) - Static method in class org.archive.io.arc.ARC2WCDX
 
createWcdx(ARCReader) - Static method in class org.archive.io.arc.ARC2WCDX
 
Credential - Class in org.archive.crawler.datamodel.credential
Credential type.
Credential(String, String) - Constructor for class org.archive.crawler.datamodel.credential.Credential
Constructor.
CredentialAvatar - Class in org.archive.crawler.datamodel.credential
A credential representation.
CredentialAvatar(Class, String) - Constructor for class org.archive.crawler.datamodel.credential.CredentialAvatar
Constructor.
CredentialAvatar(Class, String, String) - Constructor for class org.archive.crawler.datamodel.credential.CredentialAvatar
Constructor.
CredentialStore - Class in org.archive.crawler.datamodel
Front door to the credential store.
CredentialStore(String) - Constructor for class org.archive.crawler.datamodel.CredentialStore
Constructor.
Criteria - Interface in org.archive.crawler.settings.refinements
Superclass for the refinement criteria.
criteriaIterator() - Method in class org.archive.crawler.settings.refinements.Refinement
Get an ListIterator over the criteria set for this refinement.
CRLF - Static variable in interface org.archive.io.ArchiveFileConstants
 
CRLF - Static variable in class org.archive.util.anvl.ANVLRecord
An ANVL 'newline'.
CRLF_BYTES - Static variable in class org.archive.io.warc.WARCWriter
NEWLINE as bytes.
CSS_BACKSLASH_ESCAPE - Static variable in class org.archive.crawler.extractor.ExtractorCSS
 
CSS_BACKSLASH_ESCAPE - Static variable in class org.archive.extractor.RegexpCSSLinkExtractor
 
CSS_URI_EXTRACTOR - Static variable in class org.archive.crawler.extractor.ExtractorCSS
CSS URL extractor pattern.
CSS_URI_EXTRACTOR - Static variable in class org.archive.extractor.RegexpCSSLinkExtractor
CSS URL extractor pattern.
curi - Variable in class org.archive.crawler.extractor.CrawlUriSWFAction
 
curi - Variable in class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment
The URI, for logging and error reporting.
current - Variable in class org.archive.crawler.util.BenchmarkUriUniqFilters
 
CURRENT_DOC_RATE_ATTR - Static variable in class org.archive.crawler.admin.CrawlJob
 
CURRENT_KB_RATE_ATTR - Static variable in class org.archive.crawler.admin.CrawlJob
 
CURRENT_LOG_SUFFIX - Static variable in class org.archive.crawler.framework.CrawlController
suffix to use on active logs
currentDocsPerSecond - Variable in class org.archive.crawler.admin.StatisticsTracker
 
currentFps - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
currentIterator - Variable in class org.archive.util.iterator.CompositeIterator
 
CURRENTJOB_ATTR - Static variable in class org.archive.crawler.Heritrix
 
currentKBPerSec - Variable in class org.archive.crawler.admin.StatisticsTracker
 
currentKey - Variable in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
Strong reference needed to avoid disappearance of key between nextEntry() and any use of the entry
currentProcessedDocsPerSec() - Method in class org.archive.crawler.admin.StatisticsTracker
 
currentProcessedDocsPerSec() - Method in interface org.archive.crawler.framework.StatisticsTracking
Returns an estimate of recent document download rates based on a queue of recently seen CrawlURIs (as of last snapshot).
currentProcessedKBPerSec() - Method in class org.archive.crawler.admin.StatisticsTracker
 
currentProcessedKBPerSec() - Method in interface org.archive.crawler.framework.StatisticsTracking
Calculates an estimate of the rate, in kb, at which documents are currently being processed by the crawler.
currentRecord(ArchiveRecord) - Method in class org.archive.io.ArchiveReader
 
CUSTOM - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
CUSTOM - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
CUSTOM - Static variable in class org.archive.crawler.filter.FilePatternFilter
Deprecated.  
CustomSWFTags - Class in org.archive.crawler.extractor
Overwrite action tags, that may hold URI, to use CrawlUriSWFAction action.
CustomSWFTags(SWFActions) - Constructor for class org.archive.crawler.extractor.CustomSWFTags
 

D

d - Variable in class org.archive.util.BloomFilter64bit
The number of hash functions used by this filter.
databaseConfig() - Static method in class org.archive.queue.StoredQueue
A suitable DatabaseConfig for the Database backing a StoredQueue.
DataContainer - Class in org.archive.crawler.settings
This class holds the data for a ComplexType for a settings object.
DataContainer(CrawlerSettings, ComplexType) - Constructor for class org.archive.crawler.settings.DataContainer
Create a data container for a module.
dataSocket - Variable in class org.archive.net.ClientFTP
 
DATE_FIELD_KEY - Static variable in interface org.archive.io.ArchiveFileConstants
Key for the Archive File Creation Date field.
db - Variable in class org.archive.util.CachedBdbMap
Deprecated. The BDB JE database used for this instance.
db - Variable in class org.archive.util.ObjectIdentityBdbCache
The BDB JE database used for this instance.
dbDir - Variable in class org.archive.util.CachedBdbMap.DbEnvironmentEntry
Deprecated.  
DEBUG - Static variable in class org.archive.util.BloomFilter64bit
 
DecideRule - Class in org.archive.crawler.deciderules
Interface for rules which, given an object to evaluate, respond with a decision: DecideRule.ACCEPT, DecideRule.REJECT, or DecideRule.PASS.
DecideRule(String) - Constructor for class org.archive.crawler.deciderules.DecideRule
Constructor.
DecideRuleSequence - Class in org.archive.crawler.deciderules
RuleSequence represents a series of Rules, which are applied in turn to give the final result.
DecideRuleSequence(String) - Constructor for class org.archive.crawler.deciderules.DecideRuleSequence
 
DecideRuleSequence(String, String) - Constructor for class org.archive.crawler.deciderules.DecideRuleSequence
 
decideToMapOutlink(CandidateURI) - Method in class org.archive.crawler.processor.CrawlMapper
 
DecidingFilter - Class in org.archive.crawler.deciderules
DecidingFilter: a classic Filter which makes its accept/reject decision based on whatever DecideRules have been set up inside it.
DecidingFilter(String, String) - Constructor for class org.archive.crawler.deciderules.DecidingFilter
 
DecidingFilter(String) - Constructor for class org.archive.crawler.deciderules.DecidingFilter
 
DecidingScope - Class in org.archive.crawler.deciderules
DecidingScope: a Scope which makes its accept/reject decision based on whatever DecideRules have been set up inside it.
DecidingScope(String) - Constructor for class org.archive.crawler.deciderules.DecidingScope
 
decisionFor(Object) - Method in class org.archive.crawler.deciderules.AcceptDecideRule
 
decisionFor(Object) - Method in class org.archive.crawler.deciderules.BeanShellDecideRule
 
decisionFor(Object) - Method in class org.archive.crawler.deciderules.ConfiguredDecideRule
 
decisionFor(Object) - Method in class org.archive.crawler.deciderules.DecideRule
Make decision on passed object.
decisionFor(Object) - Method in class org.archive.crawler.deciderules.DecideRuleSequence
 
decisionFor(Object) - Method in class org.archive.crawler.deciderules.FilterDecideRule
Make decision on passed object.
decisionFor(Object) - Method in class org.archive.crawler.deciderules.PredicatedDecideRule
 
decisionFor(Object) - Method in class org.archive.crawler.deciderules.PrerequisiteAcceptDecideRule
 
decisionFor(Object) - Method in class org.archive.crawler.deciderules.RejectDecideRule
 
decisionFor(Object) - Method in class org.archive.crawler.deciderules.SeedAcceptDecideRule
 
decode(char[], String) - Static method in class org.archive.net.LaxURI
 
decode(String, String) - Static method in class org.archive.net.LaxURI
 
decode(String) - Static method in class org.archive.util.Base32
Decodes the given Base32 String to a raw byte array.
decode(int) - Static method in class org.archive.util.ms.Cp1252
Returns the Unicode character for the given Cp1252 byte.
decodeUrlLoose(byte[]) - Static method in class org.archive.net.LaxURLCodec
Decodes an array of URL safe 7-bit characters into an array of original bytes.
decrementQueuedCount(long) - Method in class org.archive.crawler.frontier.AbstractFrontier
Note that a number of queued Uris have been deleted.
deepestUri - Variable in class org.archive.crawler.admin.StatisticsTracker
 
deepestUri() - Method in class org.archive.crawler.admin.StatisticsTracker
Ordinal position of the 'deepest' URI eligible for crawling.
deepestUri() - Method in interface org.archive.crawler.framework.Frontier
 
deepestUri() - Method in interface org.archive.crawler.framework.StatisticsTracking
 
deepestUri() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
deepestUri() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
DEFAULT - Static variable in class org.archive.httpclient.ConfigurableX509TrustManager
Default setting for trust level.
DEFAULT - Static variable in class org.archive.net.LaxURLCodec
 
DEFAULT_ALSO_CHECK_VIA - Static variable in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
 
DEFAULT_ALSO_CHECK_VIA - Static variable in class org.archive.crawler.scope.SurtPrefixScope
Deprecated.  
DEFAULT_ATTR_RECOVERY_ENABLED - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_BALANCE_REPLENISH_AMOUNT - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
DEFAULT_BUFFER_SIZE - Static variable in class org.archive.io.RecyclingFastBufferedOutputStream
The default size of the internal buffer in bytes (16Ki).
DEFAULT_CALCULATE_ROBOTS_ONLY - Static variable in class org.archive.crawler.prefetch.PreconditionEnforcer
whether to calculate robots exclusion without applying
DEFAULT_CAPACITY - Static variable in class org.archive.util.fingerprint.ArrayLongFPCache
 
DEFAULT_CHANGED_FACTOR - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
 
DEFAULT_CHECK_OUTLINKS - Static variable in class org.archive.crawler.processor.CrawlMapper
 
DEFAULT_CHECK_URI - Static variable in class org.archive.crawler.processor.CrawlMapper
 
DEFAULT_CHECKPOINT_COPY_BDBJE_LOGS - Static variable in class org.archive.crawler.datamodel.CrawlOrder
 
DEFAULT_CHMOD_VALUE - Static variable in class org.archive.crawler.writer.Kw3WriterProcessor
Default value for permissions.
DEFAULT_COLLECTION_VALUE - Static variable in class org.archive.crawler.writer.Kw3WriterProcessor
Default value for collection.
DEFAULT_COMPRESS - Static variable in class org.archive.crawler.framework.WriterPoolProcessor
Default as to whether we do compression of files.
DEFAULT_CONTENT_LENGTH_TRESHOLD - Static variable in class org.archive.crawler.deciderules.NotExceedsDocumentLengthTresholdDecideRule
 
DEFAULT_CONTENT_REGEXPR - Static variable in class org.archive.crawler.postprocessor.ContentBasedWaitEvaluator
 
DEFAULT_CONTENT_REGEXPR - Static variable in class org.archive.crawler.postprocessor.ImageWaitEvaluator
 
DEFAULT_CONTENT_REGEXPR - Static variable in class org.archive.crawler.postprocessor.TextWaitEvaluator
 
DEFAULT_COST_POLICY - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
DEFAULT_COUNTRY_CODE - Static variable in class org.archive.crawler.deciderules.ExternalGeoLocationDecideRule
 
DEFAULT_CRAWLER_COUNT - Static variable in class org.archive.crawler.processor.HashCrawlMapper
 
DEFAULT_DEFAULT_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
 
DEFAULT_DELAY_FACTOR - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_DIGEST_ALGORITHM - Static variable in class org.archive.crawler.fetcher.FetchHTTP
Default algorithm to use for message disgesting.
DEFAULT_DIGEST_CONTENT - Static variable in class org.archive.crawler.fetcher.FetchHTTP
Default whether to perform on-the-fly digest hashing of content-bodies.
DEFAULT_DIGEST_METHOD - Static variable in interface org.archive.io.ArchiveFileConstants
 
DEFAULT_DIVERSION_DIR - Static variable in class org.archive.crawler.processor.CrawlMapper
 
DEFAULT_ENCODING - Static variable in class org.archive.crawler.Heritrix
Default encoding.
DEFAULT_ENCODING - Static variable in interface org.archive.io.arc.ARCConstants
Encoding to use getting bytes from strings.
DEFAULT_ENCODING - Static variable in interface org.archive.io.warc.WARCConstants
Encoding to use getting bytes from strings.
DEFAULT_END_OPERATION - Static variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
DEFAULT_ERROR_PENALTY_AMOUNT - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
DEFAULT_FORCE_QUEUE - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_FORCE_QUEUE - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
DEFAULT_FORCE_RETIRE - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
DEFAULT_GROUP_MAX_ALL_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
DEFAULT_GROUP_MAX_FETCH_RESPONSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
DEFAULT_GROUP_MAX_FETCH_SUCCESSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
DEFAULT_GROUP_MAX_SUCCESS_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
DEFAULT_GZIP_HEADER_LENGTH - Static variable in interface org.archive.io.arc.ARCConstants
Length of minimual 'default GZIP header.
DEFAULT_HARVESTER_VALUE - Static variable in class org.archive.crawler.writer.Kw3WriterProcessor
Default value for harvester.
DEFAULT_HERITRIX_OUT - Static variable in class org.archive.crawler.Heritrix
Heritrix stderr/stdout log file.
DEFAULT_HISTORY_LENGTH - Static variable in class org.archive.crawler.processor.recrawl.FetchHistoryProcessor
default history array length
DEFAULT_HOLD_QUEUES - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
DEFAULT_HOST_MAX_ALL_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
DEFAULT_HOST_MAX_FETCH_RESPONSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
DEFAULT_HOST_MAX_FETCH_SUCCESSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
DEFAULT_HOST_MAX_SUCCESS_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
DEFAULT_INITIAL_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.ImageWaitEvaluator
 
DEFAULT_INITIAL_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.TextWaitEvaluator
 
DEFAULT_INITIAL_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
 
DEFAULT_LIST_LOGIC - Static variable in class org.archive.crawler.deciderules.MatchesListRegExpDecideRule
 
DEFAULT_LIST_LOGIC - Static variable in class org.archive.crawler.filter.URIListRegExpFilter
Deprecated.  
DEFAULT_LOCAL_NAME - Static variable in class org.archive.crawler.processor.CrawlMapper
 
DEFAULT_LOG_FILENAME - Static variable in class org.archive.crawler.processor.recrawl.PersistLogProcessor
default log filename
DEFAULT_MAP_SOURCE - Static variable in class org.archive.crawler.processor.LexicalCrawlMapper
 
DEFAULT_MATCH_RETURN_VALUE - Static variable in class org.archive.crawler.filter.URIListRegExpFilter
Deprecated.  
DEFAULT_MAX_ACTIVE - Static variable in class org.archive.io.WriterPool
Default maximum active number of files in the pool.
DEFAULT_MAX_ARC_FILE_SIZE - Static variable in interface org.archive.io.arc.ARCConstants
Default maximum ARC file size.
DEFAULT_MAX_DELAY - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_MAX_FILE_SIZE - Static variable in class org.archive.crawler.writer.Kw3WriterProcessor
Default max file size.
DEFAULT_MAX_HOPS - Static variable in class org.archive.crawler.deciderules.TooManyHopsDecideRule
Default access so available to test code.
DEFAULT_MAX_HOST_BANDWIDTH_USAGE - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_MAX_OVERALL_BANDWIDTH_USAGE - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_MAX_PATH_DEPTH - Static variable in class org.archive.crawler.deciderules.TooManyPathSegmentsDecideRule
Default maximum value.
DEFAULT_MAX_PENDING - Static variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
DEFAULT_MAX_RETRIES - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_MAX_SIZE_BYTES - Static variable in class org.archive.crawler.extractor.HTTPContentDigest
 
DEFAULT_MAX_SPECULATIVE_HOPS - Static variable in class org.archive.crawler.deciderules.TransclusionDecideRule
Default maximum speculative ('X') hops.
DEFAULT_MAX_TRANS_HOPS - Static variable in class org.archive.crawler.deciderules.TransclusionDecideRule
Default maximum transitive hops -- any type Default access so can be accessed by unit tests.
DEFAULT_MAX_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
 
DEFAULT_MAX_WARC_FILE_SIZE - Static variable in interface org.archive.io.warc.WARCConstants
Default maximum WARC file size.
DEFAULT_MAXIMUM_WAIT - Static variable in class org.archive.io.WriterPool
Maximum time to wait on a free file..
DEFAULT_MIN_DELAY - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_MIN_WAIT_INTERVAL - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
 
DEFAULT_MODE - Static variable in class org.archive.crawler.frontier.DomainSensitiveFrontier
Deprecated.  
DEFAULT_MONITOR_MOUNTS - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
 
DEFAULT_PAUSE_AT_FINISH - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_PAUSE_AT_START - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_PAUSE_THRESHOLD - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
 
DEFAULT_PORT - Static variable in class org.archive.crawler.SimpleHttpServer
Default web port.
DEFAULT_PREFERENCE_EMBED_HOPS - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_PREFIX - Static variable in class org.archive.io.WriterPoolMember
Default file prefix.
DEFAULT_PROFILE - Static variable in class org.archive.crawler.admin.CrawlJobHandler
Default profile name.
DEFAULT_PROFILE_NAME - Static variable in class org.archive.crawler.admin.CrawlJobHandler
Name of system property whose specification overrides default profile used.
DEFAULT_QUEUE_ASSIGNMENT_POLICY - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
DEFAULT_QUEUE_IGNORE_WWW - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
DEFAULT_QUEUE_TOTAL_BUDGET - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
DEFAULT_REBUILD_ON_RECONFIG - Static variable in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
 
DEFAULT_RECHECK_THRESHOLD - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
 
DEFAULT_REDUCE_PATTERN - Static variable in class org.archive.crawler.processor.HashCrawlMapper
 
DEFAULT_REPETITIONS - Static variable in class org.archive.crawler.deciderules.PathologicalPathDecideRule
Default maximum repetitions.
DEFAULT_REPETITIONS - Static variable in class org.archive.crawler.filter.PathologicalPathFilter
Deprecated.  
DEFAULT_REREAD_SEEDS_ON_CONFIG - Static variable in class org.archive.crawler.framework.CrawlScope
 
DEFAULT_RESPECT_CRAWL_DELAY_UP_TO_SECS - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_RETRY_DELAY - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_ROTATION_DIGITS - Static variable in class org.archive.crawler.processor.CrawlMapper
 
DEFAULT_RUNTIME_SECONDS - Static variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
DEFAULT_SERVER_MAX_ALL_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
DEFAULT_SERVER_MAX_FETCH_RESPONSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
DEFAULT_SERVER_MAX_FETCH_SUCCESSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
DEFAULT_SERVER_MAX_SUCCESS_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
DEFAULT_SMEAR - Static variable in class org.archive.util.fingerprint.ArrayLongFPCache
 
DEFAULT_SNOOZE_DEACTIVATE_MS - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
DEFAULT_SOURCE_TAG_SEEDS - Static variable in class org.archive.crawler.frontier.AbstractFrontier
 
DEFAULT_STATISTICS_REPORT_INTERVAL - Static variable in class org.archive.crawler.framework.AbstractTracker
Default period between logging stat values
DEFAULT_STRIP_REG_EXPR - Static variable in class org.archive.crawler.extractor.HTTPContentDigest
 
DEFAULT_SUFFIX - Static variable in class org.archive.io.WriterPoolMember
Default for file suffix.
DEFAULT_TARGET_READY_QUEUES_BACKLOG - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
DEFAULT_TARGET_STATUS - Static variable in class org.archive.crawler.deciderules.FetchStatusDecideRule
Default access so available to test code.
DEFAULT_TOE_PRIORITY - Static variable in class org.archive.crawler.framework.ToePool
run worker thread slightly lower than usual
DEFAULT_UNCHANGED_FACTOR - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
 
DEFAULT_USE_AS_MIDFETCH - Static variable in class org.archive.crawler.deciderules.NotExceedsDocumentLengthTresholdDecideRule
 
DEFAULT_USE_OVERDUE_TIME - Static variable in class org.archive.crawler.postprocessor.WaitEvaluator
 
DEFAULT_USE_PUBLICSUFFIX_REDUCE - Static variable in class org.archive.crawler.processor.HashCrawlMapper
 
DEFAULT_USE_URI_UNIQ_FILTER - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
DefaultBlockFileSystem - Class in org.archive.util.ms
Default implementation of the Block File System.
DefaultBlockFileSystem(SeekInputStream, int) - Constructor for class org.archive.util.ms.DefaultBlockFileSystem
Constructor.
DefaultEntry - Class in org.archive.util.ms
 
DefaultEntry(DefaultBlockFileSystem, SeekInputStream, int) - Constructor for class org.archive.util.ms.DefaultEntry
 
deferredHosts() - Method in interface org.archive.crawler.framework.FrontierHostStatistics
Total number of deferred hosts.
definition - Variable in class org.archive.crawler.settings.ComplexType
 
definitionMap - Variable in class org.archive.crawler.settings.ComplexType
 
delete(CrawlURI) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Delete the given CrawlURI from persistent store.
DELETE_CRAWL_JOB_OPER - Static variable in class org.archive.crawler.Heritrix
 
deleted(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
Notify Frontier that a CrawlURI has been deleted outside of the normal next()/finished() lifecycle.
deleted(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
deleted(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Force logging, etc.
deleteDir(File) - Static method in class org.archive.util.FileUtils
Deletes all files and subdirectories under dir.
deleteInProcessing(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Removes a URI from the list of URIs belonging to this HQ and are currently being processed.
deleteItem(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.BdbWorkQueue
 
deleteItem(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueue
Removes the given item from the queue.
deleteJob(String) - Method in class org.archive.crawler.admin.CrawlJobHandler
The specified job will be removed from the pending queue or aborted if currently running.
deleteMatchedItems(Predicate) - Method in class org.archive.queue.MemQueue
 
deleteMatchedItems(Predicate) - Method in interface org.archive.queue.Queue
All objects in the queue where matcher.match(object) returns true will be deleted from the queue.
deleteMatching(WorkQueueFrontier, String) - Method in class org.archive.crawler.frontier.WorkQueue
Delete URIs matching the given pattern from this queue.
deleteMatchingFromQueue(String, String, DatabaseEntry) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Delete all CrawlURIs matching the given expression.
deleteMatchingFromQueue(WorkQueueFrontier, String) - Method in class org.archive.crawler.frontier.BdbWorkQueue
 
deleteMatchingFromQueue(WorkQueueFrontier, String) - Method in class org.archive.crawler.frontier.WorkQueue
Delete URIs matching the given pattern from this queue.
deleteProfile(CrawlJob) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
deleteSettingsObject(CrawlerSettings) - Method in class org.archive.crawler.settings.SettingsCache
Delete a settings object from the cache.
deleteSettingsObject(CrawlerSettings) - Method in class org.archive.crawler.settings.SettingsHandler
Delete a settings object from persistent storage.
deleteSettingsObject(CrawlerSettings) - Method in class org.archive.crawler.settings.XMLSettingsHandler
Delete a settings object from persistent storage.
deleteSoonerOrLater(File) - Static method in class org.archive.util.FileUtils
Delete the file now -- but in the event of failure, keep trying in the future.
deleteURIs(String) - Method in interface org.archive.crawler.framework.Frontier
Delete any URI that matches the given regular expression from the list of discovered and pending URIs.
deleteURIs(String, String) - Method in interface org.archive.crawler.framework.Frontier
Delete any URI that matches the given regular expression from the list of discovered and pending URIs, if it is in a queue with a name matching the second regular expression.
deleteURIs(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
deleteURIs(String, String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
deleteURIs(String) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Delete all scheduled URIs matching the given regex.
deleteURIs(String, String) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Delete all scheduled URIs matching the given regex, in queues with names matching the second given regex.
deleteURIsFromPending(String) - Method in class org.archive.crawler.admin.CrawlJob
Delete any URI from the frontier of the current (paused) job that match the specified regular expression.
deleteURIsFromPending(String, String) - Method in class org.archive.crawler.admin.CrawlJob
Delete any URI from the frontier of the current (paused) job that match the specified regular expression.
deleteURIsFromPending(String) - Method in class org.archive.crawler.admin.CrawlJobHandler
Delete any URI from the frontier of the current (paused) job that match the specified regular expression.
deleteURIsFromPending(String, String) - Method in class org.archive.crawler.admin.CrawlJobHandler
Delete any URI from the frontier of the current (paused) job that match the specified regular expression.
DENYALL - Static variable in class org.archive.crawler.datamodel.RobotsExclusionPolicy
 
Deque - Interface in org.archive.queue
Deprecated. As of 1.10.0. Unused.
dequeue(WorkQueueFrontier) - Method in class org.archive.crawler.frontier.WorkQueue
Remove the peekItem from the queue and adjusts the count.
dequeue() - Method in class org.archive.queue.MemQueue
 
dequeue() - Method in interface org.archive.queue.Queue
remove an entry from the start of the queue
deregisterJndi(ObjectName) - Static method in class org.archive.crawler.Heritrix
 
DESC_DIGEST_ALGORITHM - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
DESC_DIGEST_CONTENT - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
deserializeAlreadySeen(Class<? extends UriUniqFilter>, File) - Method in class org.archive.crawler.frontier.BdbFrontier
 
deserializeFromByteArray(byte[]) - Static method in class org.archive.util.IoUtils
Utility method to deserialize Object from byte[].
deserializeFromFile(File) - Static method in class org.archive.util.IoUtils
Utility method to deserialize an Object from given File.
destroy() - Method in class org.archive.crawler.admin.ui.RootFilter
 
destroy() - Method in class org.archive.crawler.Heritrix
Do inverse of construction.
DESTROY_OPER - Static variable in class org.archive.crawler.Heritrix
 
detach(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
Detach this credential from passed curi.
detachAll(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
Detach all credentials of this type from passed curi.
DevUtils - Class in org.archive.util
Write a message and stack trace to the 'org.archive.util.DevUtils' logger.
DevUtils() - Constructor for class org.archive.util.DevUtils
 
digest - Variable in class org.archive.io.ArchiveRecord
Compute digest on what we read and add to metadata when done.
DIGEST_ALGORITHMS - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
disallows - Variable in class org.archive.crawler.datamodel.RobotsDirectives
 
disallows(CrawlURI, String) - Method in class org.archive.crawler.datamodel.RobotsExclusionPolicy
 
discardNewJob() - Method in class org.archive.crawler.admin.CrawlJobHandler
Discard the handler's 'new job'.
disconnect() - Method in class org.archive.net.ClientFTP
 
DISCOVERED_COUNT_ATTR - Static variable in class org.archive.crawler.admin.CrawlJob
 
discoveredUriCount - Variable in class org.archive.crawler.admin.StatisticsTracker
 
discoveredUriCount() - Method in class org.archive.crawler.admin.StatisticsTracker
Number of discovered URIs.
discoveredUriCount() - Method in interface org.archive.crawler.framework.Frontier
Number of discovered URIs.
discoveredUriCount() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
discoveredUriCount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
(non-Javadoc)
DiskFPMergeUriUniqFilter - Class in org.archive.crawler.util
Crude FPMergeUriUniqFilter using a disk data file of raw longs as the overall FP record.
DiskFPMergeUriUniqFilter(File) - Constructor for class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
DiskFPMergeUriUniqFilter.DataFileLongIterator - Class in org.archive.crawler.util
 
DiskFPMergeUriUniqFilter.DataFileLongIterator(DataInputStream) - Constructor for class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
Construct a long iterator reading from the given stream.
diskMap - Variable in class org.archive.util.CachedBdbMap
Deprecated. The Collection view of the BDB JE database used for this instance.
diskMap - Variable in class org.archive.util.ObjectIdentityBdbCache
The Collection view of the BDB JE database used for this instance.
diskMapSize - Variable in class org.archive.util.CachedBdbMap
Deprecated. The number of objects in the diskMap StoredMap.
disregardDisposition(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
disregardedFetchAttempts() - Method in class org.archive.crawler.admin.StatisticsTracker
Get the total number of failed fetch attempts (connection failures -> give up, etc)
disregardedUriCount() - Method in interface org.archive.crawler.framework.Frontier
Number of URIs that were scheduled at one point but have been disregarded.
disregardedUriCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
disregardedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
disregardedUriCount() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
diversionLogs - Variable in class org.archive.crawler.processor.CrawlMapper
Mapping of target crawlers to logs (PrintWriters)
divertLog(CandidateURI, String) - Method in class org.archive.crawler.processor.CrawlMapper
Note the given CandidateURI in the appropriate diversion log.
DNSJavaUtil - Class in org.archive.util
Utility methods based on DNSJava.
dnsStatusCodeDistribution - Variable in class org.archive.crawler.admin.StatisticsSummary
 
doAbort(CrawlURI, HttpMethod, String) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
Doc - Class in org.archive.util.ms
Reads .doc files.
DOC_RATE_ATTR - Static variable in class org.archive.crawler.admin.CrawlJob
 
doCmdLineArgs(String[]) - Static method in class org.archive.crawler.Heritrix
 
docsPerSecond - Variable in class org.archive.crawler.admin.StatisticsTracker
 
document - Variable in class org.archive.crawler.extractor.PDFParser
 
documentReader - Variable in class org.archive.crawler.extractor.PDFParser
 
doFilter(ServletRequest, ServletResponse, FilterChain) - Method in class org.archive.crawler.admin.ui.RootFilter
 
doFlush() - Method in class org.archive.crawler.admin.CrawlJobHandler
If its a HostQueuesFrontier, needs to be flushed for the queued.
doJournalAdded(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalDisregarded(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalEmitted(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalFinishedFailure(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalFinishedSuccess(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalRescheduled(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
DOMAIN - Static variable in class org.archive.crawler.deciderules.ScopePlusOneDecideRule
 
DomainScope - Class in org.archive.crawler.scope
Deprecated. As of release 1.10.0. Replaced by DecidingScope.
DomainScope(String) - Constructor for class org.archive.crawler.scope.DomainScope
Deprecated.  
DomainSensitiveFrontier - Class in org.archive.crawler.frontier
Deprecated. As of release 1.10.0. Replaced by BdbFrontier and QuotaEnforcer.
DomainSensitiveFrontier(String) - Constructor for class org.archive.crawler.frontier.DomainSensitiveFrontier
Deprecated.  
doOneCrawl(String) - Method in class org.archive.crawler.Heritrix
Launch the crawler without a web UI and run the passed crawl only.
doOneCrawl(String, CrawlStatusListener) - Method in class org.archive.crawler.Heritrix
Launch the crawler without a web UI and run passed crawl only.
doStripRegexMatch(String, Matcher) - Method in class org.archive.crawler.url.canonicalize.BaseRule
Run a regex that strips elements of a string.
DOT - Static variable in class org.archive.crawler.scope.DomainScope
Deprecated.  
DOT - Static variable in class org.archive.net.UURIFactory
 
DOT - Static variable in class org.archive.util.SURT
 
DOT_ARC_FILE_EXTENSION - Static variable in interface org.archive.io.arc.ARCConstants
Dot ARC file extension.
DOT_COMPRESSED_ARC_FILE_EXTENSION - Static variable in interface org.archive.io.arc.ARCConstants
Compressed dot arc file extension.
DOT_COMPRESSED_FILE_EXTENSION - Static variable in interface org.archive.io.arc.ARCConstants
 
DOT_COMPRESSED_FILE_EXTENSION - Static variable in interface org.archive.io.ArchiveFileConstants
Dot plus compressed file extention.
DOT_COMPRESSED_FILE_EXTENSION - Static variable in interface org.archive.io.warc.WARCConstants
 
DOT_COMPRESSED_WARC_FILE_EXTENSION - Static variable in interface org.archive.io.warc.WARCConstants
Compressed dot WARC file extension.
DOT_WARC_FILE_EXTENSION - Static variable in interface org.archive.io.warc.WARCConstants
Dot WARC file extension.
DOUBLE - Static variable in class org.archive.crawler.settings.SettingsHandler
 
DOUBLE_LIST - Static variable in class org.archive.crawler.settings.SettingsHandler
 
DoubleList - Class in org.archive.crawler.settings
List of Double values
DoubleList(String, String) - Constructor for class org.archive.crawler.settings.DoubleList
Creates a new DoubleList.
DoubleList(String, String, DoubleList) - Constructor for class org.archive.crawler.settings.DoubleList
Creates a new DoubleList and initializes it with the values from another DoubleList.
DoubleList(String, String, Double[]) - Constructor for class org.archive.crawler.settings.DoubleList
Creates a new DoubleList and initializes it with the values from an array of Doubles.
DoubleList(String, String, double[]) - Constructor for class org.archive.crawler.settings.DoubleList
Creates a new DoubleList and initializes it with the values from an double array.
doubleToString(double, int) - Static method in class org.archive.util.ArchiveUtils
Converts a double to a string.
DOWNLOAD_COUNT_ATTR - Static variable in class org.archive.crawler.admin.CrawlJob
 
downloadDisregards - Variable in class org.archive.crawler.admin.StatisticsTracker
 
downloadedUriCount - Variable in class org.archive.crawler.admin.StatisticsTracker
 
downloadFailures - Variable in class org.archive.crawler.admin.StatisticsTracker
 
DownloadURLConnection - Class in org.archive.net
An URL Connection that pre-downloads URL reference before passing back a Stream reference.
DownloadURLConnection(URL) - Constructor for class org.archive.net.DownloadURLConnection
 
drainBuffer - Variable in class org.archive.io.RecordingInputStream
Reusable buffer to avoid reallocation on each readFullyUntil
dump(boolean) - Method in class org.archive.io.arc.ARCReader
 
DUMP - Static variable in interface org.archive.io.ArchiveFileConstants
 
dump(boolean) - Method in class org.archive.io.ArchiveReader
Dump this file on STDOUT
dump() - Method in class org.archive.io.ArchiveRecord
Writes output on STDOUT.
dump(OutputStream) - Method in class org.archive.io.ArchiveRecord
Writes output on passed os.
dump(boolean) - Method in class org.archive.io.warc.WARCReader
 
DUMP_URIS_OPER - Static variable in class org.archive.crawler.admin.CrawlJob
 
dumpAllPendingToLog() - Method in class org.archive.crawler.frontier.BdbFrontier
Dump all still-enqueued URIs to the crawl.log -- without actually dequeuing.
dumpHttpHeader() - Method in class org.archive.io.arc.ARCRecord
 
dumpReports() - Method in class org.archive.crawler.admin.StatisticsTracker
Run the reports.
dumpReports() - Method in class org.archive.crawler.framework.AbstractTracker
Dump reports, if any, on request or at crawl end.
dumpSurtPrefixSet() - Method in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
Dump the current prefixes in use to configured dump file (if any)
dumpUris(String, String, int, boolean) - Method in class org.archive.crawler.admin.CrawlJob
 
dupByHashBytes - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
dupByHashUriCount - Variable in class org.archive.crawler.admin.StatisticsTracker
 
dupByHashUrls - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
DUPLICATE - Static variable in class org.archive.crawler.util.CrawledBytesHistotable
 
duplicateCount - Variable in class org.archive.crawler.util.SetBasedUriUniqFilter
 
duplicatesAtLastSample - Variable in class org.archive.crawler.util.SetBasedUriUniqFilter
 
durationTime - Variable in class org.archive.crawler.admin.StatisticsSummary
 

E

EACH_ATTRIBUTE_EXTRACTOR - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
EACH_ATTRIBUTE_EXTRACTOR - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
earlyInitialize(CrawlerSettings) - Method in class org.archive.crawler.settings.ComplexType
This method can be overridden in subclasses to do local initialisation.
element - Variable in exception org.archive.crawler.framework.exceptions.ConfigurationException
 
Element - Class in org.archive.util.anvl
ANVL 'data element'.
Element(Label) - Constructor for class org.archive.util.anvl.Element
 
Element(Label, Value) - Constructor for class org.archive.util.anvl.Element
 
elementContext(CharSequence, CharSequence) - Static method in class org.archive.crawler.extractor.Link
Create a suitable XPath-like context from an element name and optional attribute name.
EMBED_HOP - Static variable in class org.archive.crawler.extractor.Link
embedded links necessary to render the page, like IMG/@SRC
EMBED_MISC - Static variable in class org.archive.crawler.extractor.Link
stand-in value for embeds without other context
emitted(CandidateURI) - Method in interface org.archive.crawler.frontier.FrontierJournal
Note that a CrawlURI was emitted for processing.
emitted(CandidateURI) - Method in class org.archive.crawler.frontier.RecoveryJournal
 
EMPTY - Static variable in class org.archive.util.AbstractLongFPSet
A constant used to indicate that a slot in the set storage is empty.
EMPTY_ANVL_RECORD - Static variable in class org.archive.util.anvl.ANVLRecord
 
EMPTY_STRING - Static variable in class org.archive.net.UURIFactory
 
encode(BitSet, String, String) - Method in class org.archive.net.LaxURLCodec
Encodes a string into its URL safe form using the specified string charset.
encode(byte[]) - Static method in class org.archive.util.Base32
Encodes byte array to Base32 String.
encodingMaxBytesPerChar(String) - Static method in class org.archive.util.IoUtils
Return the maximum number of bytes per character in the named encoding, or 0 if encoding is invalid or unsupported.
encounteredReferences - Variable in class org.archive.crawler.extractor.PDFParser
 
end - Variable in class org.archive.io.CharSubSequence
 
END_TRANSFORMED_AUTHORITY - Static variable in class org.archive.util.SURT
 
endDocument() - Method in class org.archive.crawler.settings.CrawlSettingsSAXHandler
 
EndedException - Exception in org.archive.crawler.framework.exceptions
Indicates a crawl has ended, either due to operator termination, frontier exhaustion, or any other reason.
EndedException(String) - Constructor for exception org.archive.crawler.framework.exceptions.EndedException
Constructs a new EndedException.
endElement(String, String, String) - Method in class org.archive.crawler.settings.CrawlSettingsSAXHandler
End of an element.
Endian - Class in org.archive.io
Reads integers stored in big or little endian streams.
endsWith(char) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.LumpyString
Tests if this string ends with a character.
EnhancedEnvironment - Class in org.archive.util.bdbje
Version of BDB_JE Environment with additional convenience features, such as a shared, cached StoredClassCatalog.
EnhancedEnvironment(File, EnvironmentConfig) - Constructor for class org.archive.util.bdbje.EnhancedEnvironment
Constructor
enqueue(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueue
Add the given CrawlURI, noting its addition in running count.
enqueue(T) - Method in class org.archive.queue.MemQueue
 
enqueue(T) - Method in interface org.archive.queue.Queue
Add an entry to the end of queue
ensureNewJobWritten(CrawlJob, String, String) - Static method in class org.archive.crawler.admin.CrawlJobHandler
Ensure order file with new name/desc is written.
ensureWriteableDirectory(String) - Static method in class org.archive.util.IoUtils
Ensure writeable directory.
ensureWriteableDirectory(List<File>) - Static method in class org.archive.util.IoUtils
Ensure writeable directories.
ensureWriteableDirectory(File) - Static method in class org.archive.util.IoUtils
Ensure writeable directory.
entry - Variable in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
 
ENTRY - Static variable in class org.archive.util.iterator.RegexpLineIterator
 
Entry - Interface in org.archive.util.ms
 
Entry.EntryType - Enum in org.archive.util.ms
 
entrySet() - Method in class org.archive.util.CachedBdbMap
Deprecated.  
entryString(Object) - Static method in class org.archive.util.Histotable
Utility method to convert a key->Long into the string "count key".
environment - Variable in class org.archive.util.CachedBdbMap.DbEnvironmentEntry
Deprecated.  
eor - Variable in class org.archive.io.ArchiveRecord
Set flag when we've reached the end-of-record.
eq(Object, Object) - Static method in class org.archive.crawler.settings.SoftSettingsHash
Check for equality of non-null reference x and possibly-null y.
equals(Object) - Method in class org.archive.crawler.datamodel.CrawlHost
 
equals(Object) - Method in class org.archive.crawler.datamodel.CrawlServer
 
equals(Object) - Method in class org.archive.crawler.fetcher.HeritrixProtocolSocketFactory
All instances of DefaultProtocolSocketFactory are the same.
equals(Object) - Method in class org.archive.crawler.fetcher.HeritrixSSLProtocolSocketFactory
 
equals(Object) - Method in class org.archive.crawler.settings.refinements.Refinement
 
equals(Object) - Method in class org.archive.crawler.settings.refinements.TimespanCriteria
 
equals(Object) - Method in class org.archive.crawler.settings.SoftSettingsHash.SettingsEntry
 
equals(Object) - Method in class org.archive.crawler.settings.TextField
 
equals(Object) - Method in class org.archive.crawler.settings.Type
The implementation of equals consider to Types as equal if name and value are equal.
equals(long) - Method in class org.archive.io.SinkHandlerLogRecord
 
equals(SinkHandlerLogRecord) - Method in class org.archive.io.SinkHandlerLogRecord
 
equals(Object) - Method in class org.archive.net.UURI
Test an object if this UURI is equal to another.
errors - Variable in class org.archive.crawler.admin.CrawlJobErrorHandler
All encountered errors
escape(String) - Static method in class org.archive.util.JavaLiterals
 
ESCAPED_AMP - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
ESCAPED_AMP - Static variable in class org.archive.extractor.RegexpJSLinkExtractor
 
ESCAPED_APOSTROPH - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_BACKSLASH - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_CIRCUMFLEX - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_LCURBRACKET - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_LSQRBRACKET - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_PIPE - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_QUOT - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_RCURBRACKET - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_RSQRBRACKET - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_SPACE - Static variable in class org.archive.net.UURIFactory
 
ESCAPED_SQUOT - Static variable in class org.archive.net.UURIFactory
 
escapeForHTML(String) - Static method in class org.archive.util.TextUtils
Minimally escapes a string so that it can be placed inside XML/HTML attribute.
escapeForHTMLJavascript(String) - Static method in class org.archive.util.TextUtils
Escapes a string so that it can be passed as an argument to a javscript in a JSP page.
escapeForMarkupAttribute(String) - Static method in class org.archive.util.TextUtils
Escapes a string so that it can be placed inside XML/HTML attribute.
escapeWhitespace(String) - Method in class org.archive.net.UURIFactory
Escape any whitespace found.
evaluate(Object) - Method in class org.archive.crawler.deciderules.AddRedirectFromRootServerToScope
 
evaluate(Object) - Method in class org.archive.crawler.deciderules.ClassKeyMatchesRegExpDecideRule
Evaluate passed object.
evaluate(Object) - Method in class org.archive.crawler.deciderules.ContentTypeMatchesRegExpDecideRule
 
evaluate(Object) - Method in class org.archive.crawler.deciderules.ContentTypeNotMatchesRegExpDecideRule
Evaluate whether given object's string version does not match configured regexp (by reversing the superclass's answer).
evaluate(Object) - Method in class org.archive.crawler.deciderules.ExternalGeoLocationDecideRule
 
evaluate(Object) - Method in class org.archive.crawler.deciderules.ExternalImplDecideRule
 
evaluate(Object) - Method in interface org.archive.crawler.deciderules.ExternalImplInterface
 
evaluate(Object) - Method in class org.archive.crawler.deciderules.FetchStatusDecideRule
Evaluate whether given object is over the threshold number of hops.
evaluate(Object) - Method in class org.archive.crawler.deciderules.FetchStatusMatchesRegExpDecideRule
 
evaluate(Object) - Method in class org.archive.crawler.deciderules.FetchStatusNotMatchesRegExpDecideRule
Evaluate whether given object's FetchStatus does not match configured regexp (by reversing the superclass's answer).
evaluate(Object) - Method in class org.archive.crawler.deciderules.HasViaDecideRule
Evaluate whether given object is over the threshold number of hops.
evaluate(Object) - Method in class org.archive.crawler.deciderules.HopsPathMatchesRegExpDecideRule
Evaluate whether given object (if CandidateURI) has hops-path matching configured regexp
evaluate(Object) - Method in class org.archive.crawler.deciderules.IsCrossTopmostAssignedSurtHopDecideRule
 
evaluate(Object) - Method in class org.archive.crawler.deciderules.MatchesListRegExpDecideRule
Evaluate whether given object's string version matches configured regexps
evaluate(Object) - Method in class org.archive.crawler.deciderules.MatchesRegExpDecideRule
Evaluate whether given object's string version matches configured regexp
evaluate(Object) - Method in class org.archive.crawler.deciderules.NotExceedsDocumentLengthTresholdDecideRule
 
evaluate(Object) - Method in class org.archive.crawler.deciderules.NotMatchesFilePatternDecideRule
Evaluate whether given object's string version does not match configured regexp (by reversing the superclass's answer).
evaluate(Object) - Method in class org.archive.crawler.deciderules.NotMatchesListRegExpDecideRule
Evaluate whether given object's string version does not match configured regexps (by reversing the superclass's answer).
evaluate(Object) - Method in class org.archive.crawler.deciderules.NotMatchesRegExpDecideRule
Evaluate whether given object's string version does not match configured regexp (by reversing the superclass's answer).
evaluate(Object) - Method in class org.archive.crawler.deciderules.NotOnDomainsDecideRule
Evaluate whether given object's URI is NOT in the set of domains -- simply reverse superclass's determination
evaluate(Object) - Method in class org.archive.crawler.deciderules.NotOnHostsDecideRule
Evaluate whether given object's URI is NOT in the set of hosts -- simply reverse superclass's determination
evaluate(Object) - Method in class org.archive.crawler.deciderules.NotSurtPrefixedDecideRule
Evaluate whether given object's URI is NOT in the SURT prefix set -- simply reverse superclass's determination
evaluate(Object) - Method in class org.archive.crawler.deciderules.PredicatedDecideRule
 
evaluate(Object) - Method in class org.archive.crawler.deciderules.QueueOverbudgetDecideRule
 
evaluate(Object) - Method in class org.archive.crawler.deciderules.recrawl.IdenticalDigestDecideRule
Evaluate whether given CrawlURI's content-digest exactly matches that of preceding fetch.
evaluate(Object) - Method in class org.archive.crawler.deciderules.ScopePlusOneDecideRule
Evaluate whether given object comes from a URI which is in scope
evaluate(Object) - Method in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
Evaluate whether given object's URI is covered by the SURT prefix set
evaluate(Object) - Method in class org.archive.crawler.deciderules.TooManyHopsDecideRule
Evaluate whether given object is over the threshold number of hops.
evaluate(Object) - Method in class org.archive.crawler.deciderules.TooManyPathSegmentsDecideRule
Evaluate whether given object is over the threshold number of path-segments.
evaluate(Object) - Method in class org.archive.crawler.deciderules.TransclusionDecideRule
Evaluate whether given object is within the threshold number of transitive hops.
evaluate(Object) - Method in class org.archive.util.Inverter
 
ExceedsDocumentLengthTresholdDecideRule - Class in org.archive.crawler.deciderules
 
ExceedsDocumentLengthTresholdDecideRule(String) - Constructor for class org.archive.crawler.deciderules.ExceedsDocumentLengthTresholdDecideRule
Usual constructor.
exceedsMaxHops(Object) - Method in class org.archive.crawler.scope.ClassicScope
Check if there are too many hops
exception - Variable in class org.archive.crawler.datamodel.LocalizedError
 
exceptionNext() - Method in class org.archive.io.ArchiveReader.ArchiveRecordIterator
A next that throws exceptions and has handling of recoverable exceptions moving us to next record.
exceptionToString(String, Throwable) - Static method in class org.archive.util.TextUtils
 
excludeAccepts(Object) - Method in class org.archive.crawler.scope.ClassicScope
Check if URI is excluded by any filters.
exec(String[]) - Static method in class org.archive.util.ProcessUtils
Runs process.
execute(HttpState, HttpConnection) - Method in class org.archive.httpclient.HttpRecorderGetMethod
 
execute(HttpState, HttpConnection) - Method in class org.archive.httpclient.HttpRecorderPostMethod
 
EXISTS_CASE_INSENSITIVE_MATCH - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment
existsMaybeCaseSensitive return code for a file that exists, using a case-insensitive comparison.
EXISTS_EXACT_MATCH - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment
existsMaybeCaseSensitive return code for a file that exists.
EXISTS_NOT - Static variable in class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment
existsMaybeCaseSensitive return code for a file that does not exist.
existsMaybeCaseSensitive(File, String, File) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment
Checks if a file (including directories) exists.
EXPANDED_URI_SAFE - Static variable in class org.archive.net.LaxURLCodec
A more expansive set of ASCII URI characters to consider as 'safe' to leave unencoded, based on actual browser behavior.
expected_n - Variable in class org.archive.crawler.util.BloomUriUniqFilter
 
EXPECTED_SIZE_KEY - Static variable in class org.archive.crawler.util.BloomUriUniqFilter
 
expectedInserts - Variable in class org.archive.util.BloomFilter64bit
The expected number of inserts; determines calculated size
expectedModCount - Variable in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
 
expend(int) - Method in class org.archive.crawler.frontier.WorkQueue
Decrease the internal running budget by the given amount.
exportTo(Writer) - Method in class org.archive.util.SurtPrefixSet
 
ExternalGeoLocationDecideRule - Class in org.archive.crawler.deciderules
A rule that can be configured to take alternate implementations of the ExternalGeoLocationInterface.
ExternalGeoLocationDecideRule(String) - Constructor for class org.archive.crawler.deciderules.ExternalGeoLocationDecideRule
 
ExternalGeoLookupInterface - Interface in org.archive.crawler.deciderules
Interface used by ExternalImplDecideRule.
ExternalImplDecideRule - Class in org.archive.crawler.deciderules
A rule that can be configured to take alternate implementations of the ExternalImplInterface.
ExternalImplDecideRule(String) - Constructor for class org.archive.crawler.deciderules.ExternalImplDecideRule
 
ExternalImplInterface - Interface in org.archive.crawler.deciderules
Interface used by ExternalImplDecideRule.
extract(CrawlURI) - Method in class org.archive.crawler.extractor.Extractor
 
extract(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorCSS
 
extract(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorDOC
Processes a word document and extracts any hyperlinks from it.
extract(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorHTML
 
extract(CrawlURI, CharSequence) - Method in class org.archive.crawler.extractor.ExtractorHTML
Run extractor.
extract(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorImpliedURI
Perform usual extraction on a CrawlURI
extract(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorJS
 
extract(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorPDF
 
extract(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorSWF
 
extract(String) - Method in class org.archive.crawler.extractor.ExtractorTool
 
extract(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorUniversal
 
extract(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorURI
Perform usual extraction on a CrawlURI
extract(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorXML
 
extract(CrawlURI, CharSequence) - Method in class org.archive.crawler.extractor.JerichoExtractorHTML
Run extractor.
extract(CrawlURI) - Method in class org.archive.crawler.extractor.TrapSuppressExtractor
 
extract(CharSequence, UURI, UURI, List<Link>, ExtractErrorListener) - Static method in class org.archive.extractor.CharSequenceLinkExtractor
Convenience method to do default extraction.
EXTRACT_VALUE_ATTRIBUTES - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
extractAddress(ObjectName) - Static method in class org.archive.util.JmxUtils
 
extractErrorListener - Variable in class org.archive.extractor.CharSequenceLinkExtractor
 
ExtractErrorListener - Interface in org.archive.extractor
ExtractErrorListener receives exceptions that may need to be logged from inside a LinkExtractor, allowing the extraction to continue without raising an exception through hasNext()/next()/nextLink().
extractImplied(CharSequence, String, String) - Static method in class org.archive.crawler.extractor.ExtractorImpliedURI
Utility method for extracting 'implied' URI given a source uri, trigger pattern, and build pattern.
extractInlineCss - Variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
extractInlineJs - Variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
extractLine - Variable in class org.archive.util.iterator.RegexpLineIterator
 
extractLink(CrawlURI, Link) - Method in class org.archive.crawler.extractor.ExtractorURI
Consider a single Link for internal URIs
Extractor - Class in org.archive.crawler.extractor
Convenience shared superclass for Extractor Processors.
Extractor(String, String) - Constructor for class org.archive.crawler.extractor.Extractor
Passthrough constructor.
EXTRACTOR_URI_EXCEPTIONS - Static variable in class org.archive.crawler.extractor.ExtractorJS
 
ExtractorCSS - Class in org.archive.crawler.extractor
This extractor is parsing URIs from CSS type files.
ExtractorCSS(String) - Constructor for class org.archive.crawler.extractor.ExtractorCSS
 
ExtractorDOC - Class in org.archive.crawler.extractor
This class allows the caller to extract href style links from word97-format word documents.
ExtractorDOC(String) - Constructor for class org.archive.crawler.extractor.ExtractorDOC
 
ExtractorHTML - Class in org.archive.crawler.extractor
Basic link-extraction, from an HTML content-body, using regular expressions.
ExtractorHTML(String) - Constructor for class org.archive.crawler.extractor.ExtractorHTML
 
ExtractorHTML(String, String) - Constructor for class org.archive.crawler.extractor.ExtractorHTML
 
ExtractorHTTP - Class in org.archive.crawler.extractor
Extracts URIs from HTTP response headers.
ExtractorHTTP(String) - Constructor for class org.archive.crawler.extractor.ExtractorHTTP
 
ExtractorImpliedURI - Class in org.archive.crawler.extractor
An extractor for finding 'implied' URIs inside other URIs.
ExtractorImpliedURI(String) - Constructor for class org.archive.crawler.extractor.ExtractorImpliedURI
Constructor
ExtractorJS - Class in org.archive.crawler.extractor
Processes Javascript files for strings that are likely to be crawlable URIs.
ExtractorJS(String) - Constructor for class org.archive.crawler.extractor.ExtractorJS
 
ExtractorPDF - Class in org.archive.crawler.extractor
Allows the caller to process a CrawlURI representing a PDF for the purpose of extracting URIs
ExtractorPDF(String) - Constructor for class org.archive.crawler.extractor.ExtractorPDF
 
ExtractorSWF - Class in org.archive.crawler.extractor
Process SWF (flash/shockwave) files for strings that are likely to be crawlable URIs.
ExtractorSWF(String) - Constructor for class org.archive.crawler.extractor.ExtractorSWF
 
ExtractorSWF.ExtractorSWFActions - Class in org.archive.crawler.extractor
SWFActions that parse URI-like strings.
ExtractorSWF.ExtractorSWFActions(CrawlURI, CrawlController) - Constructor for class org.archive.crawler.extractor.ExtractorSWF.ExtractorSWFActions
 
ExtractorSWF.ExtractorSWFReader - Class in org.archive.crawler.extractor
 
ExtractorSWF.ExtractorSWFReader(SWFTags, InputStream) - Constructor for class org.archive.crawler.extractor.ExtractorSWF.ExtractorSWFReader
 
ExtractorSWF.ExtractorSWFReader(SWFTags, InStream) - Constructor for class org.archive.crawler.extractor.ExtractorSWF.ExtractorSWFReader
 
ExtractorSWF.ExtractorSWFTags - Class in org.archive.crawler.extractor
SWFTagTypes customized to use ExtractorSWFActions, which parse URI-like strings.
ExtractorSWF.ExtractorSWFTags(SWFActions) - Constructor for class org.archive.crawler.extractor.ExtractorSWF.ExtractorSWFTags
 
ExtractorSWF.ExtractorTagParser - Class in org.archive.crawler.extractor
TagParser customized to ignore SWFTags that will never contain extractable URIs.
ExtractorSWF.ExtractorTagParser(SWFTagTypes) - Constructor for class org.archive.crawler.extractor.ExtractorSWF.ExtractorTagParser
 
ExtractorTool - Class in org.archive.crawler.extractor
Run named extractors against passed ARC file.
ExtractorTool() - Constructor for class org.archive.crawler.extractor.ExtractorTool
 
ExtractorTool(String[], String) - Constructor for class org.archive.crawler.extractor.ExtractorTool
 
ExtractorUniversal - Class in org.archive.crawler.extractor
A last ditch extractor that will look at the raw byte code and try to extract anything that looks like a link.
ExtractorUniversal(String) - Constructor for class org.archive.crawler.extractor.ExtractorUniversal
Constructor
ExtractorURI - Class in org.archive.crawler.extractor
An extractor for finding URIs inside other URIs.
ExtractorURI(String) - Constructor for class org.archive.crawler.extractor.ExtractorURI
Constructor
ExtractorXML - Class in org.archive.crawler.extractor
A simple extractor which finds HTTP URIs inside XML/RSS files, inside attribute values and simple elements (those with only whitespace + HTTP URI + whitespace as contents)
ExtractorXML(String) - Constructor for class org.archive.crawler.extractor.ExtractorXML
 
extractQueryStringLinks(UURI) - Static method in class org.archive.crawler.extractor.ExtractorURI
Look for URIs inside the supplied UURI.
extractURIs() - Method in class org.archive.crawler.extractor.PDFParser
Extract URIs from all objects found in a Pdf document's catalog.
extractURIs(PdfObject) - Method in class org.archive.crawler.extractor.PDFParser
Parse a PdfDictionary, looking for URIs recursively and adding them to foundURIs
extraInfo() - Static method in class org.archive.util.DevUtils
 

F

F_ADD - Static variable in class org.archive.crawler.frontier.RecoveryJournal
 
F_DISREGARD - Static variable in class org.archive.crawler.frontier.RecoveryJournal
 
F_EMIT - Static variable in class org.archive.crawler.frontier.RecoveryJournal
 
F_FAILURE - Static variable in class org.archive.crawler.frontier.RecoveryJournal
 
F_RESCHEDULE - Static variable in class org.archive.crawler.frontier.RecoveryJournal
 
F_SUCCESS - Static variable in class org.archive.crawler.frontier.RecoveryJournal
 
failedFetchAttempts() - Method in class org.archive.crawler.admin.StatisticsTracker
Get the total number of failed fetch attempts (connection failures -> give up, etc)
failedFetchCount() - Method in interface org.archive.crawler.framework.Frontier
Number of URIs that failed to process.
failedFetchCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
failedFetchCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
(non-Javadoc)
failedFetchCount() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
failureDisposition(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
The CrawlURI has encountered a problem, and will not be retried.
fastOutputStreamHolder - Variable in class org.archive.crawler.frontier.RecyclingSerialBinding
Thread-local cache of reusable FastOutputStream
FatalConfigurationException - Exception in org.archive.crawler.framework.exceptions
 
FatalConfigurationException(String) - Constructor for exception org.archive.crawler.framework.exceptions.FatalConfigurationException
 
FatalConfigurationException() - Constructor for exception org.archive.crawler.framework.exceptions.FatalConfigurationException
 
FatalConfigurationException(String, String, String) - Constructor for exception org.archive.crawler.framework.exceptions.FatalConfigurationException
 
fetchDisregards - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
FetchDNS - Class in org.archive.crawler.fetcher
Processor to resolve 'dns:' URIs.
FetchDNS(String) - Constructor for class org.archive.crawler.fetcher.FetchDNS
Create a new instance of FetchDNS.
fetchFailures - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
FetchFTP - Class in org.archive.crawler.fetcher
Fetches documents and directory listings using FTP.
FetchFTP(String) - Constructor for class org.archive.crawler.fetcher.FetchFTP
Constructs a new FetchFTP.
FetchHistoryProcessor - Class in org.archive.crawler.processor.recrawl
Maintain a history of fetch information inside the CrawlURI's attributes.
FetchHistoryProcessor(String) - Constructor for class org.archive.crawler.processor.recrawl.FetchHistoryProcessor
Usual constructor
FetchHTTP - Class in org.archive.crawler.fetcher
HTTP fetcher that uses Apache Jakarta Commons HttpClient library.
FetchHTTP(String) - Constructor for class org.archive.crawler.fetcher.FetchHTTP
Constructor.
FetchHTTP.PostRestore - Class in org.archive.crawler.fetcher
 
FetchHTTP.PostRestore(Cookie[]) - Constructor for class org.archive.crawler.fetcher.FetchHTTP.PostRestore
 
fetchNonResponses - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
fetchResponses - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
FetchStatusCodes - Interface in org.archive.crawler.datamodel
Constant flag codes to be used, in lieu of per-protocol codes (like HTTP's 200, 404, etc.), when network/internal/ out-of-band conditions occur.
fetchStatusCodesToString(int) - Static method in class org.archive.crawler.datamodel.CrawlURI
Takes a status code and converts it into a human readable string.
FetchStatusDecideRule - Class in org.archive.crawler.deciderules
Rule applies the configured decision for any URI which has a fetch status equal to the 'target-status' setting.
FetchStatusDecideRule(String) - Constructor for class org.archive.crawler.deciderules.FetchStatusDecideRule
Usual constructor.
FetchStatusMatchesRegExpDecideRule - Class in org.archive.crawler.deciderules
 
FetchStatusMatchesRegExpDecideRule(String) - Constructor for class org.archive.crawler.deciderules.FetchStatusMatchesRegExpDecideRule
Usual constructor.
FetchStatusNotMatchesRegExpDecideRule - Class in org.archive.crawler.deciderules
 
FetchStatusNotMatchesRegExpDecideRule(String) - Constructor for class org.archive.crawler.deciderules.FetchStatusNotMatchesRegExpDecideRule
Usual constructor.
fetchSuccesses - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
file - Variable in exception org.archive.crawler.framework.exceptions.ConfigurationException
 
fileExists(File) - Method in class org.archive.crawler.selftest.SelfTestCase
Confirm passed file exists on disk under the test directory.
FILENAME_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for filename field.
FILENAME_HEADER_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for the ARC Header filename field.
filenames - Variable in class org.archive.io.CompositeFileInputStream
 
FilePatternFilter - Class in org.archive.crawler.filter
Deprecated. As of release 1.10.0. Replaced by MatchesFilePatternDecideRule.
FilePatternFilter(String) - Constructor for class org.archive.crawler.filter.FilePatternFilter
Deprecated.  
filesExist(List) - Method in class org.archive.crawler.selftest.SelfTestCase
Confirm passed files exist on disk under the test directory.
filesFoundInArc() - Method in class org.archive.crawler.selftest.SelfTestCase
Find all files that belong to this test that are mentioned in the arc.
FileUtils - Class in org.archive.util
Utility methods for manipulating files and directories.
fillSeedsCache() - Method in class org.archive.crawler.scope.SeedCachingScope
Ensure seeds cache is created/filled
Filter - Class in org.archive.crawler.framework
Base class for filter classes.
Filter(String, String) - Constructor for class org.archive.crawler.framework.Filter
Creates a new 'null' filter.
Filter(String) - Constructor for class org.archive.crawler.framework.Filter
Creates a new 'null' filter.
FilterDecideRule - Class in org.archive.crawler.deciderules
FilterDecideRule wraps a legacy Filter for use in DecideRule contexts.
FilterDecideRule(String) - Constructor for class org.archive.crawler.deciderules.FilterDecideRule
Constructor.
FILTERS - Static variable in class org.archive.crawler.admin.ui.JobConfigureUtils
 
filters - Variable in class org.archive.crawler.deciderules.FilterDecideRule
Filter(s) to apply.
filtersAccept(CrawlURI) - Method in class org.archive.crawler.deciderules.FilterDecideRule
Do all specified filters (if any) accept this CrawlURI?
filtersAccept(MapType, CrawlURI) - Method in class org.archive.crawler.deciderules.FilterDecideRule
Do all specified filters (if any) accept this CrawlURI?
finalCleanup() - Method in class org.archive.crawler.admin.StatisticsTracker
 
finalCleanup() - Method in class org.archive.crawler.framework.AbstractTracker
Cleanup resources used, at crawl end.
finalize() - Method in class org.archive.crawler.SimpleHttpServer
 
finalize() - Method in class org.archive.io.GenericReplayCharSequence
 
finalize() - Method in class org.archive.io.Latin1ByteReplayCharSequence
 
finalize() - Method in class org.archive.util.CachedBdbMap
Deprecated.  
finalize() - Method in class org.archive.util.CachedBdbMap.LowMemoryCanary
Deprecated. When collected/finalized -- as should be expected in low-memory conditions -- trigger an expunge and a new 'canary' insertion.
finalize() - Method in class org.archive.util.ObjectIdentityBdbCache
 
finalize() - Method in class org.archive.util.ObjectIdentityBdbCache.LowMemoryCanary
When collected/finalized -- as should be expected in low-memory conditions -- trigger an expunge and a new 'canary' insertion.
finalTasks() - Method in class org.archive.crawler.fetcher.FetchHTTP
 
finalTasks() - Method in interface org.archive.crawler.framework.Frontier
Perform any final tasks *before* notification crawl has reached 'FINISHED' status.
finalTasks() - Method in class org.archive.crawler.framework.Processor
Classes subclassing this one should override this method to perform processor specific actions.
finalTasks() - Method in class org.archive.crawler.framework.Scoper
 
finalTasks() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
finalTasks() - Method in class org.archive.crawler.frontier.BdbFrontier
 
finalTasks() - Method in class org.archive.crawler.processor.recrawl.PersistLogProcessor
 
finalTasks() - Method in class org.archive.crawler.processor.recrawl.PersistOnlineProcessor
 
findFirstLineBeginning(InputStreamReader, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that that begins with the given string.
findFirstLineBeginningFromSeries(String, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that begins with the given string.
findFirstLineContaining(String, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that matches a given regular expression.
findFirstLineContaining(InputStreamReader, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that matches a given regular expression.
findFirstLineContainingFromSeries(String, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that matches a given regular expression.
findNextLink() - Method in class org.archive.extractor.CharSequenceLinkExtractor
Scan to the next link(s), if any, loading it into the next buffer.
findNextLink() - Method in class org.archive.extractor.RegexpCSSLinkExtractor
 
findNextLink() - Method in class org.archive.extractor.RegexpHTMLLinkExtractor
 
findNextLink() - Method in class org.archive.extractor.RegexpJSLinkExtractor
 
FINISHED - Static variable in class org.archive.crawler.framework.CrawlController
 
finished(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
Report a URI being processed as having finished processing.
finished(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
finished(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Note that the previously emitted CrawlURI has completed its processing (for now).
finishedDisregard(CandidateURI) - Method in interface org.archive.crawler.frontier.FrontierJournal
 
finishedDisregard(CandidateURI) - Method in class org.archive.crawler.frontier.RecoveryJournal
 
finishedFailure(CandidateURI) - Method in interface org.archive.crawler.frontier.FrontierJournal
 
finishedFailure(CandidateURI) - Method in class org.archive.crawler.frontier.RecoveryJournal
 
finishedSuccess(CandidateURI) - Method in interface org.archive.crawler.frontier.FrontierJournal
 
finishedSuccess(CandidateURI) - Method in class org.archive.crawler.frontier.RecoveryJournal
 
finishedUriCount - Variable in class org.archive.crawler.admin.StatisticsTracker
 
finishedUriCount() - Method in class org.archive.crawler.admin.StatisticsTracker
Number of URIs that have finished processing.
finishedUriCount() - Method in interface org.archive.crawler.framework.Frontier
Number of URIs that have finished processing.
finishedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
(non-Javadoc)
finishedUriCount() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
finishFpMerge() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
finishFpMerge() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
Complete the merge of candidate and previously-known FPs (closing files/iterators as appropriate).
finishFpMerge() - Method in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
finishLast(HttpConnection) - Static method in class org.archive.httpclient.SingleHttpConnectionManager
 
fireCrawledURIDisregardEvent(CrawlURI) - Method in class org.archive.crawler.framework.CrawlController
Allows an external class to raise a CrawlURIDispostion crawledURIDisregard event that will be broadcast to all listeners that have registered with the CrawlController.
fireCrawledURIFailureEvent(CrawlURI) - Method in class org.archive.crawler.framework.CrawlController
Allows an external class to raise a CrawlURIDispostion crawledURIFailure event that will be broadcast to all listeners that have registered with the CrawlController.
fireCrawledURINeedRetryEvent(CrawlURI) - Method in class org.archive.crawler.framework.CrawlController
Allows an external class to raise a CrawlURIDispostion crawledURINeedRetry event that will be broadcast to all listeners that have registered with the CrawlController.
fireCrawledURISuccessfulEvent(CrawlURI) - Method in class org.archive.crawler.framework.CrawlController
Allows an external class to raise a CrawlURIDispostion crawledURISuccessful event that will be broadcast to all listeners that have registered with the CrawlController.
fireValueErrorHandlers(Constraint.FailedCheck) - Method in class org.archive.crawler.settings.SettingsHandler
Fire events on all registered ValueErrorHandler.
fixSpaceInURL(List<String>, int) - Method in class org.archive.io.arc.ARCReader
Fix space in URLs.
fixup(String) - Method in class org.archive.crawler.admin.StatisticsTracker
 
FixupQueryStr - Class in org.archive.crawler.url.canonicalize
Strip any trailing question mark.
FixupQueryStr(String) - Constructor for class org.archive.crawler.url.canonicalize.FixupQueryStr
 
flattenVia() - Method in class org.archive.crawler.datamodel.CandidateURI
Method returns string version of this URI's referral URI.
flg - Variable in class org.archive.io.GzipHeader
The GZIP header FLG byte.
FLOAT - Static variable in class org.archive.crawler.settings.SettingsHandler
 
FLOAT_LIST - Static variable in class org.archive.crawler.settings.SettingsHandler
 
FloatList - Class in org.archive.crawler.settings
List of Float values
FloatList(String, String) - Constructor for class org.archive.crawler.settings.FloatList
Creates a new FloatList.
FloatList(String, String, FloatList) - Constructor for class org.archive.crawler.settings.FloatList
Creates a new FloatList and initializes it with the values from another FloatList.
FloatList(String, String, Float[]) - Constructor for class org.archive.crawler.settings.FloatList
Creates a new FloatList and initializes it with the values from an array of Floats.
FloatList(String, String, float[]) - Constructor for class org.archive.crawler.settings.FloatList
Creates a new FloatList and initializes it with the values from an float array.
flush() - Method in class org.archive.crawler.admin.CrawlJob
If its a HostQueuesFrontier, needs to be flushed for the queued.
flush() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
flush() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
Perform a merge of all 'pending' items to the overall fingerprint list.
flush() - Method in class org.archive.io.RecordingOutputStream
 
flush() - Method in class org.archive.io.SinkHandler
 
flush() - Method in class org.archive.io.WriterPoolMember
 
FLUSH_DELAY_FACTOR - Static variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
flushProcessingURIs() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Flush any CrawlURIs in the processingUriDB into the primaryUriDB.
focusAccepts(Object) - Method in class org.archive.crawler.scope.BroadScope
Check if URI is accepted by the focus of this scope.
focusAccepts(Object) - Method in class org.archive.crawler.scope.ClassicScope
Check if URI is accepted by the focus of this scope.
focusAccepts(Object) - Method in class org.archive.crawler.scope.DomainScope
Deprecated. Check if an URI is part of this scope.
focusAccepts(Object) - Method in class org.archive.crawler.scope.HostScope
Deprecated.  
focusAccepts(Object) - Method in class org.archive.crawler.scope.PathScope
Deprecated.  
focusAccepts(Object) - Method in class org.archive.crawler.scope.SurtPrefixScope
Deprecated. Check if a URI is part of this scope.
FOLD_PREFIX - Static variable in class org.archive.util.anvl.ANVLRecord
 
forAllHostsDo(Closure) - Method in class org.archive.crawler.datamodel.ServerCache
 
forAllPendingDo(Closure) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Utility method to perform action for all pending CrawlURI instances.
forceFetch() - Method in class org.archive.crawler.datamodel.CandidateURI
If this method returns true, this URI should be fetched even though it already has been crawled.
forceScarceMemory() - Static method in class org.archive.util.TestUtils
Temporarily exhaust memory, forcing weak/soft references to be broken.
forceWakeQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Wake all queues as if we were at the end of time
forget(String, CandidateURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Forget item was seen
forget(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Forget the given CrawlURI.
forget(String, CandidateURI) - Method in class org.archive.crawler.util.BloomUriUniqFilter
 
forget(String, CandidateURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
forget(String, CandidateURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
format(LogRecord) - Method in class org.archive.crawler.io.LocalErrorFormatter
 
format(LogRecord) - Method in class org.archive.crawler.io.RuntimeErrorFormatter
 
format(LogRecord) - Method in class org.archive.crawler.io.StatisticsLogFormatter
 
format(LogRecord) - Method in class org.archive.crawler.io.UriErrorFormatter
 
format(LogRecord) - Method in class org.archive.crawler.io.UriProcessingFormatter
 
format(Matcher, String, StringBuffer) - Method in class org.archive.crawler.url.canonicalize.RegexRule
 
format(LogRecord) - Method in class org.archive.util.OneLineSimpleLogger
 
formatBytesForDisplay(long) - Static method in class org.archive.util.ArchiveUtils
Takes a byte size and formats it for display with 'friendly' units.
formatCheckpointName(String, int) - Static method in class org.archive.crawler.framework.Checkpointer
 
formatMillisecondsToConventional(long) - Static method in class org.archive.util.ArchiveUtils
Convert milliseconds value to a human-readable duration
formatMillisecondsToConventional(long, boolean) - Static method in class org.archive.util.ArchiveUtils
Convert milliseconds value to a human-readable duration
foundURIs - Variable in class org.archive.crawler.extractor.PDFParser
 
fp - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter.PendingItem
 
FPMergeUriUniqFilter - Class in org.archive.crawler.util
UriUniqFilter based on merging FP arrays (in memory or from disk).
FPMergeUriUniqFilter() - Constructor for class org.archive.crawler.util.FPMergeUriUniqFilter
 
FPMergeUriUniqFilter.PendingItem - Class in org.archive.crawler.util
Represents a long fingerprint and (possibly) its corresponding CandidateURI, awaiting the next merge in a 'pending' state.
FPMergeUriUniqFilter.PendingItem(long, CandidateURI) - Constructor for class org.archive.crawler.util.FPMergeUriUniqFilter.PendingItem
 
FPUriUniqFilter - Class in org.archive.crawler.util
UriUniqFilter storing 64-bit UURI fingerprints, using an internal LongFPSet instance.
FPUriUniqFilter(LongFPSet) - Constructor for class org.archive.crawler.util.FPUriUniqFilter
Create FPUriUniqFilter wrapping given long set
FRAME - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
FramesSelfTestCase - Class in org.archive.crawler.selftest
Test crawler can parse pages w/ frames in them.
FramesSelfTestCase() - Constructor for class org.archive.crawler.selftest.FramesSelfTestCase
 
freeMatcher(Matcher) - Method in class org.archive.util.PatternMatcherRecycler
Return the given Matcher to the reuse stack, if stack is not already at its maximum size.
freeReserveMemory() - Method in class org.archive.crawler.framework.CrawlController
 
from(CandidateURI, long) - Static method in class org.archive.crawler.datamodel.CrawlURI
Make a CrawlURI from the passed CandidateURI.
from(Object) - Static method in class org.archive.net.UURI
Convenience method for finding the UURI inside an Object likely to have (or be/imply) one.
fromString(String) - Static method in class org.archive.crawler.datamodel.CandidateURI
Given a string containing a URI, then optional whitespace delimited hops-path and via info, create a CandidateURI instance.
fromURI(String) - Static method in class org.archive.util.SURT
Utility method for creating the SURT form of the URI in the given String.
fromURI(String, boolean) - Static method in class org.archive.util.SURT
Utility method for creating the SURT form of the URI in the given String.
Frontier - Interface in org.archive.crawler.framework
An interface for URI Frontiers.
Frontier.FrontierGroup - Interface in org.archive.crawler.framework
Generic interface representing the internal groupings of a Frontier's URIs -- usually queues.
FRONTIER_REPORT_OPER - Static variable in class org.archive.crawler.admin.CrawlJob
 
FRONTIER_SHORT_REPORT_ATTR - Static variable in class org.archive.crawler.admin.CrawlJob
 
FrontierHostStatistics - Interface in org.archive.crawler.framework
An optional interface the Frontiers can implement to provide information about specific hosts.
FrontierJournal - Interface in org.archive.crawler.frontier
Record of key Frontier happenings.
FrontierMarker - Interface in org.archive.crawler.framework
A marker is a pointer to a place somewhere inside a frontier's list of pending URIs.
FrontierScheduler - Class in org.archive.crawler.postprocessor
'Schedule' with the Frontier CandidateURIs being carried by the passed CrawlURI.
FrontierScheduler(String) - Constructor for class org.archive.crawler.postprocessor.FrontierScheduler
 
FTP_CONTROL_CONVERSATION_MIMETYPE - Static variable in interface org.archive.io.warc.WARCConstants
 

G

generateRecordId(Map<String, String>) - Method in class org.archive.io.warc.WARCWriter
 
generateRecordId(String, String) - Method in class org.archive.io.warc.WARCWriter
 
GenerationFileHandler - Class in org.archive.io
FileHandler with support for rotating the current file to an archival name with a specified integer suffix, and provision of a new replacement FileHandler with the current filename.
GenerationFileHandler(String, boolean, boolean) - Constructor for class org.archive.io.GenerationFileHandler
Constructor.
GenerationFileHandler(LinkedList<String>, boolean) - Constructor for class org.archive.io.GenerationFileHandler
 
Generator - Interface in org.archive.uid
A record-id generator.
GeneratorFactory - Class in org.archive.uid
Factory that generates uids.
GenericReplayCharSequence - Class in org.archive.io
Provides a (Replay)CharSequence view on recorded streams (a prefix buffer and overflow backing file) that can handle streams of multibyte characters.
GenericReplayCharSequence(byte[], long, long, String) - Constructor for class org.archive.io.GenericReplayCharSequence
Constructor for all in-memory operation.
GenericReplayCharSequence(ReplayInputStream, String, String) - Constructor for class org.archive.io.GenericReplayCharSequence
Constructor for overflow-to-disk-file operation.
get(Object) - Method in class org.archive.crawler.datamodel.CredentialStore
 
get(Object, String) - Method in class org.archive.crawler.datamodel.CredentialStore
 
get(String) - Method in interface org.archive.crawler.framework.AlertManager
 
get(DatabaseEntry) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Get the next nearest item after the given key.
get(Object) - Method in class org.archive.crawler.settings.DataContainer
 
get(String) - Method in class org.archive.crawler.settings.DataContainer
 
get(int) - Method in class org.archive.crawler.settings.ListType
Returns the object stored at the index specified
get(String) - Method in class org.archive.crawler.settings.SoftSettingsHash
Returns the value to which the specified key is mapped in this weak hash map, or null if the map contains no mapping for this key.
get(String) - Static method in class org.archive.crawler.util.LogReader
Returns the entire file.
get(InputStreamReader) - Static method in class org.archive.crawler.util.LogReader
Reads entire contents of reader, returns as string.
get(String, int, int) - Static method in class org.archive.crawler.util.LogReader
Gets a portion of a log file.
get(InputStreamReader, int, int, long) - Static method in class org.archive.crawler.util.LogReader
Gets a portion of a log file.
get(long) - Method in class org.archive.io.arc.ARCReaderFactory.CompressedARCReader
Get record at passed offset.
get(String) - Static method in class org.archive.io.arc.ARCReaderFactory
 
get(String, long) - Static method in class org.archive.io.arc.ARCReaderFactory
 
get(File) - Static method in class org.archive.io.arc.ARCReaderFactory
 
get(File, long) - Static method in class org.archive.io.arc.ARCReaderFactory
 
get(File, boolean, long) - Static method in class org.archive.io.arc.ARCReaderFactory
 
get(String, InputStream, boolean) - Static method in class org.archive.io.arc.ARCReaderFactory
 
get(URL, long) - Static method in class org.archive.io.arc.ARCReaderFactory
Get an ARCReader aligned at offset.
get(URL) - Static method in class org.archive.io.arc.ARCReaderFactory
Get an ARCReader.
get(long) - Method in class org.archive.io.ArchiveReader
Get record at passed offset.
get() - Method in class org.archive.io.ArchiveReader
 
get(String) - Static method in class org.archive.io.ArchiveReaderFactory
Get an Archive file Reader on passed path or url.
get(File) - Static method in class org.archive.io.ArchiveReaderFactory
 
get(File, long) - Static method in class org.archive.io.ArchiveReaderFactory
 
get(String, InputStream, boolean) - Static method in class org.archive.io.ArchiveReaderFactory
Wrap a Reader around passed Stream.
get(URL, long) - Static method in class org.archive.io.ArchiveReaderFactory
Get an Archive Reader aligned at offset.
get(URL) - Static method in class org.archive.io.ArchiveReaderFactory
Get an ARCReader.
get(long) - Method in class org.archive.io.SinkHandler
 
get(long) - Method in class org.archive.io.warc.WARCReaderFactory.CompressedWARCReader
Get record at passed offset.
get(String) - Static method in class org.archive.io.warc.WARCReaderFactory
 
get(File) - Static method in class org.archive.io.warc.WARCReaderFactory
 
get(File, long) - Static method in class org.archive.io.warc.WARCReaderFactory
 
get(String, InputStream, boolean) - Static method in class org.archive.io.warc.WARCReaderFactory
 
get(URL, long) - Static method in class org.archive.io.warc.WARCReaderFactory
 
get(URL) - Static method in class org.archive.io.warc.WARCReaderFactory
Get an WARCReader.
get(Object) - Method in class org.archive.util.CachedBdbMap
Deprecated.  
get(String) - Method in class org.archive.util.ObjectIdentityBdbCache
 
get(K) - Method in interface org.archive.util.ObjectIdentityCache
get the object under the given key/name
get(String) - Method in class org.archive.util.ObjectIdentityMemCache
 
get(int) - Method in class org.archive.util.SubList
 
get() - Method in class org.archive.util.Supplier
 
get12DigitDate() - Static method in class org.archive.util.ArchiveUtils
Utility function for creating arc-style date stamps in the format yyyMMddHHmm.
get12DigitDate(long) - Static method in class org.archive.util.ArchiveUtils
Utility function for creating arc-style date stamps in the format yyyyMMddHHmm.
get12DigitDate(Date) - Static method in class org.archive.util.ArchiveUtils
 
get14DigitDate() - Static method in class org.archive.util.ArchiveUtils
Utility function for creating arc-style date stamps in the format yyyMMddHHmmss.
get14DigitDate(long) - Static method in class org.archive.util.ArchiveUtils
Utility function for creating arc-style date stamps in the format yyyyMMddHHmmss.
get14DigitDate(Date) - Static method in class org.archive.util.ArchiveUtils
 
get17DigitDate() - Static method in class org.archive.util.ArchiveUtils
Utility function for creating arc-style date stamps in the format yyyMMddHHmmssSSS.
get17DigitDate(long) - Static method in class org.archive.util.ArchiveUtils
Utility function for creating arc-style date stamps in the format yyyyMMddHHmmssSSS.
get17DigitDate(Date) - Static method in class org.archive.util.ArchiveUtils
 
getAbsoluteName() - Method in class org.archive.crawler.settings.ComplexType
Get the absolute name of this ComplexType.
getAcceptedIssuers() - Method in class org.archive.httpclient.ConfigurableX509TrustManager
 
getActiveToeCount() - Method in class org.archive.crawler.framework.CrawlController
 
getActiveToeCount() - Method in class org.archive.crawler.framework.ToePool
 
getAlert(String) - Method in class org.archive.crawler.Heritrix
 
getAlerts() - Method in class org.archive.crawler.Heritrix
 
getAlertsCount() - Method in class org.archive.crawler.Heritrix
 
getAList() - Method in class org.archive.crawler.datamodel.CandidateURI
Assumption is that only one thread at a time will ever be accessing a particular CandidateURI.
getAll() - Method in interface org.archive.crawler.framework.AlertManager
 
getAll() - Method in class org.archive.io.SinkHandler
 
getAllLocalHostNames() - Static method in class org.archive.util.InetAddressUtil
 
getAllUnread() - Method in class org.archive.io.SinkHandler
 
getAndCheckJob(CrawlJob, HttpServletRequest, HttpServletResponse) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
Check passed crawljob CrawlJob setting.
getAnnotations() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the annotations set for this uri.
getArc() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getArcFile() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getArcFiles() - Static method in class org.archive.crawler.selftest.SelfTestCase
 
getArchiveReader(File, long) - Method in class org.archive.io.arc.ARCReaderFactory
 
getArchiveReader(File, boolean, long) - Method in class org.archive.io.arc.ARCReaderFactory
 
getArchiveReader(String, InputStream, boolean) - Method in class org.archive.io.arc.ARCReaderFactory
 
getArchiveReader(String) - Method in class org.archive.io.ArchiveReaderFactory
 
getArchiveReader(String, long) - Method in class org.archive.io.ArchiveReaderFactory
 
getArchiveReader(File) - Method in class org.archive.io.ArchiveReaderFactory
 
getArchiveReader(File, long) - Method in class org.archive.io.ArchiveReaderFactory
 
getArchiveReader(String, InputStream, boolean) - Method in class org.archive.io.ArchiveReaderFactory
 
getArchiveReader(URL, long) - Method in class org.archive.io.ArchiveReaderFactory
 
getArchiveReader(URL) - Method in class org.archive.io.ArchiveReaderFactory
 
getArchiveReader(File, long) - Method in class org.archive.io.warc.WARCReaderFactory
 
getArchiveReader(String, InputStream, boolean) - Method in class org.archive.io.warc.WARCReaderFactory
 
getAt(long) - Method in class org.archive.util.AbstractLongFPSet
Get the stored value at the given slot.
getAt(long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
getAttribute(String) - Method in class org.archive.crawler.admin.CrawlJob
 
getAttribute(String) - Method in class org.archive.crawler.Heritrix
 
getAttribute(String) - Method in class org.archive.crawler.settings.ComplexType
Obtain the value of a specific attribute from the crawl order.
getAttribute(String, CrawlURI) - Method in class org.archive.crawler.settings.ComplexType
Obtain the value of a specific attribute that is valid for a specific CrawlURI.
getAttribute(Object, String) - Method in class org.archive.crawler.settings.ComplexType
Obtain the value of a specific attribute that is valid for a specific CrawlerSettings object.
getAttribute(String) - Method in class org.archive.util.JEApplicationMBean
 
getAttribute(Environment, String) - Method in class org.archive.util.JEMBeanHelper
Get an attribute value for the given environment.
getAttributeEither(CrawlURI, String) - Method in class org.archive.crawler.fetcher.FetchHTTP
Get a value either from inside the CrawlURI instance, or from settings (module attributes).
getAttributeInfo(CrawlerSettings, String) - Method in class org.archive.crawler.settings.ComplexType
Get the effective Attribute info for an element of this type from a settings object.
getAttributeInfo(String) - Method in class org.archive.crawler.settings.ComplexType
Get the Attribute info for an element of this type from the global settings.
getAttributeInfo(String) - Method in class org.archive.crawler.settings.DataContainer
 
getAttributeInfoIterator(Object) - Method in class org.archive.crawler.settings.ComplexType
Get an Iterator over all the MBeanAttributeInfo in this ComplexType.
getAttributeList(Environment) - Method in class org.archive.util.JEMBeanHelper
Get MBean attribute metadata for this environment.
getAttributes(String[]) - Method in class org.archive.crawler.admin.CrawlJob
 
getAttributes(String[]) - Method in class org.archive.crawler.Heritrix
 
getAttributes(String[]) - Method in class org.archive.crawler.settings.ComplexType
 
getAttributes(String[]) - Method in class org.archive.util.JEApplicationMBean
 
getAttributeUnchecked(String) - Method in class org.archive.crawler.framework.WriterPoolProcessor
Version of getAttributes that catches and logs exceptions and returns null if failure to fetch the attribute.
getAudience() - Method in class org.archive.crawler.settings.CrawlerSettings
Get the audience/customer/recipient of the crawl job product from this CrawlerSettings object.
getAudience() - Method in class org.archive.crawler.settings.refinements.Refinement
 
getAuthorityMinusUserinfo() - Method in class org.archive.net.UURI
Return the authority minus userinfo (if any).
getAuthScheme(HttpMethod, CrawlURI) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
getAuxiliaryDirectory() - Method in class org.archive.io.ObjectPlusFilesInputStream
Return the top auxiliary directory, from which saved files are restored.
getAuxiliaryDirectory() - Method in class org.archive.io.ObjectPlusFilesOutputStream
Return the current auxiliary directory for storing files associated with serialized objects.
getAverageDepth() - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
Returns the average depth of all the HQs in this list
getBandwidthKbytesPerSec() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getBaseFilename() - Method in class org.archive.io.WriterPoolMember
Get the file name
getBaseURI() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the (HTML) Base URI used for derelativizing internal URIs.
getBATBlockNumber(int) - Method in class org.archive.util.ms.HeaderBlock
 
getBATCount() - Method in class org.archive.util.ms.HeaderBlock
 
getBdbEnvironment() - Method in class org.archive.crawler.framework.CrawlController
 
getBdbLogFileName(long) - Method in class org.archive.crawler.framework.CrawlController
 
getBdbSubDirectory(File) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getBigMap(String, Class<? super V>) - Method in class org.archive.crawler.framework.CrawlController
Call this method to get instance of the crawler BigMap implementation.
getBit(long) - Method in interface org.archive.util.BloomFilter
 
getBit(long) - Method in class org.archive.util.BloomFilter64bit
Returns from the local bitvector the value of the bit with the specified index.
getBodyOffset() - Method in class org.archive.io.arc.ARCRecord
 
getBooleanProperty(String) - Static method in class org.archive.util.PropertyUtils
 
getBufferedInput(File) - Static method in class org.archive.crawler.io.CrawlerJournal
Get a BufferedInputStream on the recovery file given.
getBufferedReader(File) - Static method in class org.archive.crawler.io.CrawlerJournal
Get a BufferedReader on the crawler journal given
getBufferedReader(URL) - Static method in class org.archive.crawler.io.CrawlerJournal
Get a BufferedReader on the crawler journal given.
getByRealm(Set, String, CrawlURI) - Static method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
Convenience method that does look up on passed set using realm for key.
getByRegExpr(String, String, int, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegExpr(InputStreamReader, String, int, boolean, int, int, long) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegExpr(String, String, String, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegExpr(InputStreamReader, String, String, boolean, int, int, long) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegExprFromSeries(String, String, int, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegExprFromSeries(String, String, String, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getBytesPerFileType(String) - Method in class org.archive.crawler.admin.StatisticsTracker
Returns the accumulated number of bytes from files of a given file type.
getBytesPerHost(String) - Method in class org.archive.crawler.admin.StatisticsSummary
Returns the accumulated number of bytes downloaded from a given host.
getBytesPerHost(String) - Method in class org.archive.crawler.admin.StatisticsTracker
Returns the accumulated number of bytes downloaded from a given host.
getBytesPerMimeType(String) - Method in class org.archive.crawler.admin.StatisticsSummary
Returns the accumulated number of bytes from files of a given file type.
getBytesPerTld(String) - Method in class org.archive.crawler.admin.StatisticsSummary
Returns the total number of bytes downloaded for a given TLD.
getCacheMisses() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
getCandidateSurt(Object) - Static method in class org.archive.util.SurtPrefixSet
Calculate the SURT form URI to use as a candidate against prefixes from the given Object (CandidateURI or UURI)
getCandidateURIString() - Method in class org.archive.crawler.datamodel.CandidateURI
 
getCause() - Method in exception org.archive.io.RecoverableIOException
 
getCBM(String, Class<? super V>) - Method in class org.archive.crawler.framework.CrawlController
Deprecated.  
getCharacterEncoding() - Method in class org.archive.util.HttpRecorder
 
getCharPosLimit() - Method in class org.archive.util.ms.Piece
 
getCharPosStart() - Method in class org.archive.util.ms.Piece
 
getCharSequence() - Method in interface org.archive.extractor.CharSequenceProvider
 
getCheckpointCopyBdbjeLogs() - Method in class org.archive.crawler.framework.CrawlController
 
getCheckpointInProgressDirectory() - Method in class org.archive.crawler.framework.Checkpointer
 
getCheckpointRecover() - Method in class org.archive.crawler.framework.CrawlController
Get recover checkpoint.
getCheckpointRecover(CrawlOrder) - Static method in class org.archive.crawler.framework.CrawlController
 
getCheckpointsDirectory() - Method in class org.archive.crawler.datamodel.CrawlOrder
 
getCheckpointsDisk() - Method in class org.archive.crawler.framework.CrawlController
 
getCheckpointStateFile() - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
getChild() - Method in class org.archive.util.ms.DefaultEntry
 
getChild() - Method in interface org.archive.util.ms.Entry
 
getClassCatalog() - Method in class org.archive.crawler.framework.CrawlController
Deprecated. use EnhancedEnvironment's getClassCatalog() instead
getClassCatalog() - Method in class org.archive.util.bdbje.EnhancedEnvironment
Return a StoredClassCatalog backed by a Database in this environment, either pre-existing or created (and cached) if necessary.
getClassCheckpointFile(File, String, Class) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getClassCheckpointFile(File, Class) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getClassCheckpointFilename(Class) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getClassCheckpointFilename(Class, String) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getClassKey() - Method in class org.archive.crawler.datamodel.CandidateURI
Get the token (usually the hostname + port) which indicates what "class" this CrawlURI should be grouped with, for the purposes of ensuring only one item of the class is processed at once, all items of the class are held for a politeness period, etc.
getClassKey(CandidateURI) - Method in interface org.archive.crawler.framework.Frontier
 
getClassKey(CandidateURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getClassKey(CandidateURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
getClassKey(CrawlController, CandidateURI) - Method in class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
 
getClassKey(CrawlController, CandidateURI) - Method in class org.archive.crawler.frontier.HostnameQueueAssignmentPolicy
 
getClassKey(CrawlController, CandidateURI) - Method in class org.archive.crawler.frontier.IPQueueAssignmentPolicy
 
getClassKey(CrawlController, CandidateURI) - Method in class org.archive.crawler.frontier.QueueAssignmentPolicy
Get the String key (name) of the queue to which the CrawlURI should be assigned.
getClassKey(CrawlController, CandidateURI) - Method in class org.archive.crawler.frontier.SurtAuthorityQueueAssignmentPolicy
 
getClassKey(CrawlController, CandidateURI) - Method in class org.archive.crawler.frontier.TopmostAssignedSurtQueueAssignmentPolicy
 
getClassKey() - Method in class org.archive.crawler.frontier.WorkQueue
 
getClassName(String) - Static method in class org.archive.crawler.settings.SettingsHandler
 
getClasspathPath(File) - Static method in class org.archive.util.IoUtils
 
getClassSimpleName(Class) - Method in class org.archive.crawler.datamodel.CrawlURI
 
getCommand(URL, File) - Method in class org.archive.net.DownloadURLConnection
 
getCommand(URL, File) - Method in class org.archive.net.rsync.RsyncURLConnection
 
getCommandLine() - Method in class org.archive.crawler.CommandLineParser
 
getCommandLineArguments() - Method in class org.archive.crawler.CommandLineParser
 
getCommandLineOptions() - Method in class org.archive.crawler.CommandLineParser
 
getCompletedJobs() - Method in class org.archive.crawler.admin.CrawlJobHandler
 
getComplexType() - Method in class org.archive.crawler.settings.DataContainer
Get the ComplexType for which this DataContainer keeps data.
getComplexTypeByAbsoluteName(CrawlerSettings, String) - Method in class org.archive.crawler.settings.SettingsHandler
Get a complex type by its absolute name.
getCompoundName(String) - Static method in class org.archive.util.JndiUtils
 
getCompoundName(ObjectName) - Static method in class org.archive.util.JndiUtils
Return name to use as jndi name.
getConfdir() - Static method in class org.archive.crawler.Heritrix
Get the configuration directory.
getConfdir(boolean) - Static method in class org.archive.crawler.Heritrix
Get the configuration directory.
getConfiguredImplementation(Object) - Method in class org.archive.crawler.deciderules.ExternalGeoLocationDecideRule
Get implementation, if one specified.
getConfiguredImplementation(Object) - Method in class org.archive.crawler.deciderules.ExternalImplDecideRule
Get implementation, if one specified.
getCongestionRatio() - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
Returns the congestion ratio.
getConnection() - Method in class org.archive.httpclient.HttpRecorderMethod
 
getConnection(HostConfiguration) - Method in class org.archive.httpclient.ThreadLocalHttpConnectionManager
 
getConnection(HostConfiguration, long) - Method in class org.archive.httpclient.ThreadLocalHttpConnectionManager
Deprecated. Use #getConnectionWithTimeout(HostConfiguration, long)
getConnectionWithTimeout(HostConfiguration, long) - Method in class org.archive.httpclient.SingleHttpConnectionManager
 
getConnectionWithTimeout(HostConfiguration, long) - Method in class org.archive.httpclient.ThreadLocalHttpConnectionManager
 
getConstraints() - Method in class org.archive.crawler.settings.MapType
 
getConstraints() - Method in class org.archive.crawler.settings.Type
Returns a list of constraints for the value of this type.
getContentBegin() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getContentBegin() - Method in interface org.archive.io.ArchiveRecordHeader
Offset at which the content begins.
getContentBegin() - Method in class org.archive.io.RecordingInputStream
 
getContentBegin() - Method in class org.archive.io.RecordingOutputStream
Return stored content-begin-mark (which is also end-of-headers)
getContentDigest() - Method in class org.archive.crawler.datamodel.CrawlURI
Return the retained content-digest value, if any.
getContentDigestSchemeString() - Method in class org.archive.crawler.datamodel.CrawlURI
 
getContentDigestString() - Method in class org.archive.crawler.datamodel.CrawlURI
 
getContentHandler() - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
getContentLength() - Method in class org.archive.crawler.datamodel.CrawlURI
For completed HTTP transactions, the length of the content-body.
getContentLengthTreshold(Object) - Method in class org.archive.crawler.deciderules.ExceedsDocumentLengthTresholdDecideRule
 
getContentLengthTreshold(Object) - Method in class org.archive.crawler.deciderules.NotExceedsDocumentLengthTresholdDecideRule
 
getContentReplayInputStream() - Method in class org.archive.io.RecordingInputStream
 
getContentReplayInputStream() - Method in class org.archive.io.RecordingOutputStream
Return a replay stream, cued up to begining of content
getContentReplayPrefixString(int) - Method in class org.archive.util.HttpRecorder
Return a short prefix of the presumed-textual content as a String.
getContentSize() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the size in bytes of this URI's recorded content, inclusive of things like protocol headers.
getContentSize() - Method in class org.archive.io.ReplayInputStream
Total size of content.
getContentType() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the content type of this URI.
getContentType() - Method in class org.archive.crawler.settings.MapType
Get the content type allowed for this map.
getContext() - Method in class org.archive.crawler.extractor.Link
 
getContextUURI(WorkQueueFrontier) - Method in class org.archive.crawler.frontier.WorkQueue
 
getControlConversation() - Method in class org.archive.net.ClientFTP
 
getController() - Method in class org.archive.crawler.admin.CrawlJob
 
getController() - Method in class org.archive.crawler.datamodel.CrawlOrder
 
getController() - Method in class org.archive.crawler.deciderules.DecideRule
Get the controller object.
getController() - Method in class org.archive.crawler.framework.Checkpointer.CheckpointingThread
 
getController() - Method in class org.archive.crawler.framework.Checkpointer
 
getController() - Method in class org.archive.crawler.framework.Processor
Get the controller object.
getController() - Method in class org.archive.crawler.framework.ToePool
 
getController() - Method in class org.archive.crawler.framework.ToeThread
Get the CrawlController acossiated with this thread.
getCookieValue(Cookie[], String, String) - Static method in class org.archive.crawler.admin.ui.CookieUtils
 
getCount() - Method in interface org.archive.crawler.framework.AlertManager
 
getCount() - Method in class org.archive.crawler.frontier.WorkQueue
 
getCount() - Method in class org.archive.io.SinkHandler
 
getCountryCode() - Method in class org.archive.crawler.datamodel.CrawlHost
Get country code of this host
getCrawlDelay() - Method in class org.archive.crawler.datamodel.RobotsDirectives
 
getCrawlDelay(String) - Method in class org.archive.crawler.datamodel.RobotsExclusionPolicy
Get the crawl-delay that applies to the given user-agent, or -1 (indicating no crawl-delay known) if not internal RobotsTxt instance.
getCrawlendReport(String, String) - Method in class org.archive.crawler.Heritrix
Return named crawl end report for job with passed uid.
getCrawlEndTime() - Method in class org.archive.crawler.framework.AbstractTracker
If crawl has ended it will return the time it ended (given by System.currentTimeMillis() at that time).
getCrawlerTotalElapsedTime() - Method in class org.archive.crawler.framework.AbstractTracker
 
getCrawlerTotalElapsedTime() - Method in interface org.archive.crawler.framework.StatisticsTracking
Total amount of time spent actively crawling so far.
getCrawlJob() - Method in class org.archive.crawler.admin.CrawlJob.MBeanCrawlController
 
getCrawlJob() - Static method in class org.archive.crawler.selftest.SelfTestCase
 
getCrawlJobDir() - Static method in class org.archive.crawler.selftest.SelfTestCase
 
getCrawlOrderAttribute(String) - Method in class org.archive.crawler.admin.CrawlJob
 
getCrawlOrderAttribute(String, ComplexType) - Method in class org.archive.crawler.admin.CrawlJob
 
getCrawlOrderName() - Method in class org.archive.crawler.datamodel.CrawlOrder
Get the name of the order file.
getCrawlPauseStartedTime() - Method in class org.archive.crawler.framework.AbstractTracker
Get the time when the the crawl was last paused/suspended (as given by System.currentTimeMillis() at that time).
getCrawlStartTime() - Method in class org.archive.crawler.framework.AbstractTracker
Get the starting time of the crawl (as given by System.currentTimeMillis() when the crawl started).
getCrawlStatus() - Method in class org.archive.crawler.admin.CrawlJob
 
getCrawlTotalPauseTime() - Method in class org.archive.crawler.framework.AbstractTracker
Returns the number of milliseconds that the crawl spent paused or otherwise in a nonactive state.
getCrawlURI(ARCRecord, HttpRecorder) - Method in class org.archive.crawler.extractor.ExtractorTool
 
getCrawlURI(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Returns the CrawlURI associated with the specified URI (string) or null if no such CrawlURI is queued in this HQ.
getCrawlURIString() - Method in class org.archive.crawler.datamodel.CrawlURI
 
getCreateTimestamp() - Method in class org.archive.io.WriterPoolMember
 
getCreationTime() - Method in class org.archive.io.SinkHandlerLogRecord
 
getCredential(SettingsHandler, CrawlURI) - Method in class org.archive.crawler.datamodel.credential.CredentialAvatar
 
getCredentialAvatars() - Method in class org.archive.crawler.datamodel.CrawlServer
 
getCredentialAvatars() - Method in class org.archive.crawler.datamodel.CrawlURI
 
getCredentialDomain(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
 
getCredentialStore(SettingsHandler) - Static method in class org.archive.crawler.datamodel.CredentialStore
Get a credential store reference.
getCredentialTypes() - Static method in class org.archive.crawler.datamodel.CredentialStore
 
getCurrentJob() - Method in class org.archive.crawler.admin.CrawlJobHandler
 
getCurrentProcessorName() - Method in class org.archive.crawler.framework.ToeThread
 
getCurrentRecord() - Method in class org.archive.io.ArchiveReader
 
getCustomRobots(CrawlerSettings) - Method in class org.archive.crawler.datamodel.RobotsHonoringPolicy
Get the supplied custom robots.txt
getData(ComplexType) - Method in class org.archive.crawler.settings.CrawlerSettings
 
getData(String) - Method in class org.archive.crawler.settings.CrawlerSettings
 
getDatabaseConfig() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
getDatabaseName() - Method in class org.archive.util.CachedBdbMap
Deprecated.  
getDatabaseName() - Method in class org.archive.util.ObjectIdentityBdbCache
 
getDataContainerRecursive(ComplexType.Context) - Method in class org.archive.crawler.settings.ComplexType
Get the active data container for this ComplexType for a specific settings object.
getDataContainerRecursive(ComplexType.Context, String) - Method in class org.archive.crawler.settings.ComplexType
Get the active data container for this ComplexType for a specific settings object.
getDate() - Method in class org.archive.io.arc.ARCRecordMetaData
Get the time when the record was harvested.
getDate() - Method in interface org.archive.io.ArchiveRecordHeader
Get the time when the record was created.
getDate(String) - Static method in class org.archive.util.ArchiveUtils
Parses an ARC-style date.
getDecideRule(Object) - Method in class org.archive.crawler.deciderules.DecidingFilter
 
getDecideRule(Object) - Method in class org.archive.crawler.deciderules.DecidingScope
 
getDecideRule(Object) - Method in class org.archive.crawler.framework.Processor
 
getDeepestQueueSize() - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
Returns the size of the largest (deepest) queue.
getDefaultMaxFileSize() - Method in class org.archive.crawler.framework.WriterPoolProcessor
Default maximum file size.
getDefaultMaxFileSize() - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
getDefaultMaxFileSize() - Method in class org.archive.crawler.writer.WARCWriterProcessor
 
getDefaultMessage() - Method in class org.archive.crawler.settings.Constraint
Get the default message to return if a check fails.
getDefaultNextProcessor(CrawlURI) - Method in class org.archive.crawler.framework.Processor
Returns the next processor for the given CrawlURI in the processor chain.
getDefaultPath() - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
getDefaultPath() - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
getDefaultPath() - Method in class org.archive.crawler.writer.WARCWriterProcessor
 
getDefaultProfile() - Method in class org.archive.crawler.admin.CrawlJobHandler
Returns the default profile.
getDefaultValue() - Method in class org.archive.crawler.settings.ComplexType
 
getDefaultValue() - Method in class org.archive.crawler.settings.ListType
 
getDefaultValue() - Method in class org.archive.crawler.settings.ModuleAttributeInfo
 
getDefaultValue() - Method in class org.archive.crawler.settings.SimpleType
 
getDefaultValue() - Method in class org.archive.crawler.settings.Type
The default value for this type
getDeferrals() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the deferral count.
getDefinition(String) - Method in class org.archive.crawler.settings.ComplexType
Get the content type definition for an attribute.
getDefinition() - Method in class org.archive.crawler.settings.Constraint.FailedCheck
Get the definition for the checked attribute.
getDefinition(String) - Method in class org.archive.crawler.settings.MapType
Get the content type definition for attributes of this map.
getDeleteFileOnCloseReader(File) - Method in class org.archive.io.arc.ARCReader
 
getDeleteFileOnCloseReader(File) - Method in class org.archive.io.ArchiveReader
 
getDeleteFileOnCloseReader(File) - Method in class org.archive.io.warc.WARCReader
 
getDescription() - Method in class org.archive.crawler.settings.ComplexType
Get the description of this type The description should be suitable for showing in a user interface.
getDescription() - Method in class org.archive.crawler.settings.CrawlerSettings
Get the description of this CrawlerSettings object.
getDescription() - Method in class org.archive.crawler.settings.ListType
 
getDescription() - Method in interface org.archive.crawler.settings.refinements.Criteria
Returns a description of the Criteria's current settings.
getDescription() - Method in class org.archive.crawler.settings.refinements.PortnumberCriteria
 
getDescription() - Method in class org.archive.crawler.settings.refinements.Refinement
Return the description of this refinement.
getDescription() - Method in class org.archive.crawler.settings.refinements.RegularExpressionCriteria
 
getDescription() - Method in class org.archive.crawler.settings.refinements.TimespanCriteria
 
getDescription() - Method in class org.archive.crawler.settings.SimpleType
 
getDescription() - Method in class org.archive.crawler.settings.Type
Get the description of this type The description should be suitable for showing in a user interface.
getDestination() - Method in class org.archive.crawler.extractor.Link
 
getDigest() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getDigest() - Method in interface org.archive.io.ArchiveRecordHeader
 
getDigest4Cdx(ArchiveRecordHeader) - Method in class org.archive.io.arc.ARCRecord
 
getDigest4Cdx(ArchiveRecordHeader) - Method in class org.archive.io.ArchiveRecord
 
getDigestStr() - Method in class org.archive.io.ArchiveRecord
 
getDigestValue() - Method in class org.archive.io.RecordingInputStream
Return the digest value for any recorded, digested data.
getDigestValue() - Method in class org.archive.io.RecordingOutputStream
Return the digest value for any recorded, digested data.
getDirectivesFor(String) - Method in class org.archive.crawler.datamodel.Robotstxt
 
getDirectory() - Method in class org.archive.crawler.admin.CrawlJob
Returns the path of the job's base directory.
getDirectory() - Method in class org.archive.crawler.datamodel.Checkpoint
 
getDisk() - Method in class org.archive.crawler.framework.CrawlController
Get the 'working' directory of the current crawl.
getDisplayName() - Method in class org.archive.crawler.admin.CrawlJob
Return the combination of given name and UID most commonly used in administrative interface.
getDisplayName() - Method in class org.archive.crawler.datamodel.Checkpoint
 
getDisposition() - Method in class org.archive.crawler.admin.SeedRecord
 
getDiversionLog(String) - Method in class org.archive.crawler.processor.CrawlMapper
Get the diversion log for a given target crawler node node.
getDnsMimeDistribution() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getDNSRecord(long, Record[]) - Method in class org.archive.crawler.fetcher.FetchDNS
 
getDnsStatusCodeDistribution() - Method in class org.archive.crawler.admin.StatisticsSummary
Return a HashMap representing the distribution of DNS status codes for successfully fetched curis, as represented by a hashmap where key -> val represents (string)code -> (integer)count.
getDocument(File) - Static method in class org.archive.util.XmlUtils
Parse a DOM Document from the given XML file.
getDomainOverrides(String) - Method in class org.archive.crawler.settings.SettingsHandler
Will return a Collection of strings with domains that contain 'per' domain overrides (or their subdomains contain them).
getDomainOverrides(String) - Method in class org.archive.crawler.settings.XMLSettingsHandler
 
getDotFileExtension() - Method in class org.archive.io.arc.ARCReader
 
getDotFileExtension() - Method in class org.archive.io.ArchiveReader
 
getDotFileExtension() - Method in class org.archive.io.warc.WARCReader
 
getDTDHandler() - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
getDupByHashBytes() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getDupByHashUrls() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getDurationTime() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getEarliestNextURIEmitTime() - Method in class org.archive.crawler.datamodel.CrawlHost
Get the earliest time a URI for this host could be emitted.
getElement() - Method in exception org.archive.crawler.framework.exceptions.ConfigurationException
 
getElementFromDefinition(String) - Method in class org.archive.crawler.settings.ComplexType
Get an element definition from this complex type.
getEmbedHopCount() - Method in class org.archive.crawler.datamodel.CrawlURI
Deprecated.  
getEntityResolver() - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
getEntriesStart() - Method in class org.archive.util.ms.HeaderBlock
 
getEntry(int) - Method in class org.archive.util.ms.DefaultBlockFileSystem
Returns the entry with the given number.
getEnvironmentHome() - Method in class org.archive.util.JEMBeanHelper
Return the target environment directory.
getEnvironmentIfOpen() - Method in class org.archive.util.JEMBeanHelper
Return an Environment only if the environment has already been opened in this process.
getEnvironmentOpenConfig() - Method in class org.archive.util.JEMBeanHelper
If the helper was instantiated with canConfigure==true, it shows environment configuration attributes.
getError(String) - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
Get error for a specific attribute.
getError(String, Level) - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
Get error for a specific attribute
getErrorHandler() - Method in class org.archive.crawler.admin.CrawlJob
 
getErrorHandler() - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
getErrorMessage() - Method in class org.archive.crawler.admin.CrawlJob
Get the error message associated with this job.
getErrors() - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
Get an List of all the encountered errors.
getErrors(Level) - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
Get an List of all the encountered errors.
getEscapedURI() - Method in class org.archive.net.UURI
 
getExpectedInserts() - Method in interface org.archive.util.BloomFilter
Report the number of expected inserts used at instantiation time to calculate the bitfield size.
getExpectedInserts() - Method in class org.archive.util.BloomFilter64bit
 
getExtendedBATCount() - Method in class org.archive.util.ms.HeaderBlock
 
getExtendedBATStart() - Method in class org.archive.util.ms.HeaderBlock
 
getExtractFromDirs(CrawlURI) - Method in class org.archive.crawler.fetcher.FetchFTP
Returns the extract.from.dirs attribute for this FetchFTP and the given curi.
getExtractParent(CrawlURI) - Method in class org.archive.crawler.fetcher.FetchFTP
Returns the extract.parent attribute for this FetchFTP and the given curi.
getFactory() - Static method in class org.archive.uid.GeneratorFactory
 
getFeature(String) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
getFetchAttempts() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the number of attempts at getting the document referenced by this URI.
getFetchBandwidth(CrawlURI) - Method in class org.archive.crawler.fetcher.FetchFTP
Returns the fetch-bandwidth attribute for this FetchFTP and the given curi.
getFetchDisregards() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getFetchDuration() - Method in class org.archive.crawler.datamodel.CrawlURI
 
getFetchNonResponses() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getFetchResponses() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getFetchStatus() - Method in class org.archive.crawler.datamodel.CrawlURI
Return the overall/fetch status of this CrawlURI for its current trip through the processing loop.
getFetchSuccesses() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getFextra() - Method in class org.archive.io.GzipHeader
 
getFile() - Method in exception org.archive.crawler.framework.exceptions.ConfigurationException
 
getFile() - Method in class org.archive.crawler.writer.MirrorWriterProcessor.URIToFileReturn
Gets this path as a File.
getFile() - Method in class org.archive.io.WriterPoolMember
Get this file.
getFile() - Method in class org.archive.net.DownloadURLConnection
 
getFileDistribution() - Method in class org.archive.crawler.admin.StatisticsTracker
Returns a HashMap that contains information about distributions of encountered mime types.
getFileExtension() - Method in class org.archive.io.arc.ARCReader
 
getFileExtension() - Method in class org.archive.io.ArchiveReader
 
getFileExtension() - Method in class org.archive.io.warc.WARCReader
 
getFileName() - Method in class org.archive.io.ArchiveReader
 
getFilenameSeries() - Method in class org.archive.io.GenerationFileHandler
 
getFilePos() - Method in class org.archive.util.ms.Piece
 
getFilesWithPrefix(File, String) - Static method in class org.archive.util.FileUtils
Get a list of all files in directory that have passed prefix.
getFileType() - Method in class org.archive.util.ms.HeaderBlock
 
getFilterOffPosition(CrawlURI) - Method in class org.archive.crawler.filter.PathDepthFilter
Deprecated.  
getFilterOffPosition(CrawlURI) - Method in class org.archive.crawler.filter.PathologicalPathFilter
Deprecated.  
getFilterOffPosition(CrawlURI) - Method in class org.archive.crawler.framework.Filter
If the filter is disabled, the value returned by this method is what filters return as their disabled setting.
getFirstARecord(Record[]) - Method in class org.archive.crawler.fetcher.FetchDNS
 
getFirstChain() - Method in class org.archive.crawler.framework.ProcessorChainList
Get the first processor chain.
getFirstKey() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
 
getFirstProcessor() - Method in class org.archive.crawler.framework.ProcessorChain
Get the first processor in the chain.
getFirstProcessorChain() - Method in class org.archive.crawler.framework.CrawlController
Get the first processor chain.
getFirstrecordBody(File) - Method in class org.archive.crawler.framework.WriterPoolProcessor
Write the arc metadata body content.
getFirstrecordBody(File) - Method in class org.archive.crawler.writer.WARCWriterProcessor
Return relevant values as header-like fields (here ANVLRecord, but spec-defined "application/warc-fields" type when written).
getFirstrecordStylesheet() - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
getFirstrecordStylesheet() - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
getFirstrecordStylesheet() - Method in class org.archive.crawler.writer.WARCWriterProcessor
 
getFirstWord(String) - Static method in class org.archive.util.TextUtils
 
getFlg() - Method in class org.archive.io.GzipHeader
 
getFormItems(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
getFrom(CrawlURI) - Method in class org.archive.crawler.datamodel.CrawlOrder
 
getFrom(FrontierMarker, int) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
 
getFrom() - Method in class org.archive.crawler.settings.refinements.TimespanCriteria
Get the beginning of the time frame to check against.
getFromSeries(String, int, int) - Static method in class org.archive.crawler.util.LogReader
Gets a portion of a log spread across a numbered series of files.
getFrontier() - Method in class org.archive.crawler.framework.CrawlController
 
getFrontierJournal() - Method in interface org.archive.crawler.framework.Frontier
 
getFrontierJournal() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getFrontierJournal() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
getFrontierOneLine() - Method in class org.archive.crawler.admin.CrawlJob
 
getFrontierReport(String) - Method in class org.archive.crawler.admin.CrawlJob
 
getGlobalSettings() - Method in class org.archive.crawler.settings.SettingsCache
 
getGlobalSettings() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
getGroup(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
Get the 'frontier group' (usually queue) for the given CrawlURI.
getGroup(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
getGroup(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getGzipHeader() - Method in class org.archive.io.GzippedInputStream
 
getHashCount() - Method in interface org.archive.util.BloomFilter
Report the number of internal independent hash function (and thus the number of bits set/checked for each item presented).
getHashCount() - Method in class org.archive.util.BloomFilter64bit
 
getHeader() - Method in class org.archive.io.ArchiveRecord
 
getHeaderFieldKeys() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getHeaderFieldKeys() - Method in interface org.archive.io.ArchiveRecordHeader
 
getHeaderFields() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getHeaderFields() - Method in interface org.archive.io.ArchiveRecordHeader
 
getHeaderSize() - Method in class org.archive.io.ReplayInputStream
Total size of header.
getHeaderString() - Method in class org.archive.io.arc.ARCRecord
 
getHeaderValue(String) - Method in class org.archive.io.arc.ARCRecordMetaData
 
getHeaderValue(String) - Method in interface org.archive.io.ArchiveRecordHeader
 
getHeritrixHome() - Static method in class org.archive.crawler.Heritrix
Exploit -Dheritrix.home if available to us.
getHeritrixOut() - Static method in class org.archive.crawler.Heritrix
 
getHolder() - Method in class org.archive.crawler.datamodel.CrawlURI
Return the 'holder' for the convenience of an external facility.
getHolderCost() - Method in class org.archive.crawler.datamodel.CrawlURI
Return the 'holderCost' for convenience of external facility (frontier)
getHolderKey() - Method in class org.archive.crawler.datamodel.CrawlURI
Return the 'holderKey' for convenience of an external facility (Frontier).
getHopType() - Method in class org.archive.crawler.extractor.Link
 
getHost() - Method in class org.archive.net.UURI
 
getHostAddress(ServerCache, String) - Static method in class org.archive.crawler.fetcher.HeritrixProtocolSocketFactory
Get host address using first the heritrix cache of addresses, then, failing that, go to the dnsjava cache.
getHostAddress(CrawlURI) - Method in class org.archive.crawler.framework.WriterPoolProcessor
Return IP address of given URI suitable for recording (as in a classic ARC 5-field header line).
getHostAddress(String) - Static method in class org.archive.util.DNSJavaUtil
Return an InetAddress for passed host.
getHostBasename() - Method in class org.archive.net.UURI
Strips www variants from the host.
getHostFor(String) - Method in class org.archive.crawler.datamodel.ServerCache
Get the CrawlHost associated with name.
getHostFor(CandidateURI) - Method in class org.archive.crawler.datamodel.ServerCache
Get the CrawlHost associated with curi.
getHostingHeritrix() - Method in class org.archive.crawler.admin.CrawlJob
 
getHostLastFinished(String) - Method in class org.archive.crawler.admin.StatisticsTracker
Returns the time (in millisec) when a URI belonging to a given host was last finished processing.
getHostName() - Method in class org.archive.crawler.datamodel.CrawlHost
Get the host name.
getHostName() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Returns the HQ's name
getHosts() - Method in class org.archive.crawler.SimpleHttpServer
Returns the hosts that the server is listening on.
getHostsDnsDistribution() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getHostsPerTld(String) - Method in class org.archive.crawler.admin.StatisticsSummary
Get the number of hosts with a particular TLD.
getHQ(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Get the AdaptiveRevisitHostQueue for the given CrawlURI, creating it if necessary.
getHQ(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
Get an AdaptiveRevisitHostQueue for the specified host.
getHtdocs() - Static method in class org.archive.crawler.selftest.SelfTestCase
 
getHttp() - Method in class org.archive.crawler.fetcher.FetchHTTP
 
getHttpHeaders() - Method in class org.archive.io.arc.ARCRecord
 
getHttpMethod(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
getHttpRecorder() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the http recorder associated with this uri.
getHttpRecorder() - Method in class org.archive.crawler.framework.ToeThread
Used to get current threads HttpRecorder instance.
getHttpRecorder() - Method in class org.archive.httpclient.HttpRecorderMethod
 
getHttpRecorder() - Static method in class org.archive.util.HttpRecorder
Get the current threads' HttpRecorder.
getHttpRecorder() - Method in interface org.archive.util.HttpRecorderMarker
 
getHttpServer() - Static method in class org.archive.crawler.Heritrix
 
getIgnoredSeeds() - Method in class org.archive.crawler.admin.CrawlJob
Utility method to get the stored list of ignored seed items (if any), from the last time the seeds were imported to the frontier.
getIn() - Method in class org.archive.io.ArchiveReader
 
getIn() - Method in class org.archive.io.ArchiveRecord
 
getIndex() - Method in class org.archive.util.ms.DefaultEntry
 
getIndex() - Method in interface org.archive.util.ms.Entry
 
getInflater() - Method in class org.archive.io.GzippedInputStream
 
getInFromFile(String) - Method in class org.archive.crawler.extractor.PDFParser
Read a file named 'doc' and store its' bytes for later processing.
getInitialMarker(String, boolean) - Method in class org.archive.crawler.admin.CrawlJob
Returns a URIFrontierMarker for the current, paused, job.
getInitialMarker(String, boolean) - Method in class org.archive.crawler.admin.CrawlJobHandler
Returns a URIFrontierMarker for the current, paused, job.
getInitialMarker(String, boolean) - Method in interface org.archive.crawler.framework.Frontier
Get a URIFrontierMarker initialized with the given regular expression at the 'start' of the Frontier.
getInitialMarker(String, boolean) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
getInitialMarker(String, boolean) - Method in class org.archive.crawler.frontier.BdbFrontier
 
getInitialMarker(String) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Get a marker for beginning a scan over all contents
getInputSource() - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
getInputStream(String) - Static method in class org.archive.crawler.util.IoUtils
 
getInputStream(File, String) - Static method in class org.archive.crawler.util.IoUtils
Get inputstream.
getInputStream(File, long) - Method in class org.archive.io.ArchiveReader
Convenience method for constructors.
getInputStream() - Method in class org.archive.io.ArchiveReader
 
getInputStream() - Method in class org.archive.io.GzippedInputStream
 
getInputStream() - Method in class org.archive.net.DownloadURLConnection
 
getInstance() - Static method in class org.archive.io.SinkHandler
 
getInstance(String) - Static method in class org.archive.net.UURIFactory
 
getInstance(String, String) - Static method in class org.archive.net.UURIFactory
 
getInstance(UURI, String) - Static method in class org.archive.net.UURIFactory
 
getInstances() - Static method in class org.archive.crawler.Heritrix
 
getInt(String) - Method in class org.archive.crawler.datamodel.CandidateURI
 
getInterpreter() - Method in class org.archive.crawler.deciderules.BeanShellDecideRule
Get the proper Interpreter instance -- either shared or local to this thread.
getInterpreter() - Method in class org.archive.crawler.processor.BeanShellProcessor
Get the proper Interpreter instance -- either shared or local to this thread.
getIntProperty(String, int) - Static method in class org.archive.util.PropertyUtils
 
getIntValue() - Method in class org.archive.crawler.util.StringIntPair
 
getIP() - Method in class org.archive.crawler.datamodel.CrawlHost
Get the IP address for this host.
getIp() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getIp4Cdx(ArchiveRecordHeader) - Method in class org.archive.io.arc.ARCRecord
 
getIp4Cdx(ArchiveRecordHeader) - Method in class org.archive.io.ArchiveRecord
 
getIpFetched() - Method in class org.archive.crawler.datamodel.CrawlHost
Get the time when the IP address for this host was last looked up.
getIPHostAddress(String) - Static method in class org.archive.util.InetAddressUtil
Returns InetAddress for passed host IF its in IPV4 quads format (e.g.
getIpTTL() - Method in class org.archive.crawler.datamodel.CrawlHost
Get the TTL value from the dns record for this host.
getIPValidityDuration(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
Get the maximum time a dns-record is valid.
getIterator(boolean) - Method in class org.archive.queue.MemQueue
 
getIterator(boolean) - Method in interface org.archive.queue.Queue
Returns an iterator for the queue.
getIteratorOfURLsSuccessfullyCrawledFromSeedUrl(String) - Method in class org.archive.crawler.util.RecoveryLogMapper
 
getJeLogsFilter() - Static method in class org.archive.crawler.util.CheckpointUtils
 
getJmxJobName() - Method in class org.archive.crawler.admin.CrawlJob
 
getJmxObjectName() - Static method in class org.archive.crawler.Heritrix
 
getJmxObjectName(String) - Static method in class org.archive.crawler.Heritrix
 
getJmxObjectName(String, String) - Static method in class org.archive.crawler.Heritrix
 
getJndiContainerName() - Static method in class org.archive.crawler.Heritrix
 
getJndiContext() - Static method in class org.archive.crawler.Heritrix
 
getJob(String) - Method in class org.archive.crawler.admin.CrawlJobHandler
Return a job with the given UID.
getJobHandler() - Method in class org.archive.crawler.Heritrix
Get the job handler
getJobName() - Method in class org.archive.crawler.admin.CrawlJob
Returns this job's 'name'.
getJobPriority() - Method in class org.archive.crawler.admin.CrawlJob
Get this job's level of priority.
getJobsdir() - Static method in class org.archive.crawler.Heritrix
 
getKey(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
 
getKey() - Method in class org.archive.crawler.datamodel.credential.CredentialAvatar
 
getKey(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
getKey(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
getKey() - Method in class org.archive.crawler.settings.SoftSettingsHash.SettingsEntry
 
getLabel() - Method in class org.archive.util.anvl.Element
 
getLargestValue() - Method in class org.archive.util.Histotable
Return the largest value of any key that is larger than 0.
getLastCacheMissDiff() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
getLastChain() - Method in class org.archive.crawler.framework.ProcessorChainList
Get the last processor chain.
getLastSavedTime() - Method in class org.archive.crawler.settings.CrawlerSettings
Get the time when this CrawlerSettings was last saved to persistent storage.
getLegalValues() - Method in class org.archive.crawler.settings.ComplexType
 
getLegalValues() - Method in class org.archive.crawler.settings.ListType
The getLegalValues is not applicable for list so this method will always return null.
getLegalValues() - Method in class org.archive.crawler.settings.ModuleAttributeInfo
 
getLegalValues() - Method in class org.archive.crawler.settings.SimpleType
Get the array of legal values for this Type.
getLegalValues() - Method in class org.archive.crawler.settings.Type
Get the legal values for this type.
getLegalValueType() - Method in class org.archive.crawler.settings.Type
Get the class values of this Type must be an instance of.
getLength() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getLength() - Method in interface org.archive.io.ArchiveRecordHeader
 
getLength() - Method in class org.archive.io.GzipHeader
 
getLength() - Method in class org.archive.util.anvl.ANVLRecord
 
getLevel() - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
 
getLevel() - Method in class org.archive.crawler.settings.Constraint.FailedCheck
Get the severity level.
getLevel() - Method in class org.archive.io.SinkHandlerLogRecord
 
getLinkCount() - Method in class org.archive.crawler.extractor.CrawlUriSWFAction
 
getLinkHopCount() - Method in class org.archive.crawler.datamodel.CrawlURI
Deprecated.  
getLinkRules(Object) - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
 
getListOfAllFiles() - Method in class org.archive.crawler.settings.SettingsHandler
Creates and returns a List of all files comprising the current settings framework.
getListOfAllFiles() - Method in class org.archive.crawler.settings.XMLSettingsHandler
 
getLocalAttribute(CrawlerSettings, String) - Method in class org.archive.crawler.settings.ComplexType
Obtain the value of a specific attribute that is valid for a specific CrawlerSettings object.
getLocalAttributeInfoList() - Method in class org.archive.crawler.settings.DataContainer
 
getLocalizedMessage() - Method in exception org.archive.io.RecoverableIOException
 
getLog14Date() - Static method in class org.archive.util.ArchiveUtils
Utility function for creating log timestamps, in W3C/ISO8601 format, assuming UTC.
getLog14Date(long) - Static method in class org.archive.util.ArchiveUtils
Utility function for creating log timestamps, in W3C/ISO8601 format, assuming UTC.
getLog14Date(Date) - Static method in class org.archive.util.ArchiveUtils
Utility function for creating log timestamps, in W3C/ISO8601 format, assuming UTC.
getLog17Date() - Static method in class org.archive.util.ArchiveUtils
Utility function for creating log timestamps, in W3C/ISO8601 format, assuming UTC.
getLog17Date(long) - Static method in class org.archive.util.ArchiveUtils
Utility function for creating log timestamps, in W3C/ISO8601 format, assuming UTC.
getLogger() - Static method in class org.archive.crawler.util.RecoveryLogMapper
 
getLogger() - Method in class org.archive.io.ArchiveReader
 
getLoggerName() - Method in class org.archive.io.SinkHandlerLogRecord
 
getLoggers() - Method in class org.archive.crawler.datamodel.CrawlOrder
Returns the Map of the StatisticsTracking modules that are included in the configuration that the current instance of this class is representing.
getLogin(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
getLoginUri(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
getLogPath(String) - Method in class org.archive.crawler.admin.CrawlJob
Returns the absolute path of the specified log.
getLogRegistrationMsg(String, MBeanServer, boolean) - Static method in class org.archive.util.JmxUtils
Return a string suitable for logging on registration.
getLogsDir() - Method in class org.archive.crawler.framework.CrawlController
 
getLogsDir() - Static method in class org.archive.crawler.selftest.SelfTestCase
 
getLogUnregistrationMsg(String, MBeanServer) - Static method in class org.archive.util.JmxUtils
 
getLogWriteInterval() - Method in class org.archive.crawler.framework.AbstractTracker
The number of seconds to wait between writing snapshot data to log file.
getLong(String) - Method in class org.archive.crawler.datamodel.CandidateURI
 
getLoopingToes() - Method in class org.archive.crawler.framework.CrawlController
 
getMapOutlinkDecideRule(Object) - Method in class org.archive.crawler.processor.CrawlMapper
 
getMatchDomainURI() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
getMatcher(CharSequence) - Method in class org.archive.util.PatternMatcherRecycler
Get a Matcher for the internal Pattern, against the given input sequence.
getMatcher(String, CharSequence) - Static method in class org.archive.util.TextUtils
Get a matcher object for a precompiled regex pattern.
getMatchExpression() - Method in interface org.archive.crawler.framework.FrontierMarker
Returns the regular expression that this marker uses.
getMatchExpression() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
 
getMatchHostURI() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
getMatchReturnValue(Object) - Method in class org.archive.crawler.filter.URIListRegExpFilter
Deprecated.  
getMaxCharPos() - Method in class org.archive.util.ms.PieceTable
Returns the maximum character position.
getMaxLength(CrawlURI) - Method in class org.archive.crawler.fetcher.FetchFTP
Returns the max-length-bytes attribute for this FetchFTP and the given curi.
getMaxSize() - Method in class org.archive.crawler.framework.WriterPoolProcessor
Max size we want files to be (bytes).
getMaxSize() - Method in interface org.archive.io.WriterPoolSettings
 
getMaxToes() - Method in class org.archive.crawler.datamodel.CrawlOrder
Returns the set number of maximum toe threads.
getMaxToWrite() - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
getMBeanInfo() - Method in class org.archive.crawler.admin.CrawlJob
 
getMBeanInfo() - Method in class org.archive.crawler.Heritrix
 
getMBeanInfo() - Method in class org.archive.crawler.settings.ComplexType
 
getMBeanInfo(Object) - Method in class org.archive.crawler.settings.ComplexType
 
getMBeanInfo() - Method in class org.archive.crawler.settings.DataContainer
 
getMBeanInfo() - Method in class org.archive.util.JEApplicationMBean
 
getMbeanName() - Method in class org.archive.crawler.admin.CrawlJob
 
getMBeanName() - Method in class org.archive.crawler.Heritrix
 
getMBeanServer() - Static method in class org.archive.crawler.Heritrix
Get MBeanServer.
getMessage() - Method in class org.archive.crawler.settings.Constraint.FailedCheck
Get the error message.
getMessage() - Method in exception org.archive.io.RecoverableIOException
 
getMetadata() - Method in class org.archive.crawler.framework.WriterPoolProcessor
Return list of metadatas to add to first arc file metadata record.
getMetaData() - Method in class org.archive.io.arc.ARCRecord
 
getMetadata() - Method in interface org.archive.io.WriterPoolSettings
 
getMetadataHeaderLinesTwoAndThree(String) - Method in class org.archive.io.arc.ARCWriter
 
getMetaDatas() - Static method in class org.archive.crawler.selftest.SelfTestCase
 
getMetaLine(String, String, String, long, long) - Method in class org.archive.io.arc.ARCWriter
 
getMidfetchRule(Object) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
getMimeDistribution() - Method in class org.archive.crawler.admin.StatisticsSummary
Returns a HashMap that contains information about distributions of encountered mime types.
getMimetype() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getMimetype() - Method in interface org.archive.io.ArchiveRecordHeader
 
getMimetype4Cdx(ArchiveRecordHeader) - Method in class org.archive.io.ArchiveRecord
 
getMimetype4Cdx(ArchiveRecordHeader) - Method in class org.archive.io.warc.WARCRecord
 
getModule(String) - Method in class org.archive.crawler.settings.CrawlerSettings
 
getModule(String) - Method in class org.archive.crawler.settings.SettingsHandler
Get a module by name.
getMtime() - Method in class org.archive.io.GzipHeader
 
getName() - Method in class org.archive.crawler.datamodel.Checkpoint
 
getName() - Method in class org.archive.crawler.datamodel.CrawlServer
 
getName() - Method in class org.archive.crawler.settings.CrawlerSettings
Get the name of this CrawlerSettings object.
getName() - Method in interface org.archive.crawler.settings.refinements.Criteria
Returns the name of the Criteria type.
getName() - Method in class org.archive.crawler.settings.refinements.PortnumberCriteria
 
getName() - Method in class org.archive.crawler.settings.refinements.RegularExpressionCriteria
 
getName() - Method in class org.archive.crawler.settings.refinements.TimespanCriteria
 
getName() - Method in interface org.archive.crawler.url.CanonicalizationRule
 
getName() - Method in interface org.archive.io.arc.ARCLocation
 
getName() - Method in class org.archive.util.ms.DefaultEntry
 
getName() - Method in interface org.archive.util.ms.Entry
 
getNeedReset() - Method in class org.archive.util.JEMBeanHelper
Tell the MBean if the available set of functionality has changed.
getNewAlerts() - Method in class org.archive.crawler.Heritrix
 
getNewAlertsCount() - Method in class org.archive.crawler.Heritrix
 
getNewAll() - Method in interface org.archive.crawler.framework.AlertManager
 
getNewCount() - Method in interface org.archive.crawler.framework.AlertManager
 
getNewJob() - Method in class org.archive.crawler.admin.CrawlJobHandler
Get the handler's 'new job'
getNext() - Method in class org.archive.util.ms.DefaultEntry
 
getNext() - Method in interface org.archive.util.ms.Entry
 
getNextBlock(int) - Method in interface org.archive.util.ms.BlockFileSystem
Returns the number of the block that follows the given block.
getNextBlock(int) - Method in class org.archive.util.ms.DefaultBlockFileSystem
 
getNextCheckpoint() - Method in class org.archive.crawler.framework.Checkpointer
 
getNextCheckpointName() - Method in class org.archive.crawler.framework.Checkpointer
 
getNextDirectory(List<File>) - Method in class org.archive.io.WriterPoolMember
 
getNextItemNumber() - Method in interface org.archive.crawler.framework.FrontierMarker
Returns the number of the next match after the marker.
getNextItemNumber() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
 
getNextJobUID() - Method in class org.archive.crawler.admin.CrawlJobHandler
Returns a unique job ID.
getNextNearestItem(DatabaseEntry, DatabaseEntry) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
 
getNextProcessorChain() - Method in class org.archive.crawler.framework.ProcessorChain
Get the processor chain that the URI should be working through after finishing this one.
getNextReadyTime() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Returns the time when the HQ will next be ready to issue a URI.
getNoJmxName() - Method in class org.archive.crawler.Heritrix
 
getNotificationInfo(Environment) - Method in class org.archive.util.JEMBeanHelper
No notifications are supported.
getNotificationsSequenceNumber() - Static method in class org.archive.crawler.admin.CrawlJob
 
getNotModifiedBytes() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getNotModifiedUrls() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getNovelBytes() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getNovelUrls() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getNullOrAttribute(String, Object) - Method in class org.archive.crawler.url.canonicalize.RegexRule
 
getNumActive() - Method in class org.archive.io.WriterPool
 
getNumberOfJournalEntries() - Method in class org.archive.crawler.admin.CrawlJob
 
getNumIdle() - Method in class org.archive.io.WriterPool
 
getObject(String) - Method in class org.archive.crawler.datamodel.CandidateURI
 
getOffset() - Method in interface org.archive.io.arc.ARCLocation
 
getOffset() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getOffset() - Method in interface org.archive.io.ArchiveRecordHeader
 
getOIBC(String, Class<? super V>) - Method in class org.archive.crawler.framework.CrawlController
Implement 'big map' with ObjectIdentityBdbCache.
getOpenType(String) - Static method in class org.archive.util.JmxUtils
 
getOpenType(String, OpenType) - Static method in class org.archive.util.JmxUtils
 
getOperationList(Environment) - Method in class org.archive.util.JEMBeanHelper
Get mbean operation metadata for this environment.
getOperator() - Method in class org.archive.crawler.settings.CrawlerSettings
Get the name of operator of this crawl from this CrawlerSettings object.
getOperator() - Method in class org.archive.crawler.settings.refinements.Refinement
 
getOptions() - Static method in class org.archive.io.ArchiveReader
 
getOrCreateSettingsObject(String) - Method in class org.archive.crawler.settings.SettingsHandler
Get or create CrawlerSettings object for a host or domain.
getOrCreateSettingsObject(String, String) - Method in class org.archive.crawler.settings.SettingsHandler
 
getOrder() - Method in class org.archive.crawler.framework.CrawlController
 
getOrder() - Method in class org.archive.crawler.settings.SettingsHandler
Get the CrawlOrder.
getOrderFile() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
getOrderFile() - Method in class org.archive.crawler.settings.XMLSettingsHandler
Get the File object pointing to the order file.
getOrdinal() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the ordinal (serial number) assigned at creation.
getOrganization() - Method in class org.archive.crawler.settings.CrawlerSettings
Get the name of the organization running this crawl from this CrawlerSettings object.
getOrganization() - Method in class org.archive.crawler.settings.refinements.Refinement
 
getOrUse(K, Supplier<V>) - Method in class org.archive.util.CachedBdbMap
Deprecated. ObjectIdentityCache get-or-atomic-create method.
getOrUse(String, Supplier<V>) - Method in class org.archive.util.ObjectIdentityBdbCache
 
getOrUse(K, Supplier<V>) - Method in interface org.archive.util.ObjectIdentityCache
get the object under the given key/name, using (and remembering) the object supplied by the supplier if no prior mapping exists
getOrUse(String, Supplier<V>) - Method in class org.archive.util.ObjectIdentityMemCache
 
getOs() - Method in class org.archive.io.GzipHeader
 
getOutCandidates() - Method in class org.archive.crawler.datamodel.CrawlURI
Returns discovered candidate URIs.
getOutLinks() - Method in class org.archive.crawler.datamodel.CrawlURI
Returns discovered links.
getOutObjects() - Method in class org.archive.crawler.datamodel.CrawlURI
Returns all of the outbound objects.
getOutputDirs() - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
getOutputDirs() - Method in interface org.archive.io.WriterPoolSettings
 
getOutputStream() - Method in class org.archive.io.WriterPoolMember
 
getOwner() - Method in class org.archive.crawler.settings.Constraint.FailedCheck
Get the ComplexType owning the checked attribute.
getParams() - Method in class org.archive.httpclient.ThreadLocalHttpConnectionManager
Returns parameters associated with this connection manager.
getParent() - Method in class org.archive.crawler.settings.ComplexType
Get the parent of this ComplexType.
getParent() - Method in class org.archive.crawler.settings.CrawlerSettings
Get the parent of this CrawlerSettings object.
getParent(UURI) - Method in class org.archive.crawler.settings.CrawlerSettings
Get the parent of this CrawlerSettings object.
getParentScope(String) - Method in class org.archive.crawler.settings.SettingsHandler
Strip off the leftmost part of a domain name.
getPassword(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
getPath() - Method in class org.archive.net.LaxURI
 
getPathFromSeed() - Method in class org.archive.crawler.datamodel.CandidateURI
 
getPathQuery() - Method in class org.archive.net.LaxURI
 
getPathRelativeToWorkingDirectory(String) - Method in class org.archive.crawler.settings.SettingsHandler
Transforms a relative path so that it is relative to a location that is regarded as a working dir for these settings.
getPathRelativeToWorkingDirectory(String) - Method in class org.archive.crawler.settings.XMLSettingsHandler
Transforms a relative path so that it is relative to the location of the order file.
getPattern() - Method in class org.archive.util.PatternMatcherRecycler
 
getPayload() - Method in class org.archive.crawler.datamodel.credential.CredentialAvatar
 
getPendingExpenditure() - Method in class org.archive.crawler.frontier.WorkQueue
Return the tally of all URI costs currently inside this queue
getPendingJobs() - Method in class org.archive.crawler.admin.CrawlJobHandler
A List of all pending jobs
getPendingURIsList(FrontierMarker, int, boolean) - Method in class org.archive.crawler.admin.CrawlJob
Returns the frontiers URI list based on the provided marker.
getPendingURIsList(FrontierMarker, int, boolean) - Method in class org.archive.crawler.admin.CrawlJobHandler
Returns the frontiers URI list based on the provided marker.
getPerDomainSettings() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
getPerHostSettings() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
getPersistentAList() - Method in class org.archive.crawler.datamodel.CrawlURI
 
getPool() - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
getPoolMaximumActive() - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
getPoolMaximumWait() - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
getPoolState() - Method in class org.archive.io.WriterPool
 
getPoolState(long) - Method in class org.archive.io.WriterPool
 
getPort() - Method in class org.archive.crawler.datamodel.CrawlServer
Get the port number for this server.
getPort() - Method in class org.archive.crawler.SimpleHttpServer
 
getPortNumber() - Method in class org.archive.crawler.settings.refinements.PortnumberCriteria
Get the port number that is to be checked against a URI.
getPosition() - Method in class org.archive.io.ArchiveRecord
 
getPosition() - Method in class org.archive.io.WriterPoolMember
Postion in current physical file.
getPostprocessorChain() - Method in class org.archive.crawler.framework.CrawlController
Get the postprocessor chain.
getPredecessorCheckpoints() - Method in class org.archive.crawler.framework.Checkpointer
 
getPrefix() - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
getPrefix() - Method in interface org.archive.io.WriterPoolSettings
 
getPrefixClassKey(byte[]) - Static method in class org.archive.crawler.frontier.BdbWorkQueue
 
getPrefixes() - Method in class org.archive.crawler.deciderules.ScopePlusOneDecideRule
Synchronized get of prefix set to use
getPrefixes(Object) - Method in class org.archive.crawler.deciderules.ScopePlusOneDecideRule
Synchronized get of prefix set to use.
getPrerequisite(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
Return the authentication URI, either absolute or relative, that serves as prerequisite the passed curi.
getPrerequisite(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
getPrerequisite(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
getPrerequisiteUri() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the prerequisite for this URI.
getPreservedFields() - Method in class org.archive.crawler.settings.ComplexType
Get a list of attribute names that the complex type should attempt to preserve if the module is exchanged with an other one.
getPrevious() - Method in class org.archive.util.ms.DefaultEntry
 
getPrevious() - Method in interface org.archive.util.ms.Entry
 
getProcessedDocsPerSec() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getProcessor(Class) - Method in class org.archive.crawler.framework.ProcessorChain
Get the first processor that is of class classType or a subclass of it.
getProcessorChain(int) - Method in class org.archive.crawler.framework.ProcessorChainList
Get a processor chain by its index in the list of chains.
getProcessorChain(String) - Method in class org.archive.crawler.framework.ProcessorChainList
Get a processor chain by its name.
getProcessorChainList() - Method in class org.archive.crawler.framework.CrawlController
Get the list of processor chains.
getProcessorsReport() - Method in class org.archive.crawler.admin.CrawlJob
Get the Processors report for the running crawl.
getProfiles() - Method in class org.archive.crawler.admin.CrawlJobHandler
Returns a List of all known profiles.
getProgressStatistics() - Method in class org.archive.crawler.admin.StatisticsTracker
 
getProgressStatistics() - Method in interface org.archive.crawler.framework.StatisticsTracking
 
getProgressStatisticsLine(Date) - Method in class org.archive.crawler.admin.StatisticsTracker
Return one line of current progress-statistics
getProgressStatisticsLine() - Method in class org.archive.crawler.admin.StatisticsTracker
Return one line of current progress-statistics
getProgressStatisticsLine() - Method in interface org.archive.crawler.framework.StatisticsTracking
 
getPropertiesInputStream() - Static method in class org.archive.crawler.Heritrix
 
getProperty(String) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
getPropertyOrNull(String) - Static method in class org.archive.util.PropertyUtils
 
getQualifiedRecordID(Map<String, String>) - Method in interface org.archive.uid.Generator
 
getQualifiedRecordID(String, String) - Method in interface org.archive.uid.Generator
 
getQualifiedRecordID(Map<String, String>) - Method in class org.archive.uid.GeneratorFactory
 
getQualifiedRecordID(String, String) - Method in class org.archive.uid.GeneratorFactory
 
getQualifiedRecordID(String, String) - Method in class org.archive.uid.UUIDGenerator
 
getQualifiedRecordID(Map<String, String>) - Method in class org.archive.uid.UUIDGenerator
 
getQueueAssignmentPolicy(CandidateURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getQueueFor(CrawlURI) - Method in class org.archive.crawler.frontier.BdbFrontier
Return the work queue for the given CrawlURI's classKey.
getQueueFor(String) - Method in class org.archive.crawler.frontier.BdbFrontier
Return the work queue for the given classKey, or null if no such queue exists.
getQueueFor(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Return the work queue for the given CrawlURI's classKey.
getQueueFor(String) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Return the work queue for the given classKey, or null if no such queue exists.
getRawInput() - Method in interface org.archive.util.ms.BlockFileSystem
Returns the raw input stream for this file system.
getRawInput() - Method in class org.archive.util.ms.DefaultBlockFileSystem
 
getReaderIdentifier() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getReaderIdentifier() - Method in class org.archive.io.ArchiveReader
 
getReaderIdentifier() - Method in interface org.archive.io.ArchiveRecordHeader
 
getReadReaders() - Static method in class org.archive.crawler.selftest.SelfTestCase
Returns the selftest read ARCReader.
getRealm(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
getRecordedFinishes() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getRecordedInput() - Method in class org.archive.util.HttpRecorder
Return the internal RecordingInputStream
getRecordedOutput() - Method in class org.archive.util.HttpRecorder
 
getRecordedSize() - Method in class org.archive.crawler.datamodel.CrawlURI
Get size of data recorded (transferred)
getRecordID() - Method in class org.archive.crawler.writer.WARCWriterProcessor
 
getRecordID() - Static method in class org.archive.io.warc.WARCWriter
Convenience method for getting Record-Ids.
getRecordID() - Method in interface org.archive.uid.Generator
 
getRecordID() - Method in class org.archive.uid.GeneratorFactory
 
getRecordID() - Method in class org.archive.uid.UUIDGenerator
 
getRecordIdentifier() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getRecordIdentifier() - Method in interface org.archive.io.ArchiveRecordHeader
 
getRedirectUri() - Method in class org.archive.crawler.admin.SeedRecord
 
getReference() - Method in class org.archive.crawler.settings.refinements.Refinement
Get the reference to this refinement's settings object.
getReference(ObjectName) - Static method in class org.archive.util.JndiUtils
 
getReferencedHost() - Method in class org.archive.net.UURI
Return the referenced host in the UURI, if any, also extracting the host of a DNS-lookup URI where necessary.
getRefinement(String) - Method in class org.archive.crawler.settings.CrawlerSettings
Get a refinement with a given reference.
getRegexp(Object) - Method in class org.archive.crawler.deciderules.FetchStatusMatchesRegExpDecideRule
Get the regular expression string to match the URI against.
getRegexp(Object) - Method in class org.archive.crawler.deciderules.HopsPathMatchesRegExpDecideRule
Get the regular expression string to match the URI against.
getRegexp(Object) - Method in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
Use a preset if configured to do so.
getRegexp(Object) - Method in class org.archive.crawler.deciderules.MatchesListRegExpDecideRule
Get the regular expressions list to match the URI against.
getRegexp(Object) - Method in class org.archive.crawler.deciderules.MatchesRegExpDecideRule
Get the regular expression string to match the URI against.
getRegexp(Object) - Method in class org.archive.crawler.deciderules.PathologicalPathDecideRule
Construct the regexp string to be matched against the URI.
getRegexp(Object) - Method in class org.archive.crawler.filter.FilePatternFilter
Deprecated.  
getRegexp(Object) - Method in class org.archive.crawler.filter.PathologicalPathFilter
Deprecated. Construct the regexp string to be matched aginst the URI.
getRegexp(Object) - Method in class org.archive.crawler.filter.URIListRegExpFilter
Deprecated. Get the regular expressions list to match the URI against.
getRegexp(Object) - Method in class org.archive.crawler.filter.URIRegExpFilter
Deprecated. Get the regular expression string to match the URI against.
getRegexp() - Method in class org.archive.crawler.settings.refinements.RegularExpressionCriteria
Get the regular expression to be matched against a URI.
getRegexpFileFilter(String) - Static method in class org.archive.util.FileUtils
Get a @link java.io.FileFilter that filters files based on a regular expression.
getRejectLogRules(Object) - Method in class org.archive.crawler.postprocessor.LinksScoper
 
getRelativePath() - Method in class org.archive.crawler.writer.MirrorWriterProcessor.URIToFileReturn
Gets this path as a relative path from the base directory.
getRemaining() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getRemainingLength() - Method in class org.archive.io.RecordingOutputStream
Return number of bytes that could be recorded without hitting length limit
getReplayCharSequence() - Method in class org.archive.io.RecordingInputStream
 
getReplayCharSequence(String) - Method in class org.archive.io.RecordingInputStream
 
getReplayCharSequence() - Method in class org.archive.io.RecordingOutputStream
 
getReplayCharSequence(String) - Method in class org.archive.io.RecordingOutputStream
 
getReplayCharSequence(String, long) - Method in class org.archive.io.RecordingOutputStream
 
getReplayCharSequence() - Method in class org.archive.util.HttpRecorder
 
getReplayInputStream() - Method in class org.archive.io.RecordingInputStream
 
getReplayInputStream() - Method in class org.archive.io.RecordingOutputStream
 
getReplayInputStream(long) - Method in class org.archive.io.RecordingOutputStream
 
getReplayInputStream() - Method in class org.archive.util.HttpRecorder
 
getReplyStrings() - Method in class org.archive.net.ClientFTP
 
getReports() - Method in class org.archive.crawler.datamodel.CandidateURI
 
getReports() - Method in class org.archive.crawler.framework.CrawlController
 
getReports() - Method in class org.archive.crawler.framework.ToePool
 
getReports() - Method in class org.archive.crawler.framework.ToeThread
 
getReports() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
getReports() - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
 
getReports() - Method in class org.archive.crawler.frontier.WorkQueue
 
getReports() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getReports() - Method in interface org.archive.util.Reporter
Get an array of report names offered by this Reporter.
getResponseContentLength() - Method in class org.archive.io.RecordingInputStream
 
getResponseContentLength() - Method in class org.archive.io.RecordingOutputStream
 
getResponseContentLength() - Method in class org.archive.util.HttpRecorder
 
getResult() - Method in class org.archive.util.ProcessUtils.ProcessResult
 
getReverseSortedCopy(Map<String, AtomicLong>) - Method in class org.archive.crawler.admin.StatisticsSummary
Sort the entries of the given HashMap in descending order by their values, which must be AtomicLongs.
getReverseSortedCopy(Map<String, AtomicLong>) - Method in class org.archive.crawler.admin.StatisticsTracker
Sort the entries of the given HashMap in descending order by their values, which must be longs wrapped with AtomicLong.
getReverseSortedCopy(ObjectIdentityCache<String, AtomicLong>) - Method in class org.archive.crawler.admin.StatisticsTracker
Sort the entries of the given ObjectIdentityCache in descending order by their values, which must be longs wrapped with AtomicLong.
getReverseSortedHostCounts(Map<String, AtomicLong>) - Method in class org.archive.crawler.admin.StatisticsTracker
Return a copy of the hosts distribution in reverse-sorted (largest first) order.
getReverseSortedHostsDistribution() - Method in class org.archive.crawler.admin.StatisticsSummary
Return a copy of the hosts distribution in reverse-sorted (largest first) order.
getReverseSortedHostsDistribution() - Method in class org.archive.crawler.admin.StatisticsTracker
Return a copy of the hosts distribution in reverse-sorted (largest first) order.
getRobots() - Method in class org.archive.crawler.datamodel.CrawlServer
Get the robots exclusion policy for this server.
getRobotsDenials() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getRobotsFetchedTime() - Method in class org.archive.crawler.datamodel.CrawlServer
 
getRobotsHonoringPolicy() - Method in class org.archive.crawler.datamodel.CrawlOrder
This method gets the RobotsHonoringPolicy object from the orders file.
getRobotsValidityDuration(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
Get the maximum time a robots.txt is valid.
getRoot() - Method in interface org.archive.util.ms.BlockFileSystem
Returns the root entry of the file system.
getRoot() - Method in class org.archive.util.ms.DefaultBlockFileSystem
 
getRootWebappName() - Static method in class org.archive.crawler.SimpleHttpServer
 
getRules(Object) - Method in class org.archive.crawler.deciderules.DecideRuleSequence
 
getRuntime(CrawlURI) - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
Returns the amount of time to allow the crawl to run before this processor interrupts.
getSchedulingDirective() - Method in class org.archive.crawler.datamodel.CandidateURI
 
getSchedulingFor(CrawlURI, Link, int) - Method in class org.archive.crawler.postprocessor.LinksScoper
Determine scheduling for the curi.
getScope(Object) - Method in class org.archive.crawler.deciderules.ScopePlusOneDecideRule
Decide whether using host or domain scope
getScope() - Method in class org.archive.crawler.framework.CrawlController
 
getScope() - Method in class org.archive.crawler.settings.CrawlerSettings
Get the scope of this CrawlerSettings object.
getScratchDisk() - Method in class org.archive.crawler.framework.CrawlController
 
getScript() - Method in class org.archive.net.DownloadURLConnection
 
getScript() - Method in class org.archive.net.rsync.RsyncURLConnection
 
getSecondsSinceEpoch(String) - Static method in class org.archive.util.ArchiveUtils
 
getSeedCollection() - Method in class org.archive.crawler.util.RecoveryLogMapper
 
getSeedFile(SettingsHandler) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
 
getSeedfile() - Method in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
Dig through everything to get the crawl-global seeds file.
getSeedfile() - Method in class org.archive.crawler.framework.CrawlScope
 
getSeedForUrl(String) - Method in class org.archive.crawler.util.RecoveryLogMapper
Returns seed for urlString (null if seed not found).
getSeedRecordsSortedByStatusCode() - Method in class org.archive.crawler.admin.StatisticsSummary
Returns sorted Iterator of seeds records based on status code.
getSeedRecordsSortedByStatusCode() - Method in class org.archive.crawler.admin.StatisticsTracker
 
getSeedRecordsSortedByStatusCode(Iterator<String>) - Method in class org.archive.crawler.admin.StatisticsTracker
 
getSeedRecordsSortedByStatusCode() - Method in interface org.archive.crawler.framework.StatisticsTracking
Get a SeedRecord iterator for the job being monitored.
getSeeds() - Method in class org.archive.crawler.admin.StatisticsTracker
Get a seed iterator for the job being monitored.
getSeedStream(SettingsHandler) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
Return seeds as a stream.
getSeedUrlToDiscoveredUrlsMap() - Method in class org.archive.crawler.util.RecoveryLogMapper
 
getSelftestURL() - Static method in class org.archive.crawler.selftest.SelfTestCase
 
getSelftestURLWithTrailingSlash() - Static method in class org.archive.crawler.selftest.SelfTestCase
 
getSerialNo() - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
getSerialNo() - Method in class org.archive.io.WriterPool
Returns the atomic integer used to generate serial numbers for files.
getSerialNumber() - Method in class org.archive.crawler.framework.ToeThread
 
getSerialNumber() - Method in class org.archive.util.TimestampSerialno
 
getServer(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getServer(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
getServer() - Method in class org.archive.crawler.SimpleHttpServer
 
getServerCache() - Method in class org.archive.crawler.framework.CrawlController
 
getServerDetail(MBeanServer) - Static method in class org.archive.util.JmxUtils
 
getServerFor(String) - Method in class org.archive.crawler.datamodel.ServerCache
Get the CrawlServer associated with name, creating if necessary.
getServerFor(CandidateURI) - Method in class org.archive.crawler.datamodel.ServerCache
Get the CrawlServer associated with curi.
getServerKey(CandidateURI) - Static method in class org.archive.crawler.datamodel.CrawlServer
Get key to use doing lookup on server instances.
getServerLogging() - Method in class org.archive.crawler.SimpleHttpServer
Setup log files.
getSessionBalance() - Method in class org.archive.crawler.frontier.WorkQueue
Return current session 'activity budget balance'
getSettings() - Method in class org.archive.crawler.settings.Constraint.FailedCheck
Get the CrawlerSettings for the checked attribute.
getSettings() - Method in class org.archive.crawler.settings.DataContainer
Get the settings object for which this DataContainers data are valid.
getSettings() - Method in class org.archive.crawler.settings.refinements.Refinement
Get the CrawlerSettings object this refinement refers to.
getSettings(String, String) - Method in class org.archive.crawler.settings.SettingsCache
Get the effective settings for a host.
getSettings(String) - Method in class org.archive.crawler.settings.SettingsHandler
Get CrawlerSettings object in effect for a host or domain.
getSettings(String, UURI) - Method in class org.archive.crawler.settings.SettingsHandler
Get CrawlerSettings object in effect for a host or domain.
getSettings() - Method in class org.archive.io.WriterPool
 
getSettingsDir(String) - Method in class org.archive.crawler.datamodel.CrawlOrder
Return fullpath to the directory named by key in settings.
getSettingsDir(String) - Method in class org.archive.crawler.framework.CrawlController
Return fullpath to the directory named by key in settings.
getSettingsDir() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
getSettingsDirectory() - Method in class org.archive.crawler.admin.CrawlJob
Returns the directory where the configuration files for this job are located.
getSettingsForHost(String) - Method in class org.archive.crawler.settings.SettingsHandler
 
getSettingsFromObject(Object, String) - Method in class org.archive.crawler.settings.ComplexType
Get settings object valid for a URI.
getSettingsFromObject(Object) - Method in class org.archive.crawler.settings.ComplexType
Get settings object valid for a URI.
getSettingsHandler() - Method in class org.archive.crawler.admin.CrawlJob
Returns the settings handler for this job.
getSettingsHandler() - Method in class org.archive.crawler.datamodel.CrawlServer
Get the settings handler.
getSettingsHandler() - Method in class org.archive.crawler.framework.CrawlController
 
getSettingsHandler() - Method in class org.archive.crawler.settings.ComplexType
 
getSettingsHandler() - Method in class org.archive.crawler.settings.CrawlerSettings
Get the SettingHandler this CrawlerSettings object belongs to.
getSettingsHandler() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
getSettingsObject(String, String) - Method in class org.archive.crawler.settings.SettingsCache
Get a settings object.
getSettingsObject(String) - Method in class org.archive.crawler.settings.SettingsHandler
Get CrawlerSettings object for a host or domain.
getSettingsObject(String, String) - Method in class org.archive.crawler.settings.SettingsHandler
Get CrawlerSettings object for a host/domain and a particular refinement.
getShortMessage() - Method in class org.archive.io.SinkHandlerLogRecord
 
getShutdownThread(boolean, int, String) - Static method in class org.archive.crawler.Heritrix
 
getSingleInstance() - Static method in class org.archive.crawler.Heritrix
 
getSink() - Method in class org.archive.util.ProcessUtils.StreamGobbler
 
getSize() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Returns the size of the HQ.
getSize() - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
Returns the number of URIs in all the HQs in this list
getSize() - Method in class org.archive.io.RecordingInputStream
 
getSize() - Method in class org.archive.io.RecordingOutputStream
 
getSize() - Method in class org.archive.io.ReplayInputStream
Total size of stream content.
getSizeBytes() - Method in interface org.archive.util.BloomFilter
The amount of memory in bytes consumed by the bloom bitfield.
getSizeBytes() - Method in class org.archive.util.BloomFilter64bit
 
getSlotState(long) - Method in class org.archive.util.AbstractLongFPSet
Check the state of a slot in the storage.
getSlotState(long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
getSmallBATCount() - Method in class org.archive.util.ms.HeaderBlock
 
getSmallBATStart() - Method in class org.archive.util.ms.HeaderBlock
 
getSortedByCounts() - Method in class org.archive.util.Histotable
 
getSortedByKeys() - Method in class org.archive.util.Histotable
 
getSortedDirContent(File, FilenameFilter) - Static method in class org.archive.util.FileUtils
 
getSource() - Method in class org.archive.crawler.extractor.Link
 
getStackTrace() - Method in exception org.archive.io.RecoverableIOException
 
getStartKey() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
 
getStat(Map<String, Map<String, Long>>, String, String) - Static method in class org.archive.io.warc.WARCWriter
 
getState() - Method in class org.archive.crawler.framework.CrawlController
 
getState() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Returns the current state of the HQ.
getStateByName() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Same as getState() except this method returns a human readable name for the state instead of its constant integer value.
getStateDisk() - Method in class org.archive.crawler.framework.CrawlController
 
getStateJobFile(File) - Method in class org.archive.crawler.admin.CrawlJobHandler
Find the state.job file in the job directory.
getStatistics() - Method in class org.archive.crawler.framework.CrawlController
 
getStatisticsTracking() - Method in class org.archive.crawler.admin.CrawlJob
 
getStats() - Method in class org.archive.io.warc.WARCWriter
 
getStatus() - Method in class org.archive.crawler.admin.CrawlJob
Get the current status of this CrawlJob
getStatus() - Method in class org.archive.crawler.Heritrix
 
getStatusCode() - Method in class org.archive.crawler.admin.SeedRecord
 
getStatusCode() - Method in class org.archive.io.arc.ARCRecord
Return status code for this record.
getStatusCode() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getStatusCode4Cdx(ArchiveRecordHeader) - Method in class org.archive.io.arc.ARCRecord
 
getStatusCode4Cdx(ArchiveRecordHeader) - Method in class org.archive.io.ArchiveRecord
 
getStatusCodeDistribution() - Method in class org.archive.crawler.admin.StatisticsSummary
Return a HashMap representing the distribution of HTTP status codes for successfully fetched curis, as represented by a hashmap where key -> val represents (string)code -> (integer)count.
getStatusCodeDistribution() - Method in class org.archive.crawler.admin.StatisticsTracker
Return a HashMap representing the distribution of status codes for successfully fetched curis, as represented by a hashmap where key -> val represents (string)code -> (integer)count.
getStderr() - Method in class org.archive.util.ProcessUtils.ProcessResult
 
getStdout() - Method in class org.archive.util.ProcessUtils.ProcessResult
 
getStep() - Method in class org.archive.crawler.framework.ToeThread
 
getString(String) - Method in class org.archive.crawler.datamodel.CandidateURI
 
getStringValue() - Method in class org.archive.crawler.util.StringIntPair
 
getStrippedFileName() - Method in class org.archive.io.ArchiveReader
 
getStrippedFileName(String, String) - Static method in class org.archive.io.ArchiveReader
 
getSubContext(String) - Static method in class org.archive.util.JndiUtils
Get subcontext.
getSubContext(CompoundName) - Static method in class org.archive.util.JndiUtils
Get subcontext.
getSubDir(String) - Static method in class org.archive.crawler.Heritrix
Get and check for existence of expected subdir.
getSubDir(String, boolean) - Static method in class org.archive.crawler.Heritrix
Get and optionally check for existence of subdir.
getSubstats() - Method in class org.archive.crawler.datamodel.CrawlHost
 
getSubstats() - Method in class org.archive.crawler.datamodel.CrawlServer
 
getSubstats() - Method in interface org.archive.crawler.datamodel.CrawlSubstats.HasCrawlSubstats
 
getSubstats() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
 
getSubstats() - Method in class org.archive.crawler.frontier.WorkQueue
 
getSuccessBytes() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getSuccessfullyCrawledUrls() - Method in class org.archive.crawler.util.RecoveryLogMapper
 
getSuffix() - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
getSuffix() - Method in interface org.archive.io.WriterPoolSettings
 
getSurtAuthority(String) - Method in class org.archive.crawler.frontier.SurtAuthorityQueueAssignmentPolicy
 
getSurtForm() - Method in class org.archive.net.UURI
 
getTestName() - Method in class org.archive.crawler.selftest.SelfTestCase
Calculates test name by stripping SelfTest from current class name.
getText(String) - Static method in class org.archive.util.ms.Doc
Returns the text of the .doc file with the given file name.
getText(File) - Static method in class org.archive.util.ms.Doc
Returns the text of the given .doc file.
getText(SeekInputStream) - Static method in class org.archive.util.ms.Doc
Returns the text of the given .doc file.
getText(BlockFileSystem, int) - Static method in class org.archive.util.ms.Doc
Returns the text for the given .doc file.
getThreadContextSettingsHandler() - Static method in class org.archive.crawler.settings.SettingsHandler
 
getThreadNumber() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the number of the ToeThread responsible for processing this uri.
getThreadOneLine() - Method in class org.archive.crawler.admin.CrawlJob
 
getThreadsReport() - Method in class org.archive.crawler.admin.CrawlJob
Get the CrawlControllers ToeThreads report for the running crawl.
getThrown() - Method in class org.archive.io.SinkHandlerLogRecord
 
getThrownToString() - Method in class org.archive.io.SinkHandlerLogRecord
 
getTimeout(CrawlURI) - Method in class org.archive.crawler.fetcher.FetchFTP
Returns the timeout-seconds attribute for this FetchFTP and the given curi.
getTimestamp() - Method in class org.archive.crawler.datamodel.Checkpoint
 
getTimestamp() - Method in class org.archive.util.TimestampSerialno
 
getTimestampSerialNo() - Method in class org.archive.io.WriterPoolMember
 
getTimestampSerialNo(String) - Method in class org.archive.io.WriterPoolMember
Do static synchronization around getting of counter and timestamp so no chance of a thread getting in between the getting of timestamp and allocation of serial number throwing the two out of alignment.
getTldBytes() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getTldDistribution() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getTldHostDistribution() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getTmpDir() - Method in class org.archive.util.TmpDirTestCase
 
getTo() - Method in class org.archive.crawler.settings.refinements.TimespanCriteria
Get the end of the time frame to check against.
getToeCount() - Method in class org.archive.crawler.framework.CrawlController
 
getToeCount() - Method in class org.archive.crawler.framework.ToePool
 
getToePool() - Method in class org.archive.crawler.framework.CrawlController
 
getTopHQ() - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
 
getTopLevelModule(String) - Method in class org.archive.crawler.settings.CrawlerSettings
 
getTopmostAssignedSurtPrefixPattern() - Static method in class org.archive.net.PublicSuffixes
 
getTopmostAssignedSurtPrefixRegex() - Static method in class org.archive.net.PublicSuffixes
 
getTopmostAssignedSurtPrefixRegex(BufferedReader) - Static method in class org.archive.net.PublicSuffixes
 
getTotal() - Method in class org.archive.util.Histotable
Return the total of all tallies.
getTotalBudget() - Method in class org.archive.crawler.frontier.WorkQueue
Retrieve the total expenditure level allowed by this queue.
getTotalBytes() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getTotalBytesWritten() - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
getTotalDataWritten() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getTotalDnsHostDocuments() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getTotalDnsHostSize() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getTotalDnsMimeSize() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getTotalDnsMimeTypeDocuments() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getTotalDnsStatusCodeDocuments() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getTotalExpenditure() - Method in class org.archive.crawler.frontier.WorkQueue
Return the tally of all expenditures from this queue (dequeued items)
getTotalHostDnsDocuments() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getTotalHostDocuments() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getTotalHosts() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getTotalHostSize() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getTotalMimeSize() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getTotalMimeTypeDocuments() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getTotalScheduled() - Method in class org.archive.crawler.datamodel.CrawlSubstats
 
getTotalStatusCodeDocuments() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getTotalTldDocuments() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getTotalTldSize() - Method in class org.archive.crawler.admin.StatisticsSummary
 
getTransHops() - Method in class org.archive.crawler.datamodel.CandidateURI
Tally up the number of transitive (non-simple-link) hops at the end of this CandidateURI's pathFromSeed.
getTrueOrFalse(String) - Static method in class org.archive.io.ArchiveReader
 
getType() - Method in class org.archive.crawler.datamodel.credential.CredentialAvatar
 
getType(Object) - Method in class org.archive.crawler.datamodel.RobotsHonoringPolicy
Get the policy-type.
getType() - Method in class org.archive.crawler.settings.ModuleAttributeInfo
 
getType() - Method in class org.archive.util.ms.DefaultEntry
 
getType() - Method in interface org.archive.util.ms.Entry
 
getTypeName(String) - Static method in class org.archive.crawler.settings.SettingsHandler
 
getUID() - Method in class org.archive.crawler.admin.CrawlJob
Returns this jobs unique ID (UID) that was issued by the CrawlJobHandler() when this job was first created.
getUid(ObjectName) - Static method in class org.archive.util.JmxUtils
Returns the UID portion of the name key property of an object name representing a "CrawlService.Job" bean.
getUncheckedAttribute(Object, String) - Method in class org.archive.crawler.settings.ComplexType
Obtain the value of a specific attribute that is valid for a specific CrawlerSettings object.
getUnMatchedURI() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
getUnreadCount() - Method in class org.archive.io.SinkHandler
 
getUri() - Method in class org.archive.crawler.admin.SeedRecord
 
getURI() - Method in class org.archive.net.LaxURI
 
getUriCount() - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
The total number of URIs queued in all the HQs belonging to this list.
getURIs() - Method in class org.archive.crawler.extractor.PDFParser
Get a list of URIs retrieved from the Pdf during the extractURIs operation.
getURIsList(FrontierMarker, int, boolean) - Method in interface org.archive.crawler.framework.Frontier
Returns a list of all uncrawled URIs starting from a specified marker until numberOfMatches is reached.
getURIsList(FrontierMarker, int, boolean) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
getURIsList(FrontierMarker, int, boolean) - Method in class org.archive.crawler.frontier.BdbFrontier
Return list of urls.
getURIString() - Method in class org.archive.crawler.datamodel.CandidateURI
Deprecated. Use CandidateURI.toString().
getURL(String, String) - Method in class org.archive.crawler.extractor.CrawlUriSWFAction
Overwrite handling of discovered URIs.
getURL(String, String) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorSWFActions
Overwrite handling of discovered URIs.
getUrl() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getUrl() - Method in interface org.archive.io.ArchiveRecordHeader
 
getUserAgent(CrawlURI) - Method in class org.archive.crawler.datamodel.CrawlOrder
 
getUserAgent() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the user agent to use for crawling this URI.
getUserAgents(CrawlerSettings) - Method in class org.archive.crawler.datamodel.RobotsHonoringPolicy
If policy-type is most favored crawler of set, then this method gets a list of all useragents in that set.
getUserAgents() - Method in class org.archive.crawler.datamodel.Robotstxt
 
getUTF8Bytes() - Method in interface org.archive.io.UTF8Bytes
 
getUTF8Bytes() - Method in class org.archive.util.anvl.ANVLRecord
 
getUTF8Bytes() - Method in class org.archive.util.anvl.ANVLRecords
 
getUURI() - Method in class org.archive.crawler.datamodel.CandidateURI
 
getValue() - Method in class org.archive.crawler.settings.ComplexType
Returns this object.
getValue() - Method in class org.archive.crawler.settings.Constraint.FailedCheck
Get the value of the checked attribute.
getValue() - Method in class org.archive.crawler.settings.ListType
Returns this object.
getValue() - Method in class org.archive.crawler.settings.SoftSettingsHash.SettingsEntry
 
getValue() - Method in class org.archive.util.anvl.Element
 
getVersion() - Method in class org.archive.crawler.CommandLineParser
 
getVersion() - Static method in class org.archive.crawler.Heritrix
Get the heritrix version.
getVersion() - Method in class org.archive.io.arc.ARCReader
Returns version of this ARC file.
getVersion() - Method in class org.archive.io.arc.ARCRecordMetaData
 
getVersion() - Method in class org.archive.io.ArchiveReader
 
getVersion() - Method in interface org.archive.io.ArchiveRecordHeader
 
getVia() - Method in class org.archive.crawler.datamodel.CandidateURI
 
getViaContext() - Method in class org.archive.crawler.datamodel.CandidateURI
 
getWakeTime() - Method in class org.archive.crawler.frontier.WorkQueue
 
getWarsdir() - Static method in class org.archive.crawler.Heritrix
 
getWebappPath(String) - Method in class org.archive.crawler.SimpleHttpServer
Get path to named webapp.
getWorkQueues() - Method in class org.archive.crawler.frontier.BdbFrontier
 
getXfl() - Method in class org.archive.io.GzipHeader
 
getXMLReader() - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
globalSettings() - Method in class org.archive.crawler.settings.ComplexType
Get the global settings object (aka order).
gotoEOR(ArchiveRecord) - Method in class org.archive.io.arc.ARCReader
Skip over any trailing new lines at end of the record so we're lined up ready to read the next.
gotoEOR(ArchiveRecord) - Method in class org.archive.io.arc.ARCReaderFactory.CompressedARCReader
 
gotoEOR(ArchiveRecord) - Method in class org.archive.io.ArchiveReader
Skip over any trailing new lines at end of the record so we're lined up ready to read the next.
gotoEOR(int) - Method in class org.archive.io.GzippedInputStream
Exhaust current GZIP member content.
gotoEOR() - Method in class org.archive.io.GzippedInputStream
Exhaust current GZIP member content.
gotoEOR(ArchiveRecord) - Method in class org.archive.io.warc.WARCReader
Skip over any trailing new lines at end of the record so we're lined up ready to read the next.
gotoEOR(ArchiveRecord) - Method in class org.archive.io.warc.WARCReaderFactory.CompressedWARCReader
 
GROUP - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
gui - Static variable in class org.archive.crawler.Heritrix
True if we're to put up a GUI.
GUI_PORT - Static variable in class org.archive.util.JmxUtils
 
guiHosts - Static variable in class org.archive.crawler.Heritrix
Hosts to bind the GUI webserver to.
guiPort - Static variable in class org.archive.crawler.Heritrix
Port to put the GUI up on.
gzip(byte[]) - Static method in class org.archive.io.GzippedInputStream
Gzip passed bytes.
GZIP_DUMP - Static variable in interface org.archive.io.ArchiveFileConstants
 
GZIP_HEADER_BEGIN - Static variable in interface org.archive.io.arc.ARCConstants
Start of a GZIP header that uses default deflater.
GZIP_SUFFIX - Static variable in class org.archive.crawler.io.CrawlerJournal
suffix to recognize gzipped files
gzipFile - Variable in class org.archive.crawler.io.CrawlerJournal
File we're writing journal to.
GzipHeader - Class in org.archive.io
Read in the GZIP header.
GzipHeader() - Constructor for class org.archive.io.GzipHeader
Shutdown constructor.
GzipHeader(InputStream) - Constructor for class org.archive.io.GzipHeader
Constructor.
gzipMemberSeek(long) - Method in class org.archive.io.GzippedInputStream
Seek to a gzip member.
gzipMemberSeek() - Method in class org.archive.io.GzippedInputStream
 
GzippedInputStream - Class in org.archive.io
Subclass of GZIPInputStream that can handle a stream made of multiple concatenated GZIP members/records.
GzippedInputStream(InputStream) - Constructor for class org.archive.io.GzippedInputStream
 
GzippedInputStream(InputStream, int) - Constructor for class org.archive.io.GzippedInputStream
 

H

handle401(HttpMethod, CrawlURI) - Method in class org.archive.crawler.fetcher.FetchHTTP
Server is looking for basic/digest auth credentials (RFC2617).
handleAddProxyConnectionHeader(HttpMethod) - Method in class org.archive.httpclient.HttpRecorderMethod
If a 'Proxy-Connection' header has been added to the request, it'll be of a 'keep-alive' type.
handleJobAction(CrawlJobHandler, HttpServletRequest, HttpServletResponse, String, String, String) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
Handle job action.
handlePrerequisite(CrawlURI) - Method in class org.archive.crawler.postprocessor.LinksScoper
The CrawlURI has a prerequisite; apply scoping and update Link to CandidateURI in manner analogous to outlink handling.
handlePrerequisites(CrawlURI) - Method in class org.archive.crawler.postprocessor.FrontierScheduler
 
Handler - Class in org.archive.net.md5
A protocol handler for an 'md5' URI scheme.
Handler() - Constructor for class org.archive.net.md5.Handler
 
Handler - Class in org.archive.net.rsync
A protocol handler that uses native rsync client to do copy.
Handler() - Constructor for class org.archive.net.rsync.Handler
 
Handler - Class in org.archive.net.s3
A protocol handler for an s3 scheme.
Handler() - Constructor for class org.archive.net.s3.Handler
 
handleValueError(Constraint.FailedCheck) - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
 
handleValueError(Constraint.FailedCheck) - Method in class org.archive.crawler.settings.CrawlSettingsSAXHandler
 
handleValueError(Constraint.FailedCheck) - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
handleValueError(Constraint.FailedCheck) - Method in interface org.archive.crawler.settings.ValueErrorHandler
 
HARVESTER_KEY - Static variable in interface org.archive.crawler.writer.Kw3Constants
 
hasAttributes() - Method in class org.archive.crawler.settings.DataContainer
 
hasBdbjeLogs() - Method in class org.archive.crawler.datamodel.Checkpoint
 
hasBeenLinkExtracted() - Method in class org.archive.crawler.datamodel.CrawlURI
If true then a link extractor has already claimed this CrawlURI and performed link extraction on the document content.
hasBeenLookedUp() - Method in class org.archive.crawler.datamodel.CrawlHost
Return true if the IP for this host has been looked up.
hasCredentialAvatars() - Method in class org.archive.crawler.datamodel.CrawlServer
 
hasCredentialAvatars() - Method in class org.archive.crawler.datamodel.CrawlURI
 
hasError() - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
Has there been an error with severity (level) equal to or higher then this handlers set level.
hasError(Level) - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
Has there been an error with severity (level) equal to or higher then specified.
hasErrors - Variable in class org.archive.crawler.datamodel.Robotstxt
 
hash(String) - Static method in class org.archive.crawler.settings.SoftSettingsHash
Make hash value from a String.
hash(CharSequence, int, int) - Method in class org.archive.util.BloomFilter64bit
Hashes the given sequence with the given hash function.
HASH_COUNT_KEY - Static variable in class org.archive.crawler.util.BloomUriUniqFilter
 
hashCode() - Method in class org.archive.crawler.datamodel.CrawlHost
 
hashCode() - Method in class org.archive.crawler.datamodel.CrawlServer
 
hashCode() - Method in class org.archive.crawler.fetcher.HeritrixProtocolSocketFactory
All instances of DefaultProtocolSocketFactory have the same hash code.
hashCode() - Method in class org.archive.crawler.fetcher.HeritrixSSLProtocolSocketFactory
 
hashCode() - Method in class org.archive.crawler.settings.TextField
 
HashCrawlMapper - Class in org.archive.crawler.processor
Maps URIs to one of N crawler names by applying a hash to the URI's (possibly-transformed) classKey.
HashCrawlMapper(String) - Constructor for class org.archive.crawler.processor.HashCrawlMapper
Constructor.
hashSet - Variable in class org.archive.crawler.util.MemUriUniqFilter
 
hasIdenticalDigest(CrawlURI) - Static method in class org.archive.crawler.deciderules.recrawl.IdenticalDigestDecideRule
Utility method for testing if a CrawlURI's last two history entiries (one being the most recent fetch) have identical content-digest information.
hasNext() - Method in interface org.archive.crawler.framework.FrontierMarker
Returns false if no more URIs can be found matching the expression beyond those already covered.
hasNext() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
 
hasNext() - Method in class org.archive.crawler.settings.ComplexType.MBeanAttributeInfoIterator
 
hasNext() - Method in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
 
hasNext() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
Test whether any items remain; loads next item into holding 'next' field.
hasNext() - Method in class org.archive.crawler.util.TransformIterator
 
hasNext() - Method in class org.archive.extractor.CharSequenceLinkExtractor
 
hasNext() - Method in class org.archive.io.ArchiveReader.ArchiveRecordIterator
 
hasNext() - Method in class org.archive.util.iterator.CompositeIterator
 
hasNext() - Method in class org.archive.util.iterator.LookaheadIterator
Test whether any items remain; loads next item into holding 'next' field.
hasPrerequisite(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
 
hasPrerequisite(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
hasPrerequisite(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
hasPrerequisiteUri() - Method in class org.archive.crawler.datamodel.CrawlURI
 
hasRefinements() - Method in class org.archive.crawler.settings.CrawlerSettings
Returns true if this settings object has refinements attached to it.
hasRfc2617CredentialAvatar() - Method in class org.archive.crawler.datamodel.CrawlURI
 
hasScheme(String) - Static method in class org.archive.net.UURI
Test if passed String has likely URI scheme prefix.
hasSupportedScheme(String) - Static method in class org.archive.net.UURIFactory
Test of whether passed String has an allowed URI scheme.
HasViaDecideRule - Class in org.archive.crawler.deciderules
Rule applies the configured decision for any URI which has a 'via' (essentially, any URI that was a seed or some kinds of mid-crawl adds).
HasViaDecideRule(String) - Constructor for class org.archive.crawler.deciderules.HasViaDecideRule
Usual constructor.
haveSeen(int, int) - Method in class org.archive.crawler.extractor.PDFParser
Indicates, based on a PDFObject's generation/id pair whether the parser has already encountered this object (or a reference to it) so we don't infinitely loop on circuits within the PDF.
HEADER - Static variable in interface org.archive.io.ArchiveFileConstants
 
header - Variable in class org.archive.io.ArchiveRecord
 
HEADER_FIELD_KEYS - Static variable in interface org.archive.io.warc.WARCConstants
 
HEADER_FIELD_NAME_KEYS - Static variable in class org.archive.io.arc.ARCReader
An array of the header field names found in the ARC file header on the 3rd line.
HEADER_FIELD_SEPARATOR - Static variable in interface org.archive.io.arc.ARCConstants
ARC header field seperator character.
HEADER_FIELD_SEPARATOR - Static variable in interface org.archive.io.warc.WARCConstants
Header field seperator character.
HEADER_KEY_BLOCK_DIGEST - Static variable in interface org.archive.io.warc.WARCConstants
 
HEADER_KEY_CONCURRENT_TO - Static variable in interface org.archive.io.warc.WARCConstants
 
HEADER_KEY_DATE - Static variable in interface org.archive.io.warc.WARCConstants
 
HEADER_KEY_ETAG - Static variable in interface org.archive.io.warc.WARCConstants
 
HEADER_KEY_FILENAME - Static variable in interface org.archive.io.warc.WARCConstants
 
HEADER_KEY_ID - Static variable in interface org.archive.io.warc.WARCConstants
 
HEADER_KEY_IP - Static variable in interface org.archive.io.warc.WARCConstants
 
HEADER_KEY_LAST_MODIFIED - Static variable in interface org.archive.io.warc.WARCConstants
 
HEADER_KEY_PAYLOAD_DIGEST - Static variable in interface org.archive.io.warc.WARCConstants
 
HEADER_KEY_PROFILE - Static variable in interface org.archive.io.warc.WARCConstants
 
HEADER_KEY_TRUNCATED - Static variable in interface org.archive.io.warc.WARCConstants
 
HEADER_KEY_TYPE - Static variable in interface org.archive.io.warc.WARCConstants
 
HEADER_KEY_URI - Static variable in interface org.archive.io.warc.WARCConstants
 
HEADER_LENGTH_KEY - Static variable in interface org.archive.crawler.writer.Kw3Constants
 
HEADER_LINE_ENCODING - Static variable in interface org.archive.io.warc.WARCConstants
 
HEADER_MD5_KEY - Static variable in interface org.archive.crawler.writer.Kw3Constants
 
HEADER_PREDICTS_CHANGED - Static variable in class org.archive.crawler.filter.HTTPMidFetchUnchangedFilter
 
HEADER_PREDICTS_MISSING - Static variable in class org.archive.crawler.deciderules.NotExceedsDocumentLengthTresholdDecideRule
 
HEADER_PREDICTS_MISSING - Static variable in class org.archive.crawler.filter.HTTPMidFetchUnchangedFilter
 
HEADER_PREDICTS_UNCHANGED - Static variable in class org.archive.crawler.filter.HTTPMidFetchUnchangedFilter
 
HEADER_TRUNC - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
HeaderBlock - Class in org.archive.util.ms
 
HeaderBlock(ByteBuffer) - Constructor for class org.archive.util.ms.HeaderBlock
 
headerFields - Variable in class org.archive.io.arc.ARCRecordMetaData
Map of record header fields.
headIndex - Variable in class org.archive.queue.StoredQueue
 
height() - Method in interface org.archive.queue.Stack
Deprecated. Number of items in the Stack.
Heritrix - Class in org.archive.crawler
Main class for Heritrix crawler.
Heritrix() - Constructor for class org.archive.crawler.Heritrix
Constructor.
Heritrix(boolean) - Constructor for class org.archive.crawler.Heritrix
 
Heritrix(String, boolean) - Constructor for class org.archive.crawler.Heritrix
Constructor.
Heritrix(String, boolean, CrawlJobHandler) - Constructor for class org.archive.crawler.Heritrix
Constructor.
HERITRIX_PROPERTIES_PREFIX - Static variable in class org.archive.crawler.Heritrix
Prefix used on our properties we'll add to the System.properties list.
HeritrixHttpMethodRetryHandler - Class in org.archive.crawler.fetcher
Retry handler that tries ten times to establish connection and then once established, if a GET method, tries ten times to get response (If POST, it tries once only).
HeritrixHttpMethodRetryHandler() - Constructor for class org.archive.crawler.fetcher.HeritrixHttpMethodRetryHandler
Constructor.
HeritrixHttpMethodRetryHandler(int) - Constructor for class org.archive.crawler.fetcher.HeritrixHttpMethodRetryHandler
Constructor.
HeritrixProtocolSocketFactory - Class in org.archive.crawler.fetcher
Version of protocol socket factory that tries to get IP from heritrix IP cache -- if its been set into the HttpConnectionParameters.
HeritrixProtocolSocketFactory() - Constructor for class org.archive.crawler.fetcher.HeritrixProtocolSocketFactory
Constructor.
HeritrixSSLProtocolSocketFactory - Class in org.archive.crawler.fetcher
Implementation of the commons-httpclient SSLProtocolSocketFactory so we can return SSLSockets whose trust manager is ConfigurableX509TrustManager.
HeritrixSSLProtocolSocketFactory() - Constructor for class org.archive.crawler.fetcher.HeritrixSSLProtocolSocketFactory
Shutdown constructor.
HIGH - Static variable in class org.archive.crawler.datamodel.CandidateURI
High scheduling priority.
HIGHEST - Static variable in class org.archive.crawler.datamodel.CandidateURI
Highest scheduling priority.
highestEncounteredLevel - Variable in class org.archive.crawler.admin.CrawlJobErrorHandler
 
historyDatabaseConfig() - Static method in class org.archive.crawler.processor.recrawl.PersistProcessor
 
historyDb - Variable in class org.archive.crawler.processor.recrawl.PersistOnlineProcessor
 
Histotable<K> - Class in org.archive.util
Collect and report frequency information.
Histotable() - Constructor for class org.archive.util.Histotable
 
holder - Variable in class org.archive.crawler.datamodel.CrawlURI
 
holderCost - Variable in class org.archive.crawler.datamodel.CrawlURI
spot for an integer cost to be placed by external facility (frontier).
holderKey - Variable in class org.archive.crawler.datamodel.CrawlURI
 
honoringPolicy - Variable in class org.archive.crawler.datamodel.RobotsExclusionPolicy
 
honorRobots - Variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
hookupDatabase(Database, Class, StoredClassCatalog) - Method in class org.archive.queue.StoredQueue
 
HopsFilter - Class in org.archive.crawler.filter
Deprecated. As of release 1.10.0. Replaced by DecidingFilter and equivalent DecideRule.
HopsFilter(String) - Constructor for class org.archive.crawler.filter.HopsFilter
Deprecated.  
HopsPathMatchesRegExpDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any CrawlURIs whose 'hops-path' (string like "LLXE" etc.) matches the supplied regexp.
HopsPathMatchesRegExpDecideRule(String) - Constructor for class org.archive.crawler.deciderules.HopsPathMatchesRegExpDecideRule
Usual constructor.
HOST - Static variable in class org.archive.crawler.deciderules.ScopePlusOneDecideRule
 
HOST - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
HOST - Static variable in class org.archive.util.JmxUtils
 
HOST_DEFERRED - Static variable in interface org.archive.crawler.framework.FrontierHostStatistics
Host has been deferred for some amount of time, will become ready once once that time has elapsed.
HOST_INACTIVE - Static variable in interface org.archive.crawler.framework.FrontierHostStatistics
Host has been encountered and all availible URIs for it have been processed already.
HOST_INPROCESS - Static variable in interface org.archive.crawler.framework.FrontierHostStatistics
Host has URIs currently being proessed.
HOST_READY - Static variable in interface org.archive.crawler.framework.FrontierHostStatistics
Host has URIs ready to be emited.
HOST_UNKNOWN - Static variable in interface org.archive.crawler.framework.FrontierHostStatistics
Host has not been encountered by the Frontier, or has been encountered but has been inactive so long that it has expired.
hostName - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Name of the host that this AdaptiveRevisitHostQueue represents
HOSTNAME_ADMINPORT_VARIABLE - Static variable in class org.archive.io.WriterPoolMember
Value to interpolate with actual hostname-port.
HOSTNAME_VARIABLE - Static variable in class org.archive.io.WriterPoolMember
Value to interpolate with actual hostname.
HostnameQueueAssignmentPolicy - Class in org.archive.crawler.frontier
QueueAssignmentPolicy based on the hostname:port evident in the given CrawlURI.
HostnameQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.HostnameQueueAssignmentPolicy
 
hosts - Variable in class org.archive.crawler.datamodel.ServerCache
hostname -> CrawlHost.
hostsBytes - Variable in class org.archive.crawler.admin.StatisticsSummary
 
hostsBytes - Variable in class org.archive.crawler.admin.StatisticsTracker
 
HostScope - Class in org.archive.crawler.scope
Deprecated. As of release 1.10.0. Replaced by DecidingScope.
HostScope(String) - Constructor for class org.archive.crawler.scope.HostScope
Deprecated.  
hostsDistribution - Variable in class org.archive.crawler.admin.StatisticsSummary
Keep track of hosts
hostsDistribution - Variable in class org.archive.crawler.admin.StatisticsTracker
Keep track of hosts.
hostsDnsBytes - Variable in class org.archive.crawler.admin.StatisticsSummary
 
hostsDnsDistribution - Variable in class org.archive.crawler.admin.StatisticsSummary
 
hostsLastFinished - Variable in class org.archive.crawler.admin.StatisticsTracker
 
hostStatus(String) - Method in interface org.archive.crawler.framework.FrontierHostStatistics
Get the status of a host.
HQSTATE_BUSY - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
HQ has maximum number of CrawlURI currently being processed.
HQSTATE_EMPTY - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
HQ contains no queued CrawlURIs elements.
HQSTATE_READY - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
HQ has a CrawlURI ready for processing
HQSTATE_SNOOZED - Static variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
HQ is in a suspended state until it can be woken back up
HtmlFormCredential - Class in org.archive.crawler.datamodel.credential
Credential that holds all needed to do a GET/POST to a HTML form.
HtmlFormCredential(String) - Constructor for class org.archive.crawler.datamodel.credential.HtmlFormCredential
Constructor.
HTTP - Static variable in class org.archive.net.UURIFactory
 
HTTP_PORT - Static variable in class org.archive.net.UURIFactory
 
HTTP_REQUEST_MIMETYPE - Static variable in interface org.archive.io.warc.WARCConstants
To be safe, lets use application type rather than message.
HTTP_RESPONSE_MIMETYPE - Static variable in interface org.archive.io.warc.WARCConstants
 
HTTP_SCHEME - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
HTTP_SCHEME - Static variable in class org.archive.net.LaxURI
 
HTTP_SCHEME_SLASHES - Static variable in class org.archive.net.UURIFactory
Pattern that looks for case of three or more slashes after the scheme.
HTTPContentDigest - Class in org.archive.crawler.extractor
A processor for calculating custum HTTP content digests in place of the default (if any) computed by the HTTP fetcher processors.
HTTPContentDigest(String) - Constructor for class org.archive.crawler.extractor.HTTPContentDigest
Constructor
HTTPMidFetchUnchangedFilter - Class in org.archive.crawler.filter
A mid fetch filter for HTTP fetcher processors.
HTTPMidFetchUnchangedFilter(String) - Constructor for class org.archive.crawler.filter.HTTPMidFetchUnchangedFilter
Constructor
HTTPMidFetchUnchangedFilter(String, String) - Constructor for class org.archive.crawler.filter.HTTPMidFetchUnchangedFilter
Constructor
HttpRecorder - Class in org.archive.util
Pairs together a RecordingInputStream and RecordingOutputStream to capture exactly a single HTTP transaction.
HttpRecorder() - Constructor for class org.archive.util.HttpRecorder
Constructor with limited access.
HttpRecorder(File, String, int, int) - Constructor for class org.archive.util.HttpRecorder
Create an HttpRecorder.
HttpRecorder(File, String) - Constructor for class org.archive.util.HttpRecorder
Create an HttpRecorder.
HttpRecorderGetMethod - Class in org.archive.httpclient
Override of GetMethod that marks the passed HttpRecorder w/ the transition from HTTP head to body and that forces a close on the http connection.
HttpRecorderGetMethod(String, HttpRecorder) - Constructor for class org.archive.httpclient.HttpRecorderGetMethod
 
HttpRecorderMarker - Interface in org.archive.util
A marker interface to denote a class with a gettable HttpRecorder.
httpRecorderMethod - Variable in class org.archive.httpclient.HttpRecorderGetMethod
Instance of http recorder method.
HttpRecorderMethod - Class in org.archive.httpclient
This class encapsulates the specializations supplied by the overrides HttpRecorderGetMethod and HttpRecorderPostMethod.
HttpRecorderMethod(HttpRecorder) - Constructor for class org.archive.httpclient.HttpRecorderMethod
 
httpRecorderMethod - Variable in class org.archive.httpclient.HttpRecorderPostMethod
Instance of http recorder method.
HttpRecorderPostMethod - Class in org.archive.httpclient
Override of PostMethod that marks the passed HttpRecorder w/ the transition from HTTP head to body and that forces a close on the responseConnection.
HttpRecorderPostMethod(String, HttpRecorder) - Constructor for class org.archive.httpclient.HttpRecorderPostMethod
 
HTTPS - Static variable in class org.archive.net.UURIFactory
 
HTTPS_PORT - Static variable in class org.archive.net.UURIFactory
 
HTTPS_SCHEME - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
HTTPS_SCHEME - Static variable in class org.archive.net.LaxURI
 

I

IdenticalDigestDecideRule - Class in org.archive.crawler.deciderules.recrawl
Rule applies configured decision to any CrawlURIs whose prior-history content-digest matches the latest fetch.
IdenticalDigestDecideRule(String) - Constructor for class org.archive.crawler.deciderules.recrawl.IdenticalDigestDecideRule
Usual constructor.
IFRAME - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
IGNORE - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
ignored - Variable in class org.archive.crawler.scope.SeedFileIterator
 
IGNORED_SCHEME - Static variable in class org.archive.net.UURIFactory
 
IGNORED_SEEDS_FILENAME - Static variable in class org.archive.crawler.frontier.AbstractFrontier
file collecting report of ignored seed-file entries (if any)
ignoreLine - Variable in class org.archive.util.iterator.RegexpLineIterator
 
illegalElementError(String) - Method in class org.archive.crawler.settings.CrawlSettingsSAXHandler
 
IMAGES - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
IMAGES - Static variable in class org.archive.crawler.filter.FilePatternFilter
Deprecated.  
IMAGES_PATTERNS - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
IMAGES_PATTERNS - Static variable in class org.archive.crawler.filter.FilePatternFilter
Deprecated.  
ImageWaitEvaluator - Class in org.archive.crawler.postprocessor
A specialized ContentBasedWaitEvaluator.
ImageWaitEvaluator(String) - Constructor for class org.archive.crawler.postprocessor.ImageWaitEvaluator
Constructor
IMPORT_URI_OPER - Static variable in class org.archive.crawler.admin.CrawlJob
 
IMPORT_URIS_OPER - Static variable in class org.archive.crawler.admin.CrawlJob
 
importFrom(Reader) - Method in class org.archive.util.SurtPrefixSet
Read a set of SURT prefixes from a reader source; keep sorted and with redundant entries removed.
importFromMixed(Reader, boolean) - Method in class org.archive.util.SurtPrefixSet
Import SURT prefixes from a reader with mixed URI and SURT prefix format.
importFromUris(Reader) - Method in class org.archive.util.SurtPrefixSet
 
importRecoverLog(String, boolean) - Method in interface org.archive.crawler.framework.Frontier
Recover earlier state by reading a recovery log.
importRecoverLog(String, boolean) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
importRecoverLog(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Method is not supported by this Frontier implementation..
importRecoverLog(String, boolean) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
This method is not supported by this Frontier implementation
importRecoverLog(File, CrawlController, boolean) - Static method in class org.archive.crawler.frontier.RecoveryJournal
Utility method for scanning a recovery journal and applying it to a Frontier.
importUri(String, boolean, boolean) - Method in class org.archive.crawler.admin.CrawlJob
Schedule a uri.
importUri(String, boolean, boolean, boolean) - Method in class org.archive.crawler.admin.CrawlJob
Schedule a uri.
importUri(String, boolean, boolean) - Method in class org.archive.crawler.admin.CrawlJobHandler
Schedule a uri.
importUri(String, boolean, boolean, boolean) - Method in class org.archive.crawler.admin.CrawlJobHandler
Schedule a uri.
importUris(String, String, String) - Method in class org.archive.crawler.admin.CrawlJob
 
importUris(String, String, boolean) - Method in class org.archive.crawler.admin.CrawlJob
 
importUris(String, String, boolean, boolean) - Method in class org.archive.crawler.admin.CrawlJob
 
importUris(InputStream, String, boolean) - Method in class org.archive.crawler.admin.CrawlJob
 
importUris(InputStream, String, boolean, boolean) - Method in class org.archive.crawler.admin.CrawlJob
Import URIs.
importUris(String, String, String) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
importUris(String, String, boolean) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
importUris(InputStream, String, boolean) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
IMPROPERESC - Static variable in class org.archive.net.UURIFactory
 
IMPROPERESC_REPLACE - Static variable in class org.archive.net.UURIFactory
 
in - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
 
in - Variable in class org.archive.io.ArchiveRecord
Stream to read this record from.
inactiveHosts() - Method in interface org.archive.crawler.framework.FrontierHostStatistics
Total number of inactive hosts.
inactiveQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
All 'inactive' queues, not yet in active rotation.
incrementCacheCount(ObjectIdentityCache<String, AtomicLong>, String) - Static method in class org.archive.crawler.admin.StatisticsTracker
Increment a counter for a key in a given cache.
incrementCacheCount(ObjectIdentityCache<String, AtomicLong>, String, long) - Static method in class org.archive.crawler.admin.StatisticsTracker
Increment a counter for a key in a given cache by an arbitrary amount.
incrementConsecutiveConnectionErrors() - Method in class org.archive.crawler.datamodel.CrawlServer
 
incrementDeferrals() - Method in class org.archive.crawler.datamodel.CrawlURI
Increment the deferral count.
incrementDisregardedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of disregarded URIs.
incrementFailedFetchCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of failed URIs.
incrementFetchAttempts() - Method in class org.archive.crawler.datamodel.CrawlURI
Increment the number of attempts at getting the document referenced by this URI.
incrementHostCounters(CrawlURI) - Method in class org.archive.crawler.frontier.DomainSensitiveFrontier
Deprecated.  
incrementMapCount(Map<String, AtomicLong>, String) - Static method in class org.archive.crawler.admin.StatisticsSummary
Increment a counter for a key in a given HashMap.
incrementMapCount(Map<String, AtomicLong>, String, long) - Static method in class org.archive.crawler.admin.StatisticsSummary
Increment a counter for a key in a given HashMap by an arbitrary amount.
incrementMapCount(ConcurrentMap<String, AtomicLong>, String) - Static method in class org.archive.crawler.admin.StatisticsTracker
Increment a counter for a key in a given HashMap.
incrementMapCount(ConcurrentMap<String, AtomicLong>, String, long) - Static method in class org.archive.crawler.admin.StatisticsTracker
Increment a counter for a key in a given HashMap by an arbitrary amount.
incrementPosition() - Method in class org.archive.io.ArchiveRecord
 
incrementPosition(long) - Method in class org.archive.io.ArchiveRecord
 
incrementQueuedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of queued URIs.
incrementQueuedUriCount(long) - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of queued URIs.
incrementSessionBalance(int) - Method in class org.archive.crawler.frontier.WorkQueue
Increase the internal running budget to be used before deactivating the queue
incrementSucceededFetchCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of successfully fetched URIs.
index - Variable in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
 
INDEX_FORMAT - Static variable in class org.archive.crawler.framework.Checkpointer
 
indexFor(int, int) - Static method in class org.archive.crawler.settings.SoftSettingsHash
Return index for hash code h.
indexOf(Object) - Method in class org.archive.crawler.settings.ListType
 
indexOfCurrentIterator - Variable in class org.archive.util.iterator.CompositeIterator
 
InetAddressUtil - Class in org.archive.util
InetAddress utility.
inheritFrom(CandidateURI) - Method in class org.archive.crawler.datamodel.CandidateURI
Inherit (copy) the relevant keys-values from the ancestor.
init(FilterConfig) - Method in class org.archive.crawler.admin.ui.RootFilter
 
initCause(Throwable) - Method in exception org.archive.io.RecoverableIOException
 
InitializationException - Exception in org.archive.crawler.framework.exceptions
InitializationExceptions should be thrown when there is a problem with the crawl's initialization, such as file creation problems, etc.
InitializationException() - Constructor for exception org.archive.crawler.framework.exceptions.InitializationException
 
InitializationException(String) - Constructor for exception org.archive.crawler.framework.exceptions.InitializationException
 
InitializationException(String, Throwable) - Constructor for exception org.archive.crawler.framework.exceptions.InitializationException
 
InitializationException(Throwable) - Constructor for exception org.archive.crawler.framework.exceptions.InitializationException
 
initialize(CrawlController) - Method in class org.archive.crawler.admin.StatisticsTracker
 
initialize() - Method in class org.archive.crawler.extractor.PDFParser
Initialize opens the document for reading.
initialize(CrawlController) - Method in class org.archive.crawler.framework.AbstractTracker
Sets up the Logger (including logInterval) and registers with the CrawlController for CrawlStatus and CrawlURIDisposition events.
initialize(CrawlController, String) - Method in class org.archive.crawler.framework.Checkpointer
 
initialize(SettingsHandler) - Method in class org.archive.crawler.framework.CrawlController
Starting from nothing, set up CrawlController and associated classes to be ready for a first crawl.
initialize(CrawlController) - Method in class org.archive.crawler.framework.CrawlScope
Initialize is called just before the crawler starts to run.
initialize(CrawlController) - Method in interface org.archive.crawler.framework.Frontier
Initialize the Frontier.
initialize(CrawlController) - Method in interface org.archive.crawler.framework.StatisticsTracking
Do initialization.
initialize(CrawlController) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
initialize(CrawlController) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
initialize(CrawlController) - Method in class org.archive.crawler.frontier.BdbFrontier
 
initialize(CrawlController) - Method in class org.archive.crawler.frontier.DomainSensitiveFrontier
Deprecated.  
initialize(CrawlController) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Initializes the Frontier, given the supplied CrawlController.
initialize(File) - Method in class org.archive.crawler.io.CrawlerJournal
 
initialize(CrawlController) - Method in class org.archive.crawler.scope.SurtPrefixScope
Deprecated.  
initialize(String, CrawlJob, File, File) - Static method in class org.archive.crawler.selftest.SelfTestCase
Static initializer.
initialize() - Method in class org.archive.crawler.settings.SettingsHandler
Initialize the SettingsHandler.
initialize() - Method in class org.archive.crawler.settings.XMLSettingsHandler
Initialize the SettingsHandler.
initialize(File) - Method in class org.archive.crawler.settings.XMLSettingsHandler
Initialize the SettingsHandler from a source.
initialize(int, boolean) - Method in class org.archive.crawler.SimpleHttpServer
Deprecated. Use initialize(Collection, port) instead
initialize(Collection<String>, int) - Method in class org.archive.crawler.SimpleHttpServer
Initialize the server.
initialize(Environment) - Method in class org.archive.crawler.util.BdbUriUniqFilter
Method shared by constructors.
initialize(int, int) - Method in class org.archive.crawler.util.BloomUriUniqFilter
Initializer shared by constructors.
initialize(String) - Method in class org.archive.io.ArchiveReader
Convenience method used by subclass constructors.
initialize(String) - Method in class org.archive.io.warc.WARCReader
 
initialize(Environment, Class<? super V>, StoredClassCatalog) - Method in class org.archive.util.CachedBdbMap
Deprecated. Call this method when you have an instance when you used the default constructor or when you have a deserialized instance that you want to reconnect with an extant bdbje environment.
initialize(Environment, String, Class, StoredClassCatalog) - Method in class org.archive.util.ObjectIdentityBdbCache
Call this method when you have an instance when you used the default constructor or when you have a deserialized instance that you want to reconnect with an extant bdbje environment.
initialized - Variable in class org.archive.crawler.deciderules.BeanShellDecideRule
 
initializeInstance() - Method in class org.archive.util.CachedBdbMap
Deprecated. Do any instance setup.
initialTasks() - Method in class org.archive.crawler.extractor.TrapSuppressExtractor
 
initialTasks() - Method in class org.archive.crawler.fetcher.FetchHTTP
 
initialTasks() - Method in class org.archive.crawler.framework.Processor
Classes subclassing this one should override this method to perform processor specific actions.
initialTasks() - Method in class org.archive.crawler.framework.Scoper
 
initialTasks() - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
initialTasks() - Method in class org.archive.crawler.postprocessor.AcceptRevisitProcessor
 
initialTasks() - Method in class org.archive.crawler.postprocessor.RejectRevisitProcessor
 
initialTasks() - Method in class org.archive.crawler.processor.BeanShellProcessor
 
initialTasks() - Method in class org.archive.crawler.processor.CrawlMapper
 
initialTasks() - Method in class org.archive.crawler.processor.HashCrawlMapper
 
initialTasks() - Method in class org.archive.crawler.processor.LexicalCrawlMapper
 
initialTasks() - Method in class org.archive.crawler.processor.recrawl.FetchHistoryProcessor
 
initialTasks() - Method in class org.archive.crawler.processor.recrawl.PersistLogProcessor
 
initialTasks() - Method in class org.archive.crawler.processor.recrawl.PersistOnlineProcessor
 
initialTasks() - Method in class org.archive.crawler.processor.recrawl.PersistStoreProcessor
 
initialTasks() - Method in class org.archive.crawler.writer.Kw3WriterProcessor
 
initOutputStream(CrawlURI) - Method in class org.archive.crawler.writer.Kw3WriterProcessor
 
initQueue() - Method in class org.archive.crawler.frontier.BdbFrontier
 
initQueue() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
initQueuesOfQueues() - Method in class org.archive.crawler.frontier.BdbFrontier
 
initQueuesOfQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Set up the various queues-of-queues used by the frontier.
initStore() - Method in class org.archive.crawler.processor.recrawl.PersistLoadProcessor
 
initStore() - Method in class org.archive.crawler.processor.recrawl.PersistOnlineProcessor
 
initTransientStats() - Method in class org.archive.util.CachedBdbMap
Deprecated.  
inner - Variable in class org.archive.io.CharSubSequence
 
inner - Variable in class org.archive.util.InterruptibleCharSequence
 
inner - Variable in class org.archive.util.iterator.TransformingIteratorWrapper
 
innerAccepts(Object) - Method in class org.archive.crawler.deciderules.DecidingFilter
 
innerAccepts(Object) - Method in class org.archive.crawler.deciderules.DecidingScope
 
innerAccepts(Object) - Method in class org.archive.crawler.filter.ContentTypeRegExpFilter
Deprecated.  
innerAccepts(Object) - Method in class org.archive.crawler.filter.HopsFilter
Deprecated.  
innerAccepts(Object) - Method in class org.archive.crawler.filter.HTTPMidFetchUnchangedFilter
 
innerAccepts(Object) - Method in class org.archive.crawler.filter.OrFilter
Deprecated.  
innerAccepts(Object) - Method in class org.archive.crawler.filter.PathDepthFilter
Deprecated.  
innerAccepts(Object) - Method in class org.archive.crawler.filter.SurtPrefixFilter
Deprecated.  
innerAccepts(Object) - Method in class org.archive.crawler.filter.TransclusionFilter
Deprecated.  
innerAccepts(Object) - Method in class org.archive.crawler.filter.URIListRegExpFilter
Deprecated.  
innerAccepts(Object) - Method in class org.archive.crawler.filter.URIRegExpFilter
Deprecated.  
innerAccepts(Object) - Method in class org.archive.crawler.framework.Filter
Classes subclassing this one should override this method to perfrom their custom determination of whether or not the object given to it.
innerAccepts(Object) - Method in class org.archive.crawler.scope.ClassicScope
Returns whether the given object (typically a CandidateURI) falls within this scope.
innerCheck(CrawlerSettings, ComplexType, Type, Object) - Method in class org.archive.crawler.settings.Constraint
The method all subclasses should implement to do the actual checking.
innerCheck(CrawlerSettings, ComplexType, Type, Object) - Method in class org.archive.crawler.settings.LegalValueListConstraint
 
innerCheck(CrawlerSettings, ComplexType, Type, Object) - Method in class org.archive.crawler.settings.LegalValueTypeConstraint
 
innerCheck(CrawlerSettings, ComplexType, Type, Object) - Method in class org.archive.crawler.settings.RegularExpressionConstraint
 
innerFinished(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
innerHasNext() - Method in class org.archive.io.ArchiveReader.ArchiveRecordIterator
 
innerNext() - Method in class org.archive.io.ArchiveReader.ArchiveRecordIterator
 
innerPredicate - Variable in class org.archive.util.Inverter
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.extractor.ChangeEvaluator
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.extractor.Extractor
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorHTTP
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.extractor.HTTPContentDigest
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.fetcher.FetchDNS
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.fetcher.FetchFTP
Processes the given URI.
innerProcess(CrawlURI) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.framework.Processor
Classes subclassing this one should override this method to perform their custom actions on the CrawlURI.
innerProcess(CrawlURI) - Method in class org.archive.crawler.framework.WriterPoolProcessor
Writes a CrawlURI and its associated data to store file.
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.AcceptRevisitProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.ContentBasedWaitEvaluator
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.CrawlStateUpdater
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.FrontierScheduler
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.LinksScoper
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Notes a CrawlURI's content size in its running tally.
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.RejectRevisitProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.WaitEvaluator
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.Preselector
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.processor.BeanShellProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.processor.CrawlMapper
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.processor.recrawl.FetchHistoryProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.processor.recrawl.PersistLoadProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.processor.recrawl.PersistLogProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.processor.recrawl.PersistStoreProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.writer.ARCWriterProcessor
Writes a CrawlURI and its associated data to store file.
innerProcess(CrawlURI) - Method in class org.archive.crawler.writer.Kw3WriterProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.writer.MirrorWriterProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.writer.WARCWriterProcessor
Writes a CrawlURI and its associated data to store file.
innerRejectProcess(CrawlURI) - Method in class org.archive.crawler.framework.Processor
 
innerSchedule(CandidateURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
inProcessHosts() - Method in interface org.archive.crawler.framework.FrontierHostStatistics
Total number of hosts with URIs in process.
inProcessing - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Number of URIs belonging to this queue that are being processed at the moment.
inProcessing(String) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Returns true if this HQ has a CrawlURI matching the uri string currently being processed.
inProcessQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
all per-class queues from whom a URI is outstanding
input - Variable in class org.archive.crawler.scope.SeedFileIterator
 
inputWrap(InputStream) - Method in class org.archive.util.HttpRecorder
Wrap the provided stream with the internal RecordingInputStream open() throws an exception if RecordingInputStream is already open.
insertItem(WorkQueueFrontier, CrawlURI, boolean) - Method in class org.archive.crawler.frontier.BdbWorkQueue
 
insertItem(WorkQueueFrontier, CrawlURI, boolean) - Method in class org.archive.crawler.frontier.WorkQueue
Insert the given curi, whether it is already present or not.
installThreadContextSettingsHandler() - Method in class org.archive.crawler.framework.CrawlController
Utility method to install this crawl's SettingsHandler into the 'global' (for this thread) holder, so that any subsequent deserialization operations in this thread can find it.
instance - Variable in class org.archive.util.Supplier
 
instanceMain(String[]) - Method in class org.archive.crawler.util.BenchmarkUriUniqFilters
 
instanceMain(String[]) - Method in class org.archive.util.BenchmarkBlooms
 
InstancePerThread - Interface in org.archive.crawler.datamodel
indicates that a processor should have an instance per ToeThread
instantiateModuleTypeFromClassName(String, String) - Static method in class org.archive.crawler.settings.SettingsHandler
Instatiate a new ModuleType given its name and className.
INTEGER - Static variable in class org.archive.crawler.settings.SettingsHandler
Datatypes supported by the settings framwork
INTEGER_LIST - Static variable in class org.archive.crawler.settings.SettingsHandler
 
IntegerList - Class in org.archive.crawler.settings
List of Integer values
IntegerList(String, String) - Constructor for class org.archive.crawler.settings.IntegerList
Creates a new IntegerList.
IntegerList(String, String, IntegerList) - Constructor for class org.archive.crawler.settings.IntegerList
Creates a new IntegerList and initializes it with the values from another IntegerList.
IntegerList(String, String, Integer[]) - Constructor for class org.archive.crawler.settings.IntegerList
Creates a new IntegerList and initializes it with the values from an array of Integers.
IntegerList(String, String, int[]) - Constructor for class org.archive.crawler.settings.IntegerList
Creates a new IntegerList and initializes it with the values from an int array.
interrupt(String) - Method in class org.archive.crawler.Heritrix
 
INTERRUPT_OPER - Static variable in class org.archive.crawler.Heritrix
 
InterruptibleCharSequence - Class in org.archive.util
CharSequence that noticed thread interrupts -- as might be necessary to recover from a loose regex on unexpected challenging input.
InterruptibleCharSequence(CharSequence) - Constructor for class org.archive.util.InterruptibleCharSequence
 
INVALID_SUFFIX - Static variable in interface org.archive.io.ArchiveFileConstants
Suffix appended to 'broken' files.
invalidateFile(WriterPoolMember) - Method in class org.archive.io.WriterPool
 
InvalidFrontierMarkerException - Exception in org.archive.crawler.framework.exceptions
An exception that is thrown when there is an attempt to use a URIFrontierMarker that has become invalid.
InvalidFrontierMarkerException() - Constructor for exception org.archive.crawler.framework.exceptions.InvalidFrontierMarkerException
 
InvalidJobFileException - Exception in org.archive.crawler.admin
An exception that is thrown when a program encounters a jobfile that is corrupt or otherwise incomplete or invalid.
InvalidJobFileException(String) - Constructor for exception org.archive.crawler.admin.InvalidJobFileException
 
Inverter - Class in org.archive.util
A predicate that inverts another.
Inverter(Predicate) - Constructor for class org.archive.util.Inverter
 
invoke(String, Object[], String[]) - Method in class org.archive.crawler.admin.CrawlJob
 
invoke(String, Object[], String[]) - Method in class org.archive.crawler.Heritrix
 
invoke(String, Object[], String[]) - Method in class org.archive.crawler.settings.ComplexType
 
invoke(String, Object[], String[]) - Method in class org.archive.util.JEApplicationMBean
 
invoke(Environment, String, Object[], String[]) - Method in class org.archive.util.JEMBeanHelper
Invoke an operation for the given environment.
IoUtils - Class in org.archive.crawler.util
Logging utils.
IoUtils() - Constructor for class org.archive.crawler.util.IoUtils
 
IoUtils - Class in org.archive.util
I/O Utility methods.
IoUtils() - Constructor for class org.archive.util.IoUtils
 
IP_ADDRESS - Static variable in class org.archive.crawler.extractor.ExtractorUniversal
Matches any string that begins with http:// or https:// followed by something that looks like an ip address (four numbers, none longer then 3 chars seperated by 3 dots).
IP_ADDRESS_KEY - Static variable in interface org.archive.crawler.writer.Kw3Constants
 
IP_HEADER_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for the ARC Header IP field.
IP_NEVER_EXPIRES - Static variable in class org.archive.crawler.datamodel.CrawlHost
Flag value indicating always-valid IP
IP_NEVER_LOOKED_UP - Static variable in class org.archive.crawler.datamodel.CrawlHost
Flag value indicating an IP has not yet been looked up
IPQueueAssignmentPolicy - Class in org.archive.crawler.frontier
Uses target IP as basis for queue-assignment, unless it is unavailable, in which case it behaves as HostnameQueueAssignmentPolicy.
IPQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.IPQueueAssignmentPolicy
 
IPV4_QUADS - Static variable in class org.archive.util.InetAddressUtil
ipv4 address.
is2XXSuccess() - Method in class org.archive.crawler.datamodel.CrawlURI
 
isActive() - Method in class org.archive.crawler.framework.ToeThread
Is this thread validly processing a URI, not paused, waiting for a URI, or interrupted?
isAlignedOnFirstRecord() - Method in class org.archive.io.arc.ARCReader
 
isARCSuffix(String) - Static method in class org.archive.io.arc.ARCReaderFactory
 
isARCType(String) - Method in class org.archive.io.Warc2Arc
 
isAtBeginning() - Method in class org.archive.crawler.framework.Checkpointer
 
isCheckpointErrors() - Method in class org.archive.crawler.framework.Checkpointer
 
isCheckpointFailed() - Method in class org.archive.crawler.framework.Checkpointer
 
isCheckpointing() - Method in class org.archive.crawler.admin.CrawlJob
 
isCheckpointing() - Method in class org.archive.crawler.framework.Checkpointer
 
isCheckpointing() - Method in class org.archive.crawler.framework.CrawlController
 
isCheckpointRecover(CrawlOrder) - Static method in class org.archive.crawler.framework.CrawlController
 
isCheckpointRecover() - Method in class org.archive.crawler.framework.CrawlController
 
isCommandLine() - Static method in class org.archive.crawler.Heritrix
 
isComplexType() - Method in class org.archive.crawler.settings.ModuleAttributeInfo
Returns true if this attribute refers to a ComplexType.
isCompressed() - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
isCompressed(File) - Method in class org.archive.io.arc.ARCReaderFactory
 
isCompressed(File) - Static method in class org.archive.io.arc.ARCUtils
 
isCompressed() - Method in class org.archive.io.ArchiveReader
 
isCompressed(File) - Method in class org.archive.io.ArchiveReaderFactory
 
isCompressed() - Method in class org.archive.io.WriterPoolMember
 
isCompressed() - Method in interface org.archive.io.WriterPoolSettings
 
isCompressedRepositionableStream(RepositionableStream) - Static method in class org.archive.io.GzippedInputStream
Tests passed stream is GZIP stream by reading in the HEAD.
isCompressedStream(InputStream) - Static method in class org.archive.io.GzippedInputStream
Tests passed stream is gzip stream by reading in the HEAD.
isConnectionStaleCheckingEnabled() - Method in class org.archive.httpclient.ThreadLocalHttpConnectionManager
Deprecated. Use HttpConnectionParams.isStaleCheckingEnabled(), HttpConnectionManager.getParams().
isContentToProcess(CrawlURI) - Method in class org.archive.crawler.framework.Processor
 
isCR(char) - Static method in class org.archive.util.anvl.ANVLRecord
 
isCrawling() - Method in class org.archive.crawler.admin.CrawlJob
 
isCrawling() - Method in class org.archive.crawler.admin.CrawlJobHandler
Is a crawl job being crawled?
ISCRAWLING_ATTR - Static variable in class org.archive.crawler.Heritrix
 
isCROrLF(char) - Static method in class org.archive.util.anvl.ANVLRecord
 
IsCrossTopmostAssignedSurtHopDecideRule - Class in org.archive.crawler.deciderules
Applies its decision if the current URI differs in that portion of its hostname/domain that is assigned/sold by registrars (AKA its 'topmost assigned SURT' or 'public suffix'.)
IsCrossTopmostAssignedSurtHopDecideRule(String) - Constructor for class org.archive.crawler.deciderules.IsCrossTopmostAssignedSurtHopDecideRule
 
isDate(String) - Method in class org.archive.io.arc.ARCReader
 
isDevelopment() - Static method in class org.archive.crawler.Heritrix
 
isDigest() - Method in class org.archive.io.ArchiveReader
 
isDisregarded(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
isDisregarded(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
isEmpty(Object) - Method in class org.archive.crawler.filter.OrFilter
Deprecated.  
isEmpty() - Method in interface org.archive.crawler.framework.Frontier
Returns true if the frontier contains no more URIs to crawl.
isEmpty() - Method in class org.archive.crawler.frontier.AbstractFrontier
Frontier is empty only if all queues are empty and no URIs are in-process
isEmpty() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
isEmpty() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
isEmpty() - Method in class org.archive.crawler.settings.ListType
Returns true if this list contains no elements.
isEmpty(Object) - Method in class org.archive.crawler.settings.MapType
Returns true if this map is empty.
isEmpty() - Method in interface org.archive.queue.Queue
is the queue empty?
isEmpty() - Method in interface org.archive.queue.Stack
Deprecated.  
isEnabled() - Method in class org.archive.crawler.framework.Processor
 
isEnabled(Object) - Method in interface org.archive.crawler.url.CanonicalizationRule
 
isEnabled(Object) - Method in class org.archive.crawler.url.canonicalize.BaseRule
 
isEor() - Method in class org.archive.io.ArchiveRecord
 
isEveryTime() - Method in class org.archive.crawler.datamodel.credential.Credential
 
isEveryTime() - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
isEveryTime() - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
isExpectedMimeType(String, String) - Method in class org.archive.crawler.framework.Processor
 
isExpertSetting() - Method in class org.archive.crawler.settings.ModuleAttributeInfo
Returns true if this Type should only show up in expert mode in UI.
isExpertSetting() - Method in class org.archive.crawler.settings.Type
Returns true if this Type should only show up in expert mode in UI.
isHeaderTruncatedFetch() - Method in class org.archive.crawler.datamodel.CrawlURI
 
isHeld() - Method in class org.archive.crawler.frontier.WorkQueue
Whether the queue is already in a lifecycle stage -- such as ready, in-progress, snoozed -- and thus should not be redundantly inserted to readyClassQueues
isHtmlExpectedHere(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorHTML
Test whether this HTML is so unexpected (eg in place of a GIF URI) that it shouldn't be scanned for links.
isHttpTransaction() - Method in class org.archive.crawler.datamodel.CrawlURI
Return true if this is a http transaction.
isHttpTransactionContentToProcess(CrawlURI) - Method in class org.archive.crawler.extractor.Extractor
 
isHttpTransactionContentToProcess(CrawlURI) - Method in class org.archive.crawler.framework.Processor
 
isIndependentExtractors() - Method in class org.archive.crawler.extractor.Extractor
 
isInitialized() - Method in class org.archive.crawler.settings.ComplexType
Returns true if this ComplexType is initialized.
isInScope(CandidateURI) - Method in class org.archive.crawler.framework.Scoper
Schedule the given CandidateURI with the Frontier.
isInScope(CandidateURI) - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
 
isIpExpired(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
Return true if ip should be looked up.
isLegitimateIPValue(String) - Method in class org.archive.io.arc.ARCReader
 
isLengthTruncatedFetch() - Method in class org.archive.crawler.datamodel.CrawlURI
 
isLF(char) - Static method in class org.archive.util.anvl.ANVLRecord
 
isLikelyUri(CharSequence) - Static method in class org.archive.util.UriUtils
 
isLikelyUriHtmlContextLegacy(CharSequence) - Static method in class org.archive.util.UriUtils
 
isLikelyUriJavascriptContextLegacy(CharSequence) - Static method in class org.archive.util.UriUtils
 
isListLogicOR(Object) - Method in class org.archive.crawler.deciderules.MatchesListRegExpDecideRule
 
isListLogicOR(Object) - Method in class org.archive.crawler.filter.URIListRegExpFilter
Deprecated.  
isLocation() - Method in class org.archive.crawler.datamodel.CandidateURI
 
isMarked() - Method in class org.archive.io.RepositionableInputStream
 
isNew() - Method in class org.archive.crawler.admin.CrawlJob
Is this a new job?
isNumber(String) - Method in class org.archive.io.arc.ARCReader
 
isOpen() - Method in class org.archive.io.RecordingInputStream
 
isOpen() - Method in class org.archive.io.RecordingOutputStream
 
isOpenType(Class) - Static method in class org.archive.util.JmxUtils
 
isOpenType(String) - Static method in class org.archive.util.JmxUtils
 
isOverBudget() - Method in class org.archive.crawler.frontier.WorkQueue
Check whether queue has temporarily or permanently exceeded its budget.
isOverridden(CrawlerSettings, String) - Method in class org.archive.crawler.settings.ComplexType
Returns true if an element is overridden for this settings object.
isOverrideable() - Method in class org.archive.crawler.settings.ModuleAttributeInfo
Returns true if this attribute could be overridden in per settings.
isOverrideable() - Method in class org.archive.crawler.settings.Type
Is this an 'overrideable' setting.
isOverrideLogger(Object) - Method in class org.archive.crawler.framework.Scoper
 
isParseHttpHeaders() - Method in class org.archive.io.arc.ARCReader
 
isPaused() - Method in class org.archive.crawler.framework.CrawlController
Tell if the controller is paused
isPausing() - Method in class org.archive.crawler.framework.CrawlController
 
isPost() - Method in class org.archive.crawler.datamodel.CrawlURI
Returns true if this URI should be fetched by sending a HTTP POST request.
isPost(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
 
isPost(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
isPost(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
isPrerequisite() - Method in class org.archive.crawler.datamodel.CrawlURI
Returns true if this CrawlURI is a prerequisite.
isPrerequisite(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
 
isPrerequisite(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
isPrerequisite(CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
isProfile() - Method in class org.archive.crawler.admin.CrawlJob
Set if the job is considered to be a profile
isQuadAddress(CrawlURI, String, CrawlHost) - Method in class org.archive.crawler.fetcher.FetchDNS
 
isRead() - Method in class org.archive.io.SinkHandlerLogRecord
 
isReadable(File) - Static method in class org.archive.util.FileUtils
Test file exists and is readable.
isReadableWithExtensionAndMagic(File, String, String) - Static method in class org.archive.util.FileUtils
 
isReadOnly() - Method in class org.archive.crawler.admin.CrawlJob
Is job read only?
isRefinement() - Method in class org.archive.crawler.settings.CrawlerSettings
Returns true if this settings object is a refinement.
isRetired() - Method in class org.archive.crawler.frontier.WorkQueue
 
isRobotsExpired(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
Is the robots policy expired.
isRunning() - Method in class org.archive.crawler.admin.CrawlJob
Returns true if the job is being crawled.
isRunning() - Method in class org.archive.crawler.admin.CrawlJobHandler
Is the crawler accepting crawl jobs to run?
isRunning() - Method in class org.archive.crawler.framework.CrawlController
 
ISRUNNING_ATTR - Static variable in class org.archive.crawler.Heritrix
 
isSameHost(UURI, UURI) - Method in class org.archive.crawler.framework.CrawlScope
 
isSeed() - Method in class org.archive.crawler.datamodel.CandidateURI
 
isSeed(Object) - Method in class org.archive.crawler.framework.CrawlScope
Check if a URI is in the seeds.
isSingleInstance() - Static method in class org.archive.crawler.Heritrix
 
isStarted() - Method in class org.archive.crawler.Heritrix
 
isStats() - Method in class org.archive.crawler.admin.StatisticsSummary
 
isStrict() - Method in class org.archive.io.ArchiveReader
 
isStrict() - Method in class org.archive.io.ArchiveRecord
 
isSuccess() - Method in class org.archive.crawler.datamodel.CrawlURI
Ask this URI if it was a success or not.
isTimeTruncatedFetch() - Method in class org.archive.crawler.datamodel.CrawlURI
 
isTld(String) - Static method in class org.archive.util.ArchiveUtils
Return whether the given string represents a known top-level-domain (like "com", "org", etc.) per IANA as of 20100419
isTransient() - Method in class org.archive.crawler.settings.ModuleAttributeInfo
Returns true if this attribute should be hidden from UI and not be serialized to persistent storage.
isTransient() - Method in class org.archive.crawler.settings.Type
Returns true if this ComplexType should be saved to persistent storage.
isTruncatedFetch() - Method in class org.archive.crawler.datamodel.CrawlURI
TODO: Implement truncation using booleans rather than as this ugly String parse.
isType(Object, int) - Method in class org.archive.crawler.datamodel.RobotsHonoringPolicy
Check if policy is of a certain type.
isUnicode() - Method in class org.archive.util.ms.Piece
 
isValid() - Method in class org.archive.crawler.datamodel.Checkpoint
 
isValid() - Method in class org.archive.io.ArchiveReader
Test Archive file is valid.
isValidLoginPasswordString(String) - Static method in class org.archive.crawler.Heritrix
Test string is valid login/password string.
isValidRobots() - Method in class org.archive.crawler.datamodel.CrawlServer
If true then valid robots.txt information has been retrieved.
isValue() - Method in class org.archive.util.anvl.Element
 
isWARCSuffix(String) - Static method in class org.archive.io.warc.WARCReaderFactory
 
isWithinRefinementBounds(UURI) - Method in interface org.archive.crawler.settings.refinements.Criteria
Check if a uri is within the bounds of this criteria.
isWithinRefinementBounds(UURI) - Method in class org.archive.crawler.settings.refinements.PortnumberCriteria
 
isWithinRefinementBounds(UURI) - Method in class org.archive.crawler.settings.refinements.Refinement
Check if a URI is within the bounds of every criteria set for this refinement.
isWithinRefinementBounds(UURI) - Method in class org.archive.crawler.settings.refinements.RegularExpressionCriteria
 
isWithinRefinementBounds(UURI) - Method in class org.archive.crawler.settings.refinements.TimespanCriteria
 
iterator(Object) - Method in class org.archive.crawler.datamodel.CredentialStore
 
iterator(Object) - Method in class org.archive.crawler.filter.OrFilter
Deprecated.  
iterator() - Method in class org.archive.crawler.framework.ProcessorChain
Get an iterator over the processors in this chain.
iterator() - Method in class org.archive.crawler.framework.ProcessorChainList
Get an iterator over the processor chains.
iterator(Object) - Method in class org.archive.crawler.settings.ComplexType
Get an Iterator over all the attributes in this ComplexType.
iterator() - Method in class org.archive.crawler.settings.ListType
Returns an iterator over the elements in this list in proper sequence.
iterator() - Method in class org.archive.crawler.settings.SoftSettingsHash
 
iterator() - Method in class org.archive.crawler.util.Transform
 
iterator() - Method in class org.archive.io.arc.ARCReaderFactory.CompressedARCReader
 
iterator() - Method in class org.archive.io.ArchiveReader
Returns an ArchiveRecord iterator.
iterator() - Method in class org.archive.io.GzippedInputStream
Returns a GZIP Member Iterator.
iterator() - Method in class org.archive.io.warc.WARCReaderFactory.CompressedWARCReader
 
iterator() - Method in class org.archive.queue.StoredQueue
 
iterators - Variable in class org.archive.util.iterator.CompositeIterator
 

J

JAR_SUFFIX - Static variable in class org.archive.crawler.Heritrix
 
JavaLiterals - Class in org.archive.util
Utility functions to escape or unescape Java literal strings.
JavaLiterals() - Constructor for class org.archive.util.JavaLiterals
 
JAVASCRIPT - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
JAVASCRIPT - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
JAVASCRIPT_STRING_EXTRACTOR - Static variable in class org.archive.crawler.extractor.ExtractorJS
 
JAVASCRIPT_STRING_EXTRACTOR - Static variable in class org.archive.extractor.RegexpJSLinkExtractor
 
JEApplicationMBean - Class in org.archive.util
JEApplicationMBean is an example of how a JE application can incorporate JE monitoring into its existing MBean.
JEApplicationMBean(Environment) - Constructor for class org.archive.util.JEApplicationMBean
Instantiate a JEApplicationMBean
JEMBeanHelper - Class in org.archive.util
JEMBeanHelper is a utility class for the MBean implementation which wants to add management of a JE environment to its capabilities.
JEMBeanHelper(EnvironmentConfig, File, boolean) - Constructor for class org.archive.util.JEMBeanHelper
Instantiate a helper, specifying environment home and open capabilities.
JerichoExtractorHTML - Class in org.archive.crawler.extractor
Improved link-extraction from an HTML content-body using jericho-html parser.
JerichoExtractorHTML(String) - Constructor for class org.archive.crawler.extractor.JerichoExtractorHTML
 
JerichoExtractorHTML(String, String) - Constructor for class org.archive.crawler.extractor.JerichoExtractorHTML
 
JMX_PORT - Static variable in class org.archive.util.JmxUtils
 
JmxUtils - Class in org.archive.util
Static utility used by JMX.
JmxUtils() - Constructor for class org.archive.util.JmxUtils
 
JndiUtils - Class in org.archive.util
JNDI utilities.
JndiUtils() - Constructor for class org.archive.util.JndiUtils
 
JOB - Static variable in class org.archive.util.JmxUtils
 
JOB_KEYS - Static variable in class org.archive.crawler.Heritrix
 
JobConfigureUtils - Class in org.archive.crawler.admin.ui
Utility methods used configuring jobs in the admin UI.
JobConfigureUtils() - Constructor for class org.archive.crawler.admin.ui.JobConfigureUtils
 
JS_MISC - Static variable in class org.archive.crawler.extractor.Link
stand-in value for js-discovered urls without other context
JSSTRING - Static variable in class org.archive.crawler.extractor.CrawlUriSWFAction
 
JSSTRING - Static variable in class org.archive.crawler.extractor.ExtractorSWF.ExtractorSWFActions
 

K

KB_RATE_ATTR - Static variable in class org.archive.crawler.admin.CrawlJob
 
KEY - Static variable in class org.archive.util.JmxUtils
 
keys() - Method in class org.archive.crawler.datamodel.CandidateURI
 
keys - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
keySet() - Method in class org.archive.util.CachedBdbMap
Deprecated. The keySet of the diskMap is all relevant keys.
keySet() - Method in class org.archive.util.ObjectIdentityBdbCache
 
keySet() - Method in interface org.archive.util.ObjectIdentityCache
set of all keys
keySet() - Method in class org.archive.util.ObjectIdentityMemCache
 
kickUpdate() - Method in class org.archive.crawler.admin.CrawlJob
Forward a 'kick' update to current controller if any.
kickUpdate() - Method in class org.archive.crawler.admin.CrawlJobHandler
Forward a 'kick' update to current job if any.
kickUpdate() - Method in class org.archive.crawler.deciderules.BeanShellDecideRule
Setup (or reset) Intepreter variables, as appropraite based on thread-isolation setting.
kickUpdate() - Method in class org.archive.crawler.deciderules.DecideRule
Respond to a settings update, refreshing any internal settings-derived state.
kickUpdate() - Method in class org.archive.crawler.deciderules.DecideRuleSequence
 
kickUpdate() - Method in class org.archive.crawler.deciderules.DecidingFilter
Note that configuration updates may be necessary.
kickUpdate() - Method in class org.archive.crawler.deciderules.DecidingScope
Note that configuration updates may be necessary.
kickUpdate() - Method in class org.archive.crawler.deciderules.PathologicalPathDecideRule
Repetitions may have changed; refresh constructedRegexp
kickUpdate() - Method in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
Re-read prefixes after an update.
kickUpdate() - Method in class org.archive.crawler.filter.OrFilter
Deprecated. Note that configuration updates may be necessary.
kickUpdate() - Method in class org.archive.crawler.filter.SurtPrefixFilter
Deprecated. Re-read prefixes after a settings update.
kickUpdate() - Method in class org.archive.crawler.framework.CrawlController
While many settings will update automatically when the SettingsHandler is modified, some settings need to be explicitly changed to reflect new settings.
kickUpdate() - Method in class org.archive.crawler.framework.CrawlScope
Take note of a situation (such as settings edit) where involved reconfiguration (such as reading from external files) may be necessary.
kickUpdate() - Method in class org.archive.crawler.framework.Filter
 
kickUpdate() - Method in interface org.archive.crawler.framework.Frontier
Notify Frontier that it should consider updating configuration info that may have changed in external files.
kickUpdate() - Method in class org.archive.crawler.framework.Processor
 
kickUpdate() - Method in class org.archive.crawler.framework.ProcessorChain
 
kickUpdate() - Method in class org.archive.crawler.framework.ProcessorChainList
 
kickUpdate() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
kickUpdate() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
kickUpdate() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Accomodate any changes in settings.
kickUpdate() - Method in class org.archive.crawler.processor.BeanShellProcessor
Setup (or reset) Intepreter variables, as appropraite based on thread-isolation setting.
kickUpdate() - Method in class org.archive.crawler.processor.HashCrawlMapper
 
kickUpdate() - Method in class org.archive.crawler.scope.ClassicScope
Take note of a situation (such as settings edit) where involved reconfiguration (such as reading from external files) may be necessary.
kickUpdate() - Method in class org.archive.crawler.scope.SurtPrefixScope
Deprecated. Re-read prefixes after an update.
kill() - Method in class org.archive.crawler.framework.ToeThread
Terminates a thread.
killThread(int, boolean) - Method in class org.archive.crawler.admin.CrawlJob
Kills a thread.
killThread(int, boolean) - Method in class org.archive.crawler.framework.CrawlController
Kills a thread.
killThread(int, boolean) - Method in class org.archive.crawler.framework.ToePool
Kills specified thread.
Kw3Constants - Interface in org.archive.crawler.writer
 
Kw3WriterProcessor - Class in org.archive.crawler.writer
Processor module that writes the results of successful fetches to files on disk.
Kw3WriterProcessor(String) - Constructor for class org.archive.crawler.writer.Kw3WriterProcessor
 

L

Label - Class in org.archive.util.anvl
 
Label(String) - Constructor for class org.archive.util.anvl.Label
 
lastCacheMiss - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
lastCacheMissDiff - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
lastIndexOf(Object) - Method in class org.archive.crawler.settings.ListType
 
lastLogPointTime - Variable in class org.archive.crawler.framework.AbstractTracker
Timestamp of when this logger last wrote something to the log
lastMaxBandwidthKB - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
lastPagesFetchedCount - Variable in class org.archive.crawler.admin.StatisticsTracker
 
lastProcessedBytesCount - Variable in class org.archive.crawler.admin.StatisticsTracker
 
lastReturned - Variable in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
 
Latin1ByteReplayCharSequence - Class in org.archive.io
Provides a (Replay)CharSequence view on recorded stream bytes (a prefix buffer and overflow backing file).
Latin1ByteReplayCharSequence(byte[], long, long, String) - Constructor for class org.archive.io.Latin1ByteReplayCharSequence
Constructor.
launch() - Method in class org.archive.crawler.Heritrix
Launch the crawler for a web UI.
launch(String, boolean) - Method in class org.archive.crawler.Heritrix
Launch the crawler for a web UI.
lax(BitSet) - Method in class org.archive.net.LaxURI
Given a BitSet -- typically one of the URI superclass's predefined static variables -- possibly replace it with a more-lax version to better match the character sets actually left unencoded in web browser requests
lax_abs_path - Static variable in class org.archive.net.LaxURI
 
lax_query - Static variable in class org.archive.net.LaxURI
 
lax_rel_segment - Static variable in class org.archive.net.LaxURI
 
LaxURI - Class in org.archive.net
URI subclass which allows partial/inconsistent encoding, matching the URIs which will be relayed in requests from popular web browsers (esp.
LaxURI(String, boolean, String) - Constructor for class org.archive.net.LaxURI
 
LaxURI(URI, URI) - Constructor for class org.archive.net.LaxURI
 
LaxURI(String, boolean) - Constructor for class org.archive.net.LaxURI
 
LaxURI() - Constructor for class org.archive.net.LaxURI
 
LaxURLCodec - Class in org.archive.net
 
LaxURLCodec(String) - Constructor for class org.archive.net.LaxURLCodec
 
LCURBRACKET - Static variable in class org.archive.net.UURIFactory
 
LCURBRACKET_PATTERN - Static variable in class org.archive.net.UURIFactory
 
LEGAL_LIST_LOGIC - Static variable in class org.archive.crawler.deciderules.MatchesListRegExpDecideRule
 
LEGAL_LIST_LOGIC - Static variable in class org.archive.crawler.filter.URIListRegExpFilter
Deprecated.  
LegalValueListConstraint - Class in org.archive.crawler.settings
A constraint that checks that an attribute value matches one of the items in the list of legal values.
LegalValueListConstraint(Level, String) - Constructor for class org.archive.crawler.settings.LegalValueListConstraint
Constructs a new LegalValueListConstraint.
LegalValueListConstraint(String) - Constructor for class org.archive.crawler.settings.LegalValueListConstraint
Constructs a new LegalValueListConstraint using default severity level (Level.WARNING).
LegalValueListConstraint(Level) - Constructor for class org.archive.crawler.settings.LegalValueListConstraint
Constructs a new LegalValueListConstraint using default error message.
LegalValueListConstraint() - Constructor for class org.archive.crawler.settings.LegalValueListConstraint
Constructs a new LegalValueListConstraint using default severity level (Level.WARNING) and default error message.
LegalValueTypeConstraint - Class in org.archive.crawler.settings
A constraint that checks that an attribute value is of the right type
LegalValueTypeConstraint(Level, String) - Constructor for class org.archive.crawler.settings.LegalValueTypeConstraint
Constructs a new LegalValueListConstraint.
LegalValueTypeConstraint(String) - Constructor for class org.archive.crawler.settings.LegalValueTypeConstraint
Constructs a new LegalValueListConstraint using default severity level (Level.WARNING).
LegalValueTypeConstraint(Level) - Constructor for class org.archive.crawler.settings.LegalValueTypeConstraint
Constructs a new LegalValueListConstraint using default error message.
LegalValueTypeConstraint() - Constructor for class org.archive.crawler.settings.LegalValueTypeConstraint
Constructs a new LegalValueListConstraint using default severity level (Level.WARNING) and default error message.
length() - Method in class org.archive.crawler.settings.TextField
 
length() - Method in class org.archive.crawler.writer.MirrorWriterProcessor.LumpyString
Gets the length of this string.
length() - Method in class org.archive.io.CharSubSequence
 
length() - Method in class org.archive.io.GenericReplayCharSequence
 
length - Variable in class org.archive.io.GzipHeader
Total length of the gzip header.
length - Variable in class org.archive.io.Latin1ByteReplayCharSequence
Total length of character stream to replay minus the HTTP headers if present.
length() - Method in class org.archive.io.Latin1ByteReplayCharSequence
 
length() - Method in class org.archive.io.SeekReaderCharSequence
 
length() - Method in class org.archive.net.UURI
 
length() - Method in class org.archive.queue.MemQueue
 
length() - Method in interface org.archive.queue.Queue
get the number of elements in the queue
length() - Method in class org.archive.util.InterruptibleCharSequence
 
LENGTH_FIELD_KEY - Static variable in interface org.archive.io.ArchiveFileConstants
Key for the Archive File length field.
LENGTH_TRUNC - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
level - Variable in class org.archive.crawler.admin.CrawlJobErrorHandler
 
LEVELS_AS_ARRAY - Static variable in class org.archive.httpclient.ConfigurableX509TrustManager
All the levels of trust as an array from babe-in-the-wood to strict.
LexicalCrawlMapper - Class in org.archive.crawler.processor
A simple crawl splitter/mapper, dividing up CandidateURIs/CrawlURIs between crawlers by diverting some range of URIs to local log files (which can then be imported to other crawlers).
LexicalCrawlMapper(String) - Constructor for class org.archive.crawler.processor.LexicalCrawlMapper
Constructor.
LIKELY_URI_PATH - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
LIKELY_URI_PATH - Static variable in class org.archive.util.UriUtils
 
LINE_SEPARATOR - Static variable in interface org.archive.io.arc.ARCConstants
ARC file line seperator character.
linePos - Variable in class org.archive.util.PaddingStringBuffer
 
LineReadingIterator - Class in org.archive.util.iterator
Utility class providing an Iterator interface over line-oriented text input, as a thin wrapper over a BufferedReader.
LineReadingIterator(BufferedReader) - Constructor for class org.archive.util.iterator.LineReadingIterator
 
lines - Variable in class org.archive.crawler.io.CrawlerJournal
line count
LINK - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
Link - Class in org.archive.crawler.extractor
Link represents one discovered "edge" of the web graph: the source URI, the destination URI, and the type of reference (represented by the context in which it was found).
Link(CharSequence, CharSequence, CharSequence, char) - Constructor for class org.archive.crawler.extractor.Link
Create a Link with the given fields.
LINK - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
LinkExtractor - Interface in org.archive.extractor
LinkExtractor is a general interface for classes which, when given an InputStream and Charset, can scan for Links and return them via an Iterator interface.
linkExtractorFinished() - Method in class org.archive.crawler.datamodel.CrawlURI
Note that link extraction has been performed on this CrawlURI.
LinksScoper - Class in org.archive.crawler.postprocessor
Determine which extracted links are within scope.
LinksScoper(String) - Constructor for class org.archive.crawler.postprocessor.LinksScoper
 
list() - Method in class org.archive.util.ms.DefaultEntry
 
list(List<Entry>, Entry) - Static method in class org.archive.util.ms.DefaultEntry
 
list() - Method in interface org.archive.util.ms.Entry
 
listIterator() - Method in class org.archive.crawler.settings.ListType
 
listIterator(int) - Method in class org.archive.crawler.settings.ListType
 
ListType<T> - Class in org.archive.crawler.settings
Super type for all lists.
ListType(String, String) - Constructor for class org.archive.crawler.settings.ListType
Constructs a new ListType.
listUsedFiles(List<String>) - Method in class org.archive.crawler.fetcher.FetchHTTP
 
listUsedFiles(List<String>) - Method in class org.archive.crawler.framework.CrawlScope
 
listUsedFiles(List<String>) - Method in class org.archive.crawler.settings.ModuleType
Those Modules that use files on disk should list them all when this method is called.
littleChar(InputStream) - Static method in class org.archive.io.Endian
Reads the next little-endian unsigned 16 bit integer from the given stream.
littleInt(InputStream) - Static method in class org.archive.io.Endian
Reads the next little-endian signed 32-bit integer from the given stream.
littleShort(InputStream) - Static method in class org.archive.io.Endian
Reads the next little-endian signed 16-bit integer from the given stream.
liveDisregardedUriCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
URIs that are disregarded (for example because of robot.txt rules
liveFailedFetchCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
liveQueuedUriCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
total URIs queued to be visited
liveSucceededFetchCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
load(String) - Method in class org.archive.crawler.util.RecoveryLogMapper
 
load(InputStream) - Static method in class org.archive.util.anvl.ANVLRecord
Parses a single ANVLRecord from passed InputStream.
load(String) - Static method in class org.archive.util.anvl.ANVLRecord
Parse passed String for an ANVL Record.
loadCheckpointSerialNumber() - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
loadCookies(String) - Method in class org.archive.crawler.fetcher.FetchHTTP
Load cookies from a file before the first fetch.
loadCookies() - Method in class org.archive.crawler.fetcher.FetchHTTP
Load cookies from the file specified in the order file.
loadFactor - Variable in class org.archive.util.AbstractLongFPSet
The load factor, as a fraction.
loadJob(File) - Method in class org.archive.crawler.admin.CrawlJobHandler
Loads a job given a specific job file.
loadMap() - Method in class org.archive.crawler.processor.LexicalCrawlMapper
Retrieve and parse the mapping specification from a local path or HTTP URL.
loadOptions(String) - Static method in class org.archive.crawler.admin.CrawlJobHandler
Loads options from a file.
loadProfile(File) - Method in class org.archive.crawler.admin.CrawlJobHandler
Load one profile.
loadProperties() - Static method in class org.archive.crawler.Heritrix
Load the heritrix.properties file.
loadSeeds() - Method in interface org.archive.crawler.framework.Frontier
Request that the Frontier load (or reload) crawl seeds, typically by contacting the Scope.
loadSeeds() - Method in class org.archive.crawler.frontier.AbstractFrontier
Load up the seeds.
loadSeeds() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Loads the seeds
LocalErrorFormatter - Class in org.archive.crawler.io
 
LocalErrorFormatter() - Constructor for class org.archive.crawler.io.LocalErrorFormatter
 
localErrors - Variable in class org.archive.crawler.framework.CrawlController
This logger is for job-scoped logging, specifically errors which happen and are handled within a particular processor.
LocalizedError - Class in org.archive.crawler.datamodel
 
LocalizedError(String, Throwable, String) - Constructor for class org.archive.crawler.datamodel.LocalizedError
 
localName - Variable in class org.archive.crawler.processor.CrawlMapper
name of the enclosing crawler (URIs mapped here stay put)
LOCATION_HEADER_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for the ARC Header Location field.
log(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Log to the main crawl.log
log - Variable in class org.archive.crawler.processor.recrawl.PersistLogProcessor
 
LOG_ERROR - Static variable in class org.archive.crawler.io.CrawlerJournal
prefix for error lines
LOG_OPER - Static variable in class org.archive.crawler.Heritrix
 
LOG_TIMESTAMP - Static variable in class org.archive.crawler.io.CrawlerJournal
prefix for timestamp lines
logGeneration - Variable in class org.archive.crawler.processor.CrawlMapper
Truncated timestamp prefix for diversion logs; when current time doesn't match, it's time to close all current logs.
logger - Static variable in class org.archive.crawler.extractor.AggressiveExtractorHTML
 
logger - Variable in class org.archive.crawler.postprocessor.WaitEvaluator
 
logger - Variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
logger - Static variable in class org.archive.crawler.url.canonicalize.RegexRule
 
logger - Static variable in class org.archive.httpclient.ConfigurableX509TrustManager
Logging instance.
logger - Static variable in class org.archive.httpclient.HttpRecorderGetMethod
 
logger - Static variable in class org.archive.httpclient.HttpRecorderMethod
 
logger - Variable in class org.archive.io.arc.ARCReader
 
logger - Static variable in class org.archive.io.GenericReplayCharSequence
 
logger - Static variable in class org.archive.io.Latin1ByteReplayCharSequence
 
logger - Static variable in class org.archive.io.RecordingInputStream
 
logger - Static variable in class org.archive.io.RecordingOutputStream
 
logger - Variable in class org.archive.io.WriterPool
 
logger - Static variable in class org.archive.util.DevUtils
 
logger - Static variable in class org.archive.util.HttpRecorder
 
logger - Static variable in class org.archive.util.IoUtils
 
LOGGER - Static variable in class org.archive.util.ms.PieceTable
 
logger - Static variable in class org.archive.util.XmlUtils
 
logLocalizedErrors(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Take note of any processor-local errors that have been entered into the CrawlURI.
LOGNAME_CRAWL - Static variable in class org.archive.crawler.framework.CrawlController
 
LOGNAME_LOCAL_ERRORS - Static variable in class org.archive.crawler.framework.CrawlController
 
LOGNAME_PROGRESS_STATISTICS - Static variable in class org.archive.crawler.framework.CrawlController
 
LOGNAME_RECOVER - Static variable in interface org.archive.crawler.frontier.FrontierJournal
 
LOGNAME_RUNTIME_ERRORS - Static variable in class org.archive.crawler.framework.CrawlController
 
LOGNAME_URI_ERRORS - Static variable in class org.archive.crawler.framework.CrawlController
 
logNote(String) - Method in class org.archive.crawler.framework.AbstractTracker
 
logProgressStatistics(String) - Method in class org.archive.crawler.framework.CrawlController
Log to the progress statistics log.
LogReader - Class in org.archive.crawler.util
This class contains a variety of methods for reading log files (or other text files containing repeated lines with similar information).
LogReader() - Constructor for class org.archive.crawler.util.LogReader
 
logStdErr(Level, String) - Method in class org.archive.io.ArchiveReader
Log on stderr.
logUriError(URIException, UURI, CharSequence) - Method in class org.archive.crawler.framework.CrawlController
Log a URIException from deep inside other components to the crawl's shared log.
LogUtils - Class in org.archive.crawler.util
Logging utils.
LogUtils() - Constructor for class org.archive.crawler.util.LogUtils
 
LONG - Static variable in class org.archive.crawler.settings.SettingsHandler
 
LONG_LIST - Static variable in class org.archive.crawler.settings.SettingsHandler
 
longerThan(int) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.URIToFileReturn
Tests if this path is longer than a given value.
longestActiveQueue - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
longestPrefixLength(ConcurrentSkipListSet<String>, String) - Method in class org.archive.crawler.datamodel.RobotsDirectives
 
LongFPSet - Interface in org.archive.util.fingerprint
Set for holding primitive long fingerprints.
LongFPSetCache - Class in org.archive.util.fingerprint
Like a MemLongFPSet, but with fixed capacity and maximum size.
LongFPSetCache() - Constructor for class org.archive.util.fingerprint.LongFPSetCache
 
LongFPSetCache(int, float) - Constructor for class org.archive.util.fingerprint.LongFPSetCache
 
LongFPSetTestCase - Class in org.archive.util.fingerprint
JUnit test suite for LongFPSet.
LongFPSetTestCase(String) - Constructor for class org.archive.util.fingerprint.LongFPSetTestCase
Create a new LongFPSetTest object
longIntoByteArray(long, byte[], int) - Static method in class org.archive.util.ArchiveUtils
Copy the raw bytes of a long into a byte array, starting at the specified offset.
LongList - Class in org.archive.crawler.settings
List of Long values
LongList(String, String) - Constructor for class org.archive.crawler.settings.LongList
Creates a new LongList.
LongList(String, String, LongList) - Constructor for class org.archive.crawler.settings.LongList
Creates a new LongList and initializes it with the values from another LongList.
LongList(String, String, Long[]) - Constructor for class org.archive.crawler.settings.LongList
Creates a new LongList and initializes it with the values from an array of Long.
LongList(String, String, long[]) - Constructor for class org.archive.crawler.settings.LongList
Creates a new LongList and initializes it with the values from an array of long.
lookahead() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
Check if there's a next by trying to read it.
lookahead() - Method in class org.archive.util.iterator.LineReadingIterator
Loads next line into lookahead spot
lookahead() - Method in class org.archive.util.iterator.LookaheadIterator
Caches the next item if available.
lookahead() - Method in class org.archive.util.iterator.TransformingIteratorWrapper
 
LookaheadIterator<T> - Class in org.archive.util.iterator
Superclass for Iterators which must probe ahead to know if a 'next' exists, and thus have a cached next between a call to hasNext() and next().
LookaheadIterator() - Constructor for class org.archive.util.iterator.LookaheadIterator
 
lookup(Object) - Method in interface org.archive.crawler.deciderules.ExternalGeoLookupInterface
 
lookupTable(String[]) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorSWFActions
 
LOOSE - Static variable in class org.archive.httpclient.ConfigurableX509TrustManager
Trust any valid cert including self-signed certificates.
LowDiskPauseProcessor - Class in org.archive.crawler.postprocessor
Processor module which uses 'df -k', where available and with the expected output format (on Linux), to monitor available disk space and pause the crawl if free space on monitored filesystems falls below certain thresholds.
LowDiskPauseProcessor(String) - Constructor for class org.archive.crawler.postprocessor.LowDiskPauseProcessor
 
LowercaseRule - Class in org.archive.crawler.url.canonicalize
Lowercases the URL.
LowercaseRule(String) - Constructor for class org.archive.crawler.url.canonicalize.LowercaseRule
 
LRU<K,V> - Class in org.archive.util
A least-recently used cache.
LRU(int) - Constructor for class org.archive.util.LRU
Constructor.
LSQRBRACKET - Static variable in class org.archive.net.UURIFactory
 
LSQRBRACKET_PATTERN - Static variable in class org.archive.net.UURIFactory
 

M

m - Variable in class org.archive.util.BloomFilter64bit
The number of bits in this filter.
main(String[]) - Static method in class org.archive.crawler.extractor.ExtractorTool
 
main(String[]) - Static method in class org.archive.crawler.extractor.PDFParser
 
main(String[]) - Static method in class org.archive.crawler.Heritrix
Launch program.
main(String[]) - Static method in class org.archive.crawler.processor.recrawl.PersistProcessor
Utility main for importing a log into a BDB-JE environment or moving a database between environments (2 arguments), or simply dumping a log to stderr in a more readable format (1 argument).
main(String[]) - Static method in class org.archive.crawler.util.BenchmarkUriUniqFilters
Test the UriUniqFilter implementation (MemUriUniqFilter, BloomUriUniqFilter, or BdbUriUniqFilter) named in first argument against the file of one-per-line URIs named in the second argument.
main(String[]) - Static method in class org.archive.crawler.util.RecoveryLogMapper
 
main(String[]) - Static method in class org.archive.io.arc.ARC2WCDX
 
main(String[]) - Static method in class org.archive.io.arc.ARCReader
Command-line interface to ARCReader.
main(String[]) - Static method in class org.archive.io.Arc2Warc
Command-line interface to Arc2Warc.
main(String[]) - Static method in class org.archive.io.warc.WARCReader
Command-line interface to WARCReader.
main(String[]) - Static method in class org.archive.io.Warc2Arc
Command-line interface to Arc2Warc.
main(String[]) - Static method in class org.archive.net.md5.Handler
Main dumps rsync file to STDOUT.
main(String[]) - Static method in class org.archive.net.PublicSuffixes
Utility method for dumping a regex String, based on a published public suffix list, which matches any SURT-form hostname up through the broadest 'private' (assigned/sold) domain-segment.
main(String[]) - Static method in class org.archive.net.rsync.Handler
Main dumps rsync file to STDOUT.
main(String[]) - Static method in class org.archive.net.s3.Handler
Main dumps rsync file to STDOUT.
main(String[]) - Static method in class org.archive.queue.QueueCat
 
main(String[]) - Static method in class org.archive.util.Base32
For testing, take a command-line argument in Base32, decode, print in hex, encode, print
main(String[]) - Static method in class org.archive.util.BenchmarkBlooms
 
main(String[]) - Static method in class org.archive.util.JndiUtils
Testing code.
main(String[]) - Static method in class org.archive.util.OneLineSimpleLogger
Test this logger.
main(String[]) - Static method in class org.archive.util.SURT
Allow class to be used as a command-line tool for converting URL lists (or naked host or host/path fragments implied to be HTTP URLs) to SURT form.
main(String[]) - Static method in class org.archive.util.SurtPrefixSet
Allow class to be used as a command-line tool for converting URL lists (or naked host or host/path fragments implied to be HTTP URLs) to implied SURT prefix form.
mainPart - Variable in class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment
The main part of this segment.
makeARCLocal(URLConnection) - Method in class org.archive.io.ArchiveReaderFactory
 
makeDecision(int, Object) - Method in class org.archive.crawler.deciderules.ExceedsDocumentLengthTresholdDecideRule
 
makeDecision(int, Object) - Method in class org.archive.crawler.deciderules.NotExceedsDocumentLengthTresholdDecideRule
 
makeHeritable(String) - Method in class org.archive.crawler.datamodel.CandidateURI
Make the given key 'heritable', meaning its value will be added to descendant CandidateURIs.
makeJobsTabularData(List) - Method in class org.archive.crawler.Heritrix
 
makeLongFPSet() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
 
makeNonHeritable(String) - Method in class org.archive.crawler.datamodel.CandidateURI
Make the given key non-'heritable', meaning its value will not be added to descendant CandidateURIs.
makeQueue() - Method in class org.archive.queue.QueueTestBase
The abstract subclass constructor.
makeSpace() - Method in class org.archive.util.AbstractLongFPSet
Make additional space to keep the load under the target loadFactor level.
makeSpace() - Method in class org.archive.util.fingerprint.LongFPSetCache
 
makeSpace() - Method in class org.archive.util.fingerprint.MemLongFPSet
 
MANIFEST_CONFIG_FILE - Static variable in class org.archive.crawler.framework.CrawlController
abbrieviation label for config files in manifest
MANIFEST_LOG_FILE - Static variable in class org.archive.crawler.framework.CrawlController
abbrieviation label for log files in manifest
MANIFEST_REPORT - Static variable in class org.archive.crawler.framework.CrawlController
 
MANIFEST_REPORT_FILE - Static variable in class org.archive.crawler.framework.CrawlController
abbrieviation label for report files in manifest
map(CandidateURI) - Method in class org.archive.crawler.processor.CrawlMapper
Look up the crawler node name to which the given CandidateURI should be mapped.
map(CandidateURI) - Method in class org.archive.crawler.processor.HashCrawlMapper
Look up the crawler node name to which the given CandidateURI should be mapped.
map - Variable in class org.archive.crawler.processor.LexicalCrawlMapper
Mapping of classKey ranges (as represented by their start) to crawlers (by abstract name/filename)
map(CandidateURI) - Method in class org.archive.crawler.processor.LexicalCrawlMapper
Look up the crawler node name to which the given CandidateURI should be mapped.
MAP - Static variable in class org.archive.crawler.settings.SettingsHandler
 
map - Variable in class org.archive.util.ObjectIdentityMemCache
 
mapString(String, String, long) - Static method in class org.archive.crawler.processor.HashCrawlMapper
 
MapType - Class in org.archive.crawler.settings
This class represents a container of settings.
MapType(String, String) - Constructor for class org.archive.crawler.settings.MapType
Construct a new MapType object.
MapType(String, String, Class) - Constructor for class org.archive.crawler.settings.MapType
Construct a new MapType object.
mark(int) - Method in class org.archive.io.RandomAccessInputStream
 
mark(int) - Method in class org.archive.io.RecordingInputStream
 
mark() - Method in class org.archive.io.RecordingOutputStream
When used alongside a mark-supporting RecordingInputStream, remember a position reachable by a future reset().
mark(int) - Method in class org.archive.io.RepositionableInputStream
 
mark(int) - Method in class org.archive.io.SeekInputStream
Marks the current position of the stream.
mark(int) - Method in class org.archive.io.SeekReader
Marks the current position of the stream.
markAsSeed() - Method in class org.archive.crawler.datamodel.CrawlURI
Deprecated.  
markAsSeen(int, int) - Method in class org.archive.crawler.extractor.PDFParser
Note that an object (id/generation pair) has been seen by this parser so that it can be handled differently when it is encountered again.
markContentBegin(HttpConnection) - Method in class org.archive.httpclient.HttpRecorderMethod
 
markContentBegin() - Method in class org.archive.io.RecordingInputStream
 
markContentBegin() - Method in class org.archive.io.RecordingOutputStream
Remember the current position as the start of the "response body".
markContentBegin() - Method in class org.archive.util.HttpRecorder
Mark current position as the point where the HTTP headers end.
markPrerequisite(String, ProcessorChain) - Method in class org.archive.crawler.datamodel.CrawlURI
Do all actions associated with setting a CrawlURI as requiring a prerequisite.
markSupported() - Method in class org.archive.io.ArchiveRecord
 
markSupported() - Method in class org.archive.io.RandomAccessInputStream
 
markSupported() - Method in class org.archive.io.RecordingInputStream
 
markSupported() - Method in class org.archive.io.SeekInputStream
Returns true, since SeekInputStreams support mark/reset by default.
markSupported() - Method in class org.archive.io.SeekReader
Returns true, since SeekInputStreams support mark/reset by default.
MASSAGEHOST_PATTERN - Static variable in class org.archive.net.UURI
 
match(Class) - Method in class org.archive.crawler.datamodel.credential.CredentialAvatar
 
match(Class, String) - Method in class org.archive.crawler.datamodel.credential.CredentialAvatar
 
matcherStack - Variable in class org.archive.extractor.RegexpJSLinkExtractor
 
matches(String, CharSequence) - Static method in class org.archive.util.TextUtils
Utility method using a precompiled pattern instead of using the matches method of the String class.
MatchesFilePatternDecideRule - Class in org.archive.crawler.deciderules
Compares suffix of a passed CrawlURI, UURI, or String against a regular expression pattern, applying its configured decision to all matches.
MatchesFilePatternDecideRule(String) - Constructor for class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
Usual constructor.
MatchesListRegExpDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any CrawlURIs whose String URI matches the supplied regexps.
MatchesListRegExpDecideRule(String) - Constructor for class org.archive.crawler.deciderules.MatchesListRegExpDecideRule
Usual constructor.
MatchesRegExpDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any CrawlURIs whose String URI matches the supplied regexp.
MatchesRegExpDecideRule(String) - Constructor for class org.archive.crawler.deciderules.MatchesRegExpDecideRule
Usual constructor.
MAX_ALLOWED_RECOVERABLES - Static variable in class org.archive.io.ArchiveReader
Maximum amount of recoverable exceptions in a row.
MAX_ATTR_VAL_LENGTH - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
MAX_HEADER_MATERIAL - Static variable in class org.archive.io.RecordingOutputStream
Maximum amount of header material to accept without the content body beginning -- if more, throw a RecorderTooMuchHeaderException.
MAX_INT_CHAR_WIDTH - Static variable in class org.archive.util.ArchiveUtils
 
MAX_LINE_LENGTH - Static variable in interface org.archive.io.warc.WARCConstants
 
MAX_METADATA_LINE_LENGTH - Static variable in interface org.archive.io.arc.ARCConstants
Maximum length for a metadata line.
MAX_OUTLINKS - Static variable in class org.archive.crawler.datamodel.CrawlURI
Protection against outlink overflow.
MAX_URL_LENGTH - Static variable in class org.archive.net.UURI
Consider URIs too long for IE as illegal.
MAX_WARC_HEADER_LINE_LENGTH - Static variable in interface org.archive.io.warc.WARCConstants
Assumed maximum size of a Header Line.
maxEmbedHops - Variable in class org.archive.crawler.filter.TransclusionFilter
Deprecated.  
MAXIMUM_SIZE - Static variable in class org.archive.util.anvl.ANVLRecord
Arbitrary upper bound on maximum size of ANVL Record.
maximumNumberOfKeys() - Method in class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
 
maximumNumberOfKeys() - Method in class org.archive.crawler.frontier.QueueAssignmentPolicy
Returns the maximum number of different keys this policy can create.
maxLength - Variable in class org.archive.io.RecordingOutputStream
maximum length of material to record before throwing exception
maxLinkHops - Variable in class org.archive.crawler.filter.HopsFilter
Deprecated.  
maxPathDepth - Variable in class org.archive.crawler.filter.PathDepthFilter
Deprecated.  
maxPending - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
size at which to force flush of pending items
maxRateBytesPerMs - Variable in class org.archive.io.RecordingOutputStream
maximum rate to record (adds delays to hit target rate)
maxReferralHops - Variable in class org.archive.crawler.filter.TransclusionFilter
Deprecated.  
maxSegLen - Variable in class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment
The maximum number of characters allowed in one file system path segment.
maxSpeculativeHops - Variable in class org.archive.crawler.filter.TransclusionFilter
Deprecated.  
maxTransHops - Variable in class org.archive.crawler.filter.HopsFilter
Deprecated.  
maxTransHops - Variable in class org.archive.crawler.filter.TransclusionFilter
Deprecated.  
maybeRelative(File, String) - Static method in class org.archive.util.FileUtils
Turn path into a File, relative to context (which may be ignored if path is absolute).
MBEAN_SERVER_DELEGATE - Static variable in class org.archive.util.JmxUtils
 
MD5 - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
Md5URLConnection - Class in org.archive.net.md5
Md5 URL connection.
Md5URLConnection(URL) - Constructor for class org.archive.net.md5.Md5URLConnection
 
MEDIUM - Static variable in class org.archive.crawler.datamodel.CandidateURI
Medium priority.
MemFPMergeUriUniqFilter - Class in org.archive.crawler.util
Crude all-in-memory FP-merging UriUniqFilter.
MemFPMergeUriUniqFilter() - Constructor for class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
MemLongFPSet - Class in org.archive.util.fingerprint
Open-addressing in-memory hash set for holding primitive long fingerprints.
MemLongFPSet() - Constructor for class org.archive.util.fingerprint.MemLongFPSet
 
MemLongFPSet(int, float) - Constructor for class org.archive.util.fingerprint.MemLongFPSet
 
memMap - Variable in class org.archive.util.CachedBdbMap
Deprecated. The softreferenced cache of diskMap.
memMap - Variable in class org.archive.util.ObjectIdentityBdbCache
in-memory map of new/recent/still-referenced-elsewhere instances
MemQueue<T> - Class in org.archive.queue
An in-memory implementation of a Queue.
MemQueue() - Constructor for class org.archive.queue.MemQueue
Create a new, empty MemQueue
MemUriUniqFilter - Class in org.archive.crawler.util
A purely in-memory UriUniqFilter based on a HashSet, which remembers every full URI string it sees.
MemUriUniqFilter() - Constructor for class org.archive.crawler.util.MemUriUniqFilter
 
mergeDupAtLast - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
mergeDuplicateCount - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
message(String, int) - Method in class org.archive.crawler.CommandLineParser
Print message and then exit.
messageArguments - Variable in class org.archive.crawler.settings.Constraint.FailedCheck
 
METADATA - Static variable in interface org.archive.io.warc.WARCConstants
 
METADATA_INDEX - Static variable in interface org.archive.io.warc.WARCConstants
 
MIMETYPE - Static variable in class org.archive.util.anvl.ANVLRecord
 
MIMETYPE_FIELD_KEY - Static variable in interface org.archive.io.ArchiveFileConstants
Key for the Archive File mimetype field.
mimeTypeBytes - Variable in class org.archive.crawler.admin.StatisticsSummary
 
mimeTypeBytes - Variable in class org.archive.crawler.admin.StatisticsTracker
 
mimeTypeDistribution - Variable in class org.archive.crawler.admin.StatisticsSummary
Keep track of the file types we see (mime type -> count)
mimeTypeDistribution - Variable in class org.archive.crawler.admin.StatisticsTracker
Keep track of the file types we see (mime type -> count)
mimeTypeDnsBytes - Variable in class org.archive.crawler.admin.StatisticsSummary
 
mimeTypeDnsDistribution - Variable in class org.archive.crawler.admin.StatisticsSummary
 
MimetypeUtils - Class in org.archive.util
Class of mimetype utilities.
MimetypeUtils() - Constructor for class org.archive.util.MimetypeUtils
 
MIN_ROBOTS_RETRIES - Static variable in class org.archive.crawler.datamodel.CrawlServer
only check if robots-fetch is perhaps superfluous after this many tries
MINIMAL_GZIP_HEADER_LENGTH - Static variable in class org.archive.io.GzipHeader
Length of minimal GZIP header.
MINIMUM_RECORD_LENGTH - Static variable in interface org.archive.io.arc.ARCConstants
Minimum possible record length.
MirrorWriterProcessor - Class in org.archive.crawler.writer
Processor module that writes the results of successful fetches to files on disk.
MirrorWriterProcessor(String) - Constructor for class org.archive.crawler.writer.MirrorWriterProcessor
 
MirrorWriterProcessor.DirSegment - Class in org.archive.crawler.writer
This class represents one directory segment (component) of a URI path.
MirrorWriterProcessor.DirSegment(String, int, int, int, boolean, CrawlURI, Map, String, String, Set) - Constructor for class org.archive.crawler.writer.MirrorWriterProcessor.DirSegment
Creates a DirSegment.
MirrorWriterProcessor.EndSegment - Class in org.archive.crawler.writer
This class represents the last segment (component) of a URI path.
MirrorWriterProcessor.EndSegment(String, int, int, int, boolean, CrawlURI, Map, String, String, String, int, boolean) - Constructor for class org.archive.crawler.writer.MirrorWriterProcessor.EndSegment
Creates an EndSegment.
MirrorWriterProcessor.LumpyString - Class in org.archive.crawler.writer
This class represents a dynamically growable string consisting of substrings ("lumps") that are treated atomically.
MirrorWriterProcessor.LumpyString(String, int, int, int, int, Map, String) - Constructor for class org.archive.crawler.writer.MirrorWriterProcessor.LumpyString
Creates a LumpyString.
MirrorWriterProcessor.PathSegment - Class in org.archive.crawler.writer
This class represents one segment (component) of a URI path.
MirrorWriterProcessor.PathSegment(int, boolean, CrawlURI) - Constructor for class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment
Creates a new PathSegment.
MirrorWriterProcessor.PathSegment.CaseInsensitiveFilenameFilter - Class in org.archive.crawler.writer
This class implements a FilenameFilter that matches by name, ignoring case.
MirrorWriterProcessor.PathSegment.CaseInsensitiveFilenameFilter(String) - Constructor for class org.archive.crawler.writer.MirrorWriterProcessor.PathSegment.CaseInsensitiveFilenameFilter
Creates a CaseInsensitiveFilenameFilter.
MirrorWriterProcessor.URIToFileReturn - Class in org.archive.crawler.writer
This class is returned by uriToFile.
MirrorWriterProcessor.URIToFileReturn(String, String, int) - Constructor for class org.archive.crawler.writer.MirrorWriterProcessor.URIToFileReturn
Creates a URIToFileReturn.
MISC - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
MISC - Static variable in class org.archive.crawler.filter.FilePatternFilter
Deprecated.  
MISC_PATTERNS - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
MISC_PATTERNS - Static variable in class org.archive.crawler.filter.FilePatternFilter
Deprecated.  
mkdirs() - Method in class org.archive.crawler.writer.MirrorWriterProcessor.URIToFileReturn
Creates all directories in this path as needed.
ModuleAttributeInfo - Class in org.archive.crawler.settings
 
ModuleAttributeInfo(Type) - Constructor for class org.archive.crawler.settings.ModuleAttributeInfo
Construct a new instance of ModuleAttributeInfo.
ModuleAttributeInfo(ModuleAttributeInfo) - Constructor for class org.archive.crawler.settings.ModuleAttributeInfo
 
ModuleType - Class in org.archive.crawler.settings
Superclass of all modules that should be configurable.
ModuleType(String, String) - Constructor for class org.archive.crawler.settings.ModuleType
Creates a new ModuleType.
ModuleType(String) - Constructor for class org.archive.crawler.settings.ModuleType
Every subclass should implement this constructor
MOST_FAVORED - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
MOST_FAVORED_SET - Static variable in class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
MOTHER - Static variable in class org.archive.util.JmxUtils
Key for name of the Heritrix instance hosting a Job: i.e.
moveElementDown(String) - Method in class org.archive.crawler.settings.DataContainer
Move an attribute down one place in the list.
moveElementDown(CrawlerSettings, String) - Method in class org.archive.crawler.settings.MapType
Move an attribute down one place in the list.
moveElementUp(String) - Method in class org.archive.crawler.settings.DataContainer
Move an attribute up one place in the list.
moveElementUp(CrawlerSettings, String) - Method in class org.archive.crawler.settings.MapType
Move an attribute up one place in the list.
moveToNextGzipMember() - Method in class org.archive.io.GzippedInputStream
 
MULTIPLE_SLASHES - Static variable in class org.archive.net.UURIFactory
Pattern that looks for case of two or more slashes in a path.
multiThreadMode() - Method in class org.archive.crawler.framework.CrawlController
Go to back to regular multi thread mode, where all ToeThreads may proceed at once
mustBeCrawling() - Method in class org.archive.crawler.admin.CrawlJob
 

N

NAIVE_LIKELY_URI_PATTERN - Static variable in class org.archive.util.UriUtils
 
NAIVE_URI_EXCEPTIONS - Static variable in class org.archive.util.UriUtils
 
NAME - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
NAME - Static variable in class org.archive.util.JmxUtils
 
NAME_ATTR - Static variable in class org.archive.crawler.admin.CrawlJob
 
NAMED_FIELD_CHECKSUM_LABEL - Static variable in interface org.archive.io.warc.WARCConstants
 
NAMED_FIELD_DESCRIPTION - Static variable in interface org.archive.io.warc.WARCConstants
 
NAMED_FIELD_FILEDESC - Static variable in interface org.archive.io.warc.WARCConstants
 
NAMED_FIELD_IP_LABEL - Static variable in interface org.archive.io.warc.WARCConstants
 
NAMED_FIELD_RELATED_LABEL - Static variable in interface org.archive.io.warc.WARCConstants
 
NAMED_FIELD_TRUNCATED - Static variable in interface org.archive.io.warc.WARCConstants
 
NAMED_FIELD_TRUNCATED_VALUE_HEAD - Static variable in interface org.archive.io.warc.WARCConstants
 
NAMED_FIELD_TRUNCATED_VALUE_LENGTH - Static variable in interface org.archive.io.warc.WARCConstants
 
NAMED_FIELD_TRUNCATED_VALUE_TIME - Static variable in interface org.archive.io.warc.WARCConstants
 
NAMED_FIELD_TRUNCATED_VALUE_UNSPECIFIED - Static variable in interface org.archive.io.warc.WARCConstants
 
NAMED_FIELD_WARCFILENAME - Static variable in interface org.archive.io.warc.WARCConstants
 
NASCENT - Static variable in class org.archive.crawler.framework.CrawlController
 
NATURAL_LOG_OF_2 - Static variable in class org.archive.util.BloomFilter64bit
The natural logarithm of 2, used in the computation of the number of bits.
NAVLINK_HOP - Static variable in class org.archive.crawler.extractor.Link
navigation links, like A/@HREF
NAVLINK_MISC - Static variable in class org.archive.crawler.extractor.Link
stand-in value for navlink urls without other context
NBSP - Static variable in class org.archive.net.UURIFactory
 
needsImmediateScheduling() - Method in class org.archive.crawler.datamodel.CandidateURI
 
needsPromptRetry(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Checks if a recently completed CrawlURI that did not finish successfully needs to be retried immediately (processed again as soon as politeness allows.)
needsRetrying(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Checks if a recently completed CrawlURI that did not finish successfully needs to be retried (processed again after some time elapses)
needsRetrying(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Checks if a recently completed CrawlURI that did not finish successfully needs to be retried (processed again after some time elapses)
needsSoonScheduling() - Method in class org.archive.crawler.datamodel.CandidateURI
 
NEWALERTCOUNT_ATTR - Static variable in class org.archive.crawler.Heritrix
 
newCount - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
newDefaultInstance() - Static method in class org.archive.extractor.CharSequenceLinkExtractor
 
newDefaultInstance() - Static method in class org.archive.extractor.RegexpCSSLinkExtractor
 
newDefaultInstance() - Static method in class org.archive.extractor.RegexpHTMLLinkExtractor
 
newDefaultInstance() - Static method in class org.archive.extractor.RegexpJSLinkExtractor
 
newFps - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
newFps - Variable in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
newFpsFile - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
newInterpreter() - Method in class org.archive.crawler.deciderules.BeanShellDecideRule
Create a new Interpreter instance, preloaded with any supplied source file and the variables 'self' (this BeanShellProcessor) and 'controller' (the CrawlController).
newInterpreter() - Method in class org.archive.crawler.processor.BeanShellProcessor
Create a new Interpreter instance, preloaded with any supplied source code or source file and the variables 'self' (this BeanShellProcessor) and 'controller' (the CrawlController).
newJob(CrawlJob, String, String, String, String, int) - Method in class org.archive.crawler.admin.CrawlJobHandler
Creates a new job.
newJob(File, String, String, String) - Method in class org.archive.crawler.admin.CrawlJobHandler
Creates a new job.
newline() - Method in class org.archive.util.PaddingStringBuffer
Forces a new line in the buffer.
newProfile(CrawlJob, String, String, String) - Method in class org.archive.crawler.admin.CrawlJobHandler
Creates a new profile.
next() - Method in interface org.archive.crawler.framework.Frontier
Get the next URI that should be processed.
next() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
next() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Returns the 'top' URI in the AdaptiveRevisitHostQueue.
next() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Return the next CrawlURI to be processed (and presumably visited/fetched) by a a worker thread.
next() - Method in class org.archive.crawler.settings.ComplexType.MBeanAttributeInfoIterator
 
next() - Method in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
The common parts of next() across different types of iterators
next - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
 
next() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
Return the next item.
next() - Method in class org.archive.crawler.util.TransformIterator
 
next - Variable in class org.archive.extractor.CharSequenceLinkExtractor
 
next() - Method in class org.archive.extractor.CharSequenceLinkExtractor
 
next - Variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
next() - Method in class org.archive.io.ArchiveReader.ArchiveRecordIterator
Tries to move to next record if we get RecoverableIOException.
next() - Method in class org.archive.util.iterator.CompositeIterator
 
next - Variable in class org.archive.util.iterator.LookaheadIterator
 
next() - Method in class org.archive.util.iterator.LookaheadIterator
Return the next item.
next() - Method in class org.archive.util.ms.PieceTable
Returns the next piece in the piece table.
nextEntry() - Method in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
 
nextFlushAllowableAfter - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
time-based throttle on flush-merge operations
nextIsValid - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
 
nextItemNumber - Variable in class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
 
nextKey - Variable in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
Strong reference needed to avoid disappearance of key between hasNext and next
nextLink() - Method in class org.archive.extractor.CharSequenceLinkExtractor
 
nextLink() - Method in interface org.archive.extractor.LinkExtractor
Alternative to Iterator.next() which returns type Link.
nextLong() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
 
nextOrdinal - Variable in class org.archive.crawler.frontier.AbstractFrontier
ordinal numbers to assign to created CrawlURIs
nextProcessor() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the next processor to process this URI.
nextProcessorChain() - Method in class org.archive.crawler.datamodel.CrawlURI
Get the processor chain that should be processing this URI after the current chain is finished with it.
nextReadyTime - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Time (in milliseconds) when the HQ will next be ready to issue a URI for processing.
nextSerialNumber - Variable in class org.archive.crawler.framework.ToePool
 
nextWake - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
Task for next wake
NO_DIRECTIVES - Static variable in class org.archive.crawler.datamodel.Robotstxt
 
NO_MAX_IDLE - Static variable in class org.archive.io.WriterPool
Don't enforce a maximum number of idle instances in pool.
NO_TYPE_MIMETYPE - Static variable in class org.archive.util.MimetypeUtils
The 'no-type' content-type.
NoGzipMagicException - Exception in org.archive.io
 
NoGzipMagicException() - Constructor for exception org.archive.io.NoGzipMagicException
 
NOHEAD - Static variable in interface org.archive.io.ArchiveFileConstants
 
NON_HTML_PATH_EXTENSION - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
NON_HTML_PATH_EXTENSION - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
NONWHITESPACE_ENTRY_TRAILING_COMMENT - Static variable in class org.archive.util.iterator.RegexpLineIterator
 
NoopUriUniqFilter - Class in org.archive.crawler.util
A UriUniqFilter that doesn't actually provide any uniqueness filter on presented items: all are passed through.
NoopUriUniqFilter() - Constructor for class org.archive.crawler.util.NoopUriUniqFilter
 
NORMAL - Static variable in class org.archive.crawler.datamodel.CandidateURI
Normal/low priority.
NORMAL - Static variable in class org.archive.httpclient.ConfigurableX509TrustManager
Normal jsse behavior.
note(String) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Note item as seen, without passing through to receiver.
note(String) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
note(String) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
noteAboutToEmit(CrawlURI, WorkQueue) - Method in class org.archive.crawler.frontier.AbstractFrontier
Perform fixups on a CrawlURI about to be returned via next().
noteAccess(long) - Method in class org.archive.util.fingerprint.LongFPSetCache
 
noteError(int) - Method in class org.archive.crawler.frontier.WorkQueue
Note an error and assess an extra penalty.
noteExhausted() - Method in class org.archive.crawler.scope.SeedFileIterator
Clean-up when hasNext() has returned null: close open files.
noteExhausted() - Method in class org.archive.util.iterator.TransformingIteratorWrapper
Any cleanup to occur when hasNext() is about to return false
noteExtractError(IOException, UURI, CharSequence) - Method in interface org.archive.extractor.ExtractErrorListener
Callback to report an extraction error.
noteLine() - Method in class org.archive.crawler.io.CrawlerJournal
Count and note a line
noteStart() - Method in class org.archive.crawler.framework.AbstractTracker
Notify tracker that crawl has begun.
noteStart() - Method in interface org.archive.crawler.framework.StatisticsTracking
Start the tracker's crawl timing.
NotExceedsDocumentLengthTresholdDecideRule - Class in org.archive.crawler.deciderules
 
NotExceedsDocumentLengthTresholdDecideRule(String) - Constructor for class org.archive.crawler.deciderules.NotExceedsDocumentLengthTresholdDecideRule
 
NotMatchesFilePatternDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any URIs which do *not* match the supplied (file-pattern) regexp.
NotMatchesFilePatternDecideRule(String) - Constructor for class org.archive.crawler.deciderules.NotMatchesFilePatternDecideRule
Usual constructor.
NotMatchesListRegExpDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any URIs which do *not* match the supplied regexp.
NotMatchesListRegExpDecideRule(String) - Constructor for class org.archive.crawler.deciderules.NotMatchesListRegExpDecideRule
Usual constructor.
NotMatchesRegExpDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any URIs which do *not* match the supplied regexp.
NotMatchesRegExpDecideRule(String) - Constructor for class org.archive.crawler.deciderules.NotMatchesRegExpDecideRule
Usual constructor.
NOTMODIFIED - Static variable in class org.archive.crawler.util.CrawledBytesHistotable
 
notModifiedBytes - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
notModifiedUriCount - Variable in class org.archive.crawler.admin.StatisticsTracker
 
notModifiedUrls - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
NotOnDomainsDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any URIs that are *not* in one of the domains in the configured set of domains, filled from the seed set.
NotOnDomainsDecideRule(String) - Constructor for class org.archive.crawler.deciderules.NotOnDomainsDecideRule
Usual constructor.
NotOnHostsDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any URIs that are *not* on one of the hosts in the configured set of hosts, filled from the seed set.
NotOnHostsDecideRule(String) - Constructor for class org.archive.crawler.deciderules.NotOnHostsDecideRule
Usual constructor.
NotSurtPrefixedDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any URIs that, when expressed in SURT form, do *not* begin with one of the prefixes in the configured set.
NotSurtPrefixedDecideRule(String) - Constructor for class org.archive.crawler.deciderules.NotSurtPrefixedDecideRule
Usual constructor.
NOVEL - Static variable in class org.archive.crawler.util.CrawledBytesHistotable
 
novelBytes - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
novelUriCount - Variable in class org.archive.crawler.admin.StatisticsTracker
 
novelUrls - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
NUM_RECORDS - Static variable in class org.archive.io.warc.WARCWriter
 
NUMBER_OF_WEIGHTS - Static variable in class org.archive.util.BloomFilter64bit
The number of weights used to create hash functions.
numberOfCURIsHandled - Variable in class org.archive.crawler.extractor.ExtractorHTML
 
numberOfCURIsHandled - Variable in class org.archive.crawler.extractor.ExtractorHTTP
 
numberOfCURIsHandled - Variable in class org.archive.crawler.extractor.ExtractorJS
 
numberOfCURIsHandled - Variable in class org.archive.crawler.extractor.ExtractorPDF
 
numberOfCURIsHandled - Variable in class org.archive.crawler.extractor.ExtractorSWF
 
numberOfCURIsHandled - Variable in class org.archive.crawler.extractor.ExtractorUniversal
 
numberOfCURIsHandled - Variable in class org.archive.crawler.extractor.TrapSuppressExtractor
 
numberOfCURIsSuppressed - Variable in class org.archive.crawler.extractor.TrapSuppressExtractor
 
numberOfFormsProcessed - Variable in class org.archive.crawler.extractor.JerichoExtractorHTML
 
numberOfLinksExtracted - Variable in class org.archive.crawler.extractor.ExtractorHTML
 
numberOfLinksExtracted - Variable in class org.archive.crawler.extractor.ExtractorHTTP
 
numberOfLinksExtracted - Static variable in class org.archive.crawler.extractor.ExtractorJS
 
numberOfLinksExtracted - Variable in class org.archive.crawler.extractor.ExtractorPDF
 
numberOfLinksExtracted - Variable in class org.archive.crawler.extractor.ExtractorSWF
 
numberOfLinksExtracted - Variable in class org.archive.crawler.extractor.ExtractorUniversal
 

O

OBJECT - Static variable in class org.archive.crawler.settings.SettingsHandler
 
ObjectIdentityBdbCache<V> - Class in org.archive.util
A BDB JE backed object cache.
ObjectIdentityBdbCache() - Constructor for class org.archive.util.ObjectIdentityBdbCache
Constructor.
ObjectIdentityBdbCache.LowMemoryCanary - Class in org.archive.util
 
ObjectIdentityBdbCache.LowMemoryCanary() - Constructor for class org.archive.util.ObjectIdentityBdbCache.LowMemoryCanary
 
ObjectIdentityCache<K,V> - Interface in org.archive.util
An object cache for create-once-by-name-and-then-reuse objects.
ObjectIdentityMemCache<V> - Class in org.archive.util
Trivial all-in-memory object cache, using a single internal ConcurrentHashMap.
ObjectIdentityMemCache() - Constructor for class org.archive.util.ObjectIdentityMemCache
 
ObjectIdentityMemCache(int, float, int) - Constructor for class org.archive.util.ObjectIdentityMemCache
 
ObjectPlusFilesInputStream - Class in org.archive.io
Enhanced ObjectOutputStream with support for restoring files that had been saved, in parallel with object serialization.
ObjectPlusFilesInputStream(InputStream, File) - Constructor for class org.archive.io.ObjectPlusFilesInputStream
Instantiate over the given stream and using the supplied auxiliary storage directory.
ObjectPlusFilesOutputStream - Class in org.archive.io
Enhanced ObjectOutputStream which maintains (a stack of) auxiliary directories and offers convenience methods for serialized objects to save their related disk files alongside their serialized version.
ObjectPlusFilesOutputStream(OutputStream, File) - Constructor for class org.archive.io.ObjectPlusFilesOutputStream
Constructor
objectToEntry(Object, DatabaseEntry) - Method in class org.archive.crawler.frontier.RecyclingSerialBinding
Copies superclass simply to allow different source for FastOoutputStream.
OCCUPIED_SUFFIX - Static variable in interface org.archive.io.ArchiveFileConstants
Suffix given to files currently in use.
offer(E) - Method in class org.archive.queue.StoredQueue
 
OFFSET_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for offset field.
OFFSET_HEADER_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for the ARC Header Offset field.
oldFps - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
OnDomainsDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any URIs that are on one of the domains in the configured set of domains, filled from the seed set.
OnDomainsDecideRule(String) - Constructor for class org.archive.crawler.deciderules.OnDomainsDecideRule
Usual constructor.
oneLineReportThreads() - Method in class org.archive.crawler.framework.CrawlController
 
OneLineSimpleLogger - Class in org.archive.util
Logger that writes entry on one line with less verbose date.
OneLineSimpleLogger() - Constructor for class org.archive.util.OneLineSimpleLogger
 
OnHostsDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any URIs that are on one of the hosts in the configured set of hosts, filled from the seed set.
OnHostsDecideRule(String) - Constructor for class org.archive.crawler.deciderules.OnHostsDecideRule
Usual constructor.
OP_BLOCK_URIS - Static variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
OP_CHECKPOINT - Static variable in class org.archive.util.JEMBeanHelper
 
OP_CLEAN - Static variable in class org.archive.util.JEMBeanHelper
 
OP_CLOSE - Static variable in class org.archive.util.JEApplicationMBean
This MBean provides a close operation to release the JE environment.
OP_DB_NAMES - Static variable in class org.archive.util.JEMBeanHelper
 
OP_DB_STAT - Static variable in class org.archive.crawler.admin.CrawlJob
 
OP_DB_STAT - Static variable in class org.archive.util.JEMBeanHelper
 
OP_ENV_STAT - Static variable in class org.archive.util.JEMBeanHelper
 
OP_ENV_STAT_STR - Static variable in class org.archive.util.JEMBeanHelper
 
OP_EVICT - Static variable in class org.archive.util.JEMBeanHelper
 
OP_LOCK_STAT - Static variable in class org.archive.util.JEMBeanHelper
 
OP_LOCK_STAT_STR - Static variable in class org.archive.util.JEMBeanHelper
 
OP_OPEN - Static variable in class org.archive.util.JEApplicationMBean
This MBean provides an open operation to open the JE environment.
OP_PAUSE - Static variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
OP_SYNC - Static variable in class org.archive.util.JEMBeanHelper
 
OP_TERMINATE - Static variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
OP_TXN_STAT - Static variable in class org.archive.util.JEMBeanHelper
 
open(Environment, DatabaseConfig) - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
OPEN - Static variable in class org.archive.httpclient.ConfigurableX509TrustManager
Trust anything given us.
open(InputStream) - Method in class org.archive.io.RecordingInputStream
 
open() - Method in class org.archive.io.RecordingOutputStream
Wrap the given stream, both recording and passing along any data written to this RecordingOutputStream.
open(OutputStream) - Method in class org.archive.io.RecordingOutputStream
Wrap the given stream, both recording and passing along any data written to this RecordingOutputStream.
open() - Method in class org.archive.util.ms.DefaultEntry
 
open() - Method in interface org.archive.util.ms.Entry
 
openConnection(URL) - Method in class org.archive.net.md5.Handler
 
openConnection(URL) - Method in class org.archive.net.rsync.Handler
 
openConnection(URL) - Method in class org.archive.net.s3.Handler
 
openDatabase(Environment, String) - Method in class org.archive.util.CachedBdbMap
Deprecated.  
openDatabase(Environment, String) - Method in class org.archive.util.ObjectIdentityBdbCache
 
openDataConnection(int, String) - Method in class org.archive.net.ClientFTP
Opens a data connection.
openDbCount - Variable in class org.archive.util.CachedBdbMap.DbEnvironmentEntry
Deprecated.  
OPERATION_LIST - Static variable in class org.archive.crawler.Heritrix
 
ORDER_EXCLUDE - Static variable in class org.archive.crawler.admin.CrawlJob
Don't add the following crawl-order items.
ORDER_FILE_NAME - Static variable in class org.archive.crawler.admin.CrawlJobHandler
 
ordinal - Variable in class org.archive.crawler.datamodel.CrawlURI
Monotonically increasing number within a crawl; useful for tending towards breadth-first ordering.
OrFilter - Class in org.archive.crawler.filter
Deprecated. As of release 1.10.0. Replaced by DecidingFilter and DecideRule.
OrFilter(String, String) - Constructor for class org.archive.crawler.filter.OrFilter
Deprecated.  
OrFilter(String) - Constructor for class org.archive.crawler.filter.OrFilter
Deprecated.  
org.archive.crawler - package org.archive.crawler
Introduction to Heritrix.
org.archive.crawler.admin - package org.archive.crawler.admin
Contains classes that the web UI uses to monitor and control crawls.
org.archive.crawler.admin.ui - package org.archive.crawler.admin.ui
 
org.archive.crawler.datamodel - package org.archive.crawler.datamodel
 
org.archive.crawler.datamodel.credential - package org.archive.crawler.datamodel.credential
Contains html form login and basic and digest credentials used by Heritrix logging into sites.
org.archive.crawler.deciderules - package org.archive.crawler.deciderules
Provides classes for a simple decision rules framework.
org.archive.crawler.deciderules.recrawl - package org.archive.crawler.deciderules.recrawl
 
org.archive.crawler.event - package org.archive.crawler.event
 
org.archive.crawler.extractor - package org.archive.crawler.extractor
 
org.archive.crawler.fetcher - package org.archive.crawler.fetcher
 
org.archive.crawler.filter - package org.archive.crawler.filter
 
org.archive.crawler.framework - package org.archive.crawler.framework
 
org.archive.crawler.framework.exceptions - package org.archive.crawler.framework.exceptions
 
org.archive.crawler.frontier - package org.archive.crawler.frontier
 
org.archive.crawler.io - package org.archive.crawler.io
 
org.archive.crawler.postprocessor - package org.archive.crawler.postprocessor
 
org.archive.crawler.prefetch - package org.archive.crawler.prefetch
 
org.archive.crawler.processor - package org.archive.crawler.processor
 
org.archive.crawler.processor.recrawl - package org.archive.crawler.processor.recrawl
 
org.archive.crawler.scope - package org.archive.crawler.scope
 
org.archive.crawler.selftest - package org.archive.crawler.selftest
Provides the client-side aspect of the heritrix integration self test.
org.archive.crawler.settings - package org.archive.crawler.settings
Provides classes for the settings framework.
org.archive.crawler.settings.refinements - package org.archive.crawler.settings.refinements
 
org.archive.crawler.url - package org.archive.crawler.url
 
org.archive.crawler.url.canonicalize - package org.archive.crawler.url.canonicalize
 
org.archive.crawler.util - package org.archive.crawler.util
 
org.archive.crawler.writer - package org.archive.crawler.writer
 
org.archive.extractor - package org.archive.extractor
 
org.archive.httpclient - package org.archive.httpclient
Provides specializations on apache jakarta commons httpclient.
org.archive.io - package org.archive.io
 
org.archive.io.arc - package org.archive.io.arc
ARC file reading and writing.
org.archive.io.warc - package org.archive.io.warc
Experimental WARC Writer and Readers.
org.archive.net - package org.archive.net
 
org.archive.net.md5 - package org.archive.net.md5
 
org.archive.net.rsync - package org.archive.net.rsync
 
org.archive.net.s3 - package org.archive.net.s3
 
org.archive.queue - package org.archive.queue
 
org.archive.uid - package org.archive.uid
A unique ID generator.
org.archive.util - package org.archive.util
 
org.archive.util.anvl - package org.archive.util.anvl
Parsers and Writers for the (expired) Internet-Draft A Name-Value Language (ANVL).
org.archive.util.bdbje - package org.archive.util.bdbje
 
org.archive.util.fingerprint - package org.archive.util.fingerprint
 
org.archive.util.iterator - package org.archive.util.iterator
 
org.archive.util.ms - package org.archive.util.ms
Memory-efficient reading of .doc files.
OriginSeekInputStream - Class in org.archive.io
Alters the origin of some other SeekInputStream.
OriginSeekInputStream(SeekInputStream, long) - Constructor for class org.archive.io.OriginSeekInputStream
Constructor.
os - Variable in class org.archive.io.RecyclingFastBufferedOutputStream
The underlying output stream.
out - Variable in class org.archive.crawler.io.CrawlerJournal
Stream on which we record frontier events.
outLinks - Variable in class org.archive.crawler.datamodel.CrawlURI
All discovered outbound Links (navlinks, embeds, etc.) Can either contain Link instances or CandidateURI instances, or both.
outlinks(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorTool
 
outlinksSize() - Method in class org.archive.crawler.datamodel.CrawlURI
 
outOfScope(CandidateURI) - Method in class org.archive.crawler.framework.Scoper
Called when a CandidateUri is ruled out of scope.
outOfScope(CandidateURI) - Method in class org.archive.crawler.postprocessor.LinksScoper
 
outOfScope(CandidateURI) - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
Called when a CandidateUri is ruled out of scope.
output(String) - Method in class org.archive.io.arc.ARCReader
 
output(ARCReader, String) - Static method in class org.archive.io.arc.ARCReader
Write out the arcfile.
output(String) - Method in class org.archive.io.ArchiveReader
 
output(WARCReader, String) - Static method in class org.archive.io.warc.WARCReader
Write out the arcfile.
outputCdx(String) - Method in class org.archive.io.ArchiveRecord
 
outputRecord(String) - Method in class org.archive.io.arc.ARCReader
 
outputRecord(String) - Method in class org.archive.io.ArchiveReader
Output passed record using passed format specifier.
outputRecord(ArchiveReader, String) - Static method in class org.archive.io.ArchiveReader
Output passed record using passed format specifier.
outputTemplate - Variable in class org.archive.util.iterator.RegexpLineIterator
 
outputWrap(OutputStream) - Method in class org.archive.util.HttpRecorder
Wrap the provided stream with the internal RecordingOutputStream open() throws an exception if RecordingOutputStream is already open.
overMaxRetries(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 

P

PaddingStringBuffer - Class in org.archive.util
StringBuffer-like utility which can add spaces to reach a certain column.
PaddingStringBuffer() - Constructor for class org.archive.util.PaddingStringBuffer
Create a new PaddingStringBuffer
padTo(int, int) - Static method in class org.archive.util.ArchiveUtils
Convert an int to a String, and pad it to pad spaces.
padTo(String, int) - Static method in class org.archive.util.ArchiveUtils
Pad the given String to pad characters wide by pre-pending spaces.
padTo(String, int, char) - Static method in class org.archive.util.ArchiveUtils
Pad the given String to pad characters wide by pre-pending padChar.
padTo(int) - Method in class org.archive.util.PaddingStringBuffer
Pad to a given column.
pageOutStaleEntries() - Method in class org.archive.util.ObjectIdentityBdbCache
An incremental, poll-based expunger.
parse(InputSource) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
parse(String) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
parse12DigitDate(String) - Static method in class org.archive.util.ArchiveUtils
Utility function for parsing arc-style date stamps in the format yyyMMddHHmm.
parse14DigitDate(String) - Static method in class org.archive.util.ArchiveUtils
Utility function for parsing arc-style date stamps in the format yyyMMddHHmmss.
parse17DigitDate(String) - Static method in class org.archive.util.ArchiveUtils
Utility function for parsing arc-style date stamps in the format yyyMMddHHmmssSSS.
parseArcFilename(String) - Static method in class org.archive.io.arc.ARCUtils
 
parseAuthority(String, boolean) - Method in class org.archive.net.LaxURI
Coalesce the _host and _authority fields where possible.
parseDefineBits(InStream) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineBitsJPEG3(InStream) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineBitsLossless(InStream, int, boolean) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineButtonSound(InStream) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineFont(InStream) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineFont2(InStream) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineJPEG2(InStream, int) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineJPEGTables(InStream) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineShape(int, InStream) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineSound(InStream) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineSprite(InStream) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorTagParser
 
parseFilename(String) - Static method in class org.archive.net.UURI
 
parseFontInfo(InStream, int, boolean) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorTagParser
 
parseHeaders(InputStream, String, long, boolean) - Method in class org.archive.io.warc.WARCRecord
Parse WARC Header Line and Named Fields.
parsePlaceObject2(InStream) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorTagParser
 
parseRevision(String) - Static method in class org.archive.io.Warc2Arc
 
parseUriReference(String, boolean) - Method in class org.archive.net.LaxURI
IA OVERRIDDEN IN LaxURI TO INCLUDE FIX FOR http://issues.apache.org/jira/browse/HTTPCLIENT-588 AND http://webteam.archive.org/jira/browse/HER-1268 In order to avoid any possilbity of conflict with non-ASCII characters, Parse a URI reference as a String with the character encoding of the local system or the document.
PASS - Static variable in class org.archive.crawler.deciderules.DecideRule
 
patchLogging() - Static method in class org.archive.crawler.Heritrix
If the user hasn't altered the default logging parameters, tighten them up somewhat: some of our libraries are way too verbose at the INFO or WARNING levels.
PathDepthFilter - Class in org.archive.crawler.filter
Deprecated. As of release 1.10.0. Replaced by DecidingFilter and equivalent DecideRule.
PathDepthFilter(String) - Constructor for class org.archive.crawler.filter.PathDepthFilter
Deprecated.  
PathologicalPathDecideRule - Class in org.archive.crawler.deciderules
Rule REJECTs any URI which contains an excessive number of identical, consecutive path-segments (eg http://example.com/a/a/a/boo.html == 3 '/a' segments)
PathologicalPathDecideRule(String) - Constructor for class org.archive.crawler.deciderules.PathologicalPathDecideRule
Constructs a new PathologicalPathFilter.
PathologicalPathFilter - Class in org.archive.crawler.filter
Deprecated. As of release 1.10.0. Replaced by DecidingFilter and equivalent DecideRule.
PathologicalPathFilter(String) - Constructor for class org.archive.crawler.filter.PathologicalPathFilter
Deprecated. Constructs a new PathologicalPathFilter.
PathScope - Class in org.archive.crawler.scope
Deprecated. As of release 1.10.0. Replaced by DecidingScope.
PathScope(String) - Constructor for class org.archive.crawler.scope.PathScope
Deprecated.  
pattern - Variable in class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
 
PatternMatcherRecycler - Class in org.archive.util
Utility class to retain a compiled Pattern and multiple corresponding Matcher instances for reuse.
PatternMatcherRecycler(Pattern) - Constructor for class org.archive.util.PatternMatcherRecycler
 
pause() - Method in class org.archive.crawler.admin.CrawlJob
 
pause() - Method in interface org.archive.crawler.framework.Frontier
Notify Frontier that it should not release any URIs, instead holding all threads, until instructed otherwise.
pause() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
pause() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
PAUSE_OPER - Static variable in class org.archive.crawler.admin.CrawlJob
 
PAUSED - Static variable in class org.archive.crawler.framework.CrawlController
 
pauseJob() - Method in class org.archive.crawler.admin.CrawlJobHandler
Cause the current job to pause.
PAUSING - Static variable in class org.archive.crawler.framework.CrawlController
 
PDFParser - Class in org.archive.crawler.extractor
Supports PDF parsing operations.
PDFParser(String) - Constructor for class org.archive.crawler.extractor.PDFParser
 
PDFParser(byte[]) - Constructor for class org.archive.crawler.extractor.PDFParser
 
peek() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Returns the URI with the earliest time of next processing.
peek(WorkQueueFrontier) - Method in class org.archive.crawler.frontier.WorkQueue
Return the topmost queue item -- and remember it, such that even later higher-priority inserts don't change it.
peek() - Method in class org.archive.queue.MemQueue
 
peek() - Method in interface org.archive.queue.Queue
Give the top object in the queue, leaving it in place to be returned by future peek() or dequeue() invocations.
peek() - Method in interface org.archive.queue.Stack
Deprecated.  
peek() - Method in class org.archive.queue.StoredQueue
 
peekItem(WorkQueueFrontier) - Method in class org.archive.crawler.frontier.BdbWorkQueue
 
peekItem(WorkQueueFrontier) - Method in class org.archive.crawler.frontier.WorkQueue
Returns first item from queue (does not delete)
pend(long, CandidateURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
Place the given FP/CandidateURI pair into the pending set, awaiting a merge to determine if it's actually accepted.
pendDupAtLast - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
pendDuplicateCount - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
pending() - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Count of items added, but not yet filtered in or out.
pending() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
pending() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
PENDING_JOBS_OPER - Static variable in class org.archive.crawler.Heritrix
 
pendingDeletes - Static variable in class org.archive.util.FileUtils
 
pendingSet - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
items awaiting merge TODO: consider only sorting just pre-merge TODO: consider using a fastutil long->Object class TODO: consider actually writing items to disk file, as in Najork/Heydon
pendingUris - Variable in class org.archive.crawler.frontier.BdbFrontier
all URIs scheduled to be crawled
PERCENT_SIGN - Static variable in class org.archive.net.UURIFactory
 
percentOfDiscoveredUrisCompleted() - Method in class org.archive.crawler.admin.StatisticsTracker
This returns the number of completed URIs as a percentage of the total number of URIs encountered (should be inverse to the discovery curve)
performHeritrixShutDown() - Static method in class org.archive.crawler.Heritrix
Exit program.
performHeritrixShutDown(int) - Static method in class org.archive.crawler.Heritrix
Exit program.
persistKeyFor(CrawlURI) - Method in class org.archive.crawler.processor.recrawl.PersistProcessor
Return a preferred String key for persisting the given CrawlURI's AList state.
PersistLoadProcessor - Class in org.archive.crawler.processor.recrawl
Store CrawlURI attributes from latest fetch to persistent storage for consultation by a later recrawl.
PersistLoadProcessor(String) - Constructor for class org.archive.crawler.processor.recrawl.PersistLoadProcessor
Usual constructor
PersistLogProcessor - Class in org.archive.crawler.processor.recrawl
Log CrawlURI attributes from latest fetch for consultation by a later recrawl.
PersistLogProcessor(String) - Constructor for class org.archive.crawler.processor.recrawl.PersistLogProcessor
Usual constructor
PersistOnlineProcessor - Class in org.archive.crawler.processor.recrawl
Common superclass for persisting Processors which directly store/load to persistence (as opposed to logging for batch load later).
PersistOnlineProcessor(String, String) - Constructor for class org.archive.crawler.processor.recrawl.PersistOnlineProcessor
Usual constructor
PersistProcessor - Class in org.archive.crawler.processor.recrawl
Superclass for Processors which utilize BDB-JE for URI state (including most notably history) persistence.
PersistProcessor(String, String) - Constructor for class org.archive.crawler.processor.recrawl.PersistProcessor
Usual constructor
PersistStoreProcessor - Class in org.archive.crawler.processor.recrawl
Store CrawlURI attributes from latest fetch to persistent storage for consultation by a later recrawl.
PersistStoreProcessor(String) - Constructor for class org.archive.crawler.processor.recrawl.PersistStoreProcessor
Usual constructor
Piece - Class in org.archive.util.ms
 
Piece(int, int, int, boolean) - Constructor for class org.archive.util.ms.Piece
 
pieceFor(int) - Method in class org.archive.util.ms.PieceTable
Returns the piece containing the given character position.
PieceReader - Class in org.archive.util.ms
 
PieceReader(PieceTable, SeekInputStream) - Constructor for class org.archive.util.ms.PieceReader
 
PieceTable - Class in org.archive.util.ms
The piece table of a .doc file.
PieceTable(SeekInputStream, int, int, int) - Constructor for class org.archive.util.ms.PieceTable
Constructor.
PIPE - Static variable in class org.archive.net.UURIFactory
 
PIPE_PATTERN - Static variable in class org.archive.net.UURIFactory
 
PLACEHOLDER_RECORD_LENGTH_STRING - Static variable in interface org.archive.io.warc.WARCConstants
Placeholder for length in Header line.
policyFor(CrawlerSettings, BufferedReader, RobotsHonoringPolicy) - Static method in class org.archive.crawler.datamodel.RobotsExclusionPolicy
 
politenessDelayFor(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Update any scheduling structures with the new information in this CrawlURI.
poll() - Method in class org.archive.queue.StoredQueue
 
pop() - Method in interface org.archive.queue.Stack
Deprecated. Remove and return item from top of Stack
popAuxiliaryDirectory() - Method in class org.archive.io.ObjectPlusFilesInputStream
Discard the top auxiliary directory.
popAuxiliaryDirectory() - Method in class org.archive.io.ObjectPlusFilesOutputStream
Remove the top subdirectory.
populate(CrawlURI, HttpClient, HttpMethod, String) - Method in class org.archive.crawler.datamodel.credential.Credential
 
populate(CrawlURI, HttpClient, HttpMethod, String) - Method in class org.archive.crawler.datamodel.credential.HtmlFormCredential
 
populate(CrawlURI, HttpClient, HttpMethod, String) - Method in class org.archive.crawler.datamodel.credential.Rfc2617Credential
 
populatePersistEnv(String, File) - Static method in class org.archive.crawler.processor.recrawl.PersistProcessor
Populates a new environment db from an old environment db or a persist log.
PortnumberCriteria - Class in org.archive.crawler.settings.refinements
A refinement criterion that checks if a URI matches a specific port number.
PortnumberCriteria() - Constructor for class org.archive.crawler.settings.refinements.PortnumberCriteria
Create a new instance of PortnumberCriteria.
PortnumberCriteria(String) - Constructor for class org.archive.crawler.settings.refinements.PortnumberCriteria
Create a new instance of PortnumberCriteria.
PORTREGEX - Static variable in class org.archive.net.UURIFactory
Authority port number regex.
pos - Variable in class org.archive.io.RecyclingFastBufferedOutputStream
The current position in the buffer.
position() - Method in class org.archive.io.ArchiveReader.RandomAccessBufferedInputStream
 
position(long) - Method in class org.archive.io.ArchiveReader.RandomAccessBufferedInputStream
 
position - Variable in class org.archive.io.ArchiveRecord
Position w/i the Record content, within in.
position() - Method in class org.archive.io.ArraySeekInputStream
Returns the position of the stream.
position(long) - Method in class org.archive.io.ArraySeekInputStream
Repositions the stream.
position() - Method in class org.archive.io.BufferedSeekInputStream
Returns the stream's current position.
position(long) - Method in class org.archive.io.BufferedSeekInputStream
Seeks to the given position.
position(long) - Method in class org.archive.io.GzippedInputStream
Seek to passed offset.
position() - Method in class org.archive.io.GzippedInputStream
 
position() - Method in class org.archive.io.OriginSeekInputStream
Returns the position of the underlying stream relative to the origin.
position(long) - Method in class org.archive.io.OriginSeekInputStream
Positions the underlying stream relative to the origin.
position() - Method in class org.archive.io.RandomAccessInputStream
 
position(long) - Method in class org.archive.io.RandomAccessInputStream
 
position(long) - Method in class org.archive.io.ReplayInputStream
Reposition the stream.
position() - Method in class org.archive.io.ReplayInputStream
 
position(long) - Method in class org.archive.io.RepositionableInputStream
 
position() - Method in class org.archive.io.RepositionableInputStream
 
position(long) - Method in class org.archive.io.SafeSeekInputStream
 
position() - Method in class org.archive.io.SafeSeekInputStream
 
position() - Method in class org.archive.util.ms.BlockInputStream
 
position(long) - Method in class org.archive.util.ms.BlockInputStream
 
position() - Method in class org.archive.util.ms.PieceReader
 
position(long) - Method in class org.archive.util.ms.PieceReader
 
postDeregister() - Method in class org.archive.crawler.admin.CrawlJob
 
postDeregister() - Method in class org.archive.crawler.Heritrix
 
postRegister(Boolean) - Method in class org.archive.crawler.admin.CrawlJob
 
postRegister(Boolean) - Method in class org.archive.crawler.Heritrix
 
postRestoreTasks - Variable in class org.archive.io.ObjectPlusFilesInputStream
 
postWriteRecordTasks() - Method in class org.archive.io.WriterPoolMember
Post file write tasks.
power - Variable in class org.archive.util.BloomFilter64bit
if bitfield is an exact power of 2 in length, it is this power
PreconditionEnforcer - Class in org.archive.crawler.prefetch
Ensures the preconditions for a fetch -- such as DNS lookup or acquiring and respecting a robots.txt policy -- are satisfied before a URI is passed to subsequent stages.
PreconditionEnforcer(String) - Constructor for class org.archive.crawler.prefetch.PreconditionEnforcer
 
preDeregister() - Method in class org.archive.crawler.admin.CrawlJob
 
preDeregister() - Method in class org.archive.crawler.Heritrix
 
PredicatedDecideRule - Class in org.archive.crawler.deciderules
Rule which applies the configured decision only if a test evaluates to true.
PredicatedDecideRule(String) - Constructor for class org.archive.crawler.deciderules.PredicatedDecideRule
 
prefixFrom(String) - Method in class org.archive.crawler.deciderules.OnDomainsDecideRule
 
prefixFrom(String) - Method in class org.archive.crawler.deciderules.OnHostsDecideRule
 
prefixFrom(String) - Method in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
 
prefixFromPlain(String) - Static method in class org.archive.util.SurtPrefixSet
Given a plain URI or hostname/hostname+path, deduce an implied SURT prefix from it.
PrefixSet - Class in org.archive.util
Utility class for maintaining sorted set of string prefixes.
PrefixSet() - Constructor for class org.archive.util.PrefixSet
 
PreJ15Utils - Class in org.archive.util
Deprecated. Will be removed post 1.10.0 Heritrix.
PreJ15Utils() - Constructor for class org.archive.util.PreJ15Utils
Deprecated.  
preNext(long) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
prepareHeritrixShutDown() - Static method in class org.archive.crawler.Heritrix
Prepars for program shutdown.
PREPARING - Static variable in class org.archive.crawler.framework.CrawlController
 
prepend(char) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.LumpyString
Prepends one character, as a lump, to this string.
preRegister(MBeanServer, ObjectName) - Method in class org.archive.crawler.admin.CrawlJob
 
preRegister(MBeanServer, ObjectName) - Method in class org.archive.crawler.Heritrix
 
PREREQ_HOP - Static variable in class org.archive.crawler.extractor.Link
implied prerequisite links, like dns or robots
PREREQ_MISC - Static variable in class org.archive.crawler.extractor.Link
stand-in value for prerequisite without other context
PrerequisiteAcceptDecideRule - Class in org.archive.crawler.deciderules
Rule which ACCEPTs all 'prerequisite' URIs (those with a 'P' in the last hopsPath position).
PrerequisiteAcceptDecideRule(String) - Constructor for class org.archive.crawler.deciderules.PrerequisiteAcceptDecideRule
 
Preselector - Class in org.archive.crawler.prefetch
If set to recheck the crawl's scope, gives a yes/no on whether a CrawlURI should be processed at all.
Preselector(String) - Constructor for class org.archive.crawler.prefetch.Preselector
Constructor.
preWriteRecordTasks() - Method in class org.archive.io.WriterPoolMember
Post write tasks.
primaryKeyBinding - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
A binding for the serialization of the primary key (URI string)
primaryUriDB - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Database containing the URI priority queue, indexed by the the URI string.
printOutSeeds(SettingsHandler, String) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
Print complete seeds list on passed in PrintWriter.
printOutSeeds(SettingsHandler, Writer) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
Print complete seeds list on passed in PrintWriter.
printStackTrace() - Method in exception org.archive.io.RecoverableIOException
 
printStackTrace(PrintStream) - Method in exception org.archive.io.RecoverableIOException
 
printStackTrace(PrintWriter) - Method in exception org.archive.io.RecoverableIOException
 
printUsage(PrintWriter, int, String) - Method in class org.archive.crawler.CommandLineParser.HeritrixHelpFormatter
 
printUsage(PrintWriter, int, String, Options) - Method in class org.archive.crawler.CommandLineParser.HeritrixHelpFormatter
 
PRIORITY_AVERAGE - Static variable in class org.archive.crawler.admin.CrawlJob
average
PRIORITY_CRITICAL - Static variable in class org.archive.crawler.admin.CrawlJob
highest
PRIORITY_HIGH - Static variable in class org.archive.crawler.admin.CrawlJob
high
PRIORITY_LOW - Static variable in class org.archive.crawler.admin.CrawlJob
low
PRIORITY_MINIMAL - Static variable in class org.archive.crawler.admin.CrawlJob
lowest
process(CrawlURI) - Method in class org.archive.crawler.framework.Processor
Perform processing on the given CrawlURI.
processBdbLogs(File, String) - Method in class org.archive.crawler.framework.CrawlController
 
processedBytesAfterLastEmittedURI - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
processedDocsPerSec - Variable in class org.archive.crawler.admin.StatisticsSummary
 
processedDocsPerSec() - Method in class org.archive.crawler.admin.StatisticsTracker
 
processedDocsPerSec() - Method in interface org.archive.crawler.framework.StatisticsTracking
Returns the number of documents that have been processed per second over the life of the crawl (as of last snapshot)
processedKBPerSec() - Method in class org.archive.crawler.admin.StatisticsTracker
 
processedKBPerSec() - Method in interface org.archive.crawler.framework.StatisticsTracking
Calculates the rate that data, in kb, has been processed over the life of the crawl (as of last snapshot.)
processedSeedsRecords - Variable in class org.archive.crawler.admin.StatisticsSummary
Keep track of processed seeds
processedSeedsRecords - Variable in class org.archive.crawler.admin.StatisticsTracker
Record of seeds' latest actions.
processEmbed(CrawlURI, CharSequence, CharSequence) - Method in class org.archive.crawler.extractor.ExtractorHTML
 
processEmbed(CrawlURI, CharSequence, CharSequence, char) - Method in class org.archive.crawler.extractor.ExtractorHTML
 
processEmbed(CharSequence, CharSequence) - Method in class org.archive.extractor.RegexpHTMLLinkExtractor
 
processForm(CrawlURI, Element) - Method in class org.archive.crawler.extractor.JerichoExtractorHTML
 
processGeneralTag(CrawlURI, CharSequence, CharSequence) - Method in class org.archive.crawler.extractor.ExtractorHTML
 
processGeneralTag(CrawlURI, Element, Attributes) - Method in class org.archive.crawler.extractor.JerichoExtractorHTML
 
processGeneralTag(CharSequence, CharSequence) - Method in class org.archive.extractor.RegexpHTMLLinkExtractor
 
processingCleanup() - Method in class org.archive.crawler.datamodel.CrawlURI
Clean up after a run through the processing chain.
processingUriDB - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
A database containing those URIs that are currently being processed.
processLink(CrawlURI, CharSequence, CharSequence) - Method in class org.archive.crawler.extractor.ExtractorHTML
Handle generic HREF cases.
processLink(CharSequence, CharSequence) - Method in class org.archive.extractor.RegexpHTMLLinkExtractor
 
processMeta(CrawlURI, CharSequence) - Method in class org.archive.crawler.extractor.ExtractorHTML
Process metadata tags.
processMeta(CrawlURI, Element) - Method in class org.archive.crawler.extractor.JerichoExtractorHTML
 
processMeta(CharSequence) - Method in class org.archive.extractor.RegexpHTMLLinkExtractor
 
Processor - Class in org.archive.crawler.framework
Base class for URI processing classes.
Processor(String, String) - Constructor for class org.archive.crawler.framework.Processor
 
ProcessorChain - Class in org.archive.crawler.framework
This class groups together a number of processors that logically fit together.
ProcessorChain(MapType) - Constructor for class org.archive.crawler.framework.ProcessorChain
Construct a new processor chain.
ProcessorChainList - Class in org.archive.crawler.framework
A list of all the ProcessorChains.
ProcessorChainList(CrawlOrder) - Constructor for class org.archive.crawler.framework.ProcessorChainList
Constructs a new ProcessorChainList.
processorCount() - Method in class org.archive.crawler.framework.ProcessorChainList
Get the total number of all processors in all the chains.
PROCESSORS_REPORT - Static variable in class org.archive.crawler.framework.CrawlController
 
processScript(CrawlURI, CharSequence, int) - Method in class org.archive.crawler.extractor.AggressiveExtractorHTML
 
processScript(CrawlURI, CharSequence, int) - Method in class org.archive.crawler.extractor.ExtractorHTML
 
processScript(CrawlURI, Element) - Method in class org.archive.crawler.extractor.JerichoExtractorHTML
 
processScript(CharSequence, int) - Method in class org.archive.extractor.RegexpHTMLLinkExtractor
 
processScriptCode(CrawlURI, CharSequence) - Method in class org.archive.crawler.extractor.ExtractorHTML
Extract the (java)script source in the given CharSequence.
processScriptCode(CharSequence) - Method in class org.archive.extractor.RegexpHTMLLinkExtractor
 
processStyle(CrawlURI, CharSequence, int) - Method in class org.archive.crawler.extractor.ExtractorHTML
Process style text.
processStyle(CrawlURI, Element) - Method in class org.archive.crawler.extractor.JerichoExtractorHTML
 
processStyle(CharSequence, int) - Method in class org.archive.extractor.RegexpHTMLLinkExtractor
 
processStyleCode(CrawlURI, CharSequence, CrawlController) - Static method in class org.archive.crawler.extractor.ExtractorCSS
 
processURIString(String) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorSWFActions
 
ProcessUtils - Class in org.archive.util
Class to run an external process.
ProcessUtils() - Constructor for class org.archive.util.ProcessUtils
 
ProcessUtils.ProcessResult - Class in org.archive.util
Data structure to hold result of a process exec.
ProcessUtils.ProcessResult(String[], int, String, String) - Constructor for class org.archive.util.ProcessUtils.ProcessResult
 
ProcessUtils.StreamGobbler - Class in org.archive.util
Thread to gobble up an output stream.
ProcessUtils.StreamGobbler(InputStream, String) - Constructor for class org.archive.util.ProcessUtils.StreamGobbler
 
processXml(CrawlURI, CharSequence, CrawlController) - Static method in class org.archive.crawler.extractor.ExtractorXML
 
PROFILE_REVISIT_IDENTICAL_DIGEST - Static variable in interface org.archive.io.warc.WARCConstants
 
PROFILE_REVISIT_NOT_MODIFIED - Static variable in interface org.archive.io.warc.WARCConstants
 
profileLog - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
profileLog(String) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
profileLog - Variable in class org.archive.crawler.util.SetBasedUriUniqFilter
 
profileLog(String) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
PROFILES_DIR_NAME - Static variable in class org.archive.crawler.admin.CrawlJobHandler
Name of the profiles directory.
PROG_STATS - Static variable in class org.archive.crawler.admin.CrawlJob
 
PROGRESS_STATISTICS_LEGEND_OPER - Static variable in class org.archive.crawler.admin.CrawlJob
 
PROGRESS_STATISTICS_OPER - Static variable in class org.archive.crawler.admin.CrawlJob
 
progressStatisticsEvent(EventObject) - Method in class org.archive.crawler.admin.CrawlJob.MBeanCrawlController
 
progressStatisticsEvent(EventObject) - Method in class org.archive.crawler.admin.StatisticsTracker
 
progressStatisticsEvent(EventObject) - Method in class org.archive.crawler.framework.AbstractTracker
A method for logging current crawler state.
progressStatisticsEvent(EventObject) - Method in class org.archive.crawler.framework.CrawlController
Called whenever progress statistics logging event.
progressStatisticsLegend() - Method in class org.archive.crawler.framework.AbstractTracker
 
progressStatisticsLegend() - Method in interface org.archive.crawler.framework.StatisticsTracking
 
progressStatisticsLegend(PrintWriter) - Method in class org.archive.crawler.framework.ToeThread
 
progressStatisticsLegend(PrintWriter) - Method in interface org.archive.util.ProgressStatisticsReporter
 
progressStatisticsLine(PrintWriter) - Method in class org.archive.crawler.framework.ToeThread
 
progressStatisticsLine(PrintWriter) - Method in interface org.archive.util.ProgressStatisticsReporter
 
ProgressStatisticsReporter - Interface in org.archive.util
 
PROPERTIES - Static variable in class org.archive.crawler.Heritrix
Name of the heritrix properties file.
PROPERTIES_KEY - Static variable in class org.archive.crawler.Heritrix
Name of the key to use specifying alternate heritrix properties on command line.
PropertyUtils - Class in org.archive.util
 
PropertyUtils() - Constructor for class org.archive.util.PropertyUtils
 
protocolCommandSent(ProtocolCommandEvent) - Method in class org.archive.net.ClientFTP
 
protocolReplyReceived(ProtocolCommandEvent) - Method in class org.archive.net.ClientFTP
 
PublicSuffixes - Class in org.archive.net
Utility class for making use of the information about 'public suffixes' at http://publicsuffix.org.
PublicSuffixes() - Constructor for class org.archive.net.PublicSuffixes
 
publish(LogRecord) - Method in class org.archive.io.SinkHandler
 
push(String) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorSWFActions
 
push(Object) - Method in interface org.archive.queue.Stack
Deprecated. Add object to top of Stack
pushAuxiliaryDirectory(String) - Method in class org.archive.io.ObjectPlusFilesInputStream
Push another default storage directory for use until popped.
pushAuxiliaryDirectory(String) - Method in class org.archive.io.ObjectPlusFilesOutputStream
Add another subdirectory for any file-capture needs during the current serialization.
put(CrawlURI, boolean) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Put the given CrawlURI in at the appropriate place.
put(String, Object) - Method in class org.archive.crawler.settings.DataContainer
 
put(String, MBeanAttributeInfo, Object) - Method in class org.archive.crawler.settings.DataContainer
 
put(String, CrawlerSettings) - Method in class org.archive.crawler.settings.SoftSettingsHash
Associates the specified settings object with the specified key in this hash.
put(SoftSettingsHash.SettingsEntry) - Method in class org.archive.crawler.settings.SoftSettingsHash
 
put(K, V) - Method in class org.archive.util.CachedBdbMap
Deprecated. Map.put() implementation.
putIfAbsent(K, V) - Method in class org.archive.util.CachedBdbMap
Deprecated. A composite putIfAbsent() over memMap and diskMap.
putInt(String, int) - Method in class org.archive.crawler.datamodel.CandidateURI
 
putLong(String, long) - Method in class org.archive.crawler.datamodel.CandidateURI
 
putObject(String, Object) - Method in class org.archive.crawler.datamodel.CandidateURI
 
putSettings(String, CrawlerSettings) - Method in class org.archive.crawler.settings.SettingsCache
Add a settings object to the cache.
putString(String, String) - Method in class org.archive.crawler.datamodel.CandidateURI
 

Q

qualifyRecordID(URI, String, String) - Method in class org.archive.crawler.writer.WARCWriterProcessor
 
qualifyRecordID(URI, Map<String, String>) - Method in interface org.archive.uid.Generator
Append (or if already present, update) qualifiers to passed recordId.
qualifyRecordID(URI, Map<String, String>) - Method in class org.archive.uid.GeneratorFactory
 
qualifyRecordID(URI, Map<String, String>) - Method in class org.archive.uid.UUIDGenerator
 
QUERY_SAFE - Static variable in class org.archive.net.LaxURLCodec
 
Queue<T> - Interface in org.archive.queue
An Abstract queue.
queue - Variable in class org.archive.queue.QueueTestBase
the queue object to be tested
QueueAssignmentPolicy - Class in org.archive.crawler.frontier
Establishes a mapping from CrawlURIs to String keys (queue names).
QueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.QueueAssignmentPolicy
 
QueueCat - Class in org.archive.queue
Command-line tool that displays serialized object streams in a line-oriented format.
QueueCat() - Constructor for class org.archive.queue.QueueCat
 
queueDb - Variable in class org.archive.queue.StoredQueue
 
queuedUriCount - Variable in class org.archive.crawler.admin.StatisticsTracker
 
queuedUriCount() - Method in class org.archive.crawler.admin.StatisticsTracker
Number of URIs queued up and waiting for processing.
queuedUriCount() - Method in interface org.archive.crawler.framework.Frontier
Number of URIs queued up and waiting for processing.
queuedUriCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
queuedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
(non-Javadoc)
queuedUriCount() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
queueMap - Variable in class org.archive.queue.StoredQueue
 
QueueOverbudgetDecideRule - Class in org.archive.crawler.deciderules
Applies configured decision to every candidate URI that would overbudget its queue.
QueueOverbudgetDecideRule(String) - Constructor for class org.archive.crawler.deciderules.QueueOverbudgetDecideRule
 
QueueTestBase - Class in org.archive.queue
JUnit test suite for Queue.
QueueTestBase(String) - Constructor for class org.archive.queue.QueueTestBase
Create a new PaddingStringBufferTest object
quickCache - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
cache of most recently seen FPs
quickContains(long) - Method in class org.archive.util.AbstractLongFPSet
Low-cost, non-definitive (except when true) contains test.
quickContains(long) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
quickContains(long) - Method in interface org.archive.util.fingerprint.LongFPSet
Do a contains() check that doesn't require laggy activity (eg disk IO).
quickContains(long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
quickDupAtLast - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
quickDuplicateCount - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
QUOT - Static variable in class org.archive.net.UURIFactory
 
QuotaEnforcer - Class in org.archive.crawler.prefetch
A simple quota enforcer.
QuotaEnforcer(String) - Constructor for class org.archive.crawler.prefetch.QuotaEnforcer
Constructor.

R

raAppend(int, String) - Method in class org.archive.util.PaddingStringBuffer
Append a string, right-aligned to the given columm.
raAppend(int, int) - Method in class org.archive.util.PaddingStringBuffer
Append an int right-aligned to the given column.
raAppend(int, long) - Method in class org.archive.util.PaddingStringBuffer
Append a long, right-aligned to the given column.
raf - Variable in class org.archive.io.RandomAccessOutputStream
 
RandomAccessInputStream - Class in org.archive.io
Wraps a RandomAccessFile with an InputStream interface.
RandomAccessInputStream(RandomAccessFile) - Constructor for class org.archive.io.RandomAccessInputStream
Constructor.
RandomAccessInputStream(File) - Constructor for class org.archive.io.RandomAccessInputStream
Constructor.
RandomAccessInputStream(File, long) - Constructor for class org.archive.io.RandomAccessInputStream
Constructor.
RandomAccessInputStream(RandomAccessFile, boolean, long) - Constructor for class org.archive.io.RandomAccessInputStream
 
RandomAccessOutputStream - Class in org.archive.io
Wraps a RandomAccessFile with OutputStream interface.
RandomAccessOutputStream(RandomAccessFile) - Constructor for class org.archive.io.RandomAccessOutputStream
Wrap the given RandomAccessFile
RANGE - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
RANGE_PREFIX - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
RCURBRACKET - Static variable in class org.archive.net.UURIFactory
 
RCURBRACKET_PATTERN - Static variable in class org.archive.net.UURIFactory
 
read(String) - Method in interface org.archive.crawler.framework.AlertManager
 
read() - Method in class org.archive.io.arc.ARCRecord
 
read(byte[], int, int) - Method in class org.archive.io.arc.ARCRecord
 
read() - Method in class org.archive.io.ArchiveRecord
 
read(byte[], int, int) - Method in class org.archive.io.ArchiveRecord
 
read() - Method in class org.archive.io.ArraySeekInputStream
 
read(byte[], int, int) - Method in class org.archive.io.ArraySeekInputStream
 
read(byte[]) - Method in class org.archive.io.ArraySeekInputStream
 
read() - Method in class org.archive.io.BufferedSeekInputStream
 
read(byte[], int, int) - Method in class org.archive.io.BufferedSeekInputStream
 
read(byte[]) - Method in class org.archive.io.BufferedSeekInputStream
 
read() - Method in class org.archive.io.CompositeFileInputStream
 
read(byte[], int, int) - Method in class org.archive.io.CompositeFileInputStream
 
read(byte[]) - Method in class org.archive.io.CompositeFileInputStream
 
read() - Method in class org.archive.io.OriginSeekInputStream
 
read(byte[], int, int) - Method in class org.archive.io.OriginSeekInputStream
 
read(byte[]) - Method in class org.archive.io.OriginSeekInputStream
 
read() - Method in class org.archive.io.RandomAccessInputStream
 
read(byte[], int, int) - Method in class org.archive.io.RandomAccessInputStream
 
read(byte[]) - Method in class org.archive.io.RandomAccessInputStream
 
read() - Method in class org.archive.io.RecordingInputStream
 
read(byte[], int, int) - Method in class org.archive.io.RecordingInputStream
 
read(byte[]) - Method in class org.archive.io.RecordingInputStream
 
read() - Method in class org.archive.io.ReplayInputStream
 
read(byte[], int, int) - Method in class org.archive.io.ReplayInputStream
 
read(byte[]) - Method in class org.archive.io.RepositionableInputStream
 
read(byte[], int, int) - Method in class org.archive.io.RepositionableInputStream
 
read() - Method in class org.archive.io.RepositionableInputStream
 
read() - Method in class org.archive.io.SafeSeekInputStream
 
read(byte[], int, int) - Method in class org.archive.io.SafeSeekInputStream
 
read(byte[]) - Method in class org.archive.io.SafeSeekInputStream
 
read(long) - Method in class org.archive.io.SinkHandler
 
read - Variable in class org.archive.io.SinkHandlerLogRecord
 
read() - Method in class org.archive.util.ms.BlockInputStream
 
read(byte[], int, int) - Method in class org.archive.util.ms.BlockInputStream
 
read(byte[]) - Method in class org.archive.util.ms.BlockInputStream
 
read() - Method in class org.archive.util.ms.PieceReader
 
read(char[], int, int) - Method in class org.archive.util.ms.PieceReader
 
readAlert(String) - Method in class org.archive.crawler.Heritrix
 
readByte(InputStream) - Method in class org.archive.io.GzipHeader
Read a byte.
readByte(InputStream, CRC32) - Method in class org.archive.io.GzipHeader
Read a byte.
readByte(InputStream, CRC32, byte[], int, int) - Method in class org.archive.io.GzipHeader
Read a byte.
readContentTo(OutputStream) - Method in class org.archive.io.ReplayInputStream
 
readContentTo(OutputStream, int) - Method in class org.archive.io.ReplayInputStream
 
readCrawlReport() - Method in class org.archive.crawler.admin.StatisticsSummary
Reads duration time, processed docs/sec, bandwidth, and total size of crawl from crawl-report.txt.
reader - Variable in class org.archive.util.iterator.LineReadingIterator
 
READER_IDENTIFIER_FIELD_KEY - Static variable in interface org.archive.io.ArchiveFileConstants
 
readExpectedChar(InputStream, int) - Method in class org.archive.io.warc.WARCReader
 
readFileAsString(File) - Static method in class org.archive.util.FileUtils
Utility method to read an entire file as a String.
readFully() - Method in class org.archive.io.RecordingInputStream
 
readFully(InputStream, byte[]) - Static method in class org.archive.util.IoUtils
 
readFullyAsString(InputStream) - Static method in class org.archive.util.IoUtils
Read the entire stream to EOF, returning what's read as a String.
readFullyFrom(InputStream, long, byte[]) - Method in class org.archive.io.WriterPoolMember
Deprecated. Use WriterPoolMember.copyFrom(InputStream,long,boolean) instead
readFullyOrUntil(long) - Method in class org.archive.io.RecordingInputStream
Read all of a stream (Or read until we timeout or have read to the max).
readFullyTo(OutputStream) - Method in class org.archive.io.ReplayInputStream
 
readFullyToFile(InputStream, File) - Static method in class org.archive.util.IoUtils
Read the entire stream to EOF into the passed file.
readFullyToFile(InputStream, File, byte[]) - Static method in class org.archive.util.IoUtils
Read the entire stream to EOF into the passed file.
readHeader(InputStream) - Method in class org.archive.io.GzipHeader
Read in gzip header.
readHeader() - Method in class org.archive.io.GzippedInputStream
Read in the gzip header.
readHeaderTo(OutputStream) - Method in class org.archive.io.ReplayInputStream
 
readMaxValues(Object) - Method in class org.archive.crawler.filter.TransclusionFilter
Deprecated.  
readObjectFromFile(Class<T>, File) - Static method in class org.archive.crawler.util.CheckpointUtils
 
readObjectFromFile(Class<T>, String, File) - Static method in class org.archive.crawler.util.CheckpointUtils
 
readOneTag() - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorSWFReader
Override because a corrupt SWF file can cause us to try read lengths that are hundreds of megabytes in size causing us to OOME.
readPrefixes() - Method in class org.archive.crawler.deciderules.OnDomainsDecideRule
Patch the SURT prefix set so that it only includes host-enforcing prefixes
readPrefixes() - Method in class org.archive.crawler.deciderules.OnHostsDecideRule
Patch the SURT prefix set so that it only includes host-enforcing prefixes
readPrefixes(Object) - Method in class org.archive.crawler.deciderules.ScopePlusOneDecideRule
Patch the SURT prefix set so that it only includes the appropriate prefixes.
readPrefixes() - Method in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
 
readPublishedFileToSurtList(BufferedReader) - Static method in class org.archive.net.PublicSuffixes
Reads a file of the format promulgated by publicsuffix.org, ignoring comments and '!' exceptions/notations, converting domain segments to SURT-ordering.
readResponseBody(HttpState, HttpConnection) - Method in class org.archive.httpclient.HttpRecorderGetMethod
 
readResponseBody(HttpState, HttpConnection) - Method in class org.archive.httpclient.HttpRecorderPostMethod
 
readSettingsObject(CrawlerSettings) - Method in class org.archive.crawler.settings.SettingsHandler
Read the CrawlerSettings object from persistent storage.
readSettingsObject(CrawlerSettings, File) - Method in class org.archive.crawler.settings.XMLSettingsHandler
Read the CrawlerSettings object from a specific file.
readSettingsObject(CrawlerSettings) - Method in class org.archive.crawler.settings.XMLSettingsHandler
 
readToLimitFrom(InputStream, long, byte[]) - Method in class org.archive.io.WriterPoolMember
Deprecated. Use WriterPoolMember.copyFrom(InputStream,long,boolean) instead
readUuri(String) - Method in class org.archive.crawler.datamodel.CandidateURI
Read a UURI from a String, handling a null or URIException
readValid() - Method in class org.archive.crawler.datamodel.Checkpoint
 
readyClassQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
All per-class queues whose first item may be handed out.
readyFiller - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
single-thread access to ready-filling code
readyHosts() - Method in interface org.archive.crawler.framework.FrontierHostStatistics
Total number of hosts that have a URI ready for processing.
REBIND_JNDI_OPER - Static variable in class org.archive.crawler.Heritrix
 
receive(CandidateURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter.HasUriReceiver
 
receive(CandidateURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
receive(CandidateURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Accept the given CandidateURI for scheduling, as it has passed the alreadyIncluded filter.
receive(CandidateURI) - Method in class org.archive.crawler.util.BenchmarkUriUniqFilters
 
receiver - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
receiver - Variable in class org.archive.crawler.util.SetBasedUriUniqFilter
 
RECORD_IDENTIFIER_FIELD_KEY - Static variable in interface org.archive.io.ArchiveFileConstants
Key for the Archive File record field.
recordControlMessage(String, String) - Method in class org.archive.net.ClientFTP
 
recordDNS(CrawlURI, Record[]) - Method in class org.archive.crawler.fetcher.FetchDNS
 
RecorderIOException - Exception in org.archive.io
 
RecorderIOException() - Constructor for exception org.archive.io.RecorderIOException
 
RecorderIOException(String) - Constructor for exception org.archive.io.RecorderIOException
 
RecorderLengthExceededException - Exception in org.archive.io
Indicates a length exception thrown by the Recorder.
RecorderLengthExceededException() - Constructor for exception org.archive.io.RecorderLengthExceededException
 
RecorderLengthExceededException(String) - Constructor for exception org.archive.io.RecorderLengthExceededException
 
RecorderTimeoutException - Exception in org.archive.io
Indicates a timeout thrown by the RecordingInputStream.
RecorderTimeoutException() - Constructor for exception org.archive.io.RecorderTimeoutException
 
RecorderTimeoutException(String) - Constructor for exception org.archive.io.RecorderTimeoutException
 
RecorderTooMuchHeaderException - Exception in org.archive.io
Indicates a too much header material exception thrown by the Recorder (specificially the RecordingOutputStream)
RecorderTooMuchHeaderException() - Constructor for exception org.archive.io.RecorderTooMuchHeaderException
 
RecorderTooMuchHeaderException(String) - Constructor for exception org.archive.io.RecorderTooMuchHeaderException
 
RecordingInputStream - Class in org.archive.io
Stream which records all data read from it, which it acquires from a wrapped input stream.
RecordingInputStream(int, String) - Constructor for class org.archive.io.RecordingInputStream
Create a new RecordingInputStream.
RecordingOutputStream - Class in org.archive.io
An output stream that records all writes to wrapped output stream.
RecordingOutputStream(int, String) - Constructor for class org.archive.io.RecordingOutputStream
Create a new RecordingOutputStream.
recover(CrawlController) - Method in class org.archive.crawler.framework.Checkpointer
Call when recovering from a checkpoint.
RECOVER_LOG - Static variable in class org.archive.crawler.admin.CrawlJobHandler
String to indicate recovery should be based on the recovery log, not based on checkpointing.
RecoverableIOException - Exception in org.archive.io
A decorator on IOException for IOEs that are likely not fatal or at least merit retry.
RecoverableIOException(String) - Constructor for exception org.archive.io.RecoverableIOException
 
RecoverableIOException(IOException) - Constructor for exception org.archive.io.RecoverableIOException
 
RECOVERY_JOURNAL_STYLE - Static variable in class org.archive.crawler.admin.CrawlJob
 
RecoveryJournal - Class in org.archive.crawler.frontier
Helper class for managing a simple Frontier change-events journal which is useful for recovering from crawl problems.
RecoveryJournal(String, String) - Constructor for class org.archive.crawler.frontier.RecoveryJournal
Create a new recovery journal at the given location
RecoveryLogMapper - Class in org.archive.crawler.util
 
RecoveryLogMapper(String) - Constructor for class org.archive.crawler.util.RecoveryLogMapper
Normal constructor - if encounter not-found seeds while loading recoverLogFileName, will throw throw SeedUrlNotFoundException.
RecoveryLogMapper(String, String) - Constructor for class org.archive.crawler.util.RecoveryLogMapper
Constructor to use if you want to allow not-found seeds, logging them to seedNotFoundLogFileName.
recycleMatcher(Matcher) - Static method in class org.archive.util.TextUtils
 
RecyclingFastBufferedOutputStream - Class in org.archive.io
Lightweight, unsynchronised, aligned output stream buffering class.
RecyclingFastBufferedOutputStream(OutputStream, byte[]) - Constructor for class org.archive.io.RecyclingFastBufferedOutputStream
Creates a new fast buffered output stream by wrapping a given output stream, using a given buffer
RecyclingFastBufferedOutputStream(OutputStream, int) - Constructor for class org.archive.io.RecyclingFastBufferedOutputStream
Creates a new fast buffered output stream by wrapping a given output stream with a given buffer size.
RecyclingFastBufferedOutputStream(OutputStream) - Constructor for class org.archive.io.RecyclingFastBufferedOutputStream
Creates a new fast buffered ouptut stream by wrapping a given output stream with a buffer of RecyclingFastBufferedOutputStream.DEFAULT_BUFFER_SIZE bytes.
RecyclingSerialBinding - Class in org.archive.crawler.frontier
A SerialBinding that recycles a single FastOutputStream per thread, avoiding reallocation of the internal buffer for either repeated serializations or because of mid-serialization expansions.
RecyclingSerialBinding(ClassCatalog, Class) - Constructor for class org.archive.crawler.frontier.RecyclingSerialBinding
Constructor.
reducePattern - Variable in class org.archive.crawler.processor.HashCrawlMapper
 
reduceSurtToTopmostAssigned(String) - Static method in class org.archive.net.PublicSuffixes
Truncate SURT to its topmost assigned domain segment; that is, the public suffix plus one segment, but as a SURT-ordered prefix.
REFER_HOP - Static variable in class org.archive.crawler.extractor.Link
referral/redirect links, like header 'Location:' on a 301/302 response
referentField - Static variable in class org.archive.util.CachedBdbMap
Deprecated. Reference to the Reference#referent Field.
referentField - Static variable in class org.archive.util.ObjectIdentityBdbCache
Reference to the Reference#referent Field.
REFERER - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
RefinedScope - Class in org.archive.crawler.scope
Superclass for Scopes which make use of "additional focus" to add items by pattern, or want to swap in alternative transitive filter.
RefinedScope(String) - Constructor for class org.archive.crawler.scope.RefinedScope
 
Refinement - Class in org.archive.crawler.settings.refinements
This class acts as a mapping between refinement criterias and a settings object.
Refinement(CrawlerSettings, String) - Constructor for class org.archive.crawler.settings.refinements.Refinement
Create a new instance of Refinement
Refinement(CrawlerSettings, String, String) - Constructor for class org.archive.crawler.settings.refinements.Refinement
Create a new instance of Refinement
refinementsIterator() - Method in class org.archive.crawler.settings.CrawlerSettings
Get an ListIterator over the refinements for this settings object.
refQueue - Variable in class org.archive.util.CachedBdbMap
Deprecated.  
refQueue - Variable in class org.archive.util.ObjectIdentityBdbCache
 
refreshHostToSettings() - Method in class org.archive.crawler.settings.SettingsCache
Make sure that no host strings points to wrong settings.
refreshSeeds() - Method in class org.archive.crawler.framework.CrawlScope
Refresh seeds.
refreshSeeds() - Method in class org.archive.crawler.scope.SeedCachingScope
 
refund(int) - Method in class org.archive.crawler.frontier.WorkQueue
A URI should not have been charged against queue (eg it was disregarded); return the amount expended
RegexpCSSLinkExtractor - Class in org.archive.extractor
This extractor is parsing URIs from CSS type files.
RegexpCSSLinkExtractor() - Constructor for class org.archive.extractor.RegexpCSSLinkExtractor
 
RegexpHTMLLinkExtractor - Class in org.archive.extractor
Basic link-extraction, from an HTML content-body, using regular expressions.
RegexpHTMLLinkExtractor() - Constructor for class org.archive.extractor.RegexpHTMLLinkExtractor
 
RegexpJSLinkExtractor - Class in org.archive.extractor
Uses regular expressions to find likely URIs inside Javascript.
RegexpJSLinkExtractor() - Constructor for class org.archive.extractor.RegexpJSLinkExtractor
 
RegexpLineIterator - Class in org.archive.util.iterator
Utility class providing an Iterator interface over line-oriented text input.
RegexpLineIterator(Iterator<String>, String, String, String) - Constructor for class org.archive.util.iterator.RegexpLineIterator
 
RegexRule - Class in org.archive.crawler.url.canonicalize
General conversion rule.
RegexRule(String) - Constructor for class org.archive.crawler.url.canonicalize.RegexRule
 
RegexRule(String, String, String) - Constructor for class org.archive.crawler.url.canonicalize.RegexRule
 
registerContainerJndi() - Static method in class org.archive.crawler.Heritrix
 
registeredCrawlURIDispositionListeners - Variable in class org.archive.crawler.framework.CrawlController
 
registerFinishTask(Runnable) - Method in class org.archive.io.ObjectPlusFilesInputStream
Register a task to be done when the ObjectPlusFilesInputStream is closed.
registerHeritrix(Heritrix, String, boolean) - Static method in class org.archive.crawler.Heritrix
Register Heritrix with JNDI, JMX, and with the static hashtable of all Heritrix instances known to this JVM.
registerJndi(ObjectName) - Static method in class org.archive.crawler.Heritrix
 
registerMBean(Object, String, String) - Static method in class org.archive.crawler.Heritrix
 
registerMBean(MBeanServer, Object, String, String) - Static method in class org.archive.crawler.Heritrix
 
registerMBean(MBeanServer, Object, ObjectName) - Static method in class org.archive.crawler.Heritrix
 
registerValueErrorHandler(ValueErrorHandler) - Method in class org.archive.crawler.settings.SettingsHandler
Register an instance of ValueErrorHandler.
RegularExpressionConstraint - Class in org.archive.crawler.settings
A constraint that checks that a value matches a regular expression.
RegularExpressionConstraint(String, Level, String) - Constructor for class org.archive.crawler.settings.RegularExpressionConstraint
Constructs a new RegularExpressionConstraint.
RegularExpressionConstraint(String, String) - Constructor for class org.archive.crawler.settings.RegularExpressionConstraint
Constructs a new RegularExpressionConstraint using default severity level (Level.WARNING).
RegularExpressionConstraint(String, Level) - Constructor for class org.archive.crawler.settings.RegularExpressionConstraint
Constructs a new RegularExpressionConstraint using the default error message.
RegularExpressionConstraint(String) - Constructor for class org.archive.crawler.settings.RegularExpressionConstraint
Constructs a new RegularExpressionConstraint.
RegularExpressionCriteria - Class in org.archive.crawler.settings.refinements
A refinement criteria that test if a URI matches a regular expression.
RegularExpressionCriteria() - Constructor for class org.archive.crawler.settings.refinements.RegularExpressionCriteria
Create a new instance of RegularExpressionCriteria.
RegularExpressionCriteria(String) - Constructor for class org.archive.crawler.settings.refinements.RegularExpressionCriteria
Create a new instance of RegularExpressionCriteria initializing it with a regular expression.
reinit(Queue<String>, String) - Method in class org.archive.crawler.frontier.BdbFrontier
 
REJECT - Static variable in class org.archive.crawler.deciderules.DecideRule
 
RejectDecideRule - Class in org.archive.crawler.deciderules
Rule which answers REJECT to everything evaluated.
RejectDecideRule(String) - Constructor for class org.archive.crawler.deciderules.RejectDecideRule
 
RejectRevisitProcessor - Class in org.archive.crawler.postprocessor
Set a URI to not be revisited by the ARFrontier.
RejectRevisitProcessor(String) - Constructor for class org.archive.crawler.postprocessor.RejectRevisitProcessor
 
release() - Method in class org.archive.queue.MemQueue
 
release() - Method in interface org.archive.queue.Queue
release any OS/IO resources associated with Queue
release() - Method in interface org.archive.queue.Stack
Deprecated. Release any OS resources, if necessary.
releaseConnection(HttpConnection) - Method in class org.archive.httpclient.SingleHttpConnectionManager
 
releaseConnection(HttpConnection) - Method in class org.archive.httpclient.ThreadLocalHttpConnectionManager
 
releaseContinuePermission() - Method in class org.archive.crawler.framework.CrawlController
Relinquish continue permission at end of processing (allowing another thread to proceed if in single-thread mode).
RELEVANT_TAG_EXTRACTOR - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
RELEVANT_TAG_EXTRACTOR - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
Compiled relevant tag extractor.
relocate(long, long, long) - Method in class org.archive.util.AbstractLongFPSet
 
relocate(long, long, long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
remaining() - Method in class org.archive.io.ReplayInputStream
 
remove(String) - Method in class org.archive.crawler.datamodel.CandidateURI
 
remove(CrawlerSettings, Credential) - Method in class org.archive.crawler.datamodel.CredentialStore
Delete the credential name.
remove(CrawlerSettings, String) - Method in class org.archive.crawler.datamodel.CredentialStore
Delete the credential name.
remove(String) - Method in interface org.archive.crawler.framework.AlertManager
 
remove() - Method in class org.archive.crawler.settings.ComplexType.MBeanAttributeInfoIterator
 
remove(int) - Method in class org.archive.crawler.settings.ListType
 
remove(Object) - Method in class org.archive.crawler.settings.ListType
 
remove() - Method in class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
 
remove(String) - Method in class org.archive.crawler.settings.SoftSettingsHash
Removes the settings object identified by the key from this hash if present.
remove() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
 
remove() - Method in class org.archive.crawler.util.TransformIterator
 
remove() - Method in class org.archive.extractor.CharSequenceLinkExtractor
 
remove() - Method in class org.archive.io.ArchiveReader.ArchiveRecordIterator
 
remove(long) - Method in class org.archive.io.SinkHandler
 
remove(long) - Method in class org.archive.util.AbstractLongFPSet
 
remove(Object) - Method in class org.archive.util.CachedBdbMap
Deprecated. Remove mapping for the given key.
remove(Object, Object) - Method in class org.archive.util.CachedBdbMap
Deprecated. remove item matching both the key and value.
remove(long) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
remove(long) - Method in interface org.archive.util.fingerprint.LongFPSet
Remove a fingerprint from the set, if it is there
remove() - Method in class org.archive.util.iterator.CompositeIterator
 
remove() - Method in class org.archive.util.iterator.LookaheadIterator
 
remove(int) - Method in class org.archive.util.SubList
 
removeAlert(String) - Method in class org.archive.crawler.Heritrix
 
removeAlistPersistentMember(Object) - Static method in class org.archive.crawler.datamodel.CrawlURI
 
removeAll(Collection) - Method in class org.archive.crawler.settings.ListType
 
removeAt(long) - Method in class org.archive.util.AbstractLongFPSet
Remove the value at the given index, relocating its successors as necessary.
removeCredentialAvatar(CredentialAvatar) - Method in class org.archive.crawler.datamodel.CrawlURI
Remove all credential avatars from this crawl uri.
removeCredentialAvatars() - Method in class org.archive.crawler.datamodel.CrawlURI
Remove all credential avatars from this crawl uri.
removeEldestEntry(Map.Entry<K, V>) - Method in class org.archive.util.LRU
 
removeElement(String) - Method in class org.archive.crawler.settings.DataContainer
Remove an attribute from the DataContainer.
removeElement(CrawlerSettings, String) - Method in class org.archive.crawler.settings.MapType
Remove an attribute from the map.
removeElementFromDefinition(String) - Method in class org.archive.crawler.settings.ComplexType
This method can only be called before the ComplexType has been initialized.
removeRefinement(String) - Method in class org.archive.crawler.settings.CrawlerSettings
Remove a refinement from this settings object.
reopen(Environment) - Method in class org.archive.crawler.util.BdbUriUniqFilter
Call after deserializing an instance of this class.
reorder() - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Method is called whenever something has been done that might have changed the value of the 'published' time of next ready.
reorder(AdaptiveRevisitHostQueue) - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
This method reorders the host queues.
replace(K, V, V) - Method in class org.archive.util.CachedBdbMap
Deprecated. Replace entry for key only if currently mapped to given value.
replace(K, V) - Method in class org.archive.util.CachedBdbMap
Deprecated. Replace entry for key only if currently mapped to some value.
replaceAll(String, CharSequence, String) - Static method in class org.archive.util.TextUtils
Utility method using a precompiled pattern instead of using the replaceAll method of the String class.
replaceFirst(String, CharSequence, String) - Static method in class org.archive.util.TextUtils
Utility method using a precompiled pattern instead of using the replaceFirst method of the String class.
replaceOutlinks(Collection<CandidateURI>) - Method in class org.archive.crawler.datamodel.CrawlURI
Replace current collection of links w/ passed list.
ReplayCharSequence - Interface in org.archive.io
CharSequence interface with addition of a ReplayCharSequence.close() method.
ReplayInputStream - Class in org.archive.io
Replays the bytes recorded from a RecordingInputStream or RecordingOutputStream.
ReplayInputStream(byte[], long, long, String) - Constructor for class org.archive.io.ReplayInputStream
Constructor.
ReplayInputStream(byte[], long, String) - Constructor for class org.archive.io.ReplayInputStream
Constructor.
report() - Method in class org.archive.crawler.extractor.AggressiveExtractorHTML
 
report() - Method in class org.archive.crawler.extractor.ExtractorCSS
 
report() - Method in class org.archive.crawler.extractor.ExtractorDOC
 
report() - Method in class org.archive.crawler.extractor.ExtractorHTML
 
report() - Method in class org.archive.crawler.extractor.ExtractorHTTP
 
report() - Method in class org.archive.crawler.extractor.ExtractorImpliedURI
 
report() - Method in class org.archive.crawler.extractor.ExtractorJS
 
report() - Method in class org.archive.crawler.extractor.ExtractorPDF
Provide a human-readable textual summary of this Processor's state.
report() - Method in class org.archive.crawler.extractor.ExtractorSWF
 
report() - Method in class org.archive.crawler.extractor.ExtractorUniversal
 
report() - Method in class org.archive.crawler.extractor.ExtractorURI
 
report() - Method in class org.archive.crawler.extractor.ExtractorXML
 
report() - Method in class org.archive.crawler.extractor.JerichoExtractorHTML
 
report() - Method in class org.archive.crawler.extractor.TrapSuppressExtractor
Provide a human-readable textual summary of this Processor's state.
report() - Method in class org.archive.crawler.fetcher.FetchHTTP
 
report() - Method in class org.archive.crawler.framework.Processor
Compiles and returns a report (in human readable form) about the status of the processor.
report(int) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Returns a report detailing the status of this HQ.
report() - Method in class org.archive.crawler.writer.WARCWriterProcessor
 
Reporter - Interface in org.archive.util
 
reportManifestTo(PrintWriter) - Method in class org.archive.crawler.framework.CrawlController
 
reportProcessorsTo(PrintWriter) - Method in class org.archive.crawler.framework.CrawlController
Compiles and returns a human readable report on the active processors.
reports - Variable in class org.archive.crawler.framework.CrawlController
Logger to hold job summary report.
REPORTS - Static variable in class org.archive.crawler.framework.CrawlController
 
REPORTS - Static variable in class org.archive.crawler.framework.ToePool
 
REPORTS - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
reportTo(String, PrintWriter) - Method in class org.archive.crawler.datamodel.CandidateURI
 
reportTo(PrintWriter) - Method in class org.archive.crawler.datamodel.CandidateURI
 
reportTo(PrintWriter) - Method in class org.archive.crawler.framework.CrawlController
 
reportTo(String, PrintWriter) - Method in class org.archive.crawler.framework.CrawlController
 
reportTo(String, PrintWriter) - Method in class org.archive.crawler.framework.ToePool
 
reportTo(PrintWriter) - Method in class org.archive.crawler.framework.ToePool
 
reportTo(String, PrintWriter) - Method in class org.archive.crawler.framework.ToeThread
Compiles and returns a report on its status.
reportTo(PrintWriter) - Method in class org.archive.crawler.framework.ToeThread
 
reportTo(PrintWriter) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
reportTo(PrintWriter) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
reportTo(String, PrintWriter) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
reportTo(PrintWriter) - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
 
reportTo(String, PrintWriter) - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
 
reportTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueue
 
reportTo(String, PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueue
 
reportTo(String, PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
This method compiles a human readable report on the status of the frontier at the time of the call.
reportTo(String, PrintWriter) - Method in interface org.archive.util.Reporter
Make a report of the given name to the passed-in Writer, If null, give the default report.
reportTo(PrintWriter) - Method in interface org.archive.util.Reporter
Make a default report to the passed-in Writer.
RepositionableInputStream - Class in org.archive.io
Wrapper around an InputStream to make a primitive Repositionable stream.
RepositionableInputStream(InputStream) - Constructor for class org.archive.io.RepositionableInputStream
 
RepositionableInputStream(InputStream, int) - Constructor for class org.archive.io.RepositionableInputStream
 
REQUEST - Static variable in interface org.archive.io.warc.WARCConstants
 
REQUEST_INDEX - Static variable in interface org.archive.io.warc.WARCConstants
 
requestCrawlCheckpoint() - Method in class org.archive.crawler.framework.CrawlController
Request a checkpoint.
requestCrawlPause() - Method in class org.archive.crawler.framework.CrawlController
Stop the crawl temporarly.
requestCrawlResume() - Method in class org.archive.crawler.framework.CrawlController
Resume crawl from paused state
requestCrawlStart() - Method in class org.archive.crawler.framework.CrawlController
Operator requested crawl begin
requestCrawlStop() - Method in class org.archive.crawler.admin.CrawlJobHandler
 
requestCrawlStop() - Method in class org.archive.crawler.framework.CrawlController
Operator requested for crawl to stop.
requestCrawlStop(String) - Method in class org.archive.crawler.framework.CrawlController
Operator requested for crawl to stop.
requestFlush() - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Request that any pending items be added/dropped.
requestFlush() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
requestFlush() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
REQUIRED_VERSION_1_HEADER_FIELDS - Static variable in interface org.archive.io.arc.ARCConstants
Version 1 required metadata fields.
reschedule(CrawlURI, boolean) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Put near top of relevant hostQueue (but behind anything recently scheduled 'high')..
rescheduled(CandidateURI) - Method in interface org.archive.crawler.frontier.FrontierJournal
 
rescheduled(CandidateURI) - Method in class org.archive.crawler.frontier.RecoveryJournal
 
reset() - Method in class org.archive.extractor.CharSequenceLinkExtractor
Discard all state.
reset() - Method in interface org.archive.extractor.LinkExtractor
Discard all state and release any used resources.
reset() - Method in class org.archive.extractor.RegexpCSSLinkExtractor
 
reset() - Method in class org.archive.extractor.RegexpHTMLLinkExtractor
Discard all state.
reset() - Method in class org.archive.extractor.RegexpJSLinkExtractor
 
reset() - Method in class org.archive.io.RandomAccessInputStream
 
reset() - Method in class org.archive.io.RecordingInputStream
 
reset() - Method in class org.archive.io.RecordingOutputStream
When used alongside a mark-supporting RecordingInputStream, reset the position to that saved by previous mark().
reset() - Method in class org.archive.io.RepositionableInputStream
 
reset() - Method in class org.archive.io.SeekInputStream
Resets this stream to its marked position.
reset() - Method in class org.archive.io.SeekReader
Resets this stream to its marked position.
reset() - Method in class org.archive.util.PaddingStringBuffer
reset the buffer back to empty
resetAuthentication(String, String) - Static method in class org.archive.crawler.Heritrix
Replace existing administrator login info with new info.
resetAuthentication(String, String, String, String) - Method in class org.archive.crawler.SimpleHttpServer
Reset the administrator login info.
resetConsecutiveConnectionErrors() - Method in class org.archive.crawler.datamodel.CrawlServer
 
resetDeferrals() - Method in class org.archive.crawler.datamodel.CrawlURI
Reset deferrals counter.
resetFetchAttempts() - Method in class org.archive.crawler.datamodel.CrawlURI
Reset fetchAttempts counter.
resetInflater() - Method in class org.archive.io.GzippedInputStream
Move to next gzip member in the file.
resetLimits() - Method in class org.archive.io.RecordingOutputStream
Reset limits to effectively-unlimited defaults
resetState() - Method in class org.archive.crawler.extractor.PDFParser
Reinitialize the object as though a new one were created.
resetState(byte[]) - Method in class org.archive.crawler.extractor.PDFParser
Reset the object and initialize it with a new byte array (the document).
resetState(String) - Method in class org.archive.crawler.extractor.PDFParser
Reinitialize the object as though a new one were created, complete with a valid pointer to a document that can be read
resetStats() - Method in class org.archive.io.warc.WARCWriter
 
resize(int) - Method in class org.archive.crawler.settings.SoftSettingsHash
Rehashes the contents of this hash into a new HashMap instance with a larger capacity.
resolve(String) - Method in class org.archive.net.UURI
 
resolve(String, boolean) - Method in class org.archive.net.UURI
 
resolve(String, boolean, String) - Method in class org.archive.net.UURI
 
RESOURCE - Static variable in interface org.archive.io.warc.WARCConstants
 
RESOURCE_INDEX - Static variable in interface org.archive.io.warc.WARCConstants
 
RESPONSE - Static variable in interface org.archive.io.warc.WARCConstants
 
RESPONSE_INDEX - Static variable in interface org.archive.io.warc.WARCConstants
 
RESPONSE_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
responseBodyStart - Variable in class org.archive.io.ReplayInputStream
Where the response body starts, if marked
RESPONSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
restoreFile(File) - Method in class org.archive.io.ObjectPlusFilesInputStream
Restore a file from storage, using the name and length info on the serialization stream and the file from the current auxiliary directory, to the given File.
restoreFileTo(File) - Method in class org.archive.io.ObjectPlusFilesInputStream
Restore a file from storage, using the name and length info on the serialization stream and the file from the current auxiliary directory, to the given File.
restoreStatisticsTracker(MapType, String) - Method in class org.archive.crawler.framework.CrawlController
 
resume() - Method in class org.archive.crawler.admin.CrawlJob
 
resume(WorkQueueFrontier) - Method in class org.archive.crawler.frontier.WorkQueue
Resumes this WorkQueue.
RESUME_OPER - Static variable in class org.archive.crawler.admin.CrawlJob
 
resumeJob() - Method in class org.archive.crawler.admin.CrawlJobHandler
Cause the current job to resume crawling if it was paused.
retainAll(Collection) - Method in class org.archive.crawler.settings.ListType
 
retire() - Method in class org.archive.crawler.framework.ToeThread
Request that this thread retire (exit cleanly) at the earliest opportunity.
retiredQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
'retired' queues, no longer considered for activation.
retryDelayFor(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Return a suitable value to wait before retrying the given URI.
retryMethod(HttpMethod, IOException, int) - Method in class org.archive.crawler.fetcher.HeritrixHttpMethodRetryHandler
 
returnFile(WriterPoolMember) - Method in class org.archive.io.WriterPool
 
returnTrueIfMatches(CrawlURI) - Method in class org.archive.crawler.filter.OrFilter
Deprecated.  
returnTrueIfMatches(CrawlURI) - Method in class org.archive.crawler.filter.PathDepthFilter
Deprecated.  
returnTrueIfMatches(CrawlURI) - Method in class org.archive.crawler.filter.URIRegExpFilter
Deprecated.  
returnTrueIfMatches(CrawlURI) - Method in class org.archive.crawler.framework.Filter
Checks to see if filter functionality should be inverted for this curi.
REVISIT - Static variable in interface org.archive.io.warc.WARCConstants
 
REVISIT_INDEX - Static variable in interface org.archive.io.warc.WARCConstants
 
rewind() - Method in class org.archive.io.ArchiveReader
Rewinds stream to start of the Archive file.
RFC2396REGEX - Static variable in class org.archive.net.UURIFactory
RFC 2396-inspired regex.
Rfc2617Credential - Class in org.archive.crawler.datamodel.credential
A Basic/Digest auth RFC2617 credential.
Rfc2617Credential(String) - Constructor for class org.archive.crawler.datamodel.credential.Rfc2617Credential
Constructor.
ROBOTS_NOT_FETCHED - Static variable in class org.archive.crawler.datamodel.CrawlServer
 
robotsDenials - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
RobotsDirectives - Class in org.archive.crawler.datamodel
Represents the directives that apply to a user-agent (or set of user-agents)
RobotsDirectives() - Constructor for class org.archive.crawler.datamodel.RobotsDirectives
 
RobotsExclusionPolicy - Class in org.archive.crawler.datamodel
RobotsExclusionPolicy represents the actual policy adopted with respect to a specific remote server, usually constructed from consulting the robots.txt, if any, the server provided.
RobotsExclusionPolicy(CrawlerSettings, Robotstxt, RobotsHonoringPolicy) - Constructor for class org.archive.crawler.datamodel.RobotsExclusionPolicy
 
RobotsExclusionPolicy(int) - Constructor for class org.archive.crawler.datamodel.RobotsExclusionPolicy
 
robotsFetched - Variable in class org.archive.crawler.datamodel.CrawlServer
 
RobotsHonoringPolicy - Class in org.archive.crawler.datamodel
RobotsHonoringPolicy represent the strategy used by the crawler for determining how robots.txt files will be honored.
RobotsHonoringPolicy(String) - Constructor for class org.archive.crawler.datamodel.RobotsHonoringPolicy
Creates a new instance of RobotsHonoringPolicy.
RobotsHonoringPolicy() - Constructor for class org.archive.crawler.datamodel.RobotsHonoringPolicy
 
Robotstxt - Class in org.archive.crawler.datamodel
Utility class for parsing and representing 'robots.txt' format directives, into a list of named user-agents and map from user-agents to RobotsDirectives.
Robotstxt(BufferedReader) - Constructor for class org.archive.crawler.datamodel.Robotstxt
 
robotstxtChecksum - Variable in class org.archive.crawler.datamodel.CrawlServer
 
ROOT_CONTEXT - Static variable in class org.archive.crawler.Heritrix
The root context for a webapp.
RootFilter - Class in org.archive.crawler.admin.ui
Filter that redirects accesses to 'index.jsp'.
RootFilter() - Constructor for class org.archive.crawler.admin.ui.RootFilter
 
rootUriMatch(CrawlController, CrawlURI) - Method in class org.archive.crawler.datamodel.credential.Credential
Test passed curi matches this credentials rootUri.
rotate(String, String) - Method in class org.archive.io.GenerationFileHandler
Move the current file to a new filename with the storeSuffix in place of the activeSuffix; continuing logging to a new file under the original filename.
rotateLogFiles(String) - Method in class org.archive.crawler.framework.CrawlController
 
RSQRBRACKET - Static variable in class org.archive.net.UURIFactory
 
RSQRBRACKET_PATTERN - Static variable in class org.archive.net.UURIFactory
 
RsyncURLConnection - Class in org.archive.net.rsync
Rsync URL connection.
RsyncURLConnection(URL) - Constructor for class org.archive.net.rsync.RsyncURLConnection
 
rulesAccept(Object) - Method in class org.archive.crawler.framework.Processor
 
rulesAccept(DecideRule, Object) - Method in class org.archive.crawler.framework.Processor
 
run() - Method in class org.archive.crawler.fetcher.FetchHTTP.PostRestore
 
run() - Method in class org.archive.crawler.framework.AbstractTracker
Start thread.
run() - Method in class org.archive.crawler.framework.Checkpointer.CheckpointingThread
 
run() - Method in class org.archive.crawler.framework.ToeThread
(non-Javadoc)
run() - Method in class org.archive.crawler.frontier.WorkQueueFrontier.WakeTask
 
run() - Method in class org.archive.util.ProcessUtils.StreamGobbler
 
runFrontierRecover(String) - Method in class org.archive.crawler.framework.CrawlController
 
RUNNING - Static variable in class org.archive.crawler.framework.CrawlController
 
RuntimeErrorFormatter - Class in org.archive.crawler.io
Runtime exception log formatter.
RuntimeErrorFormatter() - Constructor for class org.archive.crawler.io.RuntimeErrorFormatter
 
runtimeErrors - Variable in class org.archive.crawler.framework.CrawlController
This logger contains unexpected runtime errors.
RuntimeLimitEnforcer - Class in org.archive.crawler.prefetch
A processor to enforce runtime limits on crawls.
RuntimeLimitEnforcer(String) - Constructor for class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 

S

S_BLOCKED_BY_CUSTOM_PROCESSOR - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
Blocked by custom prefetcher processor.
S_BLOCKED_BY_QUOTA - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
Blocked due to exceeding an established quota.
S_BLOCKED_BY_RUNTIME_LIMIT - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
Blocked due to exceeding an established runtime.
S_BLOCKED_BY_USER - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
blocked from fetch by user setting.
S_CONNECT_FAILED - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
HTTP connect failed
S_CONNECT_LOST - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
HTTP connect broken
S_DEEMED_CHAFF - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
'chaff' detection of traps/content of negligible value applied
S_DEEMED_NOT_FOUND - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
synthetic status, used when some other status (such as connection-lost) is considered by policy the same as a document-not-found
S_DEFERRED - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
temporary status assigned URIs awaiting preconditions; appearance in logs is a bug
S_DELETED_BY_USER - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
deleted from frontier by user
S_DNS_SUCCESS - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
DNS success
S_DOMAIN_PREREQUISITE_FAILURE - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
DNS prerequisite failed, precluding attempt
S_DOMAIN_UNRESOLVABLE - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
DNS lookup failed
S_GETBYNAME_SUCCESS - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
InetAddress.getByName success
S_OTHER_PREREQUISITE_FAILURE - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
DNS prerequisite failed, precluding attempt
S_OUT_OF_SCOPE - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
out-of-scope upoin reexamination (only when scope changes during crawl)
S_PREREQUISITE_UNSCHEDULABLE_FAILURE - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
DNS prerequisite failed, precluding attempt
S_PROCESSING_THREAD_KILLED - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
Processing thread was killed
S_ROBOTS_PRECLUDED - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
robots rules precluded fetch
S_ROBOTS_PREREQUISITE_FAILURE - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
Robots prerequisite failed, precluding attempt
S_RUNTIME_EXCEPTION - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
Unexpected runtime exception; see runtime-errors.log
S_SERIOUS_ERROR - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
severe java 'Error' conditions (OutOfMemoryError, StackOverflowError, etc.) during URI processing
S_TIMEOUT - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
HTTP timeout (before any meaningful response received)
S_TOO_MANY_EMBED_HOPS - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
overstepped embed/trans hops
S_TOO_MANY_LINK_HOPS - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
overstepped link hops
S_TOO_MANY_RETRIES - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
multiple retries all failed
S_UNATTEMPTED - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
fetch never tried (perhaps protocol unsupported or illegal URI)
S_UNFETCHABLE_URI - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
URI recognized as unsupported or illegal)
S_UNQUEUEABLE - Static variable in interface org.archive.crawler.datamodel.FetchStatusCodes
URI could not be queued in Frontier; when URIs are properly filtered for format, should never occur
SafeSeekInputStream - Class in org.archive.io
Enables multiple concurrent streams based on the same underlying stream.
SafeSeekInputStream(SeekInputStream) - Constructor for class org.archive.io.SafeSeekInputStream
Constructor.
sameDomainAs(CandidateURI) - Method in class org.archive.crawler.datamodel.CandidateURI
Compares the domain of this CandidateURI with that of another CandidateURI
saveCheckpointSerialNumber(File, int) - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
saveCookies() - Method in class org.archive.crawler.fetcher.FetchHTTP
Saves cookies to the file specified in the order file.
saveCookies(String) - Method in class org.archive.crawler.fetcher.FetchHTTP
Saves cookies to a file.
saveHeader(String, HttpMethodBase, AList) - Method in class org.archive.crawler.processor.recrawl.FetchHistoryProcessor
Save a header from the given HTTP operation into the AList.
saveHeader(String, HttpMethodBase, ANVLRecord, String) - Method in class org.archive.crawler.writer.WARCWriterProcessor
Save a header from the given HTTP operation into the provider headers under a new name
saveHostStats(String, long) - Method in class org.archive.crawler.admin.StatisticsTracker
 
saveIgnoredItems(String, File) - Static method in class org.archive.crawler.frontier.AbstractFrontier
Dump ignored seed items (if any) to disk; delete file otherwise.
saveSourceStats(String, String) - Method in class org.archive.crawler.admin.StatisticsTracker
 
scanCheckpoints() - Method in class org.archive.crawler.admin.CrawlJob
Read all the checkpoints found in the job's checkpoints directory into Checkpoint instances
schedule(CandidateURI) - Method in interface org.archive.crawler.framework.Frontier
Schedules a CandidateURI.
schedule(CandidateURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
schedule(CandidateURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Arrange for the given CandidateURI to be visited, if it is not already scheduled/completed.
schedule(CandidateURI) - Method in class org.archive.crawler.postprocessor.FrontierScheduler
Schedule the given CandidateURI with the Frontier.
ScopePlusOneDecideRule - Class in org.archive.crawler.deciderules
Rule allows one level of discovery beyond configured scope (e.g.
ScopePlusOneDecideRule(String) - Constructor for class org.archive.crawler.deciderules.ScopePlusOneDecideRule
Constructor.
Scoper - Class in org.archive.crawler.framework
Base class for Scopers.
Scoper(String, String) - Constructor for class org.archive.crawler.framework.Scoper
Constructor.
scratchDir - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
scratchDirFor(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
Utility method to return a scratch dir for the given key's temp files.
secondaryUriDB - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Secondary index into the primary DB, URIs indexed by the time when they can next be processed again.
secondsSinceEpoch(String) - Static method in class org.archive.util.ArchiveUtils
 
SEED_DISPOSITION_DISREGARD - Static variable in interface org.archive.crawler.framework.StatisticsTracking
Seed was disregarded
SEED_DISPOSITION_FAILURE - Static variable in interface org.archive.crawler.framework.StatisticsTracking
Failed to crawl seed
SEED_DISPOSITION_NOT_PROCESSED - Static variable in interface org.archive.crawler.framework.StatisticsTracking
Seed has not been processed
SEED_DISPOSITION_RETRY - Static variable in interface org.archive.crawler.framework.StatisticsTracking
Failed to crawl seed, will retry
SEED_DISPOSITION_SUCCESS - Static variable in interface org.archive.crawler.framework.StatisticsTracking
Seed successfully crawled
SeedAcceptDecideRule - Class in org.archive.crawler.deciderules
Rule which ACCEPTs all 'seed' URIs (those for which isSeed is true).
SeedAcceptDecideRule(String) - Constructor for class org.archive.crawler.deciderules.SeedAcceptDecideRule
 
SeedCachingScope - Class in org.archive.crawler.scope
A CrawlScope that caches its seed list for the convenience of scope-tests that are based on the seeds.
SeedCachingScope(String) - Constructor for class org.archive.crawler.scope.SeedCachingScope
 
SeedFileIterator - Class in org.archive.crawler.scope
Iterator wrapper for seeds file on disk.
SeedFileIterator(BufferedReader) - Constructor for class org.archive.crawler.scope.SeedFileIterator
Construct a SeedFileIterator over the input available from the supplied BufferedReader.
SeedFileIterator(BufferedReader, Writer) - Constructor for class org.archive.crawler.scope.SeedFileIterator
Construct a SeedFileIterator over the input available from the supplied BufferedReader, reporting any nonblank noncomment entries which don't generate a valid seed to the supplied BufferedWriter.
SeedListener - Interface in org.archive.crawler.scope
Implemented by components which want notifications of seed list changes from a Scope.
seedListeners - Variable in class org.archive.crawler.framework.CrawlScope
 
SeedRecord - Class in org.archive.crawler.admin
Record of all interesting info about the most-recent processing of a specific seed.
SeedRecord(CrawlURI, String) - Constructor for class org.archive.crawler.admin.SeedRecord
Create a record from the given CrawlURI and disposition string
SeedRecord(String, String) - Constructor for class org.archive.crawler.admin.SeedRecord
Constructor for when a CrawlURI is unavailable; such as when considering seeds not yet passed through as CrawlURIs.
SeedRecord(String, String, int, String) - Constructor for class org.archive.crawler.admin.SeedRecord
Create a record from the given URI, disposition, HTTP status code, and redirect URI.
seeds - Variable in class org.archive.crawler.scope.SeedCachingScope
 
SEEDS_REPORT_OPER - Static variable in class org.archive.crawler.admin.CrawlJob
 
seedsEdittableSize(SettingsHandler) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
Test whether seeds file is of a size that's reasonable to edit in an HTML textarea.
seedsIterator() - Method in class org.archive.crawler.framework.CrawlScope
Gets an iterator over all configured seeds.
seedsIterator(Writer) - Method in class org.archive.crawler.framework.CrawlScope
Gets an iterator over all configured seeds.
seedsIterator() - Method in class org.archive.crawler.scope.SeedCachingScope
 
SeedUrlNotFoundException - Exception in org.archive.crawler.util
 
SeedUrlNotFoundException(String) - Constructor for exception org.archive.crawler.util.SeedUrlNotFoundException
 
SeekInputStream - Class in org.archive.io
Base class for repositionable input streams.
SeekInputStream() - Constructor for class org.archive.io.SeekInputStream
 
SeekReader - Class in org.archive.io
Base class for repositionable readers.
SeekReader() - Constructor for class org.archive.io.SeekReader
 
SeekReaderCharSequence - Class in org.archive.io
 
SeekReaderCharSequence(SeekReader, int) - Constructor for class org.archive.io.SeekReaderCharSequence
 
selftest(String, int) - Static method in class org.archive.crawler.Heritrix
Run the selftest
SELFTEST - Static variable in class org.archive.crawler.selftest.SelfTestCase
Suffix for selftest classes.
SelfTestCase - Class in org.archive.crawler.selftest
Base class for integrated selftest unit tests.
SelfTestCase() - Constructor for class org.archive.crawler.selftest.SelfTestCase
 
SelfTestCase(String) - Constructor for class org.archive.crawler.selftest.SelfTestCase
 
SelfTestCrawlJobHandler - Class in org.archive.crawler.selftest
An override to gain access to end-of-crawljob message.
SelfTestCrawlJobHandler(File, String, String) - Constructor for class org.archive.crawler.selftest.SelfTestCrawlJobHandler
 
sendCheckpointEvent(File) - Method in class org.archive.crawler.framework.CrawlController
Send the checkpoint event.
sendCrawlStateChangeEvent(Object, String) - Method in class org.archive.crawler.framework.CrawlController
Send crawl change event to all listeners.
sendToQueue(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Send a CrawlURI to the appropriate subqueue.
SERIALIZED_CLASS_SUFFIX - Static variable in class org.archive.crawler.util.CheckpointUtils
 
serializeToByteArray(Object) - Static method in class org.archive.util.IoUtils
Utility method to serialize Object to byte[].
serializeToFile(Object, File) - Static method in class org.archive.util.IoUtils
Utility method to serialize an object to the given File.
serialVersionUID - Static variable in class org.archive.crawler.datamodel.Robotstxt
 
serialVersionUID - Static variable in class org.archive.crawler.frontier.WorkQueue
 
serialVersionUID - Static variable in class org.archive.crawler.settings.Constraint
 
seriousError(String) - Method in interface org.archive.crawler.frontier.FrontierJournal
Add a line noting a serious crawl error.
seriousError(String) - Method in class org.archive.crawler.io.CrawlerJournal
Note a serious error vioa a special log line
SERVER - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
SERVER_CACHE_KEY - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
ServerCache - Class in org.archive.crawler.datamodel
Server and Host cache.
ServerCache() - Constructor for class org.archive.crawler.datamodel.ServerCache
Constructor.
ServerCache(SettingsHandler) - Constructor for class org.archive.crawler.datamodel.ServerCache
This constructor creates a ServerCache that is all memory-based using Hashtables.
ServerCache(CrawlController) - Constructor for class org.archive.crawler.datamodel.ServerCache
Create a ServerCache that uses the given CrawlController to initialize the maps of servers and hosts.
serverInetAddr - Variable in class org.archive.crawler.fetcher.FetchDNS
 
servers - Variable in class org.archive.crawler.datamodel.ServerCache
hostname[:port] -> CrawlServer.
SERVICE - Static variable in class org.archive.util.JmxUtils
 
set(int, Double) - Method in class org.archive.crawler.settings.DoubleList
Replaces the element at the specified position in this list with the specified element.
set(int, Float) - Method in class org.archive.crawler.settings.FloatList
Replaces the element at the specified position in this list with the specified element.
set(int, Integer) - Method in class org.archive.crawler.settings.IntegerList
Replaces the element at the specified position in this list with the specified element.
set(int, Object) - Method in class org.archive.crawler.settings.ListType
Replaces the element at the specified position in this list with the specified element.
set(int, Long) - Method in class org.archive.crawler.settings.LongList
Replaces the element at the specified position in this list with the specified element.
set(int, String) - Method in class org.archive.crawler.settings.StringList
Replaces the element at the specified position in this list with the specified element.
set(int, E) - Method in class org.archive.util.SubList
 
setActive(WorkQueueFrontier, boolean) - Method in class org.archive.crawler.frontier.WorkQueue
 
setAdd(CharSequence) - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
setAdd(CharSequence) - Method in class org.archive.crawler.util.BloomUriUniqFilter
 
setAdd(CharSequence) - Method in class org.archive.crawler.util.FPUriUniqFilter
 
setAdd(CharSequence) - Method in class org.archive.crawler.util.MemUriUniqFilter
 
setAdd(CharSequence) - Method in class org.archive.crawler.util.NoopUriUniqFilter
 
setAdd(CharSequence) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
setAlignedOnFirstRecord(boolean) - Method in class org.archive.io.arc.ARCReader
 
setAList(AList) - Method in class org.archive.crawler.datamodel.CandidateURI
Called when making a copy of another CandidateURI.
setAsOrder(SettingsHandler) - Method in class org.archive.crawler.settings.ComplexType
 
setAt(long, long) - Method in class org.archive.util.AbstractLongFPSet
Set the stored value at the given slot.
setAt(long, long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
setAttribute(Attribute) - Method in class org.archive.crawler.admin.CrawlJob
 
setAttribute(Attribute) - Method in class org.archive.crawler.Heritrix
 
setAttribute(Attribute) - Method in class org.archive.crawler.settings.ComplexType
Set the value of a specific attribute of the ComplexType.
setAttribute(CrawlerSettings, Attribute) - Method in class org.archive.crawler.settings.ComplexType
Set the value of a specific attribute of the ComplexType.
setAttribute(Attribute) - Method in class org.archive.util.JEApplicationMBean
 
setAttribute(Environment, Attribute) - Method in class org.archive.util.JEMBeanHelper
Set an attribute value for the given environment.
setAttributeInternal(Attribute) - Method in class org.archive.crawler.admin.CrawlJob
 
setAttributes(AttributeList) - Method in class org.archive.crawler.admin.CrawlJob
 
setAttributes(AttributeList) - Method in class org.archive.crawler.Heritrix
 
setAttributes(AttributeList) - Method in class org.archive.crawler.settings.ComplexType
 
setAttributes(AttributeList) - Method in class org.archive.util.JEApplicationMBean
 
setAudience(String) - Method in class org.archive.crawler.settings.CrawlerSettings
Set the recipient/customer for the crawl job product.
setAudience(String) - Method in class org.archive.crawler.settings.refinements.Refinement
 
setAuthentication(String, String, String) - Method in class org.archive.crawler.SimpleHttpServer
Setup a realm on the server named for the webapp and add to the passed webapp's context.
setAuthentication(String, String, String, String, String) - Method in class org.archive.crawler.SimpleHttpServer
 
SetBasedUriUniqFilter - Class in org.archive.crawler.util
UriUniqFilter based on an underlying UriSet (essentially a Set).
SetBasedUriUniqFilter() - Constructor for class org.archive.crawler.util.SetBasedUriUniqFilter
 
setBaseURI(String) - Method in class org.archive.crawler.datamodel.CrawlURI
Set the (HTML) Base URI used for derelativizing internal URIs.
setBdbjeBkgrdThreads(EnvironmentConfig, List, String) - Method in class org.archive.crawler.framework.CrawlController
 
setBit(long) - Method in class org.archive.util.BloomFilter64bit
Changes the bit with index bitIndex in local bitvector.
setCapacity(int) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
setCharacterEncoding(String) - Method in class org.archive.util.HttpRecorder
 
setCheckpointErrors(boolean) - Method in class org.archive.crawler.framework.Checkpointer
 
setClassKey(String) - Method in class org.archive.crawler.datamodel.CandidateURI
 
setCompressed(boolean) - Method in class org.archive.io.ArchiveReader
 
setConditionalGetHeader(CrawlURI, HttpMethod, String, String, String) - Method in class org.archive.crawler.fetcher.FetchHTTP
Set the given conditional-GET header, if the setting is enabled and a suitable value is available in the URI history.
setConnection(HttpConnection) - Method in class org.archive.httpclient.HttpRecorderMethod
 
setConnectionStaleCheckingEnabled(boolean) - Method in class org.archive.httpclient.ThreadLocalHttpConnectionManager
Deprecated. Use HttpConnectionParams.setStaleCheckingEnabled(boolean), HttpConnectionManager.getParams().
setConsoleHandler() - Static method in class org.archive.util.OneLineSimpleLogger
 
setContentBegin(int) - Method in class org.archive.io.arc.ARCRecordMetaData
 
setContentDigest(byte[]) - Method in class org.archive.crawler.datamodel.CrawlURI
Deprecated. Use CrawlURI.setContentDigest(String scheme, byte[])
setContentDigest(String, byte[]) - Method in class org.archive.crawler.datamodel.CrawlURI
 
setContentHandler(ContentHandler) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
setContentSize(long) - Method in class org.archive.crawler.datamodel.CrawlURI
Sets the 'content size' for the URI, which is considered inclusive of all recorded material (such as protocol headers) or even material 'virtually' considered (as in material from a previous fetch confirmed unchanged with a server).
setContentType(String) - Method in class org.archive.crawler.datamodel.CrawlURI
Set a fetched uri's content type.
setController(CrawlController) - Method in class org.archive.crawler.datamodel.CrawlOrder
 
setCount() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
setCount() - Method in class org.archive.crawler.util.BloomUriUniqFilter
 
setCount() - Method in class org.archive.crawler.util.FPUriUniqFilter
 
setCount() - Method in class org.archive.crawler.util.MemUriUniqFilter
 
setCount() - Method in class org.archive.crawler.util.NoopUriUniqFilter
 
setCount() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
setCountryCode(String) - Method in class org.archive.crawler.datamodel.CrawlHost
Set country code for this hos
setCrawlDelay(float) - Method in class org.archive.crawler.datamodel.RobotsDirectives
 
setCrawlJob(CrawlJob) - Method in class org.archive.crawler.admin.CrawlJob.MBeanCrawlController
 
setCrawlOrderAttribute(String, ComplexType, Attribute) - Method in class org.archive.crawler.admin.CrawlJob
 
setCredentialDomain(CrawlerSettings, String) - Method in class org.archive.crawler.datamodel.credential.Credential
 
setDefaultNextProcessor(Processor) - Method in class org.archive.crawler.framework.Processor
Set the default next processor in the chain.
setDefaultProfile(CrawlJob) - Method in class org.archive.crawler.admin.CrawlJobHandler
Set the default profile.
setDescription(String) - Method in class org.archive.crawler.settings.ComplexType
Set the description of this ComplexType The description should be suitable for showing in a user interface.
setDescription(String) - Method in class org.archive.crawler.settings.CrawlerSettings
Set the description of this CrawlerSettings object.
setDescription(String) - Method in class org.archive.crawler.settings.refinements.Refinement
Set the description for this refinement.
setDestination(UriUniqFilter.HasUriReceiver) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Receiver of uniq URIs.
setDestination(UriUniqFilter.HasUriReceiver) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
setDestination(UriUniqFilter.HasUriReceiver) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
setDigest(String) - Method in class org.archive.io.arc.ARCRecordMetaData
 
setDigest(boolean) - Method in class org.archive.io.ArchiveReader
 
setDigest(String) - Method in class org.archive.io.RecordingInputStream
Sets a digest algorithm which may be applied to recorded data.
setDigest(MessageDigest) - Method in class org.archive.io.RecordingInputStream
Sets a digest function which may be applied to recorded data.
setDigest(String) - Method in class org.archive.io.RecordingOutputStream
Sets a digest function which may be applied to recorded data.
setDigest(MessageDigest) - Method in class org.archive.io.RecordingOutputStream
Sets a digest function which may be applied to recorded data.
setDocumentLocator(Locator) - Method in class org.archive.crawler.settings.CrawlSettingsSAXHandler
 
setDTDHandler(DTDHandler) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
setEarliestNextURIEmitTime(long) - Method in class org.archive.crawler.datamodel.CrawlHost
Set the earliest time a URI for this host could be emitted.
setElement(String) - Method in exception org.archive.crawler.framework.exceptions.ConfigurationException
Set the name of the element that was being parsed when this exception occured.
setEntityResolver(EntityResolver) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
setEor(boolean) - Method in class org.archive.io.ArchiveRecord
 
setErrorHandler(ErrorHandler) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
setErrorMessage(String) - Method in class org.archive.crawler.admin.CrawlJob
Set an error message for this job.
setErrorReportingLevel(Level) - Method in class org.archive.crawler.settings.SettingsHandler
Set the level for which notification of failed constraints will be fired.
setExpertSetting(boolean) - Method in class org.archive.crawler.settings.Type
Set if this Type should only show up in expert mode in UI.
setFeature(String, boolean) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
setFetchStatus(int) - Method in class org.archive.crawler.datamodel.CrawlURI
Set the overall/fetch status of this CrawlURI for its current trip through the processing loop.
setFile(String) - Method in exception org.archive.crawler.framework.exceptions.ConfigurationException
Store the name of the configuration file that was being parsed when this exception occured.
setFile(File) - Method in class org.archive.net.DownloadURLConnection
 
setForceFetch(boolean) - Method in class org.archive.crawler.datamodel.CandidateURI
Method to signal that this URI should be fetched even though it already has been crawled.
setFrom(String) - Method in class org.archive.crawler.settings.refinements.TimespanCriteria
Set the beginning of the time frame to check against.
setGetBit(long) - Method in class org.archive.util.BloomFilter64bit
Sets the bit with index bitIndex in local bitvector -- returning the old value.
setHeader(ArchiveRecordHeader) - Method in class org.archive.io.ArchiveRecord
 
setHeld() - Method in class org.archive.crawler.frontier.WorkQueue
Set isHeld to true
setHolder(Object) - Method in class org.archive.crawler.datamodel.CrawlURI
Remember a 'holder' to which some enclosing/queueing facility has assigned this CrawlURI .
setHolderCost(int) - Method in class org.archive.crawler.datamodel.CrawlURI
Remember a 'holderCost' which some enclosing/queueing facility has assigned this CrawlURI
setHolderKey(Object) - Method in class org.archive.crawler.datamodel.CrawlURI
Remember a 'holderKey' which some enclosing/queueing facility has assigned this CrawlURI .
setHttpRecorder(HttpRecorder) - Method in class org.archive.crawler.datamodel.CrawlURI
Set the http recorder to be associated with this uri.
setIn(InputStream) - Method in class org.archive.io.ArchiveReader
 
setIntValue(int) - Method in class org.archive.crawler.util.StringIntPair
 
setIP(InetAddress, long) - Method in class org.archive.crawler.datamodel.CrawlHost
Set the IP address for this host.
setIsSeed(boolean) - Method in class org.archive.crawler.datamodel.CandidateURI
Set the isSeed attribute of this URI.
setJobPriority(int) - Method in class org.archive.crawler.admin.CrawlJob
Set this job's level of priority.
setLastSavedTime(Date) - Method in class org.archive.crawler.settings.CrawlerSettings
Set the time when this CrawlerSettings was last saved to persistent storage.
setLegalValues(Object[]) - Method in class org.archive.crawler.settings.SimpleType
Set the array of legal values for this type.
setLegalValueType(Class) - Method in class org.archive.crawler.settings.Type
Set the class values of this Type must be an instance of.
setLevel(Level) - Method in class org.archive.crawler.admin.CrawlJobErrorHandler
 
setLimits(long, long, long) - Method in class org.archive.io.RecordingInputStream
Set limits to be enforced by internal recording-out
setLimits(long, long, long) - Method in class org.archive.io.RecordingOutputStream
Set limits on length, time, and rate to enforce.
setMaxPending(int) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
setName(String) - Method in class org.archive.crawler.settings.CrawlerSettings
Set the name of this CrawlerSettings object.
setNew(boolean) - Method in class org.archive.crawler.admin.CrawlJob
Set if the job is considered a new job or not.
setNextChain(ProcessorChain) - Method in class org.archive.crawler.framework.ProcessorChain
Set the processor chain that the URI should be working through after finishing this one.
setNextProcessor(Processor) - Method in class org.archive.crawler.datamodel.CrawlURI
Set the next processor to process this URI.
setNextProcessorChain(ProcessorChain) - Method in class org.archive.crawler.datamodel.CrawlURI
Set the next processor chain to process this URI.
setNextReadyTime(long) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Updates nextReadyTime (if smaller) with the supplied value
setNumberOfJournalEntries(int) - Method in class org.archive.crawler.admin.CrawlJob
 
setOperator(String) - Method in class org.archive.crawler.settings.CrawlerSettings
Set the operator of this crawl job.
setOperator(String) - Method in class org.archive.crawler.settings.refinements.Refinement
 
setOrder(CrawlOrder) - Method in class org.archive.crawler.framework.CrawlController
 
setOrganization(String) - Method in class org.archive.crawler.settings.CrawlerSettings
Set the name of the organization who is running this crawl.
setOrganization(String) - Method in class org.archive.crawler.settings.refinements.Refinement
 
setOverrideable(boolean) - Method in class org.archive.crawler.settings.Type
Set if this Type should be overideable.
setOwner(AdaptiveRevisitQueueList) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Set the AdaptiveRevisitQueueList object that contains this HQ.
setParams(HttpConnectionManagerParams) - Method in class org.archive.httpclient.ThreadLocalHttpConnectionManager
Assigns parameters for this connection manager.
setParseHttpHeaders(boolean) - Method in class org.archive.io.arc.ARCReader
 
setPathFromSeed(String) - Method in class org.archive.crawler.datamodel.CandidateURI
 
setPool(WriterPool) - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
setPortNumber(String) - Method in class org.archive.crawler.settings.refinements.PortnumberCriteria
Set the port number that is to be checked against a URI.
setPost(boolean) - Method in class org.archive.crawler.datamodel.CrawlURI
Set whether this URI should be fetched by sending a HTTP POST request.
setPrerequisite(boolean) - Method in class org.archive.crawler.datamodel.CrawlURI
Set if this CrawlURI is itself a prerequisite URI.
setPrerequisiteUri(Object) - Method in class org.archive.crawler.datamodel.CrawlURI
Set a prerequisite for this URI.
setPreservedFields(String[]) - Method in class org.archive.crawler.settings.ComplexType
Set a list of attribute names that the complex type should attempt to preserve if the module is exchanged with an other one.
setProfileLog(File) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Set a File to receive a log for replay profiling.
setProfileLog(File) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
setProfileLog(File) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
setProperty(String, Object) - Method in class org.archive.crawler.settings.CrawlSettingsSAXSource
 
setRead() - Method in class org.archive.io.SinkHandlerLogRecord
Mark alert as seen (That is, isNew() no longer returns true).
setReaderIdentifier(String) - Method in class org.archive.io.ArchiveReader
 
setReadOnly() - Method in class org.archive.crawler.admin.CrawlJob
Once called no changes can be made to the settings for this job.
setReference(String) - Method in class org.archive.crawler.settings.refinements.Refinement
Set the reference to this refinement's settings object.
setRefinement(boolean) - Method in class org.archive.crawler.settings.CrawlerSettings
Mark this settings object as an refinement.
setRegexp(String) - Method in class org.archive.crawler.settings.refinements.RegularExpressionCriteria
Set the regular expression to be matched against a URI.
setRemove(CharSequence) - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
setRemove(CharSequence) - Method in class org.archive.crawler.util.BloomUriUniqFilter
 
setRemove(CharSequence) - Method in class org.archive.crawler.util.FPUriUniqFilter
 
setRemove(CharSequence) - Method in class org.archive.crawler.util.MemUriUniqFilter
 
setRemove(CharSequence) - Method in class org.archive.crawler.util.NoopUriUniqFilter
 
setRemove(CharSequence) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
setRetired(boolean) - Method in class org.archive.crawler.frontier.WorkQueue
Set the retired status of this queue.
setRobots(RobotsExclusionPolicy) - Method in class org.archive.crawler.datamodel.CrawlServer
Set the robots exclusion policy for this server.
setRunning(boolean) - Method in class org.archive.crawler.admin.CrawlJob
Set if job is being crawled.
setSchedulingDirective(int) - Method in class org.archive.crawler.datamodel.CandidateURI
 
setSessionBalance(int) - Method in class org.archive.crawler.frontier.WorkQueue
Set the session 'activity budget balance' to the given value
setSettingsHandler(SettingsHandler) - Method in class org.archive.crawler.datamodel.CrawlServer
Set the settings handler to be used by this server.
setSha1Digest() - Method in class org.archive.io.RecordingInputStream
Convenience method for setting SHA1 digest.
setSha1Digest() - Method in class org.archive.io.RecordingOutputStream
Convenience method for setting SHA1 digest.
setSize(int) - Method in class org.archive.crawler.framework.ToePool
Change the number of ToeThreads.
setSizes(CrawlURI, HttpRecorder) - Method in class org.archive.crawler.fetcher.FetchHTTP
Update CrawlURI internal sizes based on current transaction (and in the case of 304s, history)
setStackTrace(StackTraceElement[]) - Method in exception org.archive.io.RecoverableIOException
 
setStartKey(DatabaseEntry) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
 
setStatus(String) - Method in class org.archive.crawler.admin.CrawlJob
Set the status of this CrawlJob.
setStatusCode(String) - Method in class org.archive.io.arc.ARCRecordMetaData
 
setStrict(boolean) - Method in class org.archive.io.ArchiveReader
 
setStrict(boolean) - Method in class org.archive.io.ArchiveRecord
 
setStringValue(String) - Method in class org.archive.crawler.util.StringIntPair
 
setThreadContextSettingsHandler(SettingsHandler) - Static method in class org.archive.crawler.settings.SettingsHandler
 
setThreadNumber(int) - Method in class org.archive.crawler.datamodel.CrawlURI
Set the number of the ToeThread responsible for processing this uri.
settings - Variable in class org.archive.crawler.settings.ComplexType.Context
 
SettingsCache - Class in org.archive.crawler.settings
This class keeps a map of host names to settings objects.
SettingsCache(CrawlerSettings) - Constructor for class org.archive.crawler.settings.SettingsCache
Creates a new instance of the settings cache
SettingsFrameworkTestCase - Class in org.archive.crawler.settings
Set up a couple of settings to test different functions of the settings framework.
SettingsFrameworkTestCase() - Constructor for class org.archive.crawler.settings.SettingsFrameworkTestCase
 
settingsHandler - Variable in class org.archive.crawler.admin.CrawlJob
 
settingsHandler - Variable in class org.archive.crawler.datamodel.ServerCache
 
settingsHandler - Variable in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
SettingsHandler - Class in org.archive.crawler.settings
An instance of this class holds a hierarchy of settings.
SettingsHandler() - Constructor for class org.archive.crawler.settings.SettingsHandler
Create a new SettingsHandler object.
settingsToFilename(CrawlerSettings) - Method in class org.archive.crawler.settings.XMLSettingsHandler
Resolves the filename for a settings object into a file path.
setTo(String) - Method in class org.archive.crawler.settings.refinements.TimespanCriteria
Set the end of the time frame to check against.
setToResponseBodyStart() - Method in class org.archive.io.ReplayInputStream
 
setTotalBudget(long) - Method in class org.archive.crawler.frontier.WorkQueue
Set the total expenditure level allowable before queue is considered inherently 'over-budget'.
setTotalBytesWritten(long) - Method in class org.archive.crawler.framework.WriterPoolProcessor
 
setTransient(boolean) - Method in class org.archive.crawler.settings.Type
Set to false if this attribute should not be serialized to persistent storage.
setType(Object) - Method in class org.archive.crawler.settings.ModuleAttributeInfo
 
setUnresolvable(CrawlURI, CrawlHost) - Method in class org.archive.crawler.fetcher.FetchDNS
 
setUp() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
setup(UURI, UURI, InputStream, Charset, ExtractErrorListener) - Method in class org.archive.extractor.CharSequenceLinkExtractor
 
setup(UURI, UURI, CharSequence, ExtractErrorListener) - Method in class org.archive.extractor.CharSequenceLinkExtractor
 
setup(UURI, CharSequence, ExtractErrorListener) - Method in class org.archive.extractor.CharSequenceLinkExtractor
Convenience method for when source and base are same.
setup(UURI, InputStream, Charset, ExtractErrorListener) - Method in class org.archive.extractor.CharSequenceLinkExtractor
 
setup(UURI, UURI, InputStream, Charset, ExtractErrorListener) - Method in interface org.archive.extractor.LinkExtractor
Setup the LinkExtractor to operate on the given stream and charset, considering the given contextURI as the initial 'base' URI for resolving relative URIs.
setup(UURI, InputStream, Charset, ExtractErrorListener) - Method in interface org.archive.extractor.LinkExtractor
Convenience version of above for common case where source and base are same.
setUp() - Method in class org.archive.queue.QueueTestBase
 
setUp() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
 
setUp() - Method in class org.archive.util.TmpDirTestCase
 
setupCheckpointRecover() - Method in class org.archive.crawler.framework.CrawlController
Does setup of checkpoint recover.
setupCopyEnvironment(File) - Static method in class org.archive.crawler.processor.recrawl.PersistProcessor
 
setupCopyEnvironment(File, boolean) - Static method in class org.archive.crawler.processor.recrawl.PersistProcessor
 
setupCrawlController() - Method in class org.archive.crawler.admin.CrawlJob
 
setupForCrawlStart() - Method in class org.archive.crawler.admin.CrawlJob
 
setupPool(AtomicInteger) - Method in class org.archive.crawler.framework.WriterPoolProcessor
Set up pool of files.
setupPool(AtomicInteger) - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
setupPool(AtomicInteger) - Method in class org.archive.crawler.writer.WARCWriterProcessor
 
setURI() - Method in class org.archive.net.LaxURI
Coalesce _scheme to existing instances, where appropriate.
setUserAgent(String) - Method in class org.archive.crawler.datamodel.CrawlURI
Set the user agent to use when crawling this URI.
setVersion(String) - Method in class org.archive.io.ArchiveReader
 
setVia(UURI) - Method in class org.archive.crawler.datamodel.CandidateURI
 
setWakeTime(long) - Method in class org.archive.crawler.frontier.WorkQueue
 
SHA1 - Static variable in class org.archive.crawler.fetcher.FetchHTTP
The different digest algorithms to choose between, SHA-1 or MD-5 at the moment.
sharedInterpreter - Variable in class org.archive.crawler.deciderules.BeanShellDecideRule
 
sharedInterpreter - Variable in class org.archive.crawler.processor.BeanShellProcessor
 
sharedMap - Variable in class org.archive.crawler.deciderules.BeanShellDecideRule
 
sharedMap - Variable in class org.archive.crawler.processor.BeanShellProcessor
 
shouldBeForgotten(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
Some URIs, if they recur, deserve another chance at consideration: they might not be too many hops away via another path, or the scope may have been updated to allow them passage.
shouldCloseConnection(HttpConnection) - Method in class org.archive.httpclient.HttpRecorderGetMethod
 
shouldCloseConnection(HttpConnection) - Method in class org.archive.httpclient.HttpRecorderPostMethod
 
shouldExtract(CrawlURI) - Method in class org.archive.crawler.extractor.ExtractorXML
 
shouldLoad(CrawlURI) - Method in class org.archive.crawler.processor.recrawl.PersistProcessor
Whether the current CrawlURI's state should be loaded
shouldManifest() - Method in class org.archive.io.GenerationFileHandler
 
shouldMasquerade(CrawlURI) - Method in class org.archive.crawler.datamodel.RobotsHonoringPolicy
This method returns true if the crawler should masquerade as the user agent which restrictions it opted to use.
shouldPause - Variable in class org.archive.crawler.frontier.AbstractFrontier
should the frontier hold any threads asking for URIs?
shouldRetire() - Method in class org.archive.crawler.framework.ToeThread
Whether this thread should cleanly retire at the earliest opportunity.
shouldrun - Variable in class org.archive.crawler.framework.AbstractTracker
 
shouldStore(CrawlURI) - Method in class org.archive.crawler.processor.recrawl.PersistProcessor
Whether the current CrawlURI's state should be persisted (to log or direct to database).
shouldTerminate - Variable in class org.archive.crawler.frontier.AbstractFrontier
should the frontier send an EndedException to any threads asking for URIs?
shouldWrite(CrawlURI) - Method in class org.archive.crawler.framework.WriterPoolProcessor
Whether the given CrawlURI should be written to archive files.
shutdown(int) - Static method in class org.archive.crawler.Heritrix
Shutdown all running heritrix instances and the JVM.
shutdown() - Static method in class org.archive.crawler.Heritrix
 
SHUTDOWN_OPER - Static variable in class org.archive.crawler.Heritrix
 
sigquitSelf() - Static method in class org.archive.util.DevUtils
Send this JVM process a SIGQUIT; giving a thread dump and possibly a heap histogram (if using -XX:+PrintClassHistogram).
SimpleHttpServer - Class in org.archive.crawler
Wrapper for embedded Jetty server.
SimpleHttpServer() - Constructor for class org.archive.crawler.SimpleHttpServer
 
SimpleHttpServer(int, boolean) - Constructor for class org.archive.crawler.SimpleHttpServer
 
SimpleHttpServer(boolean, String, String, int, boolean) - Constructor for class org.archive.crawler.SimpleHttpServer
Deprecated. Use SimpleHttpServer(name,context,hosts,port,expandWebapps)
SimpleHttpServer(String, String, Collection<String>, int, boolean) - Constructor for class org.archive.crawler.SimpleHttpServer
Constructor.
SimpleHttpServer(List, int, boolean) - Constructor for class org.archive.crawler.SimpleHttpServer
 
SimpleType - Class in org.archive.crawler.settings
A type that holds a Java type.
SimpleType(String, String, Object) - Constructor for class org.archive.crawler.settings.SimpleType
Create a new instance of SimpleType.
SimpleType(String, String, Object, Object[]) - Constructor for class org.archive.crawler.settings.SimpleType
Create a new instance of SimpleType.
SINGLE_SPACE - Static variable in interface org.archive.io.ArchiveFileConstants
 
SingleHttpConnectionManager - Class in org.archive.httpclient
An HttpClient-compatible HttpConnection "manager" that actually just gives out a new connection each time -- skipping the overhead of connection management, since we already throttle our crawler with external mechanisms.
SingleHttpConnectionManager() - Constructor for class org.archive.httpclient.SingleHttpConnectionManager
 
singleLineLegend() - Method in class org.archive.crawler.datamodel.CandidateURI
 
singleLineLegend() - Method in class org.archive.crawler.framework.CrawlController
 
singleLineLegend() - Method in class org.archive.crawler.framework.ToePool
 
singleLineLegend() - Method in class org.archive.crawler.framework.ToeThread
 
singleLineLegend() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
singleLineLegend() - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
 
singleLineLegend() - Method in class org.archive.crawler.frontier.WorkQueue
 
singleLineLegend() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
singleLineLegend() - Method in interface org.archive.util.Reporter
Return a legend for the single-line summary report as a String.
singleLineReport() - Method in class org.archive.crawler.datamodel.CandidateURI
 
singleLineReport() - Method in class org.archive.crawler.framework.CrawlController
 
singleLineReport() - Method in class org.archive.crawler.framework.ToePool
 
singleLineReport() - Method in class org.archive.crawler.framework.ToeThread
 
singleLineReport() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
singleLineReport() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
singleLineReport() - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
 
singleLineReport() - Method in class org.archive.crawler.frontier.WorkQueue
 
singleLineReport(Reporter) - Static method in class org.archive.util.ArchiveUtils
Utility method to get a String singleLineReport from Reporter
singleLineReport() - Method in interface org.archive.util.Reporter
Return a short single-line summary report as a String.
singleLineReportTo(PrintWriter) - Method in class org.archive.crawler.datamodel.CandidateURI
 
singleLineReportTo(PrintWriter) - Method in class org.archive.crawler.framework.CrawlController
 
singleLineReportTo(PrintWriter) - Method in class org.archive.crawler.framework.ToePool
 
singleLineReportTo(PrintWriter) - Method in class org.archive.crawler.framework.ToeThread
 
singleLineReportTo(PrintWriter) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
singleLineReportTo(PrintWriter) - Method in class org.archive.crawler.frontier.AdaptiveRevisitQueueList
 
singleLineReportTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueue
 
singleLineReportTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
singleLineReportTo(PrintWriter) - Method in interface org.archive.util.Reporter
Make a single-line summary report to the passed-in writer
singlePossibleNonPassDecision(Object) - Method in class org.archive.crawler.deciderules.AcceptDecideRule
 
singlePossibleNonPassDecision(Object) - Method in class org.archive.crawler.deciderules.ConfiguredDecideRule
 
singlePossibleNonPassDecision(Object) - Method in class org.archive.crawler.deciderules.DecideRule
If this rule is "one-way" -- can only return a single possible decision other than PASS -- return that decision.
singlePossibleNonPassDecision(Object) - Method in class org.archive.crawler.deciderules.RejectDecideRule
 
singleThreadMode() - Method in class org.archive.crawler.framework.CrawlController
Go to single thread mode, where only one ToeThread may proceed at a time.
SinkHandler - Class in org.archive.io
A handler that keeps an in-memory vector of all events deemed loggable by configuration.
SinkHandler() - Constructor for class org.archive.io.SinkHandler
 
SinkHandlerLogRecord - Class in org.archive.io
Version of LogRecord used by SinkHandler.
SinkHandlerLogRecord() - Constructor for class org.archive.io.SinkHandlerLogRecord
 
SinkHandlerLogRecord(LogRecord) - Constructor for class org.archive.io.SinkHandlerLogRecord
 
size() - Method in class org.archive.crawler.framework.ProcessorChain
Get the number of processors in this chain.
size() - Method in class org.archive.crawler.framework.ProcessorChainList
Get the number of processor chains.
size - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Size of queue.
size() - Method in class org.archive.crawler.settings.ComplexType.MBeanAttributeInfoIterator
 
size() - Method in class org.archive.crawler.settings.DataContainer
 
size() - Method in class org.archive.crawler.settings.ListType
Get the number of elements in this list.
size(Object) - Method in class org.archive.crawler.settings.MapType
Get the number of elements in this map.
size() - Method in class org.archive.crawler.settings.SoftSettingsHash
Returns the number of key-value mappings in this map.
size() - Method in class org.archive.crawler.util.Transform
 
size() - Method in class org.archive.queue.StoredQueue
 
size() - Method in interface org.archive.util.BloomFilter
The number of character sequences in the filter (considered to be the number of add()s that returned 'true')
size - Variable in class org.archive.util.BloomFilter64bit
The number of elements currently in the filter.
size() - Method in class org.archive.util.BloomFilter64bit
The number of character sequences in the filter.
size() - Method in class org.archive.util.CachedBdbMap
Deprecated.  
size() - Method in class org.archive.util.ObjectIdentityBdbCache
 
size() - Method in interface org.archive.util.ObjectIdentityCache
count of name-to-object contained
size() - Method in class org.archive.util.ObjectIdentityMemCache
 
size() - Method in class org.archive.util.SubList
 
SIZE_ON_DISK - Static variable in class org.archive.io.warc.WARCWriter
 
skip(int) - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
 
skip() - Method in class org.archive.io.ArchiveRecord
Skip over this records content.
skip(long) - Method in class org.archive.io.ArchiveRecord
 
skip(long) - Method in class org.archive.io.BufferedSeekInputStream
 
skip(long) - Method in class org.archive.io.CompositeFileInputStream
 
skip(long) - Method in class org.archive.io.OriginSeekInputStream
 
skip(long) - Method in class org.archive.io.RandomAccessInputStream
 
skip(long) - Method in class org.archive.io.SafeSeekInputStream
 
skip(long) - Method in class org.archive.util.ms.BlockInputStream
 
skipHttpHeader() - Method in class org.archive.io.arc.ARCRecord
Skip over the the http header if one present.
skipToProcessor(ProcessorChain, Processor) - Method in class org.archive.crawler.datamodel.CrawlURI
Set which processor should be the next processor to process this uri instead of using the default next processor.
skipToProcessorChain(ProcessorChain) - Method in class org.archive.crawler.datamodel.CrawlURI
Set which processor chain should be processing this uri next.
slash - Static variable in class org.archive.crawler.filter.PathDepthFilter
Deprecated.  
SLASH - Static variable in class org.archive.net.UURIFactory
 
SLASHDOTDOTSLASH - Static variable in class org.archive.net.UURIFactory
 
slots - Variable in class org.archive.util.fingerprint.MemLongFPSet
 
smear - Variable in class org.archive.util.fingerprint.ArrayLongFPCache
 
snapshotAppendOnlyFile(File) - Method in class org.archive.io.ObjectPlusFilesOutputStream
Store a snapshot of an object's supporting file to the current auxiliary directory.
snoozedClassQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
All per-class queues held in snoozed state, sorted by wake time.
SoftSettingsHash - Class in org.archive.crawler.settings
 
SoftSettingsHash(int) - Constructor for class org.archive.crawler.settings.SoftSettingsHash
Constructs a new, empty SoftSettingsHash with the given initial capacity.
SoftSettingsHash.EntryIterator - Class in org.archive.crawler.settings
Iterator over all elements in hash.
SoftSettingsHash.EntryIterator() - Constructor for class org.archive.crawler.settings.SoftSettingsHash.EntryIterator
 
SoftSettingsHash.SettingsEntry - Class in org.archive.crawler.settings
The entries in this hash extend SoftReference, using the host string as the key.
SoftSettingsHash.SettingsEntry(String, CrawlerSettings, ReferenceQueue<? super String>, int, SoftSettingsHash.SettingsEntry) - Constructor for class org.archive.crawler.settings.SoftSettingsHash.SettingsEntry
Create new entry.
Sorts - Class in org.archive.crawler.util
 
Sorts() - Constructor for class org.archive.crawler.util.Sorts
 
sortStringIntHashMap(HashMap<String, Integer>) - Static method in class org.archive.crawler.util.Sorts
 
source - Variable in class org.archive.extractor.CharSequenceLinkExtractor
 
sourceContent - Variable in class org.archive.extractor.CharSequenceLinkExtractor
 
sourceHostDistribution - Variable in class org.archive.crawler.admin.StatisticsTracker
Keep track of URL counts per host per seed
SPACE - Static variable in class org.archive.net.UURIFactory
 
spawn(int) - Method in class org.archive.crawler.framework.Processor
 
SPECULATIVE_HOP - Static variable in class org.archive.crawler.extractor.Link
speculative/aggressively extracted links, perhaps embed or nav, as in javascript
SPECULATIVE_MISC - Static variable in class org.archive.crawler.extractor.Link
stand-in value for speculative/aggressively extracted urls without other context
speculativeFixup(String, UURI) - Static method in class org.archive.util.UriUtils
Perform additional fixup of likely-URI Strings
split(String, CharSequence) - Static method in class org.archive.util.TextUtils
Utility method using a precompiled pattern instead of using the split method of the String class.
SQUOT - Static variable in class org.archive.net.UURIFactory
 
SSL_FACTORY_KEY - Static variable in class org.archive.crawler.fetcher.FetchHTTP
 
Stack - Interface in org.archive.queue
Deprecated. As of 1.10.0. Unused.
STANDARD_REPORT - Static variable in class org.archive.crawler.framework.ToePool
 
STANDARD_REPORT - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
standardReportTo(PrintWriter) - Method in class org.archive.crawler.framework.ToePool
 
start() - Method in interface org.archive.crawler.framework.Frontier
Request that Frontier allow crawling to begin.
start() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
start() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
start() - Method in class org.archive.crawler.Heritrix
Start Heritrix.
start - Variable in class org.archive.io.CharSubSequence
 
START_CRAWLING_OPER - Static variable in class org.archive.crawler.Heritrix
 
START_OPER - Static variable in class org.archive.crawler.Heritrix
 
startCrawler() - Method in class org.archive.crawler.admin.CrawlJobHandler
Allow jobs to be crawled.
startCrawling() - Method in class org.archive.crawler.Heritrix
 
startDigest() - Method in class org.archive.io.RecordingInputStream
 
startDigest() - Method in class org.archive.io.RecordingOutputStream
Starts digesting recorded data, if a MessageDigest has been set.
startDocument() - Method in class org.archive.crawler.settings.CrawlSettingsSAXHandler
 
STARTED - Static variable in class org.archive.crawler.framework.CrawlController
 
startElement(String, String, String, Attributes) - Method in class org.archive.crawler.settings.CrawlSettingsSAXHandler
Start of an element.
startEmbeddedWebserver(int, boolean, String) - Static method in class org.archive.crawler.Heritrix
Deprecated. Use startEmbeddedWebserver(hosts, port, adminLoginPassword)
startEmbeddedWebserver(Collection<String>, int, String) - Static method in class org.archive.crawler.Heritrix
Start up the embedded Jetty webserver instance.
startKey - Variable in class org.archive.crawler.frontier.BdbMultipleWorkQueues.BdbFrontierMarker
 
STARTLOG - Static variable in class org.archive.crawler.Heritrix
Heritrix start log file.
startNextJob() - Method in class org.archive.crawler.admin.CrawlJobHandler
Start next crawl job.
startNextJobInternal() - Method in class org.archive.crawler.admin.CrawlJobHandler
 
startServer() - Method in class org.archive.crawler.SimpleHttpServer
Start the server.
startsWith(byte[], byte[]) - Static method in class org.archive.util.ArchiveUtils
Verify that the array begins with the prefix.
startTime - Variable in class org.archive.io.RecordingOutputStream
time recording begins for timeout, rate calculations
state - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Last known state of HQ -- ALL methods should use getState() to read this value, never read it directly.
statistics - Variable in class org.archive.crawler.framework.CrawlController
 
StatisticsLogFormatter - Class in org.archive.crawler.io
 
StatisticsLogFormatter() - Constructor for class org.archive.crawler.io.StatisticsLogFormatter
 
StatisticsSummary - Class in org.archive.crawler.admin
This class provides descriptive statistics of a finished crawl job by using the crawl report files generated by StatisticsTracker.
StatisticsSummary(CrawlJob) - Constructor for class org.archive.crawler.admin.StatisticsSummary
Constructor
StatisticsTracker - Class in org.archive.crawler.admin
This is an implementation of the AbstractTracker.
StatisticsTracker(String) - Constructor for class org.archive.crawler.admin.StatisticsTracker
 
StatisticsTracking - Interface in org.archive.crawler.framework
An interface for objects that want to collect statistics on running crawls.
STATUS_ABORTED - Static variable in class org.archive.crawler.admin.CrawlJob
Job was terminted by user input while crawling
STATUS_ATTR - Static variable in class org.archive.crawler.admin.CrawlJob
 
STATUS_ATTR - Static variable in class org.archive.crawler.Heritrix
 
STATUS_CHECKPOINTING - Static variable in class org.archive.crawler.admin.CrawlJob
Job is being checkpointed.
STATUS_CODE_KEY - Static variable in interface org.archive.crawler.writer.Kw3Constants
 
STATUS_CREATED - Static variable in class org.archive.crawler.admin.CrawlJob
Inital value.
STATUS_DELETED - Static variable in class org.archive.crawler.admin.CrawlJob
Job was deleted by user, will not be displayed in UI.
STATUS_FINISHED - Static variable in class org.archive.crawler.admin.CrawlJob
Job finished normally having completed its crawl.
STATUS_FINISHED_ABNORMAL - Static variable in class org.archive.crawler.admin.CrawlJob
Something went very wrong
STATUS_FINISHED_DATA_LIMIT - Static variable in class org.archive.crawler.admin.CrawlJob
Job finished normally when the specifed amount of data (MB) had been downloaded
STATUS_FINISHED_DOCUMENT_LIMIT - Static variable in class org.archive.crawler.admin.CrawlJob
Job finished normally when the specified number of documents had been fetched.
STATUS_FINISHED_TIME_LIMIT - Static variable in class org.archive.crawler.admin.CrawlJob
Job finished normally when the specified timelimit was hit.
STATUS_MISCONFIGURED - Static variable in class org.archive.crawler.admin.CrawlJob
Job could not be launced due to an InitializationException
STATUS_PAUSED - Static variable in class org.archive.crawler.admin.CrawlJob
Job was temporarly stopped.
STATUS_PENDING - Static variable in class org.archive.crawler.admin.CrawlJob
Job has been successfully submitted to a CrawlJobHandler
STATUS_PREPARING - Static variable in class org.archive.crawler.admin.CrawlJob
 
STATUS_PROFILE - Static variable in class org.archive.crawler.admin.CrawlJob
Job is actually a profile
STATUS_RUNNING - Static variable in class org.archive.crawler.admin.CrawlJob
Job is being crawled
STATUS_WAITING_FOR_PAUSE - Static variable in class org.archive.crawler.admin.CrawlJob
Job is going to be temporarly stopped after active threads are finished.
STATUSCODE_FIELD_KEY - Static variable in interface org.archive.io.arc.ARCConstants
Key for statuscode field.
statusCodeDistribution - Variable in class org.archive.crawler.admin.StatisticsSummary
Keep track of status codes
statusCodeDistribution - Variable in class org.archive.crawler.admin.StatisticsTracker
Keep track of fetch status codes
stestBackgroundImageExtraction() - Method in class org.archive.crawler.selftest.BackgroundImageExtractionSelfTestCase
Read ARC file for the background image the file that contained it.
stestFrames() - Method in class org.archive.crawler.selftest.FramesSelfTestCase
Verify that all frames and their contents are found by the crawler.
stop() - Method in class org.archive.crawler.admin.CrawlJobHandler
 
stop() - Method in class org.archive.crawler.Heritrix
Stop Heritrix.
STOP_CRAWLING_OPER - Static variable in class org.archive.crawler.Heritrix
 
STOP_OPER - Static variable in class org.archive.crawler.Heritrix
 
stopCrawler() - Method in class org.archive.crawler.admin.CrawlJobHandler
Stop future jobs from being crawled.
stopCrawling() - Method in class org.archive.crawler.admin.CrawlJob
 
stopCrawling() - Method in class org.archive.crawler.Heritrix
 
STOPPING - Static variable in class org.archive.crawler.framework.CrawlController
 
stopServer() - Method in class org.archive.crawler.SimpleHttpServer
Stop the running server.
store - Variable in class org.archive.crawler.processor.recrawl.PersistOnlineProcessor
 
storeDNSRecord(CrawlURI, String, CrawlHost, Record[]) - Method in class org.archive.crawler.fetcher.FetchDNS
 
StoredQueue<E extends java.io.Serializable> - Class in org.archive.queue
Queue backed by a JE Collections StoredSortedMap.
StoredQueue(Database, Class, StoredClassCatalog) - Constructor for class org.archive.queue.StoredQueue
Create a StoredQueue backed by the given Database.
STRAY_SPACING - Static variable in class org.archive.net.UURIFactory
 
STRICT - Static variable in class org.archive.httpclient.ConfigurableX509TrustManager
Strict trust.
strict - Variable in class org.archive.io.ArchiveRecord
 
strictAdd(CrawlURI, boolean) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
An internal method for adding URIs to the queue.
STRING - Static variable in class org.archive.crawler.settings.SettingsHandler
 
STRING_LIST - Static variable in class org.archive.crawler.settings.SettingsHandler
 
STRING_URI_DETECTOR - Static variable in class org.archive.extractor.RegexpJSLinkExtractor
 
STRING_URI_DETECTOR - Static variable in class org.archive.util.UriUtils
 
STRING_URI_DETECTOR_EXCEPTIONS - Static variable in class org.archive.util.UriUtils
 
StringIntPair - Class in org.archive.crawler.util
 
StringIntPair(String, int) - Constructor for class org.archive.crawler.util.StringIntPair
 
StringIntPairComparator - Class in org.archive.crawler.util
 
StringIntPairComparator() - Constructor for class org.archive.crawler.util.StringIntPairComparator
 
StringList - Class in org.archive.crawler.settings
List of String values.
StringList(String, String) - Constructor for class org.archive.crawler.settings.StringList
Creates a new StringList.
StringList(String, String, StringList) - Constructor for class org.archive.crawler.settings.StringList
Creates a new StringList and initializes it with the values from another StringList.
StringList(String, String, String[]) - Constructor for class org.archive.crawler.settings.StringList
Creates a new StringList and initializes it with the values from an array of Strings.
strings - Variable in class org.archive.extractor.RegexpJSLinkExtractor
 
StringToType(String, String) - Static method in class org.archive.crawler.settings.SettingsHandler
Convert a String object to an object of typeName.
stripExtension(String, String) - Static method in class org.archive.io.ArchiveReader
 
StripExtraSlashes - Class in org.archive.crawler.url.canonicalize
 
StripExtraSlashes(String) - Constructor for class org.archive.crawler.url.canonicalize.StripExtraSlashes
 
StripSessionCFIDs - Class in org.archive.crawler.url.canonicalize
Strip cold fusion session ids.
StripSessionCFIDs(String) - Constructor for class org.archive.crawler.url.canonicalize.StripSessionCFIDs
 
StripSessionIDs - Class in org.archive.crawler.url.canonicalize
Strip known session ids.
StripSessionIDs(String) - Constructor for class org.archive.crawler.url.canonicalize.StripSessionIDs
 
stripToMinimal() - Method in class org.archive.crawler.datamodel.CrawlURI
Remove all attributes set on this uri.
StripUserinfoRule - Class in org.archive.crawler.url.canonicalize
Strip any 'userinfo' found on http/https URLs.
StripUserinfoRule(String) - Constructor for class org.archive.crawler.url.canonicalize.StripUserinfoRule
 
StripWWWNRule - Class in org.archive.crawler.url.canonicalize
Strip any 'www[0-9]*' found on http/https URLs IF they have some path/query component (content after third slash).
StripWWWNRule(String) - Constructor for class org.archive.crawler.url.canonicalize.StripWWWNRule
 
StripWWWRule - Class in org.archive.crawler.url.canonicalize
Strip any 'www' found on http/https URLs, IF they have some path/query component (content after third slash).
StripWWWRule(String) - Constructor for class org.archive.crawler.url.canonicalize.StripWWWRule
 
SUBACTION - Static variable in class org.archive.crawler.admin.ui.JobConfigureUtils
 
SUBARRAY_LENGTH_IN_LONGS - Static variable in class org.archive.util.BloomFilter64bit
number of longs in one subarray
SUBARRAY_MASK - Static variable in class org.archive.util.BloomFilter64bit
mask for lowest SUBARRAY_POWER_OF_TWO bits
SUBARRAY_POWER_OF_TWO - Static variable in class org.archive.util.BloomFilter64bit
power-of-two to use as maximum size of bitfield subarrays
subclasses(Collection<? extends Object>, Class<Target>) - Static method in class org.archive.crawler.util.Transform
Returns a transform containing only objects of a given class.
SubElement - Class in org.archive.util.anvl
Abstract ANVL 'data element' sub-part.
SubElement() - Constructor for class org.archive.util.anvl.SubElement
 
SubElement(String) - Constructor for class org.archive.util.anvl.SubElement
 
subList(int, int) - Method in class org.archive.crawler.settings.ListType
 
SubList<E> - Class in org.archive.util
Universal sublist implementation.
SubList(List<E>, int, int) - Constructor for class org.archive.util.SubList
Constructor.
subSequence(int, int) - Method in class org.archive.crawler.settings.TextField
 
subSequence(int, int) - Method in class org.archive.io.CharSubSequence
 
subSequence(int, int) - Method in class org.archive.io.GenericReplayCharSequence
 
subSequence(int, int) - Method in class org.archive.io.Latin1ByteReplayCharSequence
 
subSequence(int, int) - Method in class org.archive.io.SeekReaderCharSequence
 
subSequence(int, int) - Method in class org.archive.net.UURI
 
subSequence(int, int) - Method in class org.archive.util.InterruptibleCharSequence
 
subset(CrawlURI, Class) - Method in class org.archive.crawler.datamodel.CredentialStore
Return set made up of all credentials of the passed type.
subset(CrawlURI, Class, String) - Method in class org.archive.crawler.datamodel.CredentialStore
Return set made up of all credentials of the passed type.
substats - Variable in class org.archive.crawler.datamodel.CrawlHost
 
substats - Variable in class org.archive.crawler.datamodel.CrawlServer
 
substats - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
 
substats - Variable in class org.archive.crawler.frontier.WorkQueue
Substats for all CrawlURIs in this group
substring(int, int) - Method in class org.archive.io.Latin1ByteReplayCharSequence
Deprecated. please use subSequence() and then toString() directly
subtally(Map<String, Long>, long, long, long) - Method in class org.archive.io.warc.WARCWriter
 
subtract(Histotable<K>) - Method in class org.archive.util.Histotable
 
succeededFetchCount() - Method in interface org.archive.crawler.framework.Frontier
Number of successfully processed URIs.
succeededFetchCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
succeededFetchCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
(non-Javadoc)
succeededFetchCount() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
SUCCESS_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
successBytes - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
successDisposition(CrawlURI) - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
The CrawlURI has been successfully crawled.
SUCCESSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
successfullyFetchedCount() - Method in class org.archive.crawler.admin.StatisticsTracker
 
successfullyFetchedCount() - Method in interface org.archive.crawler.framework.StatisticsTracking
Number of successfully processed URIs.
suite(String, CrawlJob, File, File) - Static method in class org.archive.crawler.selftest.AllSelfTestCases
Run all known tests in the selftest suite.
suite(String, CrawlJob, File, File, List) - Static method in class org.archive.crawler.selftest.AllSelfTestCases
Run list of passed tests.
summary() - Method in class org.archive.crawler.util.CrawledBytesHistotable
 
SupplementaryLinksScoper - Class in org.archive.crawler.postprocessor
Run CandidateURI links carried in the passed CrawlURI through a filter and 'handle' rejections.
SupplementaryLinksScoper(String) - Constructor for class org.archive.crawler.postprocessor.SupplementaryLinksScoper
 
Supplier<V> - Class in org.archive.util
Class for optionally providing one instance of the parameterized type.
Supplier() - Constructor for class org.archive.util.Supplier
 
Supplier(V) - Constructor for class org.archive.util.Supplier
 
SURT - Class in org.archive.util
Sort-friendly URI Reordering Transform.
SURT() - Constructor for class org.archive.util.SURT
 
SurtAuthorityQueueAssignmentPolicy - Class in org.archive.crawler.frontier
SurtAuthorityQueueAssignmentPolicy based on the surt form of hostname.
SurtAuthorityQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.SurtAuthorityQueueAssignmentPolicy
 
SurtPrefixedDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any URIs that, when expressed in SURT form, begin with one of the prefixes in the configured set.
SurtPrefixedDecideRule(String) - Constructor for class org.archive.crawler.deciderules.SurtPrefixedDecideRule
Usual constructor.
surtPrefixes - Variable in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
 
surtPrefixes - Variable in class org.archive.crawler.filter.SurtPrefixFilter
Deprecated.  
surtPrefixes - Variable in class org.archive.crawler.scope.SurtPrefixScope
Deprecated.  
SurtPrefixFilter - Class in org.archive.crawler.filter
Deprecated. As of release 1.10.0. Replaced by DecidingFilter and equivalent DecideRule.
SurtPrefixFilter(String) - Constructor for class org.archive.crawler.filter.SurtPrefixFilter
Deprecated.  
SurtPrefixScope - Class in org.archive.crawler.scope
Deprecated. As of release 1.10.0. Replaced by DecidingScope.
SurtPrefixScope(String) - Constructor for class org.archive.crawler.scope.SurtPrefixScope
Deprecated.  
SurtPrefixSet - Class in org.archive.util
Specialized TreeSet for keeping a set of String prefixes.
SurtPrefixSet() - Constructor for class org.archive.util.SurtPrefixSet
 
suspend(WorkQueueFrontier) - Method in class org.archive.crawler.frontier.WorkQueue
Suspends this WorkQueue.
sweepHand - Variable in class org.archive.util.fingerprint.LongFPSetCache
 
sync() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Method used by BdbFrontier during checkpointing.
sync() - Method in class org.archive.util.CachedBdbMap
Deprecated. Sync in-memory map entries to backing disk store.
sync() - Method in class org.archive.util.ObjectIdentityBdbCache
Sync all in-memory map entries to backing disk store.
sync() - Method in interface org.archive.util.ObjectIdentityCache
force the persistent backend, if any, to be updated with all live object state
sync() - Method in class org.archive.util.ObjectIdentityMemCache
 
syncDirectories(File, FilenameFilter, File) - Static method in class org.archive.util.FileUtils
Use for case where files are being added to src.
SYSTEM_PREFIX - Static variable in class org.archive.crawler.Heritrix
Prefix used on other properties we'll add to the System.properties list (after stripping this prefix).
SYSTEM_PROPERTY_GENERATOR_KEY - Variable in class org.archive.uid.GeneratorFactory
 

T

tagDefineButton(int, Vector) - Method in class org.archive.crawler.extractor.CustomSWFTags
 
tagDefineButton(int, Vector) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorSWFTags
 
tagDefineButton2(int, boolean, Vector) - Method in class org.archive.crawler.extractor.CustomSWFTags
 
tagDefineButton2(int, boolean, Vector) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorSWFTags
 
tagDefineSprite(int) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorSWFTags
 
tagDoAction() - Method in class org.archive.crawler.extractor.CustomSWFTags
 
tagDoAction() - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorSWFTags
 
tagDoInActions(int) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorSWFTags
 
tagPlaceObject2(boolean, int, int, int, Matrix, AlphaTransform, int, String, int) - Method in class org.archive.crawler.extractor.ExtractorSWF.ExtractorSWFTags
 
tags - Variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
tail(String) - Static method in class org.archive.crawler.util.LogReader
Implementation of a unix-like 'tail' command
tail(String, int) - Static method in class org.archive.crawler.util.LogReader
Implementation of a unix-like 'tail -n' command
tail(RandomAccessFile, int) - Static method in class org.archive.crawler.util.LogReader
Implementation of a unix-like 'tail -n' command
tailIndex - Variable in class org.archive.queue.StoredQueue
 
tally(CrawlURI, CrawlSubstats.Stage) - Method in class org.archive.crawler.datamodel.CrawlSubstats
Examing the CrawlURI and based on its status and internal values, update tallies.
tally(CrawlURI, CrawlSubstats.Stage) - Method in class org.archive.crawler.frontier.AbstractFrontier
Report CrawlURI to each of the three 'substats' accumulators (group/queue, server, host) for a given stage.
tally(String, long, long, long) - Method in class org.archive.io.warc.WARCWriter
 
tally(K) - Method in class org.archive.util.Histotable
Record one more occurence of the given object key.
tally(K, long) - Method in class org.archive.util.Histotable
Record count more occurence(s) of the given object key.
tallyCurrentPause() - Method in class org.archive.crawler.framework.AbstractTracker
For a current pause (if any), add paused time to total and reset
targetSize - Variable in class org.archive.crawler.framework.ToePool
 
targetSizeForReadyQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
Target (minimum) size to keep readyClassQueues
tearDown() - Method in class org.archive.crawler.settings.SettingsFrameworkTestCase
 
tearDown() - Method in class org.archive.queue.QueueTestBase
 
terminate() - Method in interface org.archive.crawler.framework.Frontier
Notify Frontier that it should end the crawl, giving any worker ToeThread that askss for a next() an EndedException.
terminate() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
terminate() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
TERMINATE_CRAWL_JOB_OPER - Static variable in class org.archive.crawler.Heritrix
 
terminateCurrentJob() - Method in class org.archive.crawler.admin.CrawlJobHandler
 
testAdd() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
check that we can add fingerprints
testCompressedARCFile(File) - Static method in class org.archive.io.arc.ARCReaderFactory
Check file is compressed and in ARC GZIP format.
testCompressedARCFile(File, boolean) - Static method in class org.archive.io.arc.ARCReaderFactory
Check file is compressed and in ARC GZIP format.
testCompressedARCFile(File) - Static method in class org.archive.io.arc.ARCUtils
Check file is compressed and in ARC GZIP format.
testCompressedARCFile(File, boolean) - Static method in class org.archive.io.arc.ARCUtils
Check file is compressed and in ARC GZIP format.
testCompressedARCStream(InputStream) - Static method in class org.archive.io.arc.ARCReaderFactory
Tests passed stream is gzip stream by reading in the HEAD.
testCompressedARCStream(InputStream) - Static method in class org.archive.io.arc.ARCUtils
Tests passed stream is gzip stream by reading in the HEAD.
testCompressedRepositionalStream(RepositionableStream) - Static method in class org.archive.io.arc.ARCUtils
Tests passed stream is gzip stream by reading in the HEAD.
testCompressedStream(InputStream) - Static method in class org.archive.io.arc.ARCUtils
Tests passed stream is gzip stream by reading in the HEAD.
testCompressedWARCFile(File) - Static method in class org.archive.io.warc.WARCReaderFactory
Check file is compressed WARC.
testContains() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
check that contains() does what we expect
testCount() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
check count works ok
testDequeue() - Method in class org.archive.queue.QueueTestBase
test that dequeue works
testDequeueEmptyQueue() - Method in class org.archive.queue.QueueTestBase
check what happens we dequeue on empty
testFilesInArc(List<File>) - Method in class org.archive.crawler.selftest.SelfTestCase
Test passed list were all found in the arc.
testFilesInArc(List<File>, List<File>) - Method in class org.archive.crawler.selftest.SelfTestCase
Test passed list were all found in the arc.
testGzipMagic(InputStream) - Method in class org.archive.io.GzipHeader
Test gzip magic is next in the stream.
testGzipMagic(InputStream, CRC32) - Method in class org.archive.io.GzipHeader
Test gzip magic is next in the stream.
testNoop() - Method in class org.archive.crawler.selftest.AltTestSuite
 
testNothing() - Method in class org.archive.crawler.selftest.SelfTestCase
 
testQueue() - Method in class org.archive.queue.QueueTestBase
test that queue puts things on, and they stay there :)
testRemove() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
test remove() works as expected
testRequiredField(Map, String) - Method in class org.archive.io.arc.ARCRecordMetaData
Test required field is present in hash.
testSmall() - Method in class org.archive.util.BloomFilterTestBase
 
testUncompressedARCFile(File) - Static method in class org.archive.io.arc.ARCUtils
Check file is uncompressed ARC file.
TestUtils - Class in org.archive.util
Utility methods useful in testing situations.
TestUtils() - Constructor for class org.archive.util.TestUtils
 
testWithZero() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
check we can call add/remove/contains() with 0 as a value
TEXT - Static variable in class org.archive.crawler.settings.SettingsHandler
 
TextField - Class in org.archive.crawler.settings
Class to hold values for text fields.
TextField(String) - Constructor for class org.archive.crawler.settings.TextField
Constructs a new TextField object.
TextUtils - Class in org.archive.util
 
TextUtils() - Constructor for class org.archive.util.TextUtils
 
TextWaitEvaluator - Class in org.archive.crawler.postprocessor
A specialized ContentBasedWaitEvaluator.
TextWaitEvaluator(String) - Constructor for class org.archive.crawler.postprocessor.TextWaitEvaluator
Constructor
THREAD_COUNT_ATTR - Static variable in class org.archive.crawler.admin.CrawlJob
 
threadContextSettingsHandler - Static variable in class org.archive.crawler.settings.SettingsHandler
 
threadCount() - Method in class org.archive.crawler.admin.StatisticsTracker
Get the total number of ToeThreads (sleeping and active)
threadInterpreter - Variable in class org.archive.crawler.deciderules.BeanShellDecideRule
 
threadInterpreter - Variable in class org.archive.crawler.processor.BeanShellProcessor
 
ThreadLocalHttpConnectionManager - Class in org.archive.httpclient
A simple, but thread-safe HttpClient HttpConnectionManager.
ThreadLocalHttpConnectionManager() - Constructor for class org.archive.httpclient.ThreadLocalHttpConnectionManager
 
THREADS_REPORT_OPER - Static variable in class org.archive.crawler.admin.CrawlJob
 
THREADS_SHORT_REPORT_ATTR - Static variable in class org.archive.crawler.admin.CrawlJob
 
timeoutMs - Variable in class org.archive.io.RecordingOutputStream
maximum time to record before throwing exception
TIMER_TRUNC - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
 
TimespanCriteria - Class in org.archive.crawler.settings.refinements
A refinement criteria that checks if a URI is requested within a specific time frame.
TimespanCriteria(String, String) - Constructor for class org.archive.crawler.settings.refinements.TimespanCriteria
Create a new instance of TimespanCriteria.
TIMESTAMP - Static variable in class org.archive.crawler.settings.SettingsHandler
 
timestamp17ToCalendar(String) - Static method in class org.archive.util.ArchiveUtils
Convert 17-digit date format timestamps (as found in crawl.log, for example) into a GregorianCalendar object.
timestamp_interval - Variable in class org.archive.crawler.io.CrawlerJournal
number of lines between timestamps
TimestampSerialno - Class in org.archive.util
Immutable data structure that holds a timestamp and an accompanying serial number.
TimestampSerialno(String, int) - Constructor for class org.archive.util.TimestampSerialno
 
TimestampSerialno(int) - Constructor for class org.archive.util.TimestampSerialno
 
tldBytes - Variable in class org.archive.crawler.admin.StatisticsSummary
 
tldDistribution - Variable in class org.archive.crawler.admin.StatisticsSummary
Keep track of TLDs
tldHostDistribution - Variable in class org.archive.crawler.admin.StatisticsSummary
 
TLDs - Static variable in class org.archive.crawler.extractor.ExtractorUniversal
Matches any string that begins with a TLD (no .) followed by a '/' slash or end of string.
TLDS - Static variable in class org.archive.util.ArchiveUtils
 
TMPDIR - Static variable in class org.archive.crawler.Heritrix
 
TMPDIR - Static variable in class org.archive.util.FileUtils
 
TmpDirTestCase - Class in org.archive.util
Base class for TestCases that want access to a tmp dir for the writing of files.
TmpDirTestCase() - Constructor for class org.archive.util.TmpDirTestCase
 
TmpDirTestCase(String) - Constructor for class org.archive.util.TmpDirTestCase
 
toArray() - Method in class org.archive.crawler.settings.ListType
 
toArray(X[]) - Method in class org.archive.crawler.settings.ListType
 
toeEnded() - Method in class org.archive.crawler.framework.CrawlController
Note that a ToeThread ended, possibly completing the crawl-stop.
toePaused() - Method in class org.archive.crawler.framework.CrawlController
Note that a ToeThread reached paused condition, possibly completing the crawl-pause.
ToePool - Class in org.archive.crawler.framework
A collection of ToeThreads.
ToePool(CrawlController) - Constructor for class org.archive.crawler.framework.ToePool
Constructor.
ToeThread - Class in org.archive.crawler.framework
One "worker thread"; asks for CrawlURIs, processes them, repeats unless told otherwise.
ToeThread(ToePool, int) - Constructor for class org.archive.crawler.framework.ToeThread
Create a ToeThread
TOKENIZED_PREFIX - Static variable in interface org.archive.io.arc.ARCConstants
Tokenized field prefix.
TooManyHopsDecideRule - Class in org.archive.crawler.deciderules
Rule REJECTs any CrawlURIs whose total number of hops (length of the hopsPath string, traversed links of any type) is over a threshold.
TooManyHopsDecideRule(String) - Constructor for class org.archive.crawler.deciderules.TooManyHopsDecideRule
Usual constructor.
TooManyPathSegmentsDecideRule - Class in org.archive.crawler.deciderules
Rule REJECTs any CrawlURIs whose total number of path-segments (as indicated by the count of '/' characters not including the first '//') is over a given threshold.
TooManyPathSegmentsDecideRule(String) - Constructor for class org.archive.crawler.deciderules.TooManyPathSegmentsDecideRule
Usual constructor.
topLevelModules() - Method in class org.archive.crawler.settings.CrawlerSettings
 
topmostAssignedSurtPrefixPattern - Static variable in class org.archive.net.PublicSuffixes
 
topmostAssignedSurtPrefixRegex - Static variable in class org.archive.net.PublicSuffixes
 
TopmostAssignedSurtQueueAssignmentPolicy - Class in org.archive.crawler.frontier
Create a queueKey based on the SURT authority, reduced to the public-suffix-plus-one domain (topmost assignable domain).
TopmostAssignedSurtQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.TopmostAssignedSurtQueueAssignmentPolicy
 
toResourcePath(File) - Static method in class org.archive.crawler.settings.XMLSettingsHandler
Convert a File to a path that might be resolved from classpath/JAR resource sources.
toString() - Method in class org.archive.crawler.datamodel.CandidateURI
 
toString() - Method in class org.archive.crawler.datamodel.CrawlHost
 
toString() - Method in class org.archive.crawler.datamodel.CrawlServer
 
toString() - Method in class org.archive.crawler.datamodel.credential.CredentialAvatar
 
toString() - Method in class org.archive.crawler.extractor.Link
 
toString() - Method in class org.archive.crawler.framework.CrawlScope
 
toString() - Method in class org.archive.crawler.framework.Filter
 
toString() - Method in class org.archive.crawler.settings.ComplexType
 
toString() - Method in class org.archive.crawler.settings.Constraint.FailedCheck
Returns a human readeable string for the failed check.
toString() - Method in class org.archive.crawler.settings.TextField
 
toString() - Method in class org.archive.crawler.writer.MirrorWriterProcessor.LumpyString
Converts this LumpyString to a String.
toString() - Method in class org.archive.io.arc.ARCRecordMetaData
 
toString() - Method in interface org.archive.io.ArchiveRecordHeader
 
toString() - Method in class org.archive.io.CharSubSequence
 
toString() - Method in class org.archive.io.GenericReplayCharSequence
 
toString() - Method in class org.archive.io.Latin1ByteReplayCharSequence
 
toString() - Method in exception org.archive.io.RecoverableIOException
 
toString() - Method in class org.archive.io.SeekReaderCharSequence
 
toString() - Method in class org.archive.io.SinkHandlerLogRecord
 
toString() - Method in class org.archive.net.UURI
Override to cache result
toString() - Method in class org.archive.util.anvl.ANVLRecord
 
toString() - Method in class org.archive.util.anvl.ANVLRecords
 
toString() - Method in class org.archive.util.anvl.Element
 
toString() - Method in class org.archive.util.anvl.SubElement
 
toString() - Method in class org.archive.util.InterruptibleCharSequence
 
toString() - Method in class org.archive.util.ms.DefaultEntry
 
toString() - Method in class org.archive.util.ms.HeaderBlock
 
toString() - Method in class org.archive.util.ms.Piece
 
toString() - Method in class org.archive.util.PaddingStringBuffer
 
toString() - Method in class org.archive.util.ProcessUtils.ProcessResult
 
TOTAL_BYTES - Static variable in class org.archive.io.warc.WARCWriter
 
TOTAL_DATA_ATTR - Static variable in class org.archive.crawler.admin.CrawlJob
 
totalBytes - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
totalBytesCrawled() - Method in class org.archive.crawler.admin.StatisticsTracker
 
totalBytesCrawled() - Method in interface org.archive.crawler.framework.StatisticsTracking
Returns the total number of uncompressed bytes crawled.
totalBytesWritten() - Method in class org.archive.crawler.admin.StatisticsTracker
Deprecated. use totalBytesCrawled
totalBytesWritten() - Method in interface org.archive.crawler.framework.Frontier
Deprecated. misnomer; consult StatisticsTracker instead
totalBytesWritten() - Method in interface org.archive.crawler.framework.StatisticsTracking
Deprecated. misnomer; use totalBytesCrawled instead
totalBytesWritten() - Method in class org.archive.crawler.frontier.AbstractFrontier
Deprecated. misnomer; use StatisticsTracking figures instead
totalBytesWritten() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
totalCount() - Method in class org.archive.crawler.admin.StatisticsTracker
 
totalCount() - Method in interface org.archive.crawler.framework.StatisticsTracking
 
totalDataWritten - Variable in class org.archive.crawler.admin.StatisticsSummary
 
totalDnsHostDocuments - Variable in class org.archive.crawler.admin.StatisticsSummary
 
totalDnsHostSize - Variable in class org.archive.crawler.admin.StatisticsSummary
 
totalDnsMimeSize - Variable in class org.archive.crawler.admin.StatisticsSummary
 
totalDnsMimeTypeDocuments - Variable in class org.archive.crawler.admin.StatisticsSummary
 
totalDnsStatusCodeDocuments - Variable in class org.archive.crawler.admin.StatisticsSummary
 
totalFileTypeDocuments - Variable in class org.archive.crawler.admin.StatisticsSummary
 
totalHostDocuments - Variable in class org.archive.crawler.admin.StatisticsSummary
 
totalHosts - Variable in class org.archive.crawler.admin.StatisticsSummary
 
totalHostSize - Variable in class org.archive.crawler.admin.StatisticsSummary
 
totalKBPerSec - Variable in class org.archive.crawler.admin.StatisticsTracker
 
totalMimeSize - Variable in class org.archive.crawler.admin.StatisticsSummary
 
totalMimeTypeDocuments - Variable in class org.archive.crawler.admin.StatisticsSummary
 
totalProcessedBytes - Variable in class org.archive.crawler.admin.StatisticsTracker
 
totalProcessedBytes - Variable in class org.archive.crawler.frontier.AbstractFrontier
Used when bandwidth constraint are used.
TOTALS - Static variable in class org.archive.io.warc.WARCWriter
 
totalScheduled - Variable in class org.archive.crawler.datamodel.CrawlSubstats
 
totalStatusCodeDocuments - Variable in class org.archive.crawler.admin.StatisticsSummary
 
totalTldDocuments - Variable in class org.archive.crawler.admin.StatisticsSummary
 
totalTldSize - Variable in class org.archive.crawler.admin.StatisticsSummary
 
TRAILING_ESCAPED_SPACE - Static variable in class org.archive.net.UURIFactory
 
TransclusionDecideRule - Class in org.archive.crawler.deciderules
Rule ACCEPTs any CrawlURIs whose path-from-seed ('hopsPath' -- see CandidateURI.getPathFromSeed()) ends with at least one, but not more than, the given number of non-navlink ('L') hops.
TransclusionDecideRule(String) - Constructor for class org.archive.crawler.deciderules.TransclusionDecideRule
Usual constructor.
TransclusionFilter - Class in org.archive.crawler.filter
Deprecated. As of release 1.10.0. Replaced by DecidingFilter and equivalent DecideRule.
TransclusionFilter(String) - Constructor for class org.archive.crawler.filter.TransclusionFilter
Deprecated.  
transform(String) - Method in class org.archive.crawler.scope.SeedFileIterator
 
Transform<Original,Transformed> - Class in org.archive.crawler.util
A transformation of a collection.
Transform(Collection<? extends Original>, Transformer<Original, Transformed>) - Constructor for class org.archive.crawler.util.Transform
Constructor.
transform(Original) - Method in interface org.archive.crawler.util.Transformer
Transforms the given object.
transform(File, File, boolean) - Method in class org.archive.io.Arc2Warc
 
transform(ARCReader, File) - Method in class org.archive.io.Arc2Warc
 
transform(File, File, String, String, boolean) - Method in class org.archive.io.Warc2Arc
 
transform(WARCReader, ARCWriter) - Method in class org.archive.io.Warc2Arc
 
transform(String) - Method in class org.archive.util.iterator.RegexpLineIterator
Loads next item into lookahead spot, if available.
transform(Original) - Method in class org.archive.util.iterator.TransformingIteratorWrapper
 
TRANSFORMED_HOST_DELIM - Static variable in class org.archive.util.SURT
 
Transformer<Original,Transformed> - Interface in org.archive.crawler.util
Transforms objects from one thing into another.
TransformingIteratorWrapper<Original,Transformed> - Class in org.archive.util.iterator
Superclass for Iterators which transform and/or filter results from a wrapped Iterator.
TransformingIteratorWrapper() - Constructor for class org.archive.util.iterator.TransformingIteratorWrapper
 
TransformIterator<Original,Transformed> - Class in org.archive.crawler.util
 
TransformIterator(Iterator<? extends Original>, Transformer<Original, Transformed>) - Constructor for class org.archive.crawler.util.TransformIterator
 
transitiveAccepts(Object) - Method in class org.archive.crawler.scope.BroadScope
 
transitiveAccepts(Object) - Method in class org.archive.crawler.scope.ClassicScope
 
transitiveAccepts(Object) - Method in class org.archive.crawler.scope.DomainScope
Deprecated.  
transitiveAccepts(Object) - Method in class org.archive.crawler.scope.HostScope
Deprecated.  
transitiveAccepts(Object) - Method in class org.archive.crawler.scope.PathScope
Deprecated.  
transitiveAccepts(Object) - Method in class org.archive.crawler.scope.RefinedScope
 
transitiveFilter - Variable in class org.archive.crawler.scope.DomainScope
Deprecated.  
transitiveFilter - Variable in class org.archive.crawler.scope.HostScope
Deprecated.  
transitiveFilter - Variable in class org.archive.crawler.scope.PathScope
Deprecated.  
transitiveFilter - Variable in class org.archive.crawler.scope.RefinedScope
 
TrapSuppressExtractor - Class in org.archive.crawler.extractor
Pseudo-extractor that suppresses link-extraction of likely trap pages, by noticing when content's digest is identical to that of its 'via'.
TrapSuppressExtractor(String) - Constructor for class org.archive.crawler.extractor.TrapSuppressExtractor
Usual constructor.
trialWithParameters(long, int, long, long) - Method in class org.archive.util.BloomFilterTestBase
 
TRIMMED_ENTRY_TRAILING_COMMENT - Static variable in class org.archive.util.iterator.RegexpLineIterator
 
trimToMax(int) - Method in class org.archive.crawler.writer.MirrorWriterProcessor.LumpyString
If necessary, trims this string to a maximum length.
TRUNC_SUFFIX - Static variable in interface org.archive.crawler.datamodel.CoreAttributeConstants
Fetch truncation codes present in CrawlURI annotations.
truncate(String) - Static method in class org.archive.util.MimetypeUtils
Truncate passed mimetype.
TRUNCATED_VALUE_UNSPECIFIED - Static variable in interface org.archive.io.warc.WARCConstants
 
TRUNCATION_REGEX - Static variable in class org.archive.util.MimetypeUtils
Truncation regex.
Type - Class in org.archive.crawler.settings
Interface implemented by all element types.
Type(String, Object) - Constructor for class org.archive.crawler.settings.Type
Creates a new instance of Type.
TYPE - Static variable in interface org.archive.io.warc.WARCConstants
 
TYPE - Static variable in class org.archive.util.JmxUtils
 
TYPE_FIELD_KEY - Static variable in interface org.archive.io.ArchiveFileConstants
Key for the Archive File type field.
TYPES - Static variable in interface org.archive.io.warc.WARCConstants
 
TYPES_LIST - Static variable in interface org.archive.io.warc.WARCConstants
 
typesafe() - Method in class org.archive.crawler.settings.ListType
Returns a compile-time typesafe version of this list.

U

UID_ATTR - Static variable in class org.archive.crawler.admin.CrawlJob
 
unbindObjectName(Context, ObjectName) - Static method in class org.archive.util.JndiUtils
 
UNCALCULATED - Static variable in class org.archive.crawler.datamodel.CrawlURI
 
unescape(String) - Static method in class org.archive.util.JavaLiterals
 
unescapeHtml(CharSequence) - Static method in class org.archive.util.TextUtils
Replaces HTML Entity Encodings.
UnitCostAssignmentPolicy - Class in org.archive.crawler.frontier
A CostAssignment policy that uses a constant value of 1 for all CrawlURIs.
UnitCostAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.UnitCostAssignmentPolicy
 
unpause() - Method in interface org.archive.crawler.framework.Frontier
Resumes the release of URIs to crawl, allowing worker ToeThreads to proceed.
unpause() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
unpause() - Method in class org.archive.crawler.frontier.AdaptiveRevisitFrontier
 
unpeek() - Method in class org.archive.crawler.frontier.WorkQueue
Forgive the peek, allowing a subsequent peek to return a different item.
unpeek() - Method in class org.archive.queue.MemQueue
 
unpeek() - Method in interface org.archive.queue.Queue
Releases queue from the obligation to return in the next peek()/dequeue() the same object as returned by any previous peek().
unregisterHeritrix(Heritrix) - Static method in class org.archive.crawler.Heritrix
 
unregisterMBean() - Method in class org.archive.crawler.admin.CrawlJob
 
unregisterMBean(MBeanServer, String, String) - Static method in class org.archive.crawler.Heritrix
 
unregisterMBean(MBeanServer, ObjectName) - Static method in class org.archive.crawler.Heritrix
 
unregisterValueErrorHandler(ValueErrorHandler) - Method in class org.archive.crawler.settings.SettingsHandler
Unregister an instance of ValueErrorHandler.
unsetAttribute(CrawlerSettings, String) - Method in class org.archive.crawler.settings.ComplexType
Unset an attribute on a per host level.
unzip(File, File) - Static method in class org.archive.crawler.util.IoUtils
Use ant to unjar.
unzip(File, File, boolean) - Static method in class org.archive.crawler.util.IoUtils
Use ant to unjar.
update(CrawlURI, boolean, long) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Update CrawlURI that has completed processing.
update(CrawlURI, boolean, long, boolean) - Method in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Update CrawlURI that has completed processing.
update(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueue
Update the given CrawlURI, which should already be present.
updateGeneration(String) - Method in class org.archive.crawler.processor.CrawlMapper
Close and mark as finished all existing diversion logs, and arrange for new logs to use the new generation prefix.
updateRecoveryPaths(File, SettingsHandler, String) - Method in class org.archive.crawler.admin.CrawlJobHandler
 
updateRobots(CrawlURI) - Method in class org.archive.crawler.datamodel.CrawlServer
Update the robots exclusion policy.
updateWith(CrawlURI, String) - Method in class org.archive.crawler.admin.SeedRecord
A later/repeat report of the same seed has arrived; update with latest.
uri - Variable in class org.archive.crawler.settings.ComplexType.Context
 
URI_HEX_ENCODING - Static variable in class org.archive.net.UURIFactory
First percent sign in string followed by two hex chars.
URI_HISTORY_DBNAME - Static variable in class org.archive.crawler.processor.recrawl.PersistProcessor
name of history Database
URI_SPLITTER - Static variable in class org.archive.util.SURT
 
UriErrorFormatter - Class in org.archive.crawler.io
Formatter for 'uri-errors.log', of URIs so malformed they could not be instantiated.
UriErrorFormatter() - Constructor for class org.archive.crawler.io.UriErrorFormatter
 
uriErrors - Variable in class org.archive.crawler.framework.CrawlController
Special log for URI format problems, wherever they may occur.
URIListRegExpFilter - Class in org.archive.crawler.filter
Deprecated. As of release 1.10.0. Replaced by DecidingFilter and equivalent DecideRule.
URIListRegExpFilter(String) - Constructor for class org.archive.crawler.filter.URIListRegExpFilter
Deprecated.  
uriProcessing - Variable in class org.archive.crawler.framework.CrawlController
Crawl progress logger.
UriProcessingFormatter - Class in org.archive.crawler.io
Formatter for 'crawl.log'.
UriProcessingFormatter() - Constructor for class org.archive.crawler.io.UriProcessingFormatter
 
URIRegExpFilter - Class in org.archive.crawler.filter
Deprecated. As of release 1.10.0. Replaced by DecidingFilter and equivalent DecideRule.
URIRegExpFilter(String) - Constructor for class org.archive.crawler.filter.URIRegExpFilter
Deprecated.  
URIRegExpFilter(String, String) - Constructor for class org.archive.crawler.filter.URIRegExpFilter
Deprecated.  
URIRegExpFilter(String, String, String) - Constructor for class org.archive.crawler.filter.URIRegExpFilter
Deprecated.  
uris - Variable in class org.archive.extractor.RegexpCSSLinkExtractor
 
UriUniqFilter - Interface in org.archive.crawler.datamodel
A UriUniqFilter passes URI objects to a destination (receiver) if the passed URI object has not been previously seen.
UriUniqFilter.HasUriReceiver - Interface in org.archive.crawler.datamodel
URIs that have not been seen before 'visit' this 'Visitor'.
UriUtils - Class in org.archive.util
URI-related utilities.
UriUtils() - Constructor for class org.archive.util.UriUtils
 
URL_FIELD_KEY - Static variable in interface org.archive.io.ArchiveFileConstants
Key for the Archive File URL field.
URL_KEY - Static variable in interface org.archive.crawler.writer.Kw3Constants
 
usage() - Method in class org.archive.crawler.CommandLineParser
Print usage then exit.
usage(int) - Method in class org.archive.crawler.CommandLineParser
Print usage then exit.
usage(String, int) - Method in class org.archive.crawler.CommandLineParser
Print message then usage then exit.
userAgents - Variable in class org.archive.crawler.datamodel.Robotstxt
 
UTF8 - Static variable in interface org.archive.io.UTF8Bytes
 
UTF8 - Static variable in class org.archive.io.WriterPoolMember
 
UTF8Bytes - Interface in org.archive.io
Marker Interface for instances that can be serialized as UTF8 bytes.
UUIDGenerator - Class in org.archive.uid
Generates UUIDs, using java.util.UUID, formatted as URNs from the UUID namespace [See RFC4122].
UUIDGenerator() - Constructor for class org.archive.uid.UUIDGenerator
 
UURI - Class in org.archive.net
Usable URI.
UURI() - Constructor for class org.archive.net.UURI
Shutdown access to default constructor.
UURI(String, boolean, String) - Constructor for class org.archive.net.UURI
 
UURI(UURI, UURI) - Constructor for class org.archive.net.UURI
 
UURI(String, boolean) - Constructor for class org.archive.net.UURI
 
UURIFactory - Class in org.archive.net
Factory that returns UURIs.

V

valence - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Number of simultanious connections permitted to this host.
VALID_DF_OUTPUT - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
 
validate() - Method in class org.archive.io.ArchiveReader
Validate the Archive file.
validate(int) - Method in class org.archive.io.ArchiveReader
Validate the Archive file.
validate(char[], BitSet) - Method in class org.archive.net.LaxURI
 
validate(char[], int, int, BitSet) - Method in class org.archive.net.LaxURI
 
validateMetaLine(String) - Method in class org.archive.io.arc.ARCWriter
Test that the metadata line is valid before writing.
VALIDITY_STAMP_FILENAME - Static variable in class org.archive.crawler.datamodel.Checkpoint
Name of file written with timestamp into valid checkpoints.
validityCheck(UURI) - Method in class org.archive.net.UURIFactory
Check the generated UURI.
validRobots - Variable in class org.archive.crawler.datamodel.CrawlServer
 
Value - Class in org.archive.util.anvl
TODO: Now values 'fold' but should but perhaps they shouldn't be stored folded.
Value(String) - Constructor for class org.archive.util.anvl.Value
 
ValueErrorHandler - Interface in org.archive.crawler.settings
If a ValueErrorHandler is registered with a SettingsHandler, only constraints with level Level.SEVERE will throw an InvalidAttributeValueException.
valueOf(String) - Static method in enum org.archive.crawler.datamodel.CrawlSubstats.Stage
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.archive.util.ms.Entry.EntryType
Returns the enum constant of this type with the specified name.
values() - Static method in enum org.archive.crawler.datamodel.CrawlSubstats.Stage
Returns an array containing the constants of this enum type, in the order they are declared.
values - Variable in class org.archive.util.fingerprint.MemLongFPSet
 
values() - Static method in enum org.archive.util.ms.Entry.EntryType
Returns an array containing the constants of this enum type, in the order they are declared.
VERSION_ATTR - Static variable in class org.archive.crawler.Heritrix
 
VERSION_FIELD_KEY - Static variable in interface org.archive.io.ArchiveFileConstants
Key for the Archive File version field.
VIDEO - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
VIDEO - Static variable in class org.archive.crawler.filter.FilePatternFilter
Deprecated.  
VIDEO_PATTERNS - Static variable in class org.archive.crawler.deciderules.MatchesFilePatternDecideRule
 
VIDEO_PATTERNS - Static variable in class org.archive.crawler.filter.FilePatternFilter
Deprecated.  

W

WagCostAssignmentPolicy - Class in org.archive.crawler.frontier
A CostAssignmentPolicy based on some wild guesses of kinds of URIs that should be deferred into the (potentially never-crawled) future.
WagCostAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.WagCostAssignmentPolicy
 
WaitEvaluator - Class in org.archive.crawler.postprocessor
A processor that determines when a URI should be revisited next.
WaitEvaluator(String) - Constructor for class org.archive.crawler.postprocessor.WaitEvaluator
Constructor
WaitEvaluator(String, String, Long, Long, Long, Double, Double) - Constructor for class org.archive.crawler.postprocessor.WaitEvaluator
Constructor
wakeQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Wake any queues sitting in the snoozed queue whose time has come.
wakeQueuesAsIfAtTime(long) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Wake any queues sitting in the snoozed queue whose time has come.
wakeTimer - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
Timer for tasks which wake head item of snoozedClassQueues
wakeUpTime - Variable in class org.archive.crawler.frontier.AdaptiveRevisitHostQueue
Time (in milliseconds) when each URI 'slot' becomes available again.
Warc2Arc - Class in org.archive.io
Convert WARCs to (sortof) ARCs.
Warc2Arc() - Constructor for class org.archive.io.Warc2Arc
 
WARC_010_ID - Static variable in interface org.archive.io.warc.WARCConstants
 
WARC_010_MAGIC - Static variable in interface org.archive.io.warc.WARCConstants
 
WARC_FILE_EXTENSION - Static variable in interface org.archive.io.warc.WARCConstants
WARC file extention.
WARC_HEADER_ENCODING - Static variable in interface org.archive.io.warc.WARCConstants
 
WARC_ID - Static variable in interface org.archive.io.warc.WARCConstants
WARC-ID
WARC_MAGIC - Static variable in interface org.archive.io.warc.WARCConstants
WARC MAGIC WARC files and records begin with this sequence.
WARC_VERSION - Static variable in interface org.archive.io.warc.WARCConstants
Hard-coded version for WARC files made with this code.
WARCConstants - Interface in org.archive.io.warc
WARC Constants used by WARC readers and writers.
WARCINFO - Static variable in interface org.archive.io.warc.WARCConstants
WARC Record Types.
WARCINFO_INDEX - Static variable in interface org.archive.io.warc.WARCConstants
 
WARCReader - Class in org.archive.io.warc
WARCReader.
WARCReader() - Constructor for class org.archive.io.warc.WARCReader
 
WARCReaderFactory - Class in org.archive.io.warc
Factory for WARC Readers.
WARCReaderFactory.CompressedWARCReader - Class in org.archive.io.warc
Compressed WARC file reader.
WARCReaderFactory.CompressedWARCReader(File) - Constructor for class org.archive.io.warc.WARCReaderFactory.CompressedWARCReader
Constructor.
WARCReaderFactory.CompressedWARCReader(File, long) - Constructor for class org.archive.io.warc.WARCReaderFactory.CompressedWARCReader
Constructor.
WARCReaderFactory.CompressedWARCReader(String, InputStream, boolean) - Constructor for class org.archive.io.warc.WARCReaderFactory.CompressedWARCReader
Constructor.
WARCReaderFactory.UncompressedWARCReader - Class in org.archive.io.warc
Uncompressed WARC file reader.
WARCReaderFactory.UncompressedWARCReader(File) - Constructor for class org.archive.io.warc.WARCReaderFactory.UncompressedWARCReader
Constructor.
WARCReaderFactory.UncompressedWARCReader(File, long) - Constructor for class org.archive.io.warc.WARCReaderFactory.UncompressedWARCReader
Constructor.
WARCReaderFactory.UncompressedWARCReader(String, InputStream) - Constructor for class org.archive.io.warc.WARCReaderFactory.UncompressedWARCReader
Constructor.
WARCRecord - Class in org.archive.io.warc
A WARC file Record.
WARCRecord(InputStream, String, long) - Constructor for class org.archive.io.warc.WARCRecord
Constructor.
WARCRecord(InputStream, ArchiveRecordHeader) - Constructor for class org.archive.io.warc.WARCRecord
Constructor.
WARCRecord(InputStream, String, long, boolean, boolean) - Constructor for class org.archive.io.warc.WARCRecord
Constructor.
WARCWriter - Class in org.archive.io.warc
WARC implementation.
WARCWriter() - Constructor for class org.archive.io.warc.WARCWriter
Shutdown Constructor Has default access so can make instance to test utility methods.
WARCWriter(AtomicInteger, OutputStream, File, boolean, String, List<String>) - Constructor for class org.archive.io.warc.WARCWriter
Constructor.
WARCWriter(AtomicInteger, List<File>, String, String, boolean, long, List<String>) - Constructor for class org.archive.io.warc.WARCWriter
Constructor.
WARCWriterPool - Class in org.archive.io.warc
A pool of WARCWriters.
WARCWriterPool(WriterPoolSettings, int, int) - Constructor for class org.archive.io.warc.WARCWriterPool
Constructor
WARCWriterPool(AtomicInteger, WriterPoolSettings, int, int) - Constructor for class org.archive.io.warc.WARCWriterPool
Constructor
WARCWriterProcessor - Class in org.archive.crawler.writer
WARCWriterProcessor.
WARCWriterProcessor(String) - Constructor for class org.archive.crawler.writer.WARCWriterProcessor
 
warnHandle(Throwable, String) - Static method in class org.archive.util.DevUtils
Log a warning message to the logger 'org.archive.util.DevUtils' made of the passed 'note' and a stack trace based off passed exception.
WCDX_VERSION - Static variable in class org.archive.io.arc.ARC2WCDX
 
WebappLifecycle - Class in org.archive.crawler
Calls start and stop of Heritrix when Heritrix is bundled as a webapp.
WebappLifecycle() - Constructor for class org.archive.crawler.WebappLifecycle
 
weight - Variable in class org.archive.util.BloomFilter64bit
The random integers used to generate the hash functions.
WHITESPACE - Static variable in class org.archive.crawler.extractor.ExtractorHTML
 
WHITESPACE - Static variable in class org.archive.extractor.RegexpHTMLLinkExtractor
 
WHITESPACE - Static variable in class org.archive.extractor.RegexpJSLinkExtractor
 
workaroundCopyFile(File, File) - Static method in class org.archive.util.FileUtils
 
WorkQueue - Class in org.archive.crawler.frontier
A single queue of related URIs to visit, grouped by a classKey (typically "hostname:port" or similar)
WorkQueue(String) - Constructor for class org.archive.crawler.frontier.WorkQueue
 
workQueueDataOnDisk() - Method in class org.archive.crawler.frontier.BdbFrontier
 
workQueueDataOnDisk() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Returns true if the WorkQueue implementation of this Frontier stores its workload on disk instead of relying on serialization mechanisms.
WorkQueueFrontier - Class in org.archive.crawler.frontier
A common Frontier base using several queues to hold pending URIs.
WorkQueueFrontier(String, String) - Constructor for class org.archive.crawler.frontier.WorkQueueFrontier
Create the CommonFrontier
WorkQueueFrontier.WakeTask - Class in org.archive.crawler.frontier
 
WorkQueueFrontier.WakeTask() - Constructor for class org.archive.crawler.frontier.WorkQueueFrontier.WakeTask
 
wrapAsIOException(Throwable) - Static method in class org.archive.util.IoUtils
Wrap generic Throwable as a checked IOException
wrapInputStreamWithHttpRecord(File, String, InputStream, String) - Static method in class org.archive.util.HttpRecorder
Record the input stream for later playback by an extractor, etc.
write(CrawlURI, long, InputStream, String) - Method in class org.archive.crawler.writer.ARCWriterProcessor
 
write(String, CrawlURI) - Method in class org.archive.crawler.writer.WARCWriterProcessor
 
write(String, String, String, long, long, ByteArrayOutputStream) - Method in class org.archive.io.arc.ARCWriter
Deprecated. use input-stream version directly instead
write(String, String, String, long, long, InputStream) - Method in class org.archive.io.arc.ARCWriter
 
write(String, String, String, long, long, InputStream, boolean) - Method in class org.archive.io.arc.ARCWriter
Write a record with the given metadata/content.
write(WARCWriter, ARCRecord) - Method in class org.archive.io.Arc2Warc
 
write(int) - Method in class org.archive.io.RandomAccessOutputStream
 
write(byte[], int, int) - Method in class org.archive.io.RandomAccessOutputStream
 
write(byte[]) - Method in class org.archive.io.RandomAccessOutputStream
 
write(int) - Method in class org.archive.io.RecordingOutputStream
 
write(byte[], int, int) - Method in class org.archive.io.RecordingOutputStream
 
write(int) - Method in class org.archive.io.RecyclingFastBufferedOutputStream
 
write(byte[], int, int) - Method in class org.archive.io.RecyclingFastBufferedOutputStream
 
write(byte[]) - Method in class org.archive.io.WriterPoolMember
 
write(byte[], int, int) - Method in class org.archive.io.WriterPoolMember
 
write(int) - Method in class org.archive.io.WriterPoolMember
 
writeArchiveInfoPart(String, CrawlURI, ReplayInputStream, OutputStream) - Method in class org.archive.crawler.writer.Kw3WriterProcessor
 
writeAttribute(String, String, ComplexType, CrawlerSettings, Object) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
Write out attribute.
writeContentPart(String, CrawlURI, ReplayInputStream, OutputStream) - Method in class org.archive.crawler.writer.Kw3WriterProcessor
 
writeCrawlReportTo(PrintWriter) - Method in class org.archive.crawler.admin.StatisticsTracker
 
writeEscapedForHTML(String, JspWriter) - Static method in class org.archive.util.TextUtils
Utility method for writing a (potentially large) String to a JspWriter, escaping it for HTML display, without constructing another large String of the whole content.
writeFrontierReport(String, PrintWriter) - Method in class org.archive.crawler.admin.CrawlJob
Write the requested frontier report to the given PrintWriter
writeFrontierReportTo(PrintWriter) - Method in class org.archive.crawler.admin.StatisticsTracker
Write the Frontier's 'nonempty' report (if available)
writeFtpControlConversation(WARCWriter, String, URI, CrawlURI, ANVLRecord, String) - Method in class org.archive.crawler.writer.WARCWriterProcessor
 
writeHeaderPart(String, ReplayInputStream, OutputStream) - Method in class org.archive.crawler.writer.Kw3WriterProcessor
 
writeHostsReportTo(PrintWriter) - Method in class org.archive.crawler.admin.StatisticsTracker
 
writeLine(String) - Method in class org.archive.crawler.io.CrawlerJournal
Write a line
writeLine(String, String) - Method in class org.archive.crawler.io.CrawlerJournal
Write a line of two strings
writeLine(String, String, String) - Method in class org.archive.crawler.io.CrawlerJournal
Write a line of three strings
writeLine(MutableString) - Method in class org.archive.crawler.io.CrawlerJournal
Write a line.
writeLongUriLine(String, CandidateURI) - Method in class org.archive.crawler.frontier.RecoveryJournal
 
writeManifestReportTo(PrintWriter) - Method in class org.archive.crawler.admin.StatisticsTracker
 
writeMetadata(WARCWriter, String, URI, CrawlURI, ANVLRecord) - Method in class org.archive.crawler.writer.WARCWriterProcessor
 
writeMetadataRecord(String, String, String, URI, ANVLRecord, InputStream, long) - Method in class org.archive.io.warc.WARCWriter
 
writeMimeFile(CrawlURI) - Method in class org.archive.crawler.writer.Kw3WriterProcessor
 
writeMimetypesReportTo(PrintWriter) - Method in class org.archive.crawler.admin.StatisticsTracker
 
writeNewOrderFile(ComplexType, CrawlerSettings, HttpServletRequest, boolean) - Static method in class org.archive.crawler.admin.ui.JobConfigureUtils
This methods updates a ComplexType with information passed to it by a HttpServletRequest.
writeObjectToFile(Object, File) - Static method in class org.archive.crawler.util.CheckpointUtils
Utility function to serialize an object to a file in current checkpoint dir.
writeObjectToFile(Object, String, File) - Static method in class org.archive.crawler.util.CheckpointUtils
 
writeProcessorsReportTo(PrintWriter) - Method in class org.archive.crawler.admin.StatisticsTracker
 
writeRecord(String, String, String, String, URI, ANVLRecord, InputStream, long) - Method in class org.archive.io.warc.WARCWriter
Deprecated. Use WARCWriter.writeRecord(String,String,String,String,URI,ANVLRecord,InputStream,long,boolean) instead
writeRecord(String, String, String, String, URI, ANVLRecord, InputStream, long, boolean) - Method in class org.archive.io.warc.WARCWriter
 
writeReportFile(String, String) - Method in class org.archive.crawler.admin.StatisticsTracker
 
writeReportLine(PrintWriter, Object...) - Method in class org.archive.crawler.admin.StatisticsTracker
 
writeReportToString(Reporter, String) - Static method in class org.archive.util.ArchiveUtils
Compose the requested report into a String.
writeRequest(WARCWriter, String, String, URI, CrawlURI, ANVLRecord) - Method in class org.archive.crawler.writer.WARCWriterProcessor
 
writeRequestRecord(String, String, String, URI, ANVLRecord, InputStream, long) - Method in class org.archive.io.warc.WARCWriter
 
writeResource(WARCWriter, String, String, URI, CrawlURI, ANVLRecord) - Method in class org.archive.crawler.writer.WARCWriterProcessor
 
writeResourceRecord(String, String, String, ANVLRecord, InputStream, long) - Method in class org.archive.io.warc.WARCWriter
 
writeResourceRecord(String, String, String, URI, ANVLRecord, InputStream, long) - Method in class org.archive.io.warc.WARCWriter
 
writeResponse(WARCWriter, String, String, URI, CrawlURI, ANVLRecord) - Method in class org.archive.crawler.writer.WARCWriterProcessor
 
writeResponseCodeReportTo(PrintWriter) - Method in class org.archive.crawler.admin.StatisticsTracker
 
writeResponseRecord(String, String, String, URI, ANVLRecord, InputStream, long) - Method in class org.archive.io.warc.WARCWriter
 
writeRevisitDigest(WARCWriter, String, String, URI, CrawlURI, ANVLRecord) - Method in class org.archive.crawler.writer.WARCWriterProcessor
 
writeRevisitNotModified(WARCWriter, String, URI, CrawlURI, ANVLRecord) - Method in class org.archive.crawler.writer.WARCWriterProcessor
 
writeRevisitRecord(String, String, String, URI, ANVLRecord, InputStream, long) - Method in class org.archive.io.warc.WARCWriter
 
WriterPool - Class in org.archive.io
Pool of Writers.
WriterPool(AtomicInteger, BasePoolableObjectFactory, WriterPoolSettings, int, int) - Constructor for class org.archive.io.WriterPool
Constructor
WriterPoolMember - Class in org.archive.io
Member of WriterPool.
WriterPoolMember(AtomicInteger, OutputStream, File, boolean, String) - Constructor for class org.archive.io.WriterPoolMember
Constructor.
WriterPoolMember(AtomicInteger, List<File>, String, boolean, long, String) - Constructor for class org.archive.io.WriterPoolMember
Constructor.
WriterPoolMember(AtomicInteger, List<File>, String, String, boolean, long, String) - Constructor for class org.archive.io.WriterPoolMember
Constructor.
WriterPoolProcessor - Class in org.archive.crawler.framework
Abstract implementation of a file pool processor.
WriterPoolProcessor(String) - Constructor for class org.archive.crawler.framework.WriterPoolProcessor
 
WriterPoolProcessor(String, String) - Constructor for class org.archive.crawler.framework.WriterPoolProcessor
 
WriterPoolSettings - Interface in org.archive.io
Settings object for a WriterPool.
writeSeedsReportTo(PrintWriter) - Method in class org.archive.crawler.admin.StatisticsTracker
 
writeSettingsObject(CrawlerSettings) - Method in class org.archive.crawler.settings.SettingsHandler
Write the CrawlerSettings object to persistent storage.
writeSettingsObject(CrawlerSettings) - Method in class org.archive.crawler.settings.XMLSettingsHandler
 
writeSettingsObject(CrawlerSettings, File) - Method in class org.archive.crawler.settings.XMLSettingsHandler
Write a CrawlerSettings object to a specified file.
writeSourceReportTo(PrintWriter) - Method in class org.archive.crawler.admin.StatisticsTracker
 
writeThreadsReport(String, PrintWriter) - Method in class org.archive.crawler.admin.CrawlJob
Write the requested threads report to the given PrintWriter
writeValidity() - Method in class org.archive.crawler.framework.Checkpointer
 
writeWarcinfoRecord(String) - Method in class org.archive.io.warc.WARCWriter
 
writeWarcinfoRecord(String, String) - Method in class org.archive.io.warc.WARCWriter
 
writeWarcinfoRecord(String, ANVLRecord, InputStream, long) - Method in class org.archive.io.warc.WARCWriter
Write a warcinfo to current file.
writeWarcinfoRecord(String, String, URI, ANVLRecord, InputStream, long) - Method in class org.archive.io.warc.WARCWriter
Write a warcinfo to current file.
WSP - Static variable in interface org.archive.io.warc.WARCConstants
WSP One of a space or horizontal tab character.

X

xestDefaultAbbreviated() - Method in class org.archive.util.BloomFilterTestBase
 
xestDefaultFull() - Method in class org.archive.util.BloomFilterTestBase
Test large (495MB), default-sized bloom at saturation for expected behavior and level of false-positives.
xestOversized() - Method in class org.archive.util.BloomFilterTestBase
Test very-large (almost 800MB, spanning more than Integer.MAX_VALUE bit indexes) bloom at saturation for expected behavior and level of false-positives.
xforceAccepts(Object) - Method in class org.archive.crawler.scope.ClassicScope
 
XML_ATTRIBUTE_CLASS - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ATTRIBUTE_FROM - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ATTRIBUTE_NAME - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ATTRIBUTE_TO - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_AUDIENCE - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_CONTENTMATCHES - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_CONTROLLER - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_DATE - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_DESCRIPTION - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_LIMITS - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_META - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_NAME - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_NEW_OBJECT - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_OBJECT - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_OPERATOR - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_ORGANIZATION - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_PORTNUMBER - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_REFERENCE - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_REFINEMENT - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_REFINEMENTLIST - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_TIMESPAN - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ELEMENT_URIMATCHES - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ROOT_HOST_SETTINGS - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ROOT_ORDER - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_ROOT_REFINEMENT - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_SCHEMA - Static variable in class org.archive.crawler.settings.XMLSettingsHandler
 
XML_URI_EXTRACTOR - Static variable in class org.archive.crawler.extractor.ExtractorXML
 
XMLSettingsHandler - Class in org.archive.crawler.settings
A SettingsHandler which uses XML files as persistent storage.
XMLSettingsHandler(File) - Constructor for class org.archive.crawler.settings.XMLSettingsHandler
Create a new XMLSettingsHandler object.
XmlUtils - Class in org.archive.util
XML utilities for document/xpath actions.
XmlUtils() - Constructor for class org.archive.util.XmlUtils
 
xpathOrNull(Document, String) - Static method in class org.archive.util.XmlUtils
Evaluate an XPath against a Document, returning a String.

Z

ZERO_LENGTH_ENTRY - Static variable in class org.archive.crawler.util.BdbUriUniqFilter
 
ZeroCostAssignmentPolicy - Class in org.archive.crawler.frontier
CostAssignmentPolicy considering all URIs costless -- essentially disabling budgetting features.
ZeroCostAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.ZeroCostAssignmentPolicy
 
zeroPadInteger(int) - Static method in class org.archive.util.ArchiveUtils
 

_

_connectAction_() - Method in class org.archive.net.ClientFTP
 

A B C D E F G H I J K L M N O P Q R S T U V W X Z _

Copyright © 2003-2011 Internet Archive. All Rights Reserved.