org.archive.crawler.framework
Class ToeThread

java.lang.Object
  extended by java.lang.Thread
      extended by org.archive.crawler.framework.ToeThread
All Implemented Interfaces:
java.lang.Runnable, CoreAttributeConstants, FetchStatusCodes, HttpRecorderMarker, ProgressStatisticsReporter, Reporter

public class ToeThread
extends java.lang.Thread
implements CoreAttributeConstants, FetchStatusCodes, HttpRecorderMarker, Reporter, ProgressStatisticsReporter

One "worker thread"; asks for CrawlURIs, processes them, repeats unless told otherwise.

Author:
Gordon Mohr

Nested Class Summary
 
Nested classes/interfaces inherited from class java.lang.Thread
java.lang.Thread.State, java.lang.Thread.UncaughtExceptionHandler
 
Field Summary
 
Fields inherited from class java.lang.Thread
MAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITY
 
Fields inherited from interface org.archive.crawler.datamodel.CoreAttributeConstants
A_ANNOTATIONS, A_CONTENT_DIGEST, A_CONTENT_TYPE, A_CREDENTIAL_AVATARS_KEY, A_DELAY_FACTOR, A_DISTANCE_FROM_SEED, A_DNS_FETCH_TIME, A_DNS_SERVER_IP_LABEL, A_ETAG_HEADER, A_FETCH_BEGAN_TIME, A_FETCH_COMPLETED_TIME, A_FETCH_HISTORY, A_FORCE_RETIRE, A_FTP_CONTROL_CONVERSATION, A_FTP_FETCH_STATUS, A_HERITABLE_KEYS, A_HTML_BASE, A_HTTP_BIND_ADDRESS, A_HTTP_PROXY_HOST, A_HTTP_PROXY_PORT, A_HTTP_TRANSACTION, A_LAST_MODIFIED_HEADER, A_LOCALIZED_ERRORS, A_META_ROBOTS, A_MINIMUM_DELAY, A_MIRROR_PATH, A_PREREQUISITE_URI, A_REFERENCE_LENGTH, A_RETRY_DELAY, A_RRECORD_SET_LABEL, A_RUNTIME_EXCEPTION, A_SOURCE_TAG, A_STATUS, A_WRITTEN_TO_WARC, HEADER_TRUNC, LENGTH_TRUNC, TIMER_TRUNC, TRUNC_SUFFIX
 
Fields inherited from interface org.archive.crawler.datamodel.FetchStatusCodes
S_BLOCKED_BY_CUSTOM_PROCESSOR, S_BLOCKED_BY_QUOTA, S_BLOCKED_BY_RUNTIME_LIMIT, S_BLOCKED_BY_USER, S_CONNECT_FAILED, S_CONNECT_LOST, S_DEEMED_CHAFF, S_DEEMED_NOT_FOUND, S_DEFERRED, S_DELETED_BY_USER, S_DNS_SUCCESS, S_DOMAIN_PREREQUISITE_FAILURE, S_DOMAIN_UNRESOLVABLE, S_GETBYNAME_SUCCESS, S_OTHER_PREREQUISITE_FAILURE, S_OUT_OF_SCOPE, S_PREREQUISITE_UNSCHEDULABLE_FAILURE, S_PROCESSING_THREAD_KILLED, S_ROBOTS_PRECLUDED, S_ROBOTS_PREREQUISITE_FAILURE, S_RUNTIME_EXCEPTION, S_SERIOUS_ERROR, S_TIMEOUT, S_TOO_MANY_EMBED_HOPS, S_TOO_MANY_LINK_HOPS, S_TOO_MANY_RETRIES, S_UNATTEMPTED, S_UNFETCHABLE_URI, S_UNQUEUEABLE
 
Constructor Summary
ToeThread(ToePool g, int sn)
          Create a ToeThread
 
Method Summary
 CrawlController getController()
          Get the CrawlController acossiated with this thread.
 java.lang.String getCurrentProcessorName()
           
 HttpRecorder getHttpRecorder()
          Used to get current threads HttpRecorder instance.
 java.lang.String[] getReports()
          Get an array of report names offered by this Reporter.
 int getSerialNumber()
           
 java.lang.Object getStep()
           
 boolean isActive()
          Is this thread validly processing a URI, not paused, waiting for a URI, or interrupted?
protected  void kill()
          Terminates a thread.
 void progressStatisticsLegend(java.io.PrintWriter writer)
           
 void progressStatisticsLine(java.io.PrintWriter writer)
           
 void reportTo(java.io.PrintWriter writer)
          Make a default report to the passed-in Writer.
 void reportTo(java.lang.String name, java.io.PrintWriter pw)
          Compiles and returns a report on its status.
 void retire()
          Request that this thread retire (exit cleanly) at the earliest opportunity.
 void run()
          (non-Javadoc)
 boolean shouldRetire()
          Whether this thread should cleanly retire at the earliest opportunity.
 java.lang.String singleLineLegend()
          Return a legend for the single-line summary report as a String.
 java.lang.String singleLineReport()
          Return a short single-line summary report as a String.
 void singleLineReportTo(java.io.PrintWriter w)
          Make a single-line summary report to the passed-in writer
 
Methods inherited from class java.lang.Thread
activeCount, checkAccess, countStackFrames, currentThread, destroy, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, stop, suspend, toString, yield
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

ToeThread

public ToeThread(ToePool g,
                 int sn)
Create a ToeThread

Parameters:
g - ToeThreadGroup
sn - serial number
Method Detail

run

public void run()
(non-Javadoc)

Specified by:
run in interface java.lang.Runnable
Overrides:
run in class java.lang.Thread
See Also:
Thread.run()

getSerialNumber

public int getSerialNumber()
Returns:
Return toe thread serial number.

getHttpRecorder

public HttpRecorder getHttpRecorder()
Used to get current threads HttpRecorder instance. Implementation of the HttpRecorderMarker interface.

Specified by:
getHttpRecorder in interface HttpRecorderMarker
Returns:
Returns instance of HttpRecorder carried by this thread.
See Also:
HttpRecorderMarker.getHttpRecorder()

getController

public CrawlController getController()
Get the CrawlController acossiated with this thread.

Returns:
Returns the CrawlController.

kill

protected void kill()
Terminates a thread.

Calling this method will ensure that the current thread will stop processing as soon as possible (note: this may be never). Meant to 'short circuit' hung threads.

Current crawl uri will have its fetch status set accordingly and will be immediately returned to the frontier.

As noted before, this does not ensure that the thread will stop running (ever). But once evoked it will not try and communicate with other parts of crawler and will terminate as soon as control is established.


getStep

public java.lang.Object getStep()
Returns:
Current step (For debugging/reporting, give abstract step where this thread is).

isActive

public boolean isActive()
Is this thread validly processing a URI, not paused, waiting for a URI, or interrupted?

Returns:
whether thread is actively processing a URI

retire

public void retire()
Request that this thread retire (exit cleanly) at the earliest opportunity.


shouldRetire

public boolean shouldRetire()
Whether this thread should cleanly retire at the earliest opportunity.

Returns:
True if should retire.

reportTo

public void reportTo(java.lang.String name,
                     java.io.PrintWriter pw)
Compiles and returns a report on its status.

Specified by:
reportTo in interface Reporter
Parameters:
name - Report name.
pw - Where to print.

singleLineReportTo

public void singleLineReportTo(java.io.PrintWriter w)
Description copied from interface: Reporter
Make a single-line summary report to the passed-in writer

Specified by:
singleLineReportTo in interface Reporter
Parameters:
w - PrintWriter to write to.

singleLineLegend

public java.lang.String singleLineLegend()
Description copied from interface: Reporter
Return a legend for the single-line summary report as a String.

Specified by:
singleLineLegend in interface Reporter
Returns:
String single-line summary legend

getReports

public java.lang.String[] getReports()
Description copied from interface: Reporter
Get an array of report names offered by this Reporter. A name in brackets indicates a free-form String, in accordance with the informal description inside the brackets, may yield a useful report.

Specified by:
getReports in interface Reporter
Returns:
String array of report names, empty if there is only one report type

reportTo

public void reportTo(java.io.PrintWriter writer)
Description copied from interface: Reporter
Make a default report to the passed-in Writer. Should be equivalent to reportTo(null, writer)

Specified by:
reportTo in interface Reporter
Parameters:
writer - to receive report

singleLineReport

public java.lang.String singleLineReport()
Description copied from interface: Reporter
Return a short single-line summary report as a String.

Specified by:
singleLineReport in interface Reporter
Returns:
String single-line summary report

progressStatisticsLine

public void progressStatisticsLine(java.io.PrintWriter writer)
Specified by:
progressStatisticsLine in interface ProgressStatisticsReporter
Parameters:
writer - Where to write statistics.

progressStatisticsLegend

public void progressStatisticsLegend(java.io.PrintWriter writer)
Specified by:
progressStatisticsLegend in interface ProgressStatisticsReporter
Parameters:
writer - Where to write statistics legend.

getCurrentProcessorName

public java.lang.String getCurrentProcessorName()


Copyright © 2003-2011 Internet Archive. All Rights Reserved.