|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object javax.management.NotificationBroadcasterSupport org.archive.crawler.admin.CrawlJob
public class CrawlJob
A CrawlJob encapsulates a 'crawl order' with any and all information and methods needed by a CrawlJobHandler to accept and execute them.
A given crawl job may also be a 'profile' for a crawl. In that case it should not be executed as a crawl but can be edited and used as a template for creating new CrawlJobs.
All of it's constructors are protected since only a CrawlJobHander should construct new CrawlJobs.
CrawlJobHandler.newJob(CrawlJob, String,
String, String, String, int)
,
CrawlJobHandler.newProfile(CrawlJob,
String, String, String)
,
Serialized FormNested Class Summary | |
---|---|
class |
CrawlJob.MBeanCrawlController
Subclass of crawlcontroller that unregisters beans when stopped. |
Field Summary | |
---|---|
static java.lang.String[] |
ATTRIBUTE_ARRAY
|
static java.util.List |
ATTRIBUTE_LIST
|
static java.lang.String |
CHECKPOINT_OPER
|
static java.lang.String |
CRAWL_LOG_STYLE
|
static java.lang.String |
CRAWL_TIME_ATTR
|
static java.lang.String |
CRAWLJOB_JMXMBEAN_TYPE
|
static java.lang.String |
CURRENT_DOC_RATE_ATTR
|
static java.lang.String |
CURRENT_KB_RATE_ATTR
|
static java.lang.String |
DISCOVERED_COUNT_ATTR
|
static java.lang.String |
DOC_RATE_ATTR
|
static java.lang.String |
DOWNLOAD_COUNT_ATTR
|
static java.lang.String |
DUMP_URIS_OPER
|
static java.lang.String |
FRONTIER_REPORT_OPER
|
static java.lang.String |
FRONTIER_SHORT_REPORT_ATTR
|
static java.lang.String |
IMPORT_URI_OPER
|
static java.lang.String |
IMPORT_URIS_OPER
|
static java.lang.String |
KB_RATE_ATTR
|
static java.lang.String |
NAME_ATTR
|
static java.lang.String |
OP_DB_STAT
|
static java.util.List |
ORDER_EXCLUDE
Don't add the following crawl-order items. |
static java.lang.String |
PAUSE_OPER
|
static int |
PRIORITY_AVERAGE
average |
static int |
PRIORITY_CRITICAL
highest |
static int |
PRIORITY_HIGH
high |
static int |
PRIORITY_LOW
low |
static int |
PRIORITY_MINIMAL
lowest |
static java.lang.String |
PROG_STATS
|
static java.lang.String |
PROGRESS_STATISTICS_LEGEND_OPER
|
static java.lang.String |
PROGRESS_STATISTICS_OPER
|
static java.lang.String |
RECOVERY_JOURNAL_STYLE
|
static java.lang.String |
RESUME_OPER
|
static java.lang.String |
SEEDS_REPORT_OPER
|
protected XMLSettingsHandler |
settingsHandler
|
static java.lang.String |
STATUS_ABORTED
Job was terminted by user input while crawling |
static java.lang.String |
STATUS_ATTR
|
static java.lang.String |
STATUS_CHECKPOINTING
Job is being checkpointed. |
static java.lang.String |
STATUS_CREATED
Inital value. |
static java.lang.String |
STATUS_DELETED
Job was deleted by user, will not be displayed in UI. |
static java.lang.String |
STATUS_FINISHED
Job finished normally having completed its crawl. |
static java.lang.String |
STATUS_FINISHED_ABNORMAL
Something went very wrong |
static java.lang.String |
STATUS_FINISHED_DATA_LIMIT
Job finished normally when the specifed amount of data (MB) had been downloaded |
static java.lang.String |
STATUS_FINISHED_DOCUMENT_LIMIT
Job finished normally when the specified number of documents had been fetched. |
static java.lang.String |
STATUS_FINISHED_TIME_LIMIT
Job finished normally when the specified timelimit was hit. |
static java.lang.String |
STATUS_MISCONFIGURED
Job could not be launced due to an InitializationException |
static java.lang.String |
STATUS_PAUSED
Job was temporarly stopped. |
static java.lang.String |
STATUS_PENDING
Job has been successfully submitted to a CrawlJobHandler |
static java.lang.String |
STATUS_PREPARING
|
static java.lang.String |
STATUS_PROFILE
Job is actually a profile |
static java.lang.String |
STATUS_RUNNING
Job is being crawled |
static java.lang.String |
STATUS_WAITING_FOR_PAUSE
Job is going to be temporarly stopped after active threads are finished. |
static java.lang.String |
THREAD_COUNT_ATTR
|
static java.lang.String |
THREADS_REPORT_OPER
|
static java.lang.String |
THREADS_SHORT_REPORT_ATTR
|
static java.lang.String |
TOTAL_DATA_ATTR
|
static java.lang.String |
UID_ATTR
|
Constructor Summary | |
---|---|
protected |
CrawlJob()
A shutdown Constructor. |
protected |
CrawlJob(java.io.File jobFile,
CrawlJobErrorHandler errorHandler)
A constructor for reloading jobs from disk. |
|
CrawlJob(java.lang.String UID,
java.lang.String name,
XMLSettingsHandler settingsHandler,
CrawlJobErrorHandler errorHandler,
int priority,
java.io.File dir)
A constructor for jobs. |
|
CrawlJob(java.lang.String UID,
java.lang.String name,
XMLSettingsHandler settingsHandler,
CrawlJobErrorHandler errorHandler,
int priority,
java.io.File dir,
java.lang.String status,
boolean isProfile,
boolean isNew)
|
protected |
CrawlJob(java.lang.String UIDandName,
XMLSettingsHandler settingsHandler,
CrawlJobErrorHandler errorHandler)
A constructor for profiles. |
Method Summary | |
---|---|
protected void |
addBdbjeAttributes(java.util.List<javax.management.openmbean.OpenMBeanAttributeInfo> attributes,
java.util.List<javax.management.MBeanAttributeInfo> bdbjeAttributes,
java.util.List<java.lang.String> bdbjeNamesToAdd)
|
protected void |
addBdbjeOperations(java.util.List<javax.management.openmbean.OpenMBeanOperationInfo> operations,
java.util.List<javax.management.MBeanOperationInfo> bdbjeOperations,
java.util.List<java.lang.String> bdbjeNamesToAdd)
|
protected void |
addCrawlOrderAttributes(ComplexType type,
java.util.List<javax.management.openmbean.OpenMBeanAttributeInfo> attributes)
|
protected javax.management.openmbean.OpenMBeanInfoSupport |
buildMBeanInfo()
Build up the MBean info for Heritrix main. |
protected void |
checkpoint()
|
void |
crawlCheckpoint(java.io.File checkpointDir)
Called by CrawlController when checkpointing. |
void |
crawlEnded(java.lang.String sExitMessage)
Called when a CrawlController has ended a crawl and is about to exit. |
void |
crawlEnding(java.lang.String sExitMessage)
Called when a CrawlController is ending a crawl (for any reason) |
void |
crawlPaused(java.lang.String statusMessage)
Called when a CrawlController is actually paused (all threads are idle). |
void |
crawlPausing(java.lang.String statusMessage)
Called when a CrawlController is going to be paused. |
void |
crawlResuming(java.lang.String statusMessage)
Called when a CrawlController is resuming a crawl that had been paused. |
void |
crawlStarted(java.lang.String message)
Called on crawl start. |
protected CrawlController |
createCrawlController()
|
long |
deleteURIsFromPending(java.lang.String regexpr)
Delete any URI from the frontier of the current (paused) job that match the specified regular expression. |
long |
deleteURIsFromPending(java.lang.String uriPattern,
java.lang.String queuePattern)
Delete any URI from the frontier of the current (paused) job that match the specified regular expression. |
void |
dumpUris(java.lang.String filename,
java.lang.String regexp,
int numberOfMatches,
boolean verbose)
|
protected void |
flush()
If its a HostQueuesFrontier, needs to be flushed for the queued. |
java.lang.Object |
getAttribute(java.lang.String attribute_name)
|
javax.management.AttributeList |
getAttributes(java.lang.String[] attributeNames)
|
CrawlController |
getController()
|
protected java.lang.Object |
getCrawlOrderAttribute(java.lang.String attribute_name)
|
protected java.lang.Object |
getCrawlOrderAttribute(java.lang.String attribute_name,
ComplexType ct)
|
java.lang.String |
getCrawlStatus()
|
java.io.File |
getDirectory()
Returns the path of the job's base directory. |
java.lang.String |
getDisplayName()
Return the combination of given name and UID most commonly used in administrative interface. |
CrawlJobErrorHandler |
getErrorHandler()
|
java.lang.String |
getErrorMessage()
Get the error message associated with this job. |
java.lang.String |
getFrontierOneLine()
|
java.lang.String |
getFrontierReport(java.lang.String reportName)
|
protected Heritrix |
getHostingHeritrix()
|
java.lang.String |
getIgnoredSeeds()
Utility method to get the stored list of ignored seed items (if any), from the last time the seeds were imported to the frontier. |
FrontierMarker |
getInitialMarker(java.lang.String regexpr,
boolean inCacheOnly)
Returns a URIFrontierMarker for the current, paused, job. |
java.lang.String |
getJmxJobName()
|
java.lang.String |
getJobName()
Returns this job's 'name'. |
int |
getJobPriority()
Get this job's level of priority. |
java.lang.String |
getLogPath(java.lang.String log)
Returns the absolute path of the specified log. |
javax.management.MBeanInfo |
getMBeanInfo()
|
protected javax.management.ObjectName |
getMbeanName()
|
protected static int |
getNotificationsSequenceNumber()
|
int |
getNumberOfJournalEntries()
|
java.util.ArrayList<java.lang.String> |
getPendingURIsList(FrontierMarker marker,
int numberOfMatches,
boolean verbose)
Returns the frontiers URI list based on the provided marker. |
java.lang.String |
getProcessorsReport()
Get the Processors report for the running crawl. |
java.lang.String |
getSettingsDirectory()
Returns the directory where the configuration files for this job are located. |
XMLSettingsHandler |
getSettingsHandler()
Returns the settings handler for this job. |
StatisticsTracking |
getStatisticsTracking()
|
java.lang.String |
getStatus()
Get the current status of this CrawlJob |
java.lang.String |
getThreadOneLine()
|
java.lang.String |
getThreadsReport()
Get the CrawlControllers ToeThreads report for the running crawl. |
java.lang.String |
getUID()
Returns this jobs unique ID (UID) that was issued by the CrawlJobHandler() when this job was first created. |
void |
importUri(java.lang.String uri,
boolean forceFetch,
boolean isSeed)
Schedule a uri. |
void |
importUri(java.lang.String str,
boolean forceFetch,
boolean isSeed,
boolean isFlush)
Schedule a uri. |
protected int |
importUris(java.io.InputStream is,
java.lang.String style,
boolean forceRevisit)
|
protected int |
importUris(java.io.InputStream is,
java.lang.String style,
boolean forceRevisit,
boolean areSeeds)
Import URIs. |
java.lang.String |
importUris(java.lang.String fileOrUrl,
java.lang.String style,
boolean forceRevisit)
|
java.lang.String |
importUris(java.lang.String fileOrUrl,
java.lang.String style,
boolean forceRevisit,
boolean areSeeds)
|
java.lang.String |
importUris(java.lang.String file,
java.lang.String style,
java.lang.String force)
|
java.lang.Object |
invoke(java.lang.String operationName,
java.lang.Object[] params,
java.lang.String[] signature)
|
boolean |
isCheckpointing()
|
boolean |
isCrawling()
|
boolean |
isNew()
Is this a new job? |
boolean |
isProfile()
Set if the job is considered to be a profile |
boolean |
isReadOnly()
Is job read only? |
boolean |
isRunning()
Returns true if the job is being crawled. |
void |
kickUpdate()
Forward a 'kick' update to current controller if any. |
void |
killThread(int threadNumber,
boolean replace)
Kills a thread. |
void |
mustBeCrawling()
|
protected void |
pause()
|
void |
postDeregister()
|
void |
postRegister(java.lang.Boolean registrationDone)
|
void |
preDeregister()
|
javax.management.ObjectName |
preRegister(javax.management.MBeanServer server,
javax.management.ObjectName on)
|
protected void |
resume()
|
java.util.Collection |
scanCheckpoints()
Read all the checkpoints found in the job's checkpoints directory into Checkpoint instances |
void |
setAttribute(javax.management.Attribute attribute)
|
protected void |
setAttributeInternal(javax.management.Attribute attribute)
|
javax.management.AttributeList |
setAttributes(javax.management.AttributeList attributes)
|
protected void |
setCrawlOrderAttribute(java.lang.String attribute_name,
ComplexType ct,
javax.management.Attribute attribute)
|
void |
setErrorMessage(java.lang.String string)
Set an error message for this job. |
void |
setJobPriority(int priority)
Set this job's level of priority. |
void |
setNew(boolean b)
Set if the job is considered a new job or not. |
void |
setNumberOfJournalEntries(int numberOfJournalEntries)
|
void |
setReadOnly()
Once called no changes can be made to the settings for this job. |
protected void |
setRunning(boolean b)
Set if job is being crawled. |
void |
setStatus(java.lang.String status)
Set the status of this CrawlJob. |
protected CrawlController |
setupCrawlController()
|
void |
setupForCrawlStart()
|
void |
stopCrawling()
|
protected void |
unregisterMBean()
|
void |
writeFrontierReport(java.lang.String reportName,
java.io.PrintWriter writer)
Write the requested frontier report to the given PrintWriter |
void |
writeThreadsReport(java.lang.String reportName,
java.io.PrintWriter writer)
Write the requested threads report to the given PrintWriter |
Methods inherited from class javax.management.NotificationBroadcasterSupport |
---|
addNotificationListener, getNotificationInfo, handleNotification, removeNotificationListener, removeNotificationListener, sendNotification |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int PRIORITY_MINIMAL
public static final int PRIORITY_LOW
public static final int PRIORITY_AVERAGE
public static final int PRIORITY_HIGH
public static final int PRIORITY_CRITICAL
public static final java.lang.String STATUS_CREATED
public static final java.lang.String STATUS_PENDING
public static final java.lang.String STATUS_RUNNING
public static final java.lang.String STATUS_DELETED
public static final java.lang.String STATUS_ABORTED
public static final java.lang.String STATUS_FINISHED_ABNORMAL
public static final java.lang.String STATUS_FINISHED
public static final java.lang.String STATUS_FINISHED_TIME_LIMIT
public static final java.lang.String STATUS_FINISHED_DATA_LIMIT
public static final java.lang.String STATUS_FINISHED_DOCUMENT_LIMIT
public static final java.lang.String STATUS_WAITING_FOR_PAUSE
public static final java.lang.String STATUS_PAUSED
public static final java.lang.String STATUS_CHECKPOINTING
public static final java.lang.String STATUS_MISCONFIGURED
public static final java.lang.String STATUS_PROFILE
public static final java.lang.String STATUS_PREPARING
protected transient XMLSettingsHandler settingsHandler
public static final java.lang.String RECOVERY_JOURNAL_STYLE
public static final java.lang.String CRAWL_LOG_STYLE
public static final java.lang.String CRAWLJOB_JMXMBEAN_TYPE
public static final java.lang.String NAME_ATTR
public static final java.lang.String UID_ATTR
public static final java.lang.String STATUS_ATTR
public static final java.lang.String FRONTIER_SHORT_REPORT_ATTR
public static final java.lang.String THREADS_SHORT_REPORT_ATTR
public static final java.lang.String TOTAL_DATA_ATTR
public static final java.lang.String CRAWL_TIME_ATTR
public static final java.lang.String DOC_RATE_ATTR
public static final java.lang.String CURRENT_DOC_RATE_ATTR
public static final java.lang.String KB_RATE_ATTR
public static final java.lang.String CURRENT_KB_RATE_ATTR
public static final java.lang.String THREAD_COUNT_ATTR
public static final java.lang.String DOWNLOAD_COUNT_ATTR
public static final java.lang.String DISCOVERED_COUNT_ATTR
public static final java.lang.String[] ATTRIBUTE_ARRAY
public static final java.util.List ATTRIBUTE_LIST
public static final java.lang.String IMPORT_URI_OPER
public static final java.lang.String IMPORT_URIS_OPER
public static final java.lang.String DUMP_URIS_OPER
public static final java.lang.String PAUSE_OPER
public static final java.lang.String RESUME_OPER
public static final java.lang.String FRONTIER_REPORT_OPER
public static final java.lang.String THREADS_REPORT_OPER
public static final java.lang.String SEEDS_REPORT_OPER
public static final java.lang.String CHECKPOINT_OPER
public static final java.lang.String PROGRESS_STATISTICS_OPER
public static final java.lang.String PROGRESS_STATISTICS_LEGEND_OPER
public static final java.lang.String PROG_STATS
public static final java.lang.String OP_DB_STAT
public static final java.util.List ORDER_EXCLUDE
Constructor Detail |
---|
protected CrawlJob()
public CrawlJob(java.lang.String UID, java.lang.String name, XMLSettingsHandler settingsHandler, CrawlJobErrorHandler errorHandler, int priority, java.io.File dir)
Create, ready to crawl, jobs.
UID
- A unique ID for this job. Typically emitted by the
CrawlJobHandler.name
- The name of the jobsettingsHandler
- The associated settingserrorHandler
- The crawl jobs settings error handler.
null means none is setpriority
- job priority.dir
- The directory that is considered this jobs working directory.protected CrawlJob(java.lang.String UIDandName, XMLSettingsHandler settingsHandler, CrawlJobErrorHandler errorHandler)
Any job created with this constructor will be considered a profile. Profiles are not stored on disk (only their settings files are stored on disk). This is because their data is predictible given any settings files.
UIDandName
- A unique ID for this job. For profiles this is the same
as namesettingsHandler
- The associated settingserrorHandler
- The crawl jobs settings error handler.
null means none is setpublic CrawlJob(java.lang.String UID, java.lang.String name, XMLSettingsHandler settingsHandler, CrawlJobErrorHandler errorHandler, int priority, java.io.File dir, java.lang.String status, boolean isProfile, boolean isNew)
protected CrawlJob(java.io.File jobFile, CrawlJobErrorHandler errorHandler) throws InvalidJobFileException, java.io.IOException
CrawlJobHandler
.
Proper structure of a job file (TODO: Maybe one day make this an XML file)
Line 1. UID
Line 2. Job name (string)
Line 3. Job status (string)
Line 4. is job read only (true/false)
Line 5. is job running (true/false)
Line 6. job priority (int)
Line 7. number of journal entries
Line 8. setting file (with path)
Line 9. statistics tracker file (with path)
Line 10-?. error message (String, empty for null), can be many lines
jobFile
- a file containing information about the job to load.errorHandler
- The crawl jobs settings error handler.
null means none is set
InvalidJobFileException
- if the specified file does not refer to a valid job file.
java.io.IOException
- if io operations failMethod Detail |
---|
public java.lang.String getUID()
CrawlJobHandler.getNextJobUID()
public java.lang.String getJobName()
getUID()
.
The name corrisponds to the value of the 'name' tag in the 'meta' section of the settings file.
public java.lang.String getDisplayName()
public void setJobPriority(int priority)
priority
- The level of prioritygetJobPriority()
,
PRIORITY_MINIMAL
,
PRIORITY_LOW
,
PRIORITY_AVERAGE
,
PRIORITY_HIGH
,
PRIORITY_CRITICAL
public int getJobPriority()
setJobPriority(int)
,
PRIORITY_MINIMAL
,
PRIORITY_LOW
,
PRIORITY_AVERAGE
,
PRIORITY_HIGH
,
PRIORITY_CRITICAL
public void setReadOnly()
public boolean isReadOnly()
public void setStatus(java.lang.String status)
status
- Current status of CrawlJob
(see constants defined here beginning with STATUS)public java.lang.String getCrawlStatus()
public java.lang.String getStatus()
public XMLSettingsHandler getSettingsHandler()
public boolean isNew()
public boolean isProfile()
public void setNew(boolean b)
b
- Is the job considered to be new.public boolean isRunning()
protected void setRunning(boolean b)
b
- Is job being crawled.protected void unregisterMBean()
protected CrawlController setupCrawlController() throws InitializationException
InitializationException
protected CrawlController createCrawlController()
public void setupForCrawlStart() throws InitializationException
InitializationException
public void stopCrawling()
public java.lang.String getFrontierOneLine()
public java.lang.String getFrontierReport(java.lang.String reportName)
reportName
- Name of report to write.
public void writeFrontierReport(java.lang.String reportName, java.io.PrintWriter writer)
reportName
- Name of report to write.writer
- Where to write to.public java.lang.String getThreadOneLine()
public java.lang.String getThreadsReport()
public void writeThreadsReport(java.lang.String reportName, java.io.PrintWriter writer)
reportName
- Name of report to write.writer
- Where to write to.public void killThread(int threadNumber, boolean replace)
ToePool.killThread(int, boolean)
.
threadNumber
- Thread to kill.replace
- Should thread be replaced.ToePool.killThread(int, boolean)
public java.lang.String getProcessorsReport()
public java.lang.String getSettingsDirectory()
public java.io.File getDirectory()
new File(getSettingsDirectory())
.
public java.lang.String getErrorMessage()
public void setErrorMessage(java.lang.String string)
string
- the error message associated with this jobpublic int getNumberOfJournalEntries()
public void setNumberOfJournalEntries(int numberOfJournalEntries)
numberOfJournalEntries
- The number of journal entries to set.public CrawlJobErrorHandler getErrorHandler()
public java.util.Collection scanCheckpoints()
public java.lang.String getLogPath(java.lang.String log) throws javax.management.AttributeNotFoundException, javax.management.MBeanException, javax.management.ReflectionException
log
-
javax.management.AttributeNotFoundException
javax.management.ReflectionException
javax.management.MBeanException
protected void pause()
protected void resume()
protected void checkpoint() throws java.lang.IllegalStateException
java.lang.IllegalStateException
- Thrown if crawl is not paused.public boolean isCheckpointing()
protected void flush()
public long deleteURIsFromPending(java.lang.String regexpr)
regexpr
- Regular expression to delete URIs by.
public long deleteURIsFromPending(java.lang.String uriPattern, java.lang.String queuePattern)
regexpr
- Regular expression to delete URIs by.
public java.lang.String importUris(java.lang.String file, java.lang.String style, java.lang.String force)
public java.lang.String importUris(java.lang.String fileOrUrl, java.lang.String style, boolean forceRevisit)
public java.lang.String importUris(java.lang.String fileOrUrl, java.lang.String style, boolean forceRevisit, boolean areSeeds)
fileOrUrl
- Name of file w/ seeds.style
- What style of seeds -- crawl log, recovery journal, or
seeds file.forceRevisit
- Should we revisit even if seen before?areSeeds
- Is the file exclusively seeds?
protected int importUris(java.io.InputStream is, java.lang.String style, boolean forceRevisit)
protected int importUris(java.io.InputStream is, java.lang.String style, boolean forceRevisit, boolean areSeeds)
is
- Stream to use as URI source.style
- Style in which URIs are rendored. Currently support for
recoveryJournal
, crawlLog
, and seeds file
format (i.e default
) where default
style is
a UURI per line (comments allowed).forceRevisit
- Whether we should revisit this URI even if we've
visited it previously.areSeeds
- Are the imported URIs seeds?
public void importUri(java.lang.String uri, boolean forceFetch, boolean isSeed) throws org.apache.commons.httpclient.URIException
uri
- Uri to schedule.forceFetch
- Should it be forcefetched.isSeed
- True if seed.
org.apache.commons.httpclient.URIException
public void importUri(java.lang.String str, boolean forceFetch, boolean isSeed, boolean isFlush) throws org.apache.commons.httpclient.URIException
str
- String that can be: 1. a UURI, 2. a snippet of the
crawl.log line, or 3. a snippet from recover log. See
importUris(InputStream, String, boolean)
for how it subparses
the lines from crawl.log and recover.log.forceFetch
- Should it be forcefetched.isSeed
- True if seed.isFlush
- If true, flush the frontier IF it implements
flushing.
org.apache.commons.httpclient.URIException
public javax.management.MBeanInfo getMBeanInfo()
getMBeanInfo
in interface javax.management.DynamicMBean
protected javax.management.openmbean.OpenMBeanInfoSupport buildMBeanInfo() throws InitializationException
InitializationException
protected void addBdbjeAttributes(java.util.List<javax.management.openmbean.OpenMBeanAttributeInfo> attributes, java.util.List<javax.management.MBeanAttributeInfo> bdbjeAttributes, java.util.List<java.lang.String> bdbjeNamesToAdd)
protected void addBdbjeOperations(java.util.List<javax.management.openmbean.OpenMBeanOperationInfo> operations, java.util.List<javax.management.MBeanOperationInfo> bdbjeOperations, java.util.List<java.lang.String> bdbjeNamesToAdd)
protected void addCrawlOrderAttributes(ComplexType type, java.util.List<javax.management.openmbean.OpenMBeanAttributeInfo> attributes)
public java.lang.Object getAttribute(java.lang.String attribute_name) throws javax.management.AttributeNotFoundException
getAttribute
in interface javax.management.DynamicMBean
javax.management.AttributeNotFoundException
protected java.lang.Object getCrawlOrderAttribute(java.lang.String attribute_name)
protected java.lang.Object getCrawlOrderAttribute(java.lang.String attribute_name, ComplexType ct) throws javax.management.AttributeNotFoundException, javax.management.MBeanException, javax.management.ReflectionException
javax.management.AttributeNotFoundException
javax.management.MBeanException
javax.management.ReflectionException
public javax.management.AttributeList getAttributes(java.lang.String[] attributeNames)
getAttributes
in interface javax.management.DynamicMBean
public void setAttribute(javax.management.Attribute attribute) throws javax.management.AttributeNotFoundException
setAttribute
in interface javax.management.DynamicMBean
javax.management.AttributeNotFoundException
protected void setAttributeInternal(javax.management.Attribute attribute) throws javax.management.AttributeNotFoundException
javax.management.AttributeNotFoundException
protected void setCrawlOrderAttribute(java.lang.String attribute_name, ComplexType ct, javax.management.Attribute attribute) throws javax.management.AttributeNotFoundException, javax.management.InvalidAttributeValueException, javax.management.MBeanException, javax.management.ReflectionException
javax.management.AttributeNotFoundException
javax.management.InvalidAttributeValueException
javax.management.MBeanException
javax.management.ReflectionException
public javax.management.AttributeList setAttributes(javax.management.AttributeList attributes)
setAttributes
in interface javax.management.DynamicMBean
public java.lang.Object invoke(java.lang.String operationName, java.lang.Object[] params, java.lang.String[] signature) throws javax.management.ReflectionException
invoke
in interface javax.management.DynamicMBean
javax.management.ReflectionException
public void mustBeCrawling()
public boolean isCrawling()
public java.lang.String getIgnoredSeeds()
public void kickUpdate()
CrawlController.kickUpdate()
public FrontierMarker getInitialMarker(java.lang.String regexpr, boolean inCacheOnly)
regexpr
- A regular expression that each URI must match in order to
be considered 'within' the marker.inCacheOnly
- Limit marker scope to 'cached' URIs.
getPendingURIsList(FrontierMarker, int, boolean)
,
Frontier.getInitialMarker(String,
boolean)
,
FrontierMarker
public java.util.ArrayList<java.lang.String> getPendingURIsList(FrontierMarker marker, int numberOfMatches, boolean verbose) throws InvalidFrontierMarkerException
marker
- URIFrontier markernumberOfMatches
- Maximum number of matches to returnverbose
- Should detailed info be provided on each URI?
InvalidFrontierMarkerException
- When marker is inconsistent with the current state of the
frontier.getInitialMarker(String, boolean)
,
FrontierMarker
public void dumpUris(java.lang.String filename, java.lang.String regexp, int numberOfMatches, boolean verbose)
public void crawlStarted(java.lang.String message)
CrawlStatusListener
crawlStarted
in interface CrawlStatusListener
message
- Start message.public void crawlEnding(java.lang.String sExitMessage)
CrawlStatusListener
crawlEnding
in interface CrawlStatusListener
sExitMessage
- Type of exit. Should be one of the STATUS constants
in defined in CrawlJob.CrawlJob
public void crawlEnded(java.lang.String sExitMessage)
CrawlStatusListener
crawlEnded
in interface CrawlStatusListener
sExitMessage
- Type of exit. Should be one of the STATUS constants
in defined in CrawlJob.CrawlJob
public void crawlPausing(java.lang.String statusMessage)
CrawlStatusListener
crawlPausing
in interface CrawlStatusListener
statusMessage
- Should be
STATUS_WAITING_FOR_PAUSE
. Passed for conveniencepublic void crawlPaused(java.lang.String statusMessage)
CrawlStatusListener
crawlPaused
in interface CrawlStatusListener
statusMessage
- Should be
STATUS_PAUSED
. Passed for
conveniencepublic void crawlResuming(java.lang.String statusMessage)
CrawlStatusListener
crawlResuming
in interface CrawlStatusListener
statusMessage
- Should be
STATUS_RUNNING
. Passed for
conveniencepublic void crawlCheckpoint(java.io.File checkpointDir) throws java.lang.Exception
CrawlStatusListener
CrawlController
when checkpointing.
crawlCheckpoint
in interface CrawlStatusListener
checkpointDir
- Checkpoint dir. Write checkpoint state here.
java.lang.Exception
- A fatal exception. Any exceptions
that are let out of this checkpoint are assumed fatal
and terminate further checkpoint processing.public CrawlController getController()
public javax.management.ObjectName preRegister(javax.management.MBeanServer server, javax.management.ObjectName on) throws java.lang.Exception
preRegister
in interface javax.management.MBeanRegistration
java.lang.Exception
public void postRegister(java.lang.Boolean registrationDone)
postRegister
in interface javax.management.MBeanRegistration
public void preDeregister() throws java.lang.Exception
preDeregister
in interface javax.management.MBeanRegistration
java.lang.Exception
public void postDeregister()
postDeregister
in interface javax.management.MBeanRegistration
protected Heritrix getHostingHeritrix()
public java.lang.String getJmxJobName()
protected static int getNotificationsSequenceNumber()
protected javax.management.ObjectName getMbeanName()
public StatisticsTracking getStatisticsTracking()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |