|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
See:
Description
Packages | |
---|---|
org.archive.hcc | |
org.archive.hcc.client | |
org.archive.hcc.util | |
org.archive.hcc.util.jmx |
The Heritrix Cluster Controller (hcc) is a set of packages that enable control of a cluster of heritrix instances running across multiple machines.
There are two main components - the controller itself and a client API for accessing the component. The controller itself is essentially a facade with a DynamicMBean interface. Internally it effectively finds all heritrix resources in a JNDI scope and then proxies all communication to them. It provides a set of attributes and methods which perform the general functions of finding, listing, and invoking operations on single remote instances or groups of them. The client serves to translate the generic MBean interface into an easy to use domain specific interface thus simplying the work of programmers interested in building application specific extensions generic JMX OpenDynamicMBean interface.
In order to bring up a cluster controller, you need the following pieces:
It should be noted that since JNDI is basically a passive service, it must be running BEFORE you try to bring up heritrix. Be sure to read the instructions in the jndi.properties file in the heritrix jar in order to learn how to configure heritrix to talk to an external jndi server. Another tip: At least the last time I checked, it is necessary to execute heritrix from the $HERITRIX_HOME directory if you want Heritrix to read the jndi.properties file in $HERITRIX_HOME.
We've been using JBOSS's jndi service to fulfill this requirement. The aforementioned jndi.properties file contains instructions for configuring the JBOSS jndi server and client jars.
So let us assume you have your instance of heritrix started and it has successfully registered with the JNDI server. (You can verify using any JNDI viewer. JBoss comes with a web based JNDI viewer which we've found quite handy.)
Now it is possible to run the ClusterControllerBean in a jvm separate from the ClusterControllerClient interface. For the present, let us look at the simplest configuration: the ClusterControllerBean is running in the same JVM as the ClusterControllerClient. Both jmx bean and client are running on the same box as the jndi server (in turn running on the standard port of 1099).
Note that, to enable the JMX agent for local access, you need to set the
sytem property com.sun.management.jmxremote
. See Monitoring and Management Using JMX.
Given the above, your main class should look something like this:
public void main(String[] args){ //initialize the cluster controller bean ClusterControllerBean ccbean = new ClusterControllerBean(); ccbean.init(); //obtain a handle to the cluster controller client. ClusterControllerClient cc = ClusterControllerClientManager.getDefaultClient(); //list crawlers Collection crawlers = cc.listCrawlers(); //create a new crawler Crawler crawler = cc.createCrawler(); //etc... }
Once you've initialized the cluster controller bean, you should be able to view it using jconsole or any other standard jmx viewer.
Given the above simple configuration, the assumption is that the ClusterControllerClient will communicate with ClusterControllerBean (MBean) via a jmx port on the local machine (Default is localhost:8849). If you need to change the host/port of the bean (ie you want to run the client on a different machine or in a process separate from jmx bean), you can alter it by specifying the following commandline parameter:
-Dorg.archive.hcc.client.jmxPort=8850 -Dorg.archive.hcc.client.host=myhost
Other properties pertaining to the HeritrixClusterControllerBean can be specified in the hcc.properties file. The HeritrixClusterControllerBean will attempt to resolve hcc.properties in the following order:
Currently there is only one property in the hcc.properties.
org.archive.hcc.ClusterControllerBean.maxPerContainer | This property controls the default max number of heritrix instances per JVM. The max number can be set explicitly by host:port at runtime using the ClusterControllerClient if you need that level of specificity. |
org.archive.hcc.util.OrderJarFactory.settingsDefaultsDir | Specifies a default settings directory so you can specify some default settings that apply to all jobs. Please note that an order.xml file placed in the root directory will be ignored. Any user defined settings will take precedence over defaults set in this directory. |
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |