System Runtime Requirements

Java Runtime Environment

The Heritrix crawler is implemented purely in java. This means that the only true requirement for running it is that you have a JRE installed.

The Heritrix crawler makes use of Java 5.0 features so your JRE must be at least of a 5.0 (1.5.0+) pedigree.

We currently include all of the free/open source third-party libraries necessary to run Heritrix in the distribution package. See dependencies for the complete list (Licenses for all of the listed libraries are listed in the dependencies section of the raw project.xml at the root of the src download or here on sourceforge).

Hardware

Default heap size is 256MB RAM. This should be suitable for crawls that range over hundreds of hosts.

Linux

The Heritrix crawler has been built and tested primarily on Linux. It has seen some informal use on Macintosh, Windows 2000 and Windows XP, but is not tested, packaged, nor supported on platforms other than Linux at this time.