1. Introduction

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler.

This document explains how to create, configure and run crawls using Heritrix. It is intended for users of the software and presumes that they possess at least a general familiarity with the concept of web crawling.

For a general overview on Heritrix, see An Introduction to Heritrix.

If you want to build Heritrix from source or if you'd like to make contributions and would like to know about contribution conventions, etc., see instead the Developer's Manual.