3. Coding conventions

Heritrix baselines on SUN's Code Conventions for the Java Programming Language [Sun Code Conventions]. It'd be hard not to they say so little. They at least say maximum line length of 80 characters.

We also will favor much of what is written in the document, Java Programming Style Guidelines [Java Programming Style Guidelines].

3.1. Tightenings on the SUN conventions

Below are tightenings on the SUN conventions used in Heritrix.

3.1.1. No Tabs

No tabs in source code. Set your editor to indent with spaces.

3.1.2. Indent Width

Indents are 4 characters wide.

3.1.3. Function/Block Bracket Open on Same Line

Preference is to have the bracket that opens functions and blocks on same line as function declaration or block test rather than on a new line of its own. For example:

if (true) {
    return true;
}
and
public void main (String [] args) {
    System.println("Hello world");
}

3.1.4. File comment

Here is the eclipse template we use for the file header comment:

/* ${type_name}
 * 
 * $$Id: developer_manual.xml 5022 2007-03-23 16:06:32Z stack-sf $$
 * 
 * Created on ${date}
 *
 * Copyright (C) ${year} Internet Archive.
 * 
 * This file is part of the Heritrix web crawler (crawler.archive.org).
 * 
 * Heritrix is free software; you can redistribute it and/or modify
 * it under the terms of the GNU Lesser Public License as published by
 * the Free Software Foundation; either version 2.1 of the License, or
 * any later version.
 * 
 * Heritrix is distributed in the hope that it will be useful, 
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU Lesser Public License for more details.
 * 
 * You should have received a copy of the GNU Lesser Public License
 * along with Heritrix; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 */
${package_declaration}

3.2. Long versus int

We will be performing multi-billion resource crawls -- which may have to pass up billions of pages that don't make the time/priority cut. Our access tools will range over tens if not hundreds of billions of resources. We may often archive resources larger than 2GB. Keep these in mind when choosing between 'int' (max value: around 2 billion) and 'long' (max value: around 9 quintillion) in your code.

3.3. Unit tests code in same package

"[A ] popular convention is to place all test classes in a parallel directory structure. This allows you to use the same Java package names for your tests, while keeping the source files separate. To be honest, we do not like this approach because you must look in two different directories to find files." from Section 4.11.3, Java Extreme Programming Cookbook, By Eric M. Burke, Brian M. Coyner. We agree with the above so we put Unit Test classes beside the classes they are testing in the source tree giving them the name of the Class they are testing with a Test suffix.

Another advantage is that test classes of the same package can get at testee's default access methods and members, something a test in another package would not be able to do.

3.4. Log Message Format

A suggested format for source code log messages can be found at the subversion site, Log Messages.