org.archive.util
Class ArchiveUtils

java.lang.Object
  extended by org.archive.util.ArchiveUtils

public class ArchiveUtils
extends java.lang.Object

Miscellaneous useful methods.


Field Summary
static int MAX_INT_CHAR_WIDTH
           
static java.util.Set<java.lang.String> TLDS
           
 
Constructor Summary
ArchiveUtils()
           
 
Method Summary
static java.lang.String addImpliedHttpIfNecessary(java.lang.String u)
          Given a string that may be a plain host or host/path (without URI scheme), add an implied http:// if necessary.
static boolean byteArrayEquals(byte[] lhs, byte[] rhs)
          check that two byte arrays are equal.
static long byteArrayIntoLong(byte[] bytearray)
           
static long byteArrayIntoLong(byte[] bytearray, int offset)
          Byte array into long.
static long classnameBasedUID(java.lang.Class<?> class1, int version)
          Generate a long UID based on the given class and version number.
static java.lang.String doubleToString(double val, int maxFractionDigits)
          Converts a double to a string.
static java.lang.String formatBytesForDisplay(long amount)
          Takes a byte size and formats it for display with 'friendly' units.
static java.lang.String formatMillisecondsToConventional(long time)
          Convert milliseconds value to a human-readable duration
static java.lang.String formatMillisecondsToConventional(long time, boolean toMs)
          Convert milliseconds value to a human-readable duration
static java.lang.String get12DigitDate()
          Utility function for creating arc-style date stamps in the format yyyMMddHHmm.
static java.lang.String get12DigitDate(java.util.Date d)
           
static java.lang.String get12DigitDate(long date)
          Utility function for creating arc-style date stamps in the format yyyyMMddHHmm.
static java.lang.String get14DigitDate()
          Utility function for creating arc-style date stamps in the format yyyMMddHHmmss.
static java.lang.String get14DigitDate(java.util.Date d)
           
static java.lang.String get14DigitDate(long date)
          Utility function for creating arc-style date stamps in the format yyyyMMddHHmmss.
static java.lang.String get17DigitDate()
          Utility function for creating arc-style date stamps in the format yyyMMddHHmmssSSS.
static java.lang.String get17DigitDate(java.util.Date date)
           
static java.lang.String get17DigitDate(long date)
          Utility function for creating arc-style date stamps in the format yyyyMMddHHmmssSSS.
static java.util.Date getDate(java.lang.String d)
          Parses an ARC-style date.
static java.lang.String getLog14Date()
          Utility function for creating log timestamps, in W3C/ISO8601 format, assuming UTC.
static java.lang.String getLog14Date(java.util.Date date)
          Utility function for creating log timestamps, in W3C/ISO8601 format, assuming UTC.
static java.lang.String getLog14Date(long date)
          Utility function for creating log timestamps, in W3C/ISO8601 format, assuming UTC.
static java.lang.String getLog17Date()
          Utility function for creating log timestamps, in W3C/ISO8601 format, assuming UTC.
static java.lang.String getLog17Date(long date)
          Utility function for creating log timestamps, in W3C/ISO8601 format, assuming UTC.
static java.util.Date getSecondsSinceEpoch(java.lang.String timestamp)
           
static boolean isTld(java.lang.String dom)
          Return whether the given string represents a known top-level-domain (like "com", "org", etc.) per IANA as of 20100419
static void longIntoByteArray(long l, byte[] array, int offset)
          Copy the raw bytes of a long into a byte array, starting at the specified offset.
static java.lang.String padTo(int i, int pad)
          Convert an int to a String, and pad it to pad spaces.
static java.lang.String padTo(java.lang.String s, int pad)
          Pad the given String to pad characters wide by pre-pending spaces.
static java.lang.String padTo(java.lang.String s, int pad, char padChar)
          Pad the given String to pad characters wide by pre-pending padChar.
static java.util.Date parse12DigitDate(java.lang.String date)
          Utility function for parsing arc-style date stamps in the format yyyMMddHHmm.
static java.util.Date parse14DigitDate(java.lang.String date)
          Utility function for parsing arc-style date stamps in the format yyyMMddHHmmss.
static java.util.Date parse17DigitDate(java.lang.String date)
          Utility function for parsing arc-style date stamps in the format yyyMMddHHmmssSSS.
static java.lang.String secondsSinceEpoch(java.lang.String timestamp)
           
static java.lang.String singleLineReport(Reporter rep)
          Utility method to get a String singleLineReport from Reporter
static boolean startsWith(byte[] array, byte[] prefix)
          Verify that the array begins with the prefix.
static java.util.Calendar timestamp17ToCalendar(java.lang.String timestamp17String)
          Convert 17-digit date format timestamps (as found in crawl.log, for example) into a GregorianCalendar object.
static java.lang.String writeReportToString(Reporter rep, java.lang.String name)
          Compose the requested report into a String.
static java.lang.String zeroPadInteger(int i)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MAX_INT_CHAR_WIDTH

public static int MAX_INT_CHAR_WIDTH

TLDS

public static java.util.Set<java.lang.String> TLDS
Constructor Detail

ArchiveUtils

public ArchiveUtils()
Method Detail

get17DigitDate

public static java.lang.String get17DigitDate()
Utility function for creating arc-style date stamps in the format yyyMMddHHmmssSSS. Date stamps are in the UTC time zone

Returns:
the date stamp

get14DigitDate

public static java.lang.String get14DigitDate()
Utility function for creating arc-style date stamps in the format yyyMMddHHmmss. Date stamps are in the UTC time zone

Returns:
the date stamp

get12DigitDate

public static java.lang.String get12DigitDate()
Utility function for creating arc-style date stamps in the format yyyMMddHHmm. Date stamps are in the UTC time zone

Returns:
the date stamp

getLog17Date

public static java.lang.String getLog17Date()
Utility function for creating log timestamps, in W3C/ISO8601 format, assuming UTC. Use current time. Format is yyyy-MM-dd'T'HH:mm:ss.SSS'Z'

Returns:
the date stamp

getLog17Date

public static java.lang.String getLog17Date(long date)
Utility function for creating log timestamps, in W3C/ISO8601 format, assuming UTC. Format is yyyy-MM-dd'T'HH:mm:ss.SSS'Z'

Parameters:
date - Date to format.
Returns:
the date stamp

getLog14Date

public static java.lang.String getLog14Date()
Utility function for creating log timestamps, in W3C/ISO8601 format, assuming UTC. Use current time. Format is yyyy-MM-dd'T'HH:mm:ss'Z'

Returns:
the date stamp

getLog14Date

public static java.lang.String getLog14Date(long date)
Utility function for creating log timestamps, in W3C/ISO8601 format, assuming UTC. Format is yyyy-MM-dd'T'HH:mm:ss'Z'

Parameters:
date - long timestamp to format.
Returns:
the date stamp

getLog14Date

public static java.lang.String getLog14Date(java.util.Date date)
Utility function for creating log timestamps, in W3C/ISO8601 format, assuming UTC. Format is yyyy-MM-dd'T'HH:mm:ss'Z'

Parameters:
date - Date to format.
Returns:
the date stamp

get17DigitDate

public static java.lang.String get17DigitDate(long date)
Utility function for creating arc-style date stamps in the format yyyyMMddHHmmssSSS. Date stamps are in the UTC time zone

Parameters:
date - milliseconds since epoc
Returns:
the date stamp

get17DigitDate

public static java.lang.String get17DigitDate(java.util.Date date)

get14DigitDate

public static java.lang.String get14DigitDate(long date)
Utility function for creating arc-style date stamps in the format yyyyMMddHHmmss. Date stamps are in the UTC time zone

Parameters:
date - milliseconds since epoc
Returns:
the date stamp

get14DigitDate

public static java.lang.String get14DigitDate(java.util.Date d)

get12DigitDate

public static java.lang.String get12DigitDate(long date)
Utility function for creating arc-style date stamps in the format yyyyMMddHHmm. Date stamps are in the UTC time zone

Parameters:
date - milliseconds since epoc
Returns:
the date stamp

get12DigitDate

public static java.lang.String get12DigitDate(java.util.Date d)

getDate

public static java.util.Date getDate(java.lang.String d)
                              throws java.text.ParseException
Parses an ARC-style date. If passed String is < 12 characters in length, we pad. At a minimum, String should contain a year (>=4 characters). Parse will also fail if day or month are incompletely specified. Depends on the above getXXDigitDate methods.

Parameters:
A - 4-17 digit date in ARC style (yyyy to yyyyMMddHHmmssSSS) formatting.
Returns:
A Date object representing the passed String.
Throws:
java.text.ParseException

parse17DigitDate

public static java.util.Date parse17DigitDate(java.lang.String date)
                                       throws java.text.ParseException
Utility function for parsing arc-style date stamps in the format yyyMMddHHmmssSSS. Date stamps are in the UTC time zone. The whole string will not be parsed, only the first 17 digits.

Parameters:
date - an arc-style formatted date stamp
Returns:
the Date corresponding to the date stamp string
Throws:
java.text.ParseException - if the inputstring was malformed

parse14DigitDate

public static java.util.Date parse14DigitDate(java.lang.String date)
                                       throws java.text.ParseException
Utility function for parsing arc-style date stamps in the format yyyMMddHHmmss. Date stamps are in the UTC time zone. The whole string will not be parsed, only the first 14 digits.

Parameters:
date - an arc-style formatted date stamp
Returns:
the Date corresponding to the date stamp string
Throws:
java.text.ParseException - if the inputstring was malformed

parse12DigitDate

public static java.util.Date parse12DigitDate(java.lang.String date)
                                       throws java.text.ParseException
Utility function for parsing arc-style date stamps in the format yyyMMddHHmm. Date stamps are in the UTC time zone. The whole string will not be parsed, only the first 12 digits.

Parameters:
date - an arc-style formatted date stamp
Returns:
the Date corresponding to the date stamp string
Throws:
java.text.ParseException - if the inputstring was malformed

timestamp17ToCalendar

public static java.util.Calendar timestamp17ToCalendar(java.lang.String timestamp17String)
Convert 17-digit date format timestamps (as found in crawl.log, for example) into a GregorianCalendar object. + * Useful so you can convert into milliseconds-since-epoch. Note: it is possible to compute milliseconds-since-epoch + * using parse17DigitDate(java.lang.String).UTC(), but that method is deprecated in favor of using Calendar.getTimeInMillis(). + *

I probably should have dug into all the utility methods in DateFormat.java to parse the timestamp, but this was + * easier. If someone wants to fix this to use those methods, please have at it!

Mike Schwartz, schwartz at CodeOnTheRoad dot com.

Parameters:
timestamp17String -
Returns:
Calendar set to timestamp17String.

secondsSinceEpoch

public static java.lang.String secondsSinceEpoch(java.lang.String timestamp)
                                          throws java.text.ParseException
Parameters:
timestamp - A 14-digit timestamp or the suffix for a 14-digit timestamp: E.g. '20010909014640' or '20010101' or '1970'.
Returns:
Seconds since the epoch as a string zero-pre-padded so always Integer.MAX_VALUE wide (Makes it so sorting of resultant string works properly).
Throws:
java.text.ParseException

getSecondsSinceEpoch

public static java.util.Date getSecondsSinceEpoch(java.lang.String timestamp)
                                           throws java.text.ParseException
Parameters:
timestamp - A 14-digit timestamp or the suffix for a 14-digit timestamp: E.g. '20010909014640' or '20010101' or '1970'.
Returns:
A date.
Throws:
java.text.ParseException
See Also:
secondsSinceEpoch(String)

zeroPadInteger

public static java.lang.String zeroPadInteger(int i)
Parameters:
i - Integer to add prefix of zeros too. If passed 2005, will return the String 0000002005. String width is the width of Integer.MAX_VALUE as a string (10 digits).
Returns:
Padded String version of i.

padTo

public static java.lang.String padTo(int i,
                                     int pad)
Convert an int to a String, and pad it to pad spaces.

Parameters:
i - the int
pad - the width to pad to.
Returns:
String w/ padding.

padTo

public static java.lang.String padTo(java.lang.String s,
                                     int pad)
Pad the given String to pad characters wide by pre-pending spaces. s should not be null. If s is already wider than pad no change is done.

Parameters:
s - the String to pad
pad - the width to pad to.
Returns:
String w/ padding.

padTo

public static java.lang.String padTo(java.lang.String s,
                                     int pad,
                                     char padChar)
Pad the given String to pad characters wide by pre-pending padChar. s should not be null. If s is already wider than pad no change is done.

Parameters:
s - the String to pad
pad - the width to pad to.
padChar - The pad character to use.
Returns:
String w/ padding.

byteArrayEquals

public static boolean byteArrayEquals(byte[] lhs,
                                      byte[] rhs)
check that two byte arrays are equal. They may be null.

Parameters:
lhs - a byte array
rhs - another byte array.
Returns:
true if they are both equal (or both null)

doubleToString

public static java.lang.String doubleToString(double val,
                                              int maxFractionDigits)
Converts a double to a string.

Parameters:
val - The double to convert
precision - How many characters to include after '.'
Returns:
the double as a string.

formatBytesForDisplay

public static java.lang.String formatBytesForDisplay(long amount)
Takes a byte size and formats it for display with 'friendly' units.

This involves converting it to the largest unit (of B, KB, MB, GB, TB) for which the amount will be > 1.

Additionally, at least 2 significant digits are always displayed.

Displays as bytes (B): 0-1023 Displays as kilobytes (KB): 1024 - 2097151 (~2Mb) Displays as megabytes (MB): 2097152 - 4294967295 (~4Gb) Displays as gigabytes (GB): 4294967296 - infinity

Negative numbers will be returned as '0 B'.

Parameters:
amount - the amount of bytes
Returns:
A string containing the amount, properly formated.

formatMillisecondsToConventional

public static java.lang.String formatMillisecondsToConventional(long time)
Convert milliseconds value to a human-readable duration

Parameters:
time -
Returns:
Human readable string version of passed time

formatMillisecondsToConventional

public static java.lang.String formatMillisecondsToConventional(long time,
                                                                boolean toMs)
Convert milliseconds value to a human-readable duration

Parameters:
time -
toMs - whether to print to the ms
Returns:
Human readable string version of passed time

classnameBasedUID

public static long classnameBasedUID(java.lang.Class<?> class1,
                                     int version)
Generate a long UID based on the given class and version number. Using this instead of the default will assume serialization compatibility across class changes unless version number is intentionally bumped.

Parameters:
class1 -
version -
Returns:
UID based off class and version number.

longIntoByteArray

public static void longIntoByteArray(long l,
                                     byte[] array,
                                     int offset)
Copy the raw bytes of a long into a byte array, starting at the specified offset.

Parameters:
l -
array -
offset -

byteArrayIntoLong

public static long byteArrayIntoLong(byte[] bytearray)

byteArrayIntoLong

public static long byteArrayIntoLong(byte[] bytearray,
                                     int offset)
Byte array into long.

Parameters:
bytearray - Array to convert to a long.
offset - Offset into array at which we start decoding the long.
Returns:
Long made of the bytes of array beginning at offset offset.
See Also:
longIntoByteArray(long, byte[], int)

addImpliedHttpIfNecessary

public static java.lang.String addImpliedHttpIfNecessary(java.lang.String u)
Given a string that may be a plain host or host/path (without URI scheme), add an implied http:// if necessary.

Parameters:
u - string to evaluate
Returns:
string with http:// added if no scheme already present

startsWith

public static boolean startsWith(byte[] array,
                                 byte[] prefix)
Verify that the array begins with the prefix.

Parameters:
array -
prefix -
Returns:
true if array is identical to prefix for the first prefix.length positions

singleLineReport

public static java.lang.String singleLineReport(Reporter rep)
Utility method to get a String singleLineReport from Reporter

Parameters:
rep - Reporter to get singleLineReport from
Returns:
String of report

writeReportToString

public static java.lang.String writeReportToString(Reporter rep,
                                                   java.lang.String name)
Compose the requested report into a String. DANGEROUS IF REPORT CAN BE LARGE.

Parameters:
rep - Reported
name - String name of report to compose
Returns:
String of report

isTld

public static boolean isTld(java.lang.String dom)
Return whether the given string represents a known top-level-domain (like "com", "org", etc.) per IANA as of 20100419

Parameters:
dom - candidate string
Returns:
boolean true if recognized as TLD


Copyright © 2003-2011 Internet Archive. All Rights Reserved.