org.archive.io
Class Warc2Arc

java.lang.Object
  extended by org.archive.io.Warc2Arc

public class Warc2Arc
extends java.lang.Object

Convert WARCs to (sortof) ARCs. WARCs can be 1Gig in size, that is, 10x default ARC size. Script takes directory as output and will write multiple ARCs for a single large WARC. Only writes resource records of type text/dns or application/http; msgtype=response. All others -- metadata, request -- are skipped.

Version:
$Date: 2007-03-09 23:57:28 +0000 (Fri, 09 Mar 2007) $ $Revision: 4977 $
Author:
stack

Constructor Summary
Warc2Arc()
           
 
Method Summary
protected  boolean isARCType(java.lang.String mimetype)
           
static void main(java.lang.String[] args)
          Command-line interface to Arc2Warc.
(package private) static java.lang.String parseRevision(java.lang.String version)
           
 void transform(java.io.File warc, java.io.File dir, java.lang.String prefix, java.lang.String suffix, boolean force)
           
protected  void transform(WARCReader reader, ARCWriter writer)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Warc2Arc

public Warc2Arc()
Method Detail

parseRevision

static java.lang.String parseRevision(java.lang.String version)

transform

public void transform(java.io.File warc,
                      java.io.File dir,
                      java.lang.String prefix,
                      java.lang.String suffix,
                      boolean force)
               throws java.io.IOException,
                      java.text.ParseException
Throws:
java.io.IOException
java.text.ParseException

transform

protected void transform(WARCReader reader,
                         ARCWriter writer)
                  throws java.io.IOException,
                         java.text.ParseException
Throws:
java.io.IOException
java.text.ParseException

isARCType

protected boolean isARCType(java.lang.String mimetype)

main

public static void main(java.lang.String[] args)
                 throws org.apache.commons.cli.ParseException,
                        java.io.IOException,
                        java.text.ParseException
Command-line interface to Arc2Warc.

Parameters:
args - Command-line arguments.
Throws:
org.apache.commons.cli.ParseException - Failed parse of the command line.
java.io.IOException
java.text.ParseException


Copyright © 2003-2011 Internet Archive. All Rights Reserved.