org.archive.io
Class Warc2Arc
java.lang.Object
org.archive.io.Warc2Arc
public class Warc2Arc
- extends java.lang.Object
Convert WARCs to (sortof) ARCs.
WARCs can be 1Gig in size, that is, 10x default ARC size. Script takes
directory as output and will write multiple ARCs for a single large WARC.
Only writes resource records of type text/dns
or
application/http; msgtype=response
. All others -- metadata,
request -- are skipped.
- Version:
- $Date: 2007-03-09 23:57:28 +0000 (Fri, 09 Mar 2007) $ $Revision: 4977 $
- Author:
- stack
Method Summary |
protected boolean |
isARCType(java.lang.String mimetype)
|
static void |
main(java.lang.String[] args)
Command-line interface to Arc2Warc. |
(package private) static java.lang.String |
parseRevision(java.lang.String version)
|
void |
transform(java.io.File warc,
java.io.File dir,
java.lang.String prefix,
java.lang.String suffix,
boolean force)
|
protected void |
transform(WARCReader reader,
ARCWriter writer)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Warc2Arc
public Warc2Arc()
parseRevision
static java.lang.String parseRevision(java.lang.String version)
transform
public void transform(java.io.File warc,
java.io.File dir,
java.lang.String prefix,
java.lang.String suffix,
boolean force)
throws java.io.IOException,
java.text.ParseException
- Throws:
java.io.IOException
java.text.ParseException
transform
protected void transform(WARCReader reader,
ARCWriter writer)
throws java.io.IOException,
java.text.ParseException
- Throws:
java.io.IOException
java.text.ParseException
isARCType
protected boolean isARCType(java.lang.String mimetype)
main
public static void main(java.lang.String[] args)
throws org.apache.commons.cli.ParseException,
java.io.IOException,
java.text.ParseException
- Command-line interface to Arc2Warc.
- Parameters:
args
- Command-line arguments.
- Throws:
org.apache.commons.cli.ParseException
- Failed parse of the command line.
java.io.IOException
java.text.ParseException
Copyright © 2003-2011 Internet Archive. All Rights Reserved.