org.archive.crawler.util
Class DiskFPMergeUriUniqFilter

java.lang.Object
  extended by org.archive.crawler.util.FPMergeUriUniqFilter
      extended by org.archive.crawler.util.DiskFPMergeUriUniqFilter
All Implemented Interfaces:
UriUniqFilter

public class DiskFPMergeUriUniqFilter
extends FPMergeUriUniqFilter

Crude FPMergeUriUniqFilter using a disk data file of raw longs as the overall FP record.

Author:
gojomo

Nested Class Summary
 class DiskFPMergeUriUniqFilter.DataFileLongIterator
           
 
Nested classes/interfaces inherited from class org.archive.crawler.util.FPMergeUriUniqFilter
FPMergeUriUniqFilter.PendingItem
 
Nested classes/interfaces inherited from interface org.archive.crawler.datamodel.UriUniqFilter
UriUniqFilter.HasUriReceiver
 
Field Summary
(package private)  long count
           
(package private)  java.io.File currentFps
           
(package private)  long newCount
           
(package private)  java.io.DataOutputStream newFps
           
(package private)  java.io.File newFpsFile
           
(package private)  java.io.DataInputStream oldFps
           
(package private)  java.io.File scratchDir
           
 
Fields inherited from class org.archive.crawler.util.FPMergeUriUniqFilter
DEFAULT_MAX_PENDING, FLUSH_DELAY_FACTOR, maxPending, mergeDupAtLast, mergeDuplicateCount, nextFlushAllowableAfter, pendDupAtLast, pendDuplicateCount, pendingSet, profileLog, quickCache, quickDupAtLast, quickDuplicateCount, receiver
 
Constructor Summary
DiskFPMergeUriUniqFilter(java.io.File scratchDir)
           
 
Method Summary
protected  void addNewFp(long fp)
          Add an FP (which may be an old or new FP) to the new complete list.
protected  it.unimi.dsi.fastutil.longs.LongIterator beginFpMerge()
          Begin merging pending candidates with complete list.
 long count()
           
protected  void finishFpMerge()
          Complete the merge of candidate and previously-known FPs (closing files/iterators as appropriate).
 
Methods inherited from class org.archive.crawler.util.FPMergeUriUniqFilter
add, addForce, addNow, close, createFp, flush, forget, note, pend, pending, profileLog, requestFlush, setDestination, setMaxPending, setProfileLog
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

count

long count

scratchDir

java.io.File scratchDir

currentFps

java.io.File currentFps

newFpsFile

java.io.File newFpsFile

newFps

java.io.DataOutputStream newFps

newCount

long newCount

oldFps

java.io.DataInputStream oldFps
Constructor Detail

DiskFPMergeUriUniqFilter

public DiskFPMergeUriUniqFilter(java.io.File scratchDir)
Method Detail

beginFpMerge

protected it.unimi.dsi.fastutil.longs.LongIterator beginFpMerge()
Description copied from class: FPMergeUriUniqFilter
Begin merging pending candidates with complete list. Return an Iterator which will return all previously-known FPs in turn.

Specified by:
beginFpMerge in class FPMergeUriUniqFilter
Returns:
Iterator over all previously-known FPs

addNewFp

protected void addNewFp(long fp)
Description copied from class: FPMergeUriUniqFilter
Add an FP (which may be an old or new FP) to the new complete list. Should only be called after beginFpMerge() and before finishFpMerge().

Specified by:
addNewFp in class FPMergeUriUniqFilter
Parameters:
fp - the FP to add

finishFpMerge

protected void finishFpMerge()
Description copied from class: FPMergeUriUniqFilter
Complete the merge of candidate and previously-known FPs (closing files/iterators as appropriate).

Specified by:
finishFpMerge in class FPMergeUriUniqFilter

count

public long count()
Returns:
Count of already seen URIs.


Copyright © 2003-2011 Internet Archive. All Rights Reserved.