org.archive.crawler.util
Class SetBasedUriUniqFilter

java.lang.Object
  extended by org.archive.crawler.util.SetBasedUriUniqFilter
All Implemented Interfaces:
UriUniqFilter
Direct Known Subclasses:
BdbUriUniqFilter, BloomUriUniqFilter, FPUriUniqFilter, MemUriUniqFilter, NoopUriUniqFilter

public abstract class SetBasedUriUniqFilter
extends java.lang.Object
implements UriUniqFilter

UriUniqFilter based on an underlying UriSet (essentially a Set).

Author:
gojomo

Nested Class Summary
 
Nested classes/interfaces inherited from interface org.archive.crawler.datamodel.UriUniqFilter
UriUniqFilter.HasUriReceiver
 
Field Summary
protected  long duplicateCount
           
protected  long duplicatesAtLastSample
           
protected  java.io.PrintWriter profileLog
           
protected  UriUniqFilter.HasUriReceiver receiver
           
 
Constructor Summary
SetBasedUriUniqFilter()
           
 
Method Summary
 void add(java.lang.String key, CandidateURI value)
          Add given uri, if not already present.
 void addForce(java.lang.String key, CandidateURI value)
          Add given uri, all the way through to underlying destination, even if already present.
 void addNow(java.lang.String key, CandidateURI value)
          Immediately add uri.
 void close()
          Close down any allocated resources.
 long count()
           
 void forget(java.lang.String key, CandidateURI value)
          Forget item was seen
 void note(java.lang.String key)
          Note item as seen, without passing through to receiver.
 long pending()
          Count of items added, but not yet filtered in or out.
protected  void profileLog(java.lang.String key)
           
 long requestFlush()
          Request that any pending items be added/dropped.
protected abstract  boolean setAdd(java.lang.CharSequence key)
           
protected abstract  long setCount()
           
 void setDestination(UriUniqFilter.HasUriReceiver receiver)
          Receiver of uniq URIs.
 void setProfileLog(java.io.File logfile)
          Set a File to receive a log for replay profiling.
protected abstract  boolean setRemove(java.lang.CharSequence key)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

receiver

protected UriUniqFilter.HasUriReceiver receiver

profileLog

protected java.io.PrintWriter profileLog

duplicateCount

protected long duplicateCount

duplicatesAtLastSample

protected long duplicatesAtLastSample
Constructor Detail

SetBasedUriUniqFilter

public SetBasedUriUniqFilter()
Method Detail

setAdd

protected abstract boolean setAdd(java.lang.CharSequence key)

setRemove

protected abstract boolean setRemove(java.lang.CharSequence key)

setCount

protected abstract long setCount()

count

public long count()
Specified by:
count in interface UriUniqFilter
Returns:
Count of already seen URIs.

pending

public long pending()
Description copied from interface: UriUniqFilter
Count of items added, but not yet filtered in or out. Some implementations may buffer up large numbers of pending items to be evaluated in a later large batch/scan/merge with disk files.

Specified by:
pending in interface UriUniqFilter
Returns:
Count of items added not yet evaluated

setDestination

public void setDestination(UriUniqFilter.HasUriReceiver receiver)
Description copied from interface: UriUniqFilter
Receiver of uniq URIs. Items that have not been seen before are pass through to this object.

Specified by:
setDestination in interface UriUniqFilter
Parameters:
receiver - Object that will be passed items. Must implement HasUriReceiver interface.

profileLog

protected void profileLog(java.lang.String key)

add

public void add(java.lang.String key,
                CandidateURI value)
Description copied from interface: UriUniqFilter
Add given uri, if not already present.

Specified by:
add in interface UriUniqFilter
Parameters:
key - Usually a canonicalized version of value. This is the key used doing lookups, forgets and insertions on the already included list.
value - item to add.

addNow

public void addNow(java.lang.String key,
                   CandidateURI value)
Description copied from interface: UriUniqFilter
Immediately add uri.

Specified by:
addNow in interface UriUniqFilter
Parameters:
key - Usually a canonicalized version of uri. This is the key used doing lookups, forgets and insertions on the already included list.
value - item to add.

addForce

public void addForce(java.lang.String key,
                     CandidateURI value)
Description copied from interface: UriUniqFilter
Add given uri, all the way through to underlying destination, even if already present. (Sometimes a URI must be fetched, or refetched, for example when DNS or robots info expires or the operator forces a refetch. A normal add() or addNow() would drop the URI without forwarding on once it is determmined to already be in the filter.)

Specified by:
addForce in interface UriUniqFilter
Parameters:
key - Usually a canonicalized version of uri. This is the key used doing lookups, forgets and insertions on the already included list.
value - item to add.

note

public void note(java.lang.String key)
Description copied from interface: UriUniqFilter
Note item as seen, without passing through to receiver.

Specified by:
note in interface UriUniqFilter
Parameters:
key - Usually a canonicalized version of an URI. This is the key used doing lookups, forgets and insertions on the already included list.

forget

public void forget(java.lang.String key,
                   CandidateURI value)
Description copied from interface: UriUniqFilter
Forget item was seen

Specified by:
forget in interface UriUniqFilter
Parameters:
key - Usually a canonicalized version of an URI. This is the key used doing lookups, forgets and insertions on the already included list.
value - item to add.

requestFlush

public long requestFlush()
Description copied from interface: UriUniqFilter
Request that any pending items be added/dropped. Implementors may ignore the request if a flush would be too expensive/too soon.

Specified by:
requestFlush in interface UriUniqFilter
Returns:
Number added.

close

public void close()
Description copied from interface: UriUniqFilter
Close down any allocated resources. Makes sense calling this when checkpointing.

Specified by:
close in interface UriUniqFilter

setProfileLog

public void setProfileLog(java.io.File logfile)
Description copied from interface: UriUniqFilter
Set a File to receive a log for replay profiling.

Specified by:
setProfileLog in interface UriUniqFilter


Copyright © 2003-2011 Internet Archive. All Rights Reserved.