|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.archive.crawler.frontier.BdbMultipleWorkQueues
public class BdbMultipleWorkQueues
A BerkeleyDB-database-backed structure for holding ordered groupings of CrawlURIs. Reading the groupings from specific per-grouping (per-classKey/per-Host) starting points allows this to act as a collection of independent queues.
For how the bdb keys are made, see calculateInsertKey(CrawlURI)
.
TODO: refactor, improve naming.
Nested Class Summary | |
---|---|
class |
BdbMultipleWorkQueues.BdbFrontierMarker
Marker for remembering a position within the BdbMultipleWorkQueues. |
Constructor Summary | |
---|---|
BdbMultipleWorkQueues(com.sleepycat.je.Environment env,
com.sleepycat.bind.serial.StoredClassCatalog classCatalog,
boolean recycle)
Create the multi queue in the given environment. |
Method Summary | |
---|---|
void |
addCap(byte[] origin)
Add a dummy 'cap' entry at the given insertion key. |
(package private) static com.sleepycat.je.DatabaseEntry |
calculateInsertKey(CrawlURI curi)
Calculate the insertKey that places a CrawlURI in the desired spot. |
(package private) static byte[] |
calculateOriginKey(java.lang.String classKey)
Calculate the 'origin' key for a virtual queue of items with the given classKey. |
void |
close()
clean up |
void |
delete(CrawlURI item)
Delete the given CrawlURI from persistent store. |
long |
deleteMatchingFromQueue(java.lang.String match,
java.lang.String queue,
com.sleepycat.je.DatabaseEntry headKey)
Delete all CrawlURIs matching the given expression. |
protected void |
forAllPendingDo(org.apache.commons.collections.Closure c)
Utility method to perform action for all pending CrawlURI instances. |
CrawlURI |
get(com.sleepycat.je.DatabaseEntry headKey)
Get the next nearest item after the given key. |
protected com.sleepycat.je.DatabaseEntry |
getFirstKey()
|
java.util.List |
getFrom(FrontierMarker m,
int maxMatches)
|
FrontierMarker |
getInitialMarker(java.lang.String regexpr)
Get a marker for beginning a scan over all contents |
protected com.sleepycat.je.OperationStatus |
getNextNearestItem(com.sleepycat.je.DatabaseEntry headKey,
com.sleepycat.je.DatabaseEntry result)
|
void |
put(CrawlURI curi,
boolean overwriteIfPresent)
Put the given CrawlURI in at the appropriate place. |
(package private) void |
sync()
Method used by BdbFrontier during checkpointing. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public BdbMultipleWorkQueues(com.sleepycat.je.Environment env, com.sleepycat.bind.serial.StoredClassCatalog classCatalog, boolean recycle) throws com.sleepycat.je.DatabaseException
env
- bdb environment to useclassCatalog
- Class catalog to use.recycle
- True if we are to reuse db content if any.
com.sleepycat.je.DatabaseException
Method Detail |
---|
public long deleteMatchingFromQueue(java.lang.String match, java.lang.String queue, com.sleepycat.je.DatabaseEntry headKey) throws com.sleepycat.je.DatabaseException
match
- queue
- headKey
-
com.sleepycat.je.DatabaseException
com.sleepycat.je.DatabaseException
public java.util.List getFrom(FrontierMarker m, int maxMatches) throws com.sleepycat.je.DatabaseException
m
- markermaxMatches
-
com.sleepycat.je.DatabaseException
public FrontierMarker getInitialMarker(java.lang.String regexpr)
regexpr
-
protected com.sleepycat.je.DatabaseEntry getFirstKey() throws com.sleepycat.je.DatabaseException
com.sleepycat.je.DatabaseException
public CrawlURI get(com.sleepycat.je.DatabaseEntry headKey) throws com.sleepycat.je.DatabaseException
TODO: hold within a queue's range
headKey
- Key prefix that demarks the beginning of the range
in pendingUrisDB
we're interested in.
com.sleepycat.je.DatabaseException
protected com.sleepycat.je.OperationStatus getNextNearestItem(com.sleepycat.je.DatabaseEntry headKey, com.sleepycat.je.DatabaseEntry result) throws com.sleepycat.je.DatabaseException
com.sleepycat.je.DatabaseException
public void put(CrawlURI curi, boolean overwriteIfPresent) throws com.sleepycat.je.DatabaseException
curi
-
com.sleepycat.je.DatabaseException
static byte[] calculateOriginKey(java.lang.String classKey)
classKey
- String key to derive origin byte key from
static com.sleepycat.je.DatabaseEntry calculateInsertKey(CrawlURI curi)
curi
-
public void delete(CrawlURI item) throws com.sleepycat.je.DatabaseException
item
-
com.sleepycat.je.DatabaseException
void sync()
The backing bdbje database has been marked deferred write so we save on writes to disk. Means no guarantees disk will have whats in memory unless a sync is called (Calling sync on the bdbje Environment is not sufficent).
Package access only because only Frontiers of this package would ever need access.
public void close()
public void addCap(byte[] origin)
origin
- key at which to insert the capprotected void forAllPendingDo(org.apache.commons.collections.Closure c) throws com.sleepycat.je.DatabaseException
c
- Closure action to perform
com.sleepycat.je.DatabaseException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |