org.archive.crawler.frontier
Class AntiCalendarCostAssignmentPolicy

java.lang.Object
  extended by org.archive.crawler.frontier.CostAssignmentPolicy
      extended by org.archive.crawler.frontier.UnitCostAssignmentPolicy
          extended by org.archive.crawler.frontier.AntiCalendarCostAssignmentPolicy

public class AntiCalendarCostAssignmentPolicy
extends UnitCostAssignmentPolicy

CostAssignmentPolicy that further penalizes URIs with calendar-suggestive strings in them, with an extra unit of cost. Will catch some 'innocent' URIs, but only when uncaught large-volume chaff is ranked higher than caught 'wheat' will this cause notable problems.

Author:
gojomo

Field Summary
static java.lang.String CALENDARISH
           
 
Constructor Summary
AntiCalendarCostAssignmentPolicy()
           
 
Method Summary
 int costOf(CrawlURI curi)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CALENDARISH

public static java.lang.String CALENDARISH
Constructor Detail

AntiCalendarCostAssignmentPolicy

public AntiCalendarCostAssignmentPolicy()
Method Detail

costOf

public int costOf(CrawlURI curi)
Overrides:
costOf in class UnitCostAssignmentPolicy


Copyright © 2003-2011 Internet Archive. All Rights Reserved.