org.archive.crawler.datamodel
Class Robotstxt
java.lang.Object
org.archive.crawler.datamodel.Robotstxt
- All Implemented Interfaces:
- java.io.Serializable
public class Robotstxt
- extends java.lang.Object
- implements java.io.Serializable
Utility class for parsing and representing 'robots.txt' format
directives, into a list of named user-agents and map from user-agents
to RobotsDirectives.
- See Also:
- Serialized Form
Constructor Summary |
Robotstxt(java.io.BufferedReader reader)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
serialVersionUID
static final long serialVersionUID
- See Also:
- Constant Field Values
userAgents
java.util.LinkedList<java.lang.String> userAgents
agentsToDirectives
java.util.Map<java.lang.String,RobotsDirectives> agentsToDirectives
hasErrors
boolean hasErrors
NO_DIRECTIVES
static RobotsDirectives NO_DIRECTIVES
Robotstxt
public Robotstxt(java.io.BufferedReader reader)
throws java.io.IOException
- Throws:
java.io.IOException
allowsAll
public boolean allowsAll()
- Does this policy effectively allow everything? (No
disallows or timing (crawl-delay) directives?)
- Returns:
getUserAgents
public java.util.List<java.lang.String> getUserAgents()
getDirectivesFor
public RobotsDirectives getDirectivesFor(java.lang.String ua)
Copyright © 2003-2011 Internet Archive. All Rights Reserved.