org.archive.crawler.extractor
Class Link

java.lang.Object
  extended by org.archive.crawler.extractor.Link
All Implemented Interfaces:
java.io.Serializable

public class Link
extends java.lang.Object
implements java.io.Serializable

Link represents one discovered "edge" of the web graph: the source URI, the destination URI, and the type of reference (represented by the context in which it was found). As such, it is a suitably generic item to returned from generic link-extraction utility code.

Author:
gojomo
See Also:
Serialized Form

Field Summary
static char EMBED_HOP
          embedded links necessary to render the page, like IMG/@SRC
static java.lang.String EMBED_MISC
          stand-in value for embeds without other context
static java.lang.String JS_MISC
          stand-in value for js-discovered urls without other context
static char NAVLINK_HOP
          navigation links, like A/@HREF
static java.lang.String NAVLINK_MISC
          stand-in value for navlink urls without other context
static char PREREQ_HOP
          implied prerequisite links, like dns or robots
static java.lang.String PREREQ_MISC
          stand-in value for prerequisite without other context
static char REFER_HOP
          referral/redirect links, like header 'Location:' on a 301/302 response
static char SPECULATIVE_HOP
          speculative/aggressively extracted links, perhaps embed or nav, as in javascript
static java.lang.String SPECULATIVE_MISC
          stand-in value for speculative/aggressively extracted urls without other context
 
Constructor Summary
Link(java.lang.CharSequence source, java.lang.CharSequence destination, java.lang.CharSequence context, char hopType)
          Create a Link with the given fields.
 
Method Summary
static java.lang.CharSequence elementContext(java.lang.CharSequence element, java.lang.CharSequence attribute)
          Create a suitable XPath-like context from an element name and optional attribute name.
 java.lang.CharSequence getContext()
           
 java.lang.CharSequence getDestination()
           
 char getHopType()
           
 java.lang.CharSequence getSource()
           
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

EMBED_MISC

public static final java.lang.String EMBED_MISC
stand-in value for embeds without other context


JS_MISC

public static final java.lang.String JS_MISC
stand-in value for js-discovered urls without other context


NAVLINK_MISC

public static final java.lang.String NAVLINK_MISC
stand-in value for navlink urls without other context


SPECULATIVE_MISC

public static final java.lang.String SPECULATIVE_MISC
stand-in value for speculative/aggressively extracted urls without other context


PREREQ_MISC

public static final java.lang.String PREREQ_MISC
stand-in value for prerequisite without other context


NAVLINK_HOP

public static final char NAVLINK_HOP
navigation links, like A/@HREF

See Also:
Constant Field Values

PREREQ_HOP

public static final char PREREQ_HOP
implied prerequisite links, like dns or robots

See Also:
Constant Field Values

EMBED_HOP

public static final char EMBED_HOP
embedded links necessary to render the page, like IMG/@SRC

See Also:
Constant Field Values

SPECULATIVE_HOP

public static final char SPECULATIVE_HOP
speculative/aggressively extracted links, perhaps embed or nav, as in javascript

See Also:
Constant Field Values

REFER_HOP

public static final char REFER_HOP
referral/redirect links, like header 'Location:' on a 301/302 response

See Also:
Constant Field Values
Constructor Detail

Link

public Link(java.lang.CharSequence source,
            java.lang.CharSequence destination,
            java.lang.CharSequence context,
            char hopType)
Create a Link with the given fields.

Parameters:
source -
destination -
context -
hopType -
Method Detail

getContext

public java.lang.CharSequence getContext()
Returns:
Returns the context.

getDestination

public java.lang.CharSequence getDestination()
Returns:
Returns the destination.

getSource

public java.lang.CharSequence getSource()
Returns:
Returns the source.

getHopType

public char getHopType()
Returns:
char hopType

elementContext

public static java.lang.CharSequence elementContext(java.lang.CharSequence element,
                                                    java.lang.CharSequence attribute)
Create a suitable XPath-like context from an element name and optional attribute name.

Parameters:
element -
attribute -
Returns:
CharSequence context

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object


Copyright © 2003-2011 Internet Archive. All Rights Reserved.