org.archive.net
Class UURI

java.lang.Object
  extended by org.apache.commons.httpclient.URI
      extended by org.archive.net.LaxURI
          extended by org.archive.net.UURI
All Implemented Interfaces:
java.io.Serializable, java.lang.CharSequence, java.lang.Cloneable, java.lang.Comparable

public class UURI
extends LaxURI
implements java.lang.CharSequence, java.io.Serializable

Usable URI. This class wraps URI adding caching and methods. It cannot be instantiated directly. Go via UURIFactory.

We used to use URI for parsing URIs but ran across quirky behaviors and bugs. URI is not subclassable -- its final -- and its unlikely that java.net.URI will change any time soon (See Gordon's considered petition here: java.net.URI should have loose/tolerant/compatibility option (or allow reuse)).

This class tries to cache calculated strings such as the extracted host and this class as a string rather than have the parent class rerun its calculation everytime.

Author:
gojomo, stack
See Also:
URI, Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.commons.httpclient.URI
org.apache.commons.httpclient.URI.DefaultCharsetChanged, org.apache.commons.httpclient.URI.LocaleToCharsetMap
 
Field Summary
static java.lang.String MASSAGEHOST_PATTERN
           
static int MAX_URL_LENGTH
          Consider URIs too long for IE as illegal.
 
Fields inherited from class org.archive.net.LaxURI
HTTP_SCHEME, HTTPS_SCHEME, lax_abs_path, lax_query, lax_rel_segment
 
Fields inherited from class org.apache.commons.httpclient.URI
_authority, _fragment, _host, _is_abs_path, _is_hier_part, _is_hostname, _is_IPv4address, _is_IPv6reference, _is_net_path, _is_opaque_part, _is_reg_name, _is_rel_path, _is_server, _opaque, _path, _port, _query, _scheme, _uri, _userinfo, abs_path, absoluteURI, allowed_abs_path, allowed_authority, allowed_fragment, allowed_host, allowed_IPv6reference, allowed_opaque_part, allowed_query, allowed_reg_name, allowed_rel_path, allowed_userinfo, allowed_within_authority, allowed_within_path, allowed_within_query, allowed_within_userinfo, alpha, alphanum, authority, control, defaultDocumentCharset, defaultDocumentCharsetByLocale, defaultDocumentCharsetByPlatform, defaultProtocolCharset, delims, digit, disallowed_opaque_part, disallowed_rel_path, domainlabel, escaped, fragment, hash, hex, hier_part, host, hostname, hostport, IPv4address, IPv6address, IPv6reference, mark, net_path, opaque_part, param, path, path_segments, pchar, percent, port, protocolCharset, query, reg_name, rel_path, rel_segment, relativeURI, reserved, rootPath, scheme, segment, server, space, toplabel, unreserved, unwise, URI_reference, uric, uric_no_slash, userinfo, within_userinfo
 
Constructor Summary
protected UURI()
          Shutdown access to default constructor.
protected UURI(java.lang.String uri, boolean escaped)
           
protected UURI(java.lang.String uri, boolean escaped, java.lang.String charset)
           
protected UURI(UURI base, UURI relative)
           
 
Method Summary
 char charAt(int index)
           
protected  void coalesceHostAuthorityStrings()
          The two String fields cachedHost and cachedAuthorityMinusUserInfo are usually identical; if so, coalesce into a single instance.
protected  void coalesceUriStrings()
          The two String fields cachedString and cachedEscapedURI are usually identical; if so, coalesce into a single instance.
 int compareTo(java.lang.Object arg0)
           
 boolean equals(java.lang.Object obj)
          Test an object if this UURI is equal to another.
static UURI from(java.lang.Object o)
          Convenience method for finding the UURI inside an Object likely to have (or be/imply) one.
 java.lang.String getAuthorityMinusUserinfo()
          Return the authority minus userinfo (if any).
 java.lang.String getEscapedURI()
           
 java.lang.String getHost()
           
 java.lang.String getHostBasename()
          Strips www variants from the host.
 java.lang.String getReferencedHost()
          Return the referenced host in the UURI, if any, also extracting the host of a DNS-lookup URI where necessary.
 java.lang.String getSurtForm()
           
static boolean hasScheme(java.lang.String possibleUrl)
          Test if passed String has likely URI scheme prefix.
 int length()
           
static java.lang.String parseFilename(java.lang.String pathOrUri)
           
 UURI resolve(java.lang.String uri)
           
 UURI resolve(java.lang.String uri, boolean e)
           
 UURI resolve(java.lang.String uri, boolean e, java.lang.String charset)
           
 java.lang.CharSequence subSequence(int start, int end)
           
 java.lang.String toString()
          Override to cache result
 
Methods inherited from class org.archive.net.LaxURI
decode, decode, getPath, getPathQuery, getURI, lax, parseAuthority, parseUriReference, setURI, validate, validate
 
Methods inherited from class org.apache.commons.httpclient.URI
clone, encode, equals, getAboveHierPath, getAuthority, getCurrentHierPath, getDefaultDocumentCharset, getDefaultDocumentCharsetByLocale, getDefaultDocumentCharsetByPlatform, getDefaultProtocolCharset, getEscapedAboveHierPath, getEscapedAuthority, getEscapedCurrentHierPath, getEscapedFragment, getEscapedName, getEscapedPath, getEscapedPathQuery, getEscapedQuery, getEscapedURIReference, getEscapedUserinfo, getFragment, getName, getPort, getProtocolCharset, getQuery, getRawAboveHierPath, getRawAuthority, getRawCurrentHierPath, getRawCurrentHierPath, getRawFragment, getRawHost, getRawName, getRawPath, getRawPathQuery, getRawQuery, getRawScheme, getRawURI, getRawURIReference, getRawUserinfo, getScheme, getURIReference, getUserinfo, hasAuthority, hasFragment, hashCode, hasQuery, hasUserinfo, indexFirstOf, indexFirstOf, indexFirstOf, indexFirstOf, isAbsoluteURI, isAbsPath, isHierPart, isHostname, isIPv4address, isIPv6reference, isNetPath, isOpaquePart, isRegName, isRelativeURI, isRelPath, isServer, normalize, normalize, prevalidate, removeFragmentIdentifier, resolvePath, setDefaultDocumentCharset, setDefaultProtocolCharset, setEscapedAuthority, setEscapedFragment, setEscapedPath, setEscapedQuery, setFragment, setPath, setQuery, setRawAuthority, setRawFragment, setRawPath, setRawQuery
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

MAX_URL_LENGTH

public static final int MAX_URL_LENGTH
Consider URIs too long for IE as illegal.

See Also:
Constant Field Values

MASSAGEHOST_PATTERN

public static final java.lang.String MASSAGEHOST_PATTERN
See Also:
Constant Field Values
Constructor Detail

UURI

protected UURI()
Shutdown access to default constructor.


UURI

protected UURI(java.lang.String uri,
               boolean escaped,
               java.lang.String charset)
        throws org.apache.commons.httpclient.URIException
Parameters:
uri - String representation of an absolute URI.
escaped - If escaped.
charset - Charset to use.
Throws:
org.apache.commons.httpclient.URIException

UURI

protected UURI(UURI base,
               UURI relative)
        throws org.apache.commons.httpclient.URIException
Parameters:
relative - String representation of URI.
base - Parent UURI to use derelativizing.
Throws:
org.apache.commons.httpclient.URIException

UURI

protected UURI(java.lang.String uri,
               boolean escaped)
        throws org.apache.commons.httpclient.URIException,
               java.lang.NullPointerException
Parameters:
uri - String representation of a URI.
escaped - If escaped.
Throws:
java.lang.NullPointerException
org.apache.commons.httpclient.URIException
Method Detail

resolve

public UURI resolve(java.lang.String uri)
             throws org.apache.commons.httpclient.URIException
Parameters:
uri - URI as string that is resolved relative to this UURI.
Returns:
UURI that uses this UURI as base.
Throws:
org.apache.commons.httpclient.URIException

resolve

public UURI resolve(java.lang.String uri,
                    boolean e)
             throws org.apache.commons.httpclient.URIException
Parameters:
uri - URI as string that is resolved relative to this UURI.
e - True if escaped.
Returns:
UURI that uses this UURI as base.
Throws:
org.apache.commons.httpclient.URIException

resolve

public UURI resolve(java.lang.String uri,
                    boolean e,
                    java.lang.String charset)
             throws org.apache.commons.httpclient.URIException
Parameters:
uri - URI as string that is resolved relative to this UURI.
e - True if uri is escaped.
charset - Charset to use.
Returns:
UURI that uses this UURI as base.
Throws:
org.apache.commons.httpclient.URIException

equals

public boolean equals(java.lang.Object obj)
Test an object if this UURI is equal to another.

Overrides:
equals in class org.apache.commons.httpclient.URI
Parameters:
obj - an object to compare
Returns:
true if two URI objects are equal

getHostBasename

public java.lang.String getHostBasename()
                                 throws org.apache.commons.httpclient.URIException
Strips www variants from the host. Strips www[0-9]*\. from the host. If calling getHostBaseName becomes a performance issue we should consider adding the hostBasename member that is set on initialization.

Returns:
Host's basename.
Throws:
org.apache.commons.httpclient.URIException

toString

public java.lang.String toString()
Override to cache result

Specified by:
toString in interface java.lang.CharSequence
Overrides:
toString in class org.apache.commons.httpclient.URI
Returns:
String representation of this URI

getEscapedURI

public java.lang.String getEscapedURI()
Overrides:
getEscapedURI in class org.apache.commons.httpclient.URI

coalesceUriStrings

protected void coalesceUriStrings()
The two String fields cachedString and cachedEscapedURI are usually identical; if so, coalesce into a single instance.


getHost

public java.lang.String getHost()
                         throws org.apache.commons.httpclient.URIException
Overrides:
getHost in class org.apache.commons.httpclient.URI
Throws:
org.apache.commons.httpclient.URIException

coalesceHostAuthorityStrings

protected void coalesceHostAuthorityStrings()
The two String fields cachedHost and cachedAuthorityMinusUserInfo are usually identical; if so, coalesce into a single instance.


getReferencedHost

public java.lang.String getReferencedHost()
                                   throws org.apache.commons.httpclient.URIException
Return the referenced host in the UURI, if any, also extracting the host of a DNS-lookup URI where necessary.

Returns:
the target or topic host of the URI
Throws:
org.apache.commons.httpclient.URIException

getSurtForm

public java.lang.String getSurtForm()
Returns:
Return the 'SURT' format of this UURI

getAuthorityMinusUserinfo

public java.lang.String getAuthorityMinusUserinfo()
                                           throws org.apache.commons.httpclient.URIException
Return the authority minus userinfo (if any). If no userinfo present, just returns the authority.

Returns:
The authority stripped of any userinfo if present.
Throws:
org.apache.commons.httpclient.URIException

length

public int length()
Specified by:
length in interface java.lang.CharSequence

charAt

public char charAt(int index)
Specified by:
charAt in interface java.lang.CharSequence

subSequence

public java.lang.CharSequence subSequence(int start,
                                          int end)
Specified by:
subSequence in interface java.lang.CharSequence

compareTo

public int compareTo(java.lang.Object arg0)
Specified by:
compareTo in interface java.lang.Comparable
Overrides:
compareTo in class org.apache.commons.httpclient.URI

from

public static UURI from(java.lang.Object o)
Convenience method for finding the UURI inside an Object likely to have (or be/imply) one.

Parameters:
o - Object that is, has, or implies a UURI
Returns:
the UURI found, or null if none

hasScheme

public static boolean hasScheme(java.lang.String possibleUrl)
Test if passed String has likely URI scheme prefix.

Parameters:
possibleUrl - URL string to examine.
Returns:
True if passed string looks like it could be an URL.

parseFilename

public static java.lang.String parseFilename(java.lang.String pathOrUri)
                                      throws java.net.URISyntaxException
Parameters:
pathOrUri - A file path or a URI.
Returns:
Path parsed from passed pathOrUri.
Throws:
java.net.URISyntaxException


Copyright © 2003-2011 Internet Archive. All Rights Reserved.