|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.commons.httpclient.URI org.archive.net.UURIFactory
public class UURIFactory
Factory that returns UURIs. Does escaping and fixup on URIs massaging in accordance with RFC2396 and to match browser practice. For example, it removes any '..' if first thing in the path as per IE, converts backslashes to forward slashes, and discards any 'fragment'/anchor portion of the URI. This class will also fail URIs if they are longer than IE's allowed maximum length.
TODO: Test logging.
Nested Class Summary |
---|
Nested classes/interfaces inherited from class org.apache.commons.httpclient.URI |
---|
org.apache.commons.httpclient.URI.DefaultCharsetChanged, org.apache.commons.httpclient.URI.LocaleToCharsetMap |
Field Summary | |
---|---|
(package private) static java.lang.String |
ACCEPTABLE_ASCII_DOMAIN
Characters we'll accept in the domain label part of a URI authority: ASCII letters-digits-hyphen (LDH) plus underscore, with single intervening '.' characters. |
static java.lang.String |
APOSTROPH
|
static java.lang.String |
BACKSLASH
|
static java.lang.String |
BACKSLASH_PATTERN
|
static java.lang.String |
CIRCUMFLEX
|
static java.lang.String |
CIRCUMFLEX_PATTERN
|
static char |
COLON
|
static java.lang.String |
COMMERCIAL_AT
|
static java.lang.String |
DOT
|
static java.lang.String |
EMPTY_STRING
|
static java.lang.String |
ESCAPED_APOSTROPH
|
static java.lang.String |
ESCAPED_BACKSLASH
|
static java.lang.String |
ESCAPED_CIRCUMFLEX
|
static java.lang.String |
ESCAPED_LCURBRACKET
|
static java.lang.String |
ESCAPED_LSQRBRACKET
|
static java.lang.String |
ESCAPED_PIPE
|
static java.lang.String |
ESCAPED_QUOT
|
static java.lang.String |
ESCAPED_RCURBRACKET
|
static java.lang.String |
ESCAPED_RSQRBRACKET
|
static java.lang.String |
ESCAPED_SPACE
|
static java.lang.String |
ESCAPED_SQUOT
|
static java.lang.String |
HTTP
|
static java.lang.String |
HTTP_PORT
|
(package private) static java.util.regex.Pattern |
HTTP_SCHEME_SLASHES
Pattern that looks for case of three or more slashes after the scheme. |
static java.lang.String |
HTTPS
|
static java.lang.String |
HTTPS_PORT
|
static int |
IGNORED_SCHEME
|
static java.lang.String |
IMPROPERESC
|
static java.lang.String |
IMPROPERESC_REPLACE
|
static java.lang.String |
LCURBRACKET
|
static java.lang.String |
LCURBRACKET_PATTERN
|
static java.lang.String |
LSQRBRACKET
|
static java.lang.String |
LSQRBRACKET_PATTERN
|
(package private) static java.util.regex.Pattern |
MULTIPLE_SLASHES
Pattern that looks for case of two or more slashes in a path. |
static java.lang.String |
NBSP
|
static char |
PERCENT_SIGN
|
static java.lang.String |
PIPE
|
static java.lang.String |
PIPE_PATTERN
|
(package private) static java.util.regex.Pattern |
PORTREGEX
Authority port number regex. |
static java.lang.String |
QUOT
|
static java.lang.String |
RCURBRACKET
|
static java.lang.String |
RCURBRACKET_PATTERN
|
(package private) static java.util.regex.Pattern |
RFC2396REGEX
RFC 2396-inspired regex. |
static java.lang.String |
RSQRBRACKET
|
static java.lang.String |
RSQRBRACKET_PATTERN
|
static java.lang.String |
SLASH
|
static java.lang.String |
SLASHDOTDOTSLASH
|
static java.lang.String |
SPACE
|
static java.lang.String |
SQUOT
|
static java.lang.String |
STRAY_SPACING
|
static java.lang.String |
TRAILING_ESCAPED_SPACE
|
static java.lang.String |
URI_HEX_ENCODING
First percent sign in string followed by two hex chars. |
Fields inherited from class org.apache.commons.httpclient.URI |
---|
_authority, _fragment, _host, _is_abs_path, _is_hier_part, _is_hostname, _is_IPv4address, _is_IPv6reference, _is_net_path, _is_opaque_part, _is_reg_name, _is_rel_path, _is_server, _opaque, _path, _port, _query, _scheme, _uri, _userinfo, abs_path, absoluteURI, allowed_abs_path, allowed_authority, allowed_fragment, allowed_host, allowed_IPv6reference, allowed_opaque_part, allowed_query, allowed_reg_name, allowed_rel_path, allowed_userinfo, allowed_within_authority, allowed_within_path, allowed_within_query, allowed_within_userinfo, alpha, alphanum, authority, control, defaultDocumentCharset, defaultDocumentCharsetByLocale, defaultDocumentCharsetByPlatform, defaultProtocolCharset, delims, digit, disallowed_opaque_part, disallowed_rel_path, domainlabel, escaped, fragment, hash, hex, hier_part, host, hostname, hostport, IPv4address, IPv6address, IPv6reference, mark, net_path, opaque_part, param, path, path_segments, pchar, percent, port, protocolCharset, query, reg_name, rel_path, rel_segment, relativeURI, reserved, rootPath, scheme, segment, server, space, toplabel, unreserved, unwise, URI_reference, uric, uric_no_slash, userinfo, within_userinfo |
Method Summary | |
---|---|
protected void |
checkHttpSchemeSpecificPartSlashPrefix(org.apache.commons.httpclient.URI base,
java.lang.String scheme,
java.lang.String schemeSpecificPart)
If http(s) scheme, check scheme specific part begins '//'. |
protected java.lang.String |
escapeWhitespace(java.lang.String uri)
Escape any whitespace found. |
static UURI |
getInstance(java.lang.String uri)
|
static UURI |
getInstance(java.lang.String uri,
java.lang.String charset)
|
static UURI |
getInstance(UURI base,
java.lang.String relative)
|
static boolean |
hasSupportedScheme(java.lang.String possibleUrl)
Test of whether passed String has an allowed URI scheme. |
protected UURI |
validityCheck(UURI uuri)
Check the generated UURI. |
Methods inherited from class org.apache.commons.httpclient.URI |
---|
clone, compareTo, decode, decode, encode, equals, equals, getAboveHierPath, getAuthority, getCurrentHierPath, getDefaultDocumentCharset, getDefaultDocumentCharsetByLocale, getDefaultDocumentCharsetByPlatform, getDefaultProtocolCharset, getEscapedAboveHierPath, getEscapedAuthority, getEscapedCurrentHierPath, getEscapedFragment, getEscapedName, getEscapedPath, getEscapedPathQuery, getEscapedQuery, getEscapedURI, getEscapedURIReference, getEscapedUserinfo, getFragment, getHost, getName, getPath, getPathQuery, getPort, getProtocolCharset, getQuery, getRawAboveHierPath, getRawAuthority, getRawCurrentHierPath, getRawCurrentHierPath, getRawFragment, getRawHost, getRawName, getRawPath, getRawPathQuery, getRawQuery, getRawScheme, getRawURI, getRawURIReference, getRawUserinfo, getScheme, getURI, getURIReference, getUserinfo, hasAuthority, hasFragment, hashCode, hasQuery, hasUserinfo, indexFirstOf, indexFirstOf, indexFirstOf, indexFirstOf, isAbsoluteURI, isAbsPath, isHierPart, isHostname, isIPv4address, isIPv6reference, isNetPath, isOpaquePart, isRegName, isRelativeURI, isRelPath, isServer, normalize, normalize, parseAuthority, parseUriReference, prevalidate, removeFragmentIdentifier, resolvePath, setDefaultDocumentCharset, setDefaultProtocolCharset, setEscapedAuthority, setEscapedFragment, setEscapedPath, setEscapedQuery, setFragment, setPath, setQuery, setRawAuthority, setRawFragment, setRawPath, setRawQuery, setURI, toString, validate, validate |
Methods inherited from class java.lang.Object |
---|
finalize, getClass, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
static final java.util.regex.Pattern RFC2396REGEX
URI Generic Syntax August 1998 B. Parsing a URI Reference with a Regular Expression As described in Section 4.3, the generic URI syntax is not sufficient to disambiguate the components of some forms of URI. Since the "greedy algorithm" described in that section is identical to the disambiguation method used by POSIX regular expressions, it is natural and commonplace to use a regular expression for parsing the potential four components and fragment identifier of a URI reference. The following line is the regular expression for breaking-down a URI reference into its components. ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? 12 3 4 5 6 7 8 9 The numbers in the second line above are only to assist readability; they indicate the reference points for each subexpression (i.e., each paired parenthesis). We refer to the value matched for subexpression--as $ . For example, matching the above expression to http://www.ics.uci.edu/pub/ietf/uri/#Related results in the following subexpression matches: $1 = http: $2 = http $3 = //www.ics.uci.edu $4 = www.ics.uci.edu $5 = /pub/ietf/uri/ $6 = $7 = $8 = #Related $9 = Related where indicates that the component is not present, as is the case for the query component in the above example. Therefore, we can determine the value of the four components and fragment as scheme = $2 authority = $4 path = $5 query = $7 fragment = $9
Below differs from the rfc regex in that... (1) it has java escaping of regex characters (2) we allow a URI made of a fragment only (Added extra group so indexing is off by one after scheme). (3) scheme is limited to legal scheme characters
public static final java.lang.String SLASHDOTDOTSLASH
public static final java.lang.String SLASH
public static final java.lang.String HTTP
public static final java.lang.String HTTP_PORT
public static final java.lang.String HTTPS
public static final java.lang.String HTTPS_PORT
public static final java.lang.String DOT
public static final java.lang.String EMPTY_STRING
public static final java.lang.String NBSP
public static final java.lang.String SPACE
public static final java.lang.String ESCAPED_SPACE
public static final java.lang.String TRAILING_ESCAPED_SPACE
public static final java.lang.String PIPE
public static final java.lang.String PIPE_PATTERN
public static final java.lang.String ESCAPED_PIPE
public static final java.lang.String CIRCUMFLEX
public static final java.lang.String CIRCUMFLEX_PATTERN
public static final java.lang.String ESCAPED_CIRCUMFLEX
public static final java.lang.String QUOT
public static final java.lang.String ESCAPED_QUOT
public static final java.lang.String SQUOT
public static final java.lang.String ESCAPED_SQUOT
public static final java.lang.String APOSTROPH
public static final java.lang.String ESCAPED_APOSTROPH
public static final java.lang.String LSQRBRACKET
public static final java.lang.String LSQRBRACKET_PATTERN
public static final java.lang.String ESCAPED_LSQRBRACKET
public static final java.lang.String RSQRBRACKET
public static final java.lang.String RSQRBRACKET_PATTERN
public static final java.lang.String ESCAPED_RSQRBRACKET
public static final java.lang.String LCURBRACKET
public static final java.lang.String LCURBRACKET_PATTERN
public static final java.lang.String ESCAPED_LCURBRACKET
public static final java.lang.String RCURBRACKET
public static final java.lang.String RCURBRACKET_PATTERN
public static final java.lang.String ESCAPED_RCURBRACKET
public static final java.lang.String BACKSLASH
public static final java.lang.String BACKSLASH_PATTERN
public static final java.lang.String ESCAPED_BACKSLASH
public static final java.lang.String STRAY_SPACING
public static final java.lang.String IMPROPERESC_REPLACE
public static final java.lang.String IMPROPERESC
public static final java.lang.String COMMERCIAL_AT
public static final char PERCENT_SIGN
public static final char COLON
public static final java.lang.String URI_HEX_ENCODING
static final java.util.regex.Pattern PORTREGEX
static final java.lang.String ACCEPTABLE_ASCII_DOMAIN
static final java.util.regex.Pattern HTTP_SCHEME_SLASHES
static final java.util.regex.Pattern MULTIPLE_SLASHES
public static final int IGNORED_SCHEME
Method Detail |
---|
public static UURI getInstance(java.lang.String uri) throws org.apache.commons.httpclient.URIException
uri
- URI as string.
org.apache.commons.httpclient.URIException
public static UURI getInstance(java.lang.String uri, java.lang.String charset) throws org.apache.commons.httpclient.URIException
uri
- URI as string.charset
- Character encoding of the passed uri string.
org.apache.commons.httpclient.URIException
public static UURI getInstance(UURI base, java.lang.String relative) throws org.apache.commons.httpclient.URIException
base
- Base uri to use resolving passed relative uri.relative
- URI as string.
org.apache.commons.httpclient.URIException
public static boolean hasSupportedScheme(java.lang.String possibleUrl)
possibleUrl
- URL string to examine.
protected UURI validityCheck(UURI uuri) throws org.apache.commons.httpclient.URIException
uuri
- Created uuri to check.
uuri
so can easily inline this check.
org.apache.commons.httpclient.URIException
protected void checkHttpSchemeSpecificPartSlashPrefix(org.apache.commons.httpclient.URI base, java.lang.String scheme, java.lang.String schemeSpecificPart) throws org.apache.commons.httpclient.URIException
org.apache.commons.httpclient.URIException
Section 3.1. Common Internet
Scheme Syntax
protected java.lang.String escapeWhitespace(java.lang.String uri)
uri
- URI string to check.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |