Network Working Group E. Hammer-Lahav
Internet-Draft Yahoo!
Intended status: Informational January 9, 2009
Expires: July 13, 2009
HTTP-based Resource Descriptor Discovery
draft-hammer-discovery-00
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on July 13, 2009.
Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document.
Abstract
This memo describes an HTTP-based process for obtaining information
about a resource identified by a URI. The 'information about a
Hammer-Lahav Expires July 13, 2009 [Page 1]
Internet-Draft Resource Discovery January 2009
resource' - a resource descriptor - typically provides machine-
readable information that aims to assist and enhance the interaction
with the resource. This memo only defines the process for locating
and obtaining the descriptor, but leaves the descriptor format and
its interpretation out of scope.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Notational Conventions . . . . . . . . . . . . . . . . . . . . 3
3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4. Resource Discovery and Service Discovery . . . . . . . . . . . 3
5. Discovery Workflow . . . . . . . . . . . . . . . . . . . . . . 5
6. 'describedby' Link Relationship . . . . . . . . . . . . . . . 6
7. Method Selection . . . . . . . . . . . . . . . . . . . . . . . 7
8. Obtaining Descriptor Location . . . . . . . . . . . . . . . . 9
8.1. <LINK> Element . . . . . . . . . . . . . . . . . . . . . . 9
8.2. HTTP Link Header . . . . . . . . . . . . . . . . . . . . . 10
8.3. Site-Meta Document . . . . . . . . . . . . . . . . . . . . 11
8.3.1. Site-Wide Links . . . . . . . . . . . . . . . . . . . 12
8.3.2. <link-template> Element . . . . . . . . . . . . . . . 12
8.3.3. DNS Verification for Non-HTTP(S) URIs . . . . . . . . 14
8.3.4. Method Workflow . . . . . . . . . . . . . . . . . . . 14
9. Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
10. Security Considerations . . . . . . . . . . . . . . . . . . . 16
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16
Appendix A. Method Suitability Analysis . . . . . . . . . . . 16
Appendix A.1. Requirements . . . . . . . . . . . . . . . . . . 16
Appendix A.2. Analysis . . . . . . . . . . . . . . . . . . . . 18
Appendix A.2.1. HTTP Response Header . . . . . . . . . . . . . . 18
Appendix A.2.2. HTTP Response Header Via HEAD . . . . . . . . . . 19
Appendix A.2.3. HTTP Content Negotiation . . . . . . . . . . . . 19
Appendix A.2.4. HTTP Header Negotiation . . . . . . . . . . . . . 20
Appendix A.2.5. <Link> Element . . . . . . . . . . . . . . . . . 21
Appendix A.2.6. HTTP OPTIONS Method . . . . . . . . . . . . . . . 22
Appendix A.2.7. WebDAV PROPFIND Method . . . . . . . . . . . . . 22
Appendix A.2.8. Custom HTTP Method . . . . . . . . . . . . . . . 23
Appendix A.2.9. Static Resource URI Transformation . . . . . . . 23
Appendix A.2.10. Dynamic Resource URI Transformation . . . . . . . 24
Appendix B. Acknowledgments . . . . . . . . . . . . . . . . . 25
12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25
12.1. Normative References . . . . . . . . . . . . . . . . . . . 25
12.2. Informative References . . . . . . . . . . . . . . . . . . 26
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 27
Hammer-Lahav Expires July 13, 2009 [Page 2]
Internet-Draft Resource Discovery January 2009
1. Introduction
This memo aims to provide a uniform and easily implementable process
for locating resource descriptors. With the development of
interoperability specifications comes the need to enable compliant
services and resources to declare their conformance to these
specifications. There is a growing need to describe resources in a
way that does not depend on their internal structure, or even the
availability of an HTTP-accessible representation of these resources.
For example, while an end-user is reading a web page such as a blog
article, the user-agent can discover whether the content of this page
has generated from an Atom feed or Atom entry and whether that feed
supports Atom authoring. It can discover whether there is an
iCalendar-formatted or CalDAV calendar associated with the page, or
where other content by the same page author might be found.
In an example related to the identity space, an end-user can use a
URI as an identifier for signing into web services, and in turn, the
web service can discover more information about the user's resources
and preferences such as who did the user delegate their identity
management to, where they keep their address book or list of social
network friends, where their profile information is stored to reduce
signup registration requirements, and what other services they use
which may enhance their interaction with the web service.
2. Notational Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
3. Scope
The scope of this memo is intentionally restricted to locating
resource descriptors, leaving out their format. Given the wide range
of use cases and information that can be provided 'about a resource',
no single descriptor format can adequately accommodate all needs.
However, the process in which the desired descriptor is located
should be consistent across use cases and formats.
4. Resource Discovery and Service Discovery
Resource discovery provides a process for obtaining information about
a resource identified with a URI. It allows resource-providers to
Hammer-Lahav Expires July 13, 2009 [Page 3]
Internet-Draft Resource Discovery January 2009
describe their resources in a machine-readable format, enabling
automatic interoperability by user-agents and resource-consuming
applications. Discovery enables applications to utilize a wide range
of web services and resources across multiple providers without the
need to know about their capabilities in advance, reducing the need
for manual configuration and resource-specific software.
When discussing discovery, it is important to differentiate between
resource discovery and service discovery. Both types attempts to
associate capabilities with resources, but they approach it from
opposite ends.
Service discovery centers around identifying the location of
qualified resources, typically finding an endpoint capable of certain
protocols and capabilities. In contrast, resource discovery begins
with a resource, trying to find which capabilities it supports.
A simple way to distinguish between the two types of discovery is to
define the questions they are each trying to answer:
Resource-Discovery: Given a resource, what are its attributes:
capabilities, characteristics, and relationships to other
resources?
Service-Discovery: Given a set of attributes, which available
resources match the desired set and what is their location?
While this memo deals exclusively with resource discovery, it is
important to note that the two discovery types are closely related
and are usually used in tandem. In fact, a typical use case will
switch between service discovery and resource discovery multiple
times in a single workflow, and can start with either one.
One reason for this dependency between the two discovery types is
that resource descriptors usually contain not only a list of
capabilities, but also relationships to other resources. Since those
relationships are usually typed, the process in which an application
chooses which links to use is in fact service discovery.
Applications use resource discovery to obtain the list of links, and
service discovery to choose the relevant links. In another common
example, the application uses service discovery to find a resource
with a given capability, then uses resource discovery to find out
what other capabilities it supports.
Unless otherwise noted, the term 'discovery' is used in this memo to
mean resource discovery.
Hammer-Lahav Expires July 13, 2009 [Page 4]
Internet-Draft Resource Discovery January 2009
5. Discovery Workflow
Discovery can be performed before or after a resource is obtained.
Performing discovery ahead of accessing the resource allows a
resource-consumer to learn more about the properties of the resource.
For example, a consumer can learn about the protocols supported by
the resource and if understood, utilize them to interact with it.
In many cases, discovery is performed after the resource has been
obtained, based on the content of the resource and the way in which
the user-agent interacts with it (or based on human interactions).
Most web applications make strong assumptions about the resources
they interact with, mostly due to lack of a standard discovery
protocol for web resources. Such assumptions are not likely to
disappear even with the introduction of a discovery workflow. In
many cases, discovery will be used as a secondary step for enhancing
the interaction with a resource rather than the first step of
determining how to interact with it at all.
The focus of this memo is on the first step in discovery: identifying
the location of the resource descriptors. The overall discovery
workflow includes two additional steps:
1. The location of the resource descriptor document is obtained
using the resource URI. It does not matter how the resource URI
has been obtained, just that a URI is known. Once the descriptor
location has been identified, the descriptor document is
retrieved.
2. The resource descriptor document is parsed based on the
descriptor document format used. For example, two such formats
are POWDER (Protocol for Web Description Resources c) and XRD
(Extensible Resource Descriptor [XRD] [[replace with new XRD
specification reference]]).
3. The information about the resource contained within the
descriptor document is processed to find out its capabilities,
characteristics, and relationships to other resources.
Capabilities are usually described with identifiers or
description languages that the consuming application can match to
a database of known capabilities or process via an interpreter.
While the process described in this memo utilizes the HTTP protocol
[RFC2616] for locating descriptors, it can be used with any URI
scheme and is not limited to just the 'http' and 'https' URI schemes.
HTTP is an ideal framework for performing discovery activities on web
resources, but it does not clearly define a mechanism for attaching a
descriptor or metadata to a resource identified with a URI.
Hammer-Lahav Expires July 13, 2009 [Page 5]
Internet-Draft Resource Discovery January 2009
6. 'describedby' Link Relationship
The first step when performing discovery is to identify the location
of the resource descriptor document for the desired resource. This
can be simply described as a link between the URI of the resource and
the URI of the descriptor. Links are one of the most fundamental
building blocks of the web, and provide all that is necessary to
define the relationship between a resource and its descriptors.
The purpose of this memo is to define a consistent set of methods
using HTTP through which this link information is obtained when
performing discovery. The web provides a large number of methods for
defining links between resources, but in order to achieve
interoperability, the selection has to be narrowed down to a much
smaller set of options.
Since a single resource can have many descriptors, the descriptor
link relationship has a one-to-many structure. In the case of
multiple descriptors, selecting which descriptor to use is
application-specific. It can involve factors such as the descriptor
document format, accessibility, and other typed relationships, and as
such is beyond the scope of this memo.
All the methods described in this memo build directly on the typed-
relationships framework defined in [I-D.nottingham-http-link-header].
The relationship type between a resource and its descriptor used for
discovery is 'describedby' which was originally defined by [POWDER]
as a generic relationship type as follows:
Relationship type: describedby
Purpose: To link a resource to a description that applies to that
resource
Documentation: http://www.w3.org/TR/powder-dr/#assoc-linking
Note: The relationship A 'describedby' B does not imply that B is
a POWDER file (the Media Type does that), simply that B provides a
description of A. This is the only constraint placed on A and B by
asserting the describedby relationship.
[[NOTE: the link type 'describedby' Link Relationship has been
submitted to IANA for review, approval, and inclusion in the Atom
Link Relations registry. The Atom Link Relations registry is
expected to be replaced by a generic Link Relations registry as
defined in [I-D.nottingham-http-link-header] section 4.2.]].
Hammer-Lahav Expires July 13, 2009 [Page 6]
Internet-Draft Resource Discovery January 2009
For example, the following HTTP response header (fragment) returned
with the HTTP representation of the resource
http://example.com/resource/1:
HEAD /resource/1 HTTP/1.1
Server: example.com
Link: <http://example.com/resource/1;about>;
rel="describedby"; type="application/xrd+xml"
defines a link between the resource http://example.com/resource/1 and
its descriptor located at http://example.com/resource/1;about and is
hinted to be using the XRD [XRD] document format.
The methods described in this memo all result in one or more link
relationships with type 'describedby'. Two out of the three methods
use existing link mechanisms as-is, by simply specifying the
relationship type used. The third defines a new mechanism for
dynamically constructing links using templates.
7. Method Selection
Due to the wide range of use cases requiring resource descriptors,
and the desire to reuse as much as possible, no single solution has
been found to sufficiently cover the requirements for linking between
the resource URI and the descriptor URI. An analysis of the
potential methods considered and the reason for their inclusion or
rejection can be found in Appendix A. A discussion regarding the
architectural issues around discovery can be found in [Uniform
Access].
Obtaining the link information between the resource URI and the
descriptor URI is accomplished using one of three methods. The
criteria used to determine which methods a resource-provider SHOULD
support and resource-consumer SHOULD attempt to use are based on a
combination of factors:
o The document type of the available resource representation (text/
html, application/atom+xml, image/png, unknown, etc.).
o The URI scheme (http, https, mailto, xmpp, etc.).
o The availability of an HTTP-accessible representation for the
resource (a representation of the resource that can be retrieved
using an HTTP GET request).
Hammer-Lahav Expires July 13, 2009 [Page 7]
Internet-Draft Resource Discovery January 2009
o The ability, desire, or applicability of the resource-consumer to
directly interact and retrieve a resource representation (which
might be unknown to it).
When selecting a method to use, the following requirement of each
method are considered (each method is described in details in
Section 8):
o <LINK> Element: Limited to resources with an accessible markup
representation with direct support for typed-relationships using
the <LINK> element, such as HTML [W3C.REC-html401-19991224] and
Atom [RFC4287]. Other document types are allowed as long as the
semantics of their <LINK> element are fully compatible with the
link framework defined in [I-D.nottingham-http-link-header]. This
method requires full retrieval of the resource representation
before any discovery information about it is available. While
HTTP is the most common transport for HTML and Atom documents,
this method is transport independent.
o HTTP Link Header: Limited to resources with an accessible
representation using the HTTP protocol [RFC2616], or resources for
which an HTTP GET or HEAD request returns a valid HTTP response
header. This method uses the Link header defined in
[I-D.nottingham-http-link-header]. This method requires the
retrieval of the resource representation header (using an HTTP GET
or HEAD request).
o Site-Meta Document: A known-location based solution used for any
resources identified by a URI with a DNS-resolvable authority
component (i.e. an authority that can be directly mapped to an IP
address). This method uses the Site-Meta document defined in
[I-D.nottingham-site-meta]. This method does not require any
direct interaction with the resource.
The order in which the methods are listed is based on their
applicability specialization, from the most restrictive method to the
most generic method. This ordering however does not imply the order
in which multiple applicable methods should be attempted (which is
application specific). Because different methods are more
appropriate in different circumstances, all three methods described
are considered equal and can be attempted in any order. To ensure
interoperability, the following rules MUST be observed:
o Resource-providers MUST support at least one of the three methods
for each resource for which discovery information is to be made
available. If more than one method is supported, all methods MUST
produce the same resource descriptor location (either by returning
the same descriptor URI or a different descriptor URI that leads
Hammer-Lahav Expires July 13, 2009 [Page 8]
Internet-Draft Resource Discovery January 2009
to the same descriptor URI after following HTTP redirections).
o Resource-consumers SHOULD support all three methods and attempt
each in their preferred order until a descriptor URI is obtained
successfully. Resource-consumers SHOULD NOT attempt additional
methods after a previous method has concluded successfully.
8. Obtaining Descriptor Location
To obtain the location of the resource descriptor using the resource
URI, the resource-consumer SHALL proceed as follows:
1. Select one of the three methods as defined in Section 7. In many
cases, only some of the methods will be applicable. If more than
one method is available, the resource-consumer SHOULD pick the
method most efficient for its needs.
2. Perform the steps described below for the selected method. If
successful, the method will produce the descriptor location. If
the method fails, repeat the process from the previous step by
selecting another method. If no method is left, the discovery
process fails.
3. Once the desired descriptor URI has been obtained, the descriptor
document is obtained via an HTTP GET request to the identified
URI. The resource-consumer MUST obey all HTTP 301 and 302
redirects and the descriptor document is considered valid only if
contained within an HTTP response with the HTTP 200 response
code.
8.1. <LINK> Element
Resources with an HTML [W3C.REC-html401-19991224] or an Atom
[RFC4287] representations MAY include a <LINK> element with the
'describedby' relationship type to link between the resource and its
descriptor.
For example:
<LINK href="http://example.com/resource;about"
rel="describedby" type="application/powder+xml">
A resource-consumer trying to obtain the location of the resource's
descriptor using this method SHALL:
1. Retrieve a valid representation of the resource using the
applicable transport for that resource URI. If the resource
Hammer-Lahav Expires July 13, 2009 [Page 9]
Internet-Draft Resource Discovery January 2009
representation is obtained using HTTP, the resource-consumer MUST
only use it if the HTTP response containing the representation
carries a valid HTTP 200 response code. If any other response
code is returned, the method fails. [[This is written
specifically about the request producing the representation,
ignoring any potential redirects that might have occurred prior.
Should redirects be explicitly mentioned here?]]
2. Parse the document as defined by the document specification and
look for <LINK> elements with a 'rel' attribute value containing
the 'describedby' relationship (a multiple relationship 'rel'
attribute value is allowed and MUST be handled by the consumer,
for example 'rel="describedby copyright"').
3. The resource-consumer SHOULD examine any available 'type'
attributes as hints for the document format used by the
descriptor document. If more than one link is found, the
descriptor mime-type SHOULD be used to narrow down the selection.
4. The descriptor location is obtained from the value of the 'href'
attribute on the selected <LINK> element.
8.2. HTTP Link Header
Resources with an accessible HTTP representation MAY include a Link
header in the HTTP response header as defined by
[I-D.nottingham-http-link-header] with a 'rel' parameter value set to
'describedby'.
For example:
Link: <http://example.com/resource;about>; rel="describedby";
type="application/powder+xml"
A resource-consumer trying to obtain the location of the resource's
descriptor using this method SHALL:
1. Retrieve a valid HTTP response header for the representation of
the resource using an HTTP GET or HEAD request. The resource-
consumer MUST follow HTTP redirections 301 and 302. The
resulting header MUST only be used for the purpose of discovery
if the HTTP response containing the header has one of the
following HTTP response codes: 200, 303, and 401. If any other
response code is returned, the method fails.
2. Parse the HTTP response header and look for a Link header with a
'rel' parameter value containing the 'describedby' relationship
(a multiple relationship 'rel' parameter value is allowed and
Hammer-Lahav Expires July 13, 2009 [Page 10]
Internet-Draft Resource Discovery January 2009
MUST be handled by the consumer, for example 'rel="describedby
copyright";').
3. The resource-consumer SHOULD examine any available 'type'
parameters as hints for the document format used by the
descriptor document. If more than one link is found, the
descriptor mime-type SHOULD be used to narrow down the selection.
4. The descriptor location is obtained from the URI-reference
locating between the '<>' characters of the selected Link header.
If the HTTP response code is 303, any descriptor location is defined
to be between the requested resource and the descriptor and not
between the 'See other' resource indicated by a Location header. If
the response code is 401, any descriptor location MUST only be used
in association with obtaining access to the resource, which once
obtained, must be queried again for its descriptor location which MAY
be different from the unauthorized response.
8.3. Site-Meta Document
The descriptor location of resources identified with a URI which
contains a DNS-resolvable authority component MAY be obtained from
the Site-Meta document [I-D.nottingham-site-meta] via a templatized
map used to transform the resource URI to the descriptor URI. The
templatized link is provided by an extension (defined by this memo)
to the Site-Meta schema for describing link templates. Using a
template provided by Site-Meta for URIs under its authority, a
resource URI can be deconstructed and then reconstructed to form the
URI of the descriptor location.
For example, given the resource identified by http://example.com/r/1,
the Site-Meta document for its authority example.com is obtained from
http://example.com/site-meta. The Site-Meta document defines a
template in which the resource URI is converted to the descriptor URI
by appending ';about' to the URI:
<metadata>
<link-template template="{uri};about"
rel="describedby"
type="application/powder+xml" />
</metadata>
resulting in the descriptor location URI
http://example.com/r/1;about.
Hammer-Lahav Expires July 13, 2009 [Page 11]
Internet-Draft Resource Discovery January 2009
8.3.1. Site-Wide Links
Site-Meta defines a method for locating site-wide metadata for web
sites. Its primary objective is to avoid the need of further known-
location solutions by creating one last such resource which can point
to other resources. It can be considered a registry for "known-
location" resources to avoid further intrusion into the site's naming
authority, hopefully the last such resource.
In the context of discovery, Site-Meta offers a convenient location
for storing information about how to map between resource URIs and
their descriptor URIs at an authority level. Site-Meta provides a
method for obtaining descriptor locations that does not depend on the
availability of an HTTP representation (or 303 See Other response)
for resources. It can also, with an additional authority
verification step (described in Section 8.3.3), provide descriptor
locations for URIs with schemes other than 'http' and 'https'.
The elements defined by Site-Meta are meant to contain site-wide
information. Unlike Link headers included in the HTTP response to
requests for the domain root resource (obtained via 'GET / HTTP/1.1'
or 'HEAD / HTTP/1.1') which are specific to the root resource, links
in Site-Meta are between the abstract 'web site' entity and the
linked resources. It is critical not to confuse the root resource of
a domain authority with the abstract 'web site' entity described by
Site-Meta.
For this reason, any <meta> elements containing linked resources with
relationship type 'describedby', identify the location of the
abstract 'web site' entity description which by itself cannot be
described using a URI. While this is a valid application of the
'describedby' relationship type, it is beyond the scope of this memo.
8.3.2. <link-template> Element
The <link-template> element is defined as a child of the root
<metadata> element and identical to the Site-Meta <meta> element with
the following differences:
o It cannot contain any value or child elements and must be self-
closing.
o The 'href' attribute in the <meta> element is replaced by the
'template' attribute.
o The OPTIONAL 'scheme' attribute is added.
Hammer-Lahav Expires July 13, 2009 [Page 12]
Internet-Draft Resource Discovery January 2009
o The resulting relationship is defined as between the individual
resource used as an input to the template and the resulting
descriptor URI, and do not in relation to the abstract 'web site'
entity.
The 'scheme' attribute serves as a filter indicating which URI scheme
are meant to be transformed using the provided template. This
OPTIONAL attribute is meant to allow different handling of different
URI schemes. The attribute value is a space separated list of
lowercase scheme names. If omitted, the template is meant to be
applied to any URI schemes.
The 'template' attribute defines a URI template with a very simple
syntax. The attribute value is used to construct a valid URI by
substituting the variable enclosed in '{}' with the value of the
variable.
In the example above, the 'uri' variable is replaced with the actual
resource URI (the resource URI http://example.com replaces the
'{uri}' string which results in http://example.com;about). If the
variable name is prefixed by a '%' character, any character other
than unreserved in variable value MUST be percent-encoded per
[RFC3986].
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
For example, the following template when used with the resource URI
http://example.com:
<metadata>
<link-template template="http://example.com?describe={%uri}"
rel="describedby"
type="application/xrd+xml"
scheme="mailto http https" />
</metadata>
produces the descriptor URI:
http://example.com?describe=http%3A%2F%2Fexample.com.
[[This initial draft only defines a single 'uri' variable. However,
it is expected that future revision will define a larger template
vocabulary which will be based on the URI structure definition and
include: uri, scheme, authority, domain, port, path, query, fragment,
and username (for 'mailto' URIs).]]
[[Site-Meta is pending a major revision of its document format which
is likely to replace its XML structure with a simpler text based
structure. This memo will be revised as soon as the new Site-Meta
Hammer-Lahav Expires July 13, 2009 [Page 13]
Internet-Draft Resource Discovery January 2009
draft is published to reflect these changes. However, the changes
are expected to only change the document formatting.]]
8.3.3. DNS Verification for Non-HTTP(S) URIs
Site-Meta uses the HTTP protocol for providing metadata about the
abstract 'web site' entity. This raises the issue whether an HTTP
server can speak authoritatively for a non-HTTP resource, namely, a
resource identified by a URI with a scheme other than 'http' or
'https'.
From a deployment perspective, many organizations separate the
administration responsibilities of their HTTP resources from other
resources such as email (SMTP) or instant messaging (XMPP). It would
be an unexpected behavior in such organizations if the HTTP server
provides authoritative information about identifiers belonging to
other departments.
The process defined in this memo was design with ease of deployment
as one of its top priorities. Since it is unlikely that protocols
such as SMTP will introduce their own discovery extensions (which
will realize any significant deployment in the foreseeable future),
Site-Meta must be able to provide authoritative information regarding
the descriptor location of non-HTTP(S) resources.
To obtain such authority, the owner of the domain (as represented by
the administrator of its DNS records) MUST declare that Site-Meta is
indeed allowed to provide such information. To do so, the domain DNS
records MUST include, for each domain or sub-domain for which the
HTTP server has such authority, a TXT record with the exact value of
'/site-meta non-http delegation enabled'. [[The DNS record type and
value are included in this draft only as a straw man proposal and are
likely to change based on feedback received from the DNS community.
Proposals for such record are requested.]]
Resource-consumers MUST verify the existence of such DNS record
before obtaining and utilizing the Site-Meta document for the
discovery of non-HTTP(S) resources. If no such record is found, the
method fails.
8.3.4. Method Workflow
A resource-consumer trying to obtain the location of the resource's
descriptor using this method SHALL:
1. Examine the resource URI and extract its authority component as
defined by [RFC3986] section 3:
Hammer-Lahav Expires July 13, 2009 [Page 14]
Internet-Draft Resource Discovery January 2009
foo://example.com:8042/over/there?name=ferret#nose
\_/ \______________/\_________/ \_________/ \__/
| | | | |
scheme authority path query fragment
If the authority contains an '@' character, the '@' character and
everything to its left is removed. For example, in the URI
mailto:username@example.com the authority component
username@example.com is stripped of the '@' character and the
characters to its left, leaving example.com as the extracted
authority value used in follow-up steps.
2. If the URI scheme being discovered is not 'http' or 'https', the
resource-consumer MUST perform DNS verification as described in
Section 8.3.3 to ensure that the HTTP protocol service for that
domain has the authority to relay discovery location for other
schemes.
3. Retrieve the Site-Meta document for the extracted authority as
defined by [I-D.nottingham-site-meta] section 4, by making an
HTTP GET request:
GET /site-meta HTTP/1.1
Host: example.com
If the request fails to retrieve a valid Site-Meta document, the
method fails. [[Should the method require the use of HTTPS when
retrieving the Site-Meta document when performing discovery on
'https' scheme URIs?]]
4. Parse Site-Meta document and look for <link-template> elements
with a 'rel' attribute value containing the 'describedby'
relationship (a multiple relationship 'rel' attribute value is
allowed and MUST be handled by the consumer, for example
'rel="describedby copyright"').
5. The resource-consumer SHOULD examine any available 'type'
attributes as hints for the document format used by the
descriptor document. If more than one link template is found,
the descriptor mime-type SHOULD be used to narrow down the
selection.
6. The descriptor location is constructed by applying the template
obtained from the 'template' attribute of the selected <link-
template> element on the resource URI.
Hammer-Lahav Expires July 13, 2009 [Page 15]
Internet-Draft Resource Discovery January 2009
9. Caching
Resource-consumers MUST obey all HTTP caching headers and directives
and discard any cached descriptor location as defined by the
resource-provider. The ability to cache descriptor locations was a
key requirement in selecting which methods to include in the
discovery workflow. It is critical that such information is cached
as defined by HTTP.
10. Security Considerations
The methods used to perform discovery are not secure, private or
integrity-guaranteed, and due caution should be exercised when using
them. Applications that perform discovery should consider the attack
vectors opened by automatically following, trusting, or otherwise
using links gathered from <LINK> elements, HTTP Link headers, or
Site-Meta documents.
11. IANA Considerations
This memo includes no request to IANA. The relationship type
'describedby' used by this memo is pending approval by the IANA and
must be fully registered before this memo can become final. If for
any reason the 'describedby' relationship type fails to register with
the IANA, it is expected that this memo will define a new
relationship type.
Appendix A. Method Suitability Analysis
The following analysis attempts to list all the method proposed for
addressing resource discovery. It has been previously published as
an article at [Discovery and HTTP] and is included here to provide
background information as to why certain methods have been selected
while others rejected from the discovery process. It has been
updated to match the terms used in this memo and its structure.
Appendix A.1. Requirements
Getting from a resource URI to its descriptor document can be
implemented in many ways. The problem is that none of the current
methods address all of the requirements presented by the common use
cases. The requirements are simple, but the more we try to address,
the less elegant and accessible the process becomes. While working
on the now defunct XRDS-Simple specification [XRDS-Simple] and
talking to companies and individual about it, the following
Hammer-Lahav Expires July 13, 2009 [Page 16]
Internet-Draft Resource Discovery January 2009
requirements emerged for any proposed process:
Self Declaration:
Allow resources to declare the availability of descriptor
information and its location. When a resource is accessed, it
needs to have a way to communicate to the resource-consumer
that it supports the discovery protocol and to indicates the
location of such descriptor.
This is useful when the consumer is able or is already
interacting with the resource but can enhance its interaction
with additional information. For example, accessing a blog
page enhanced if it was generated from an Atom feed or Atom
entry and that feed supports Atom authoring.
Direct Descriptor Access:
Enable direct retrieval of the resource descriptor without
interacting with the resource itself. Before a resource is
accessed, the resource-consumer should have a way to obtain the
resource descriptor without accessing the resource. This is
important for two reasons.
First, accessing an unknown resource may have undesirable
consequences. After all, the information contained in the
descriptor is supposed to inform the consumer how to interact
with the resource. The second is efficiency - removing the
need to first obtain the resource in order to get its
descriptor (reducing HTTP round-trips, network bandwidth, and
application latency).
Web Architecture Compliant:
Work with well-established web infrastructure. This may sound
obvious but it is in fact the most complex requirement.
Deploying new extensions to the HTTP protocol is a complicated
endeavor. Beside getting applications to support a new header,
method, or content negotiation, existing caches and proxies
must be enhanced to properly handle these requests, and they
must not fail performing their normal duties without such
enhancements.
For example, a new content negotiation method may cause an
existing cache to serve the wrong data to a non-discovery
consumer due to its inability to distinguish the metadata
request from the resource representation request.
Hammer-Lahav Expires July 13, 2009 [Page 17]
Internet-Draft Resource Discovery January 2009
Scale and Technology Agnostic:
Support large and small web providers regardless of the size of
operations and deployment. Any solution must work for a small
hosted web site as well as the world largest search engine. It
must be flexible enough to allow developers with restricted
access to the full HTTP protocol (such as limited access to
request or response headers) to be able to both provide and
consume resource descriptors. Any solution should also support
caching as much as possible and allow reuse of source code and
data.
Extensible:
Accommodate future enhancements and unknown descriptor formats.
It should support the existing set of descriptor formats such
as XRD and POWDER, as well as new descriptor relationships that
might emerge in the future. In addition, the solution should
not depend on the descriptor format itself and work equally
well with any document format - it should aim to keep the road
and destination separate.
Appendix A.2. Analysis
The following is a list of proposed and implemented methods trying to
address resource discovery. Each method is reviewed for its
compliance with the requirements identified previously. The [-],
[+], or [+-] symbols next to each requirement indicate how well the
method complies with the requirement.
Appendix A.2.1. HTTP Response Header
When a resource representation is retrieved using and HTTP GET
request, the server includes in the response a header pointing to the
location of the descriptor document. For example, POWDER uses the
'Link' response header to create an association between the resource
and its descriptor. XRDS [XRDS] (based on the Yadis protocol
[Yadis]) uses a similar approach, but since the Link header was not
available when Yadis was first drafted, it defines a custom header
X-XRDS-Location which serves a similar but less generic purpose.
[+] Self Declaration - using the Link header, any resource can point
to its descriptor documents.
[-] Direct Descriptor Access - the header is only accessible when
requesting the resource itself via an HTTP GET request. While
HTTP GET is meant to be a safe operation, it is still possible for
some resource to have side-effects.
Hammer-Lahav Expires July 13, 2009 [Page 18]
Internet-Draft Resource Discovery January 2009
[+] Web Architecture Compliant - uses the Link header which is an
IETF Internet Standard [[Currently a standard-track draft]], and
is consistent with HTTP protocol design.
[-] Scale and Technology Agnostic - since discovery accounts for a
small percent of resource requests, the extra Link header is
wasteful. For some hosted servers, access to HTTP headers is
limited and will prevent implementation.
[+] Extensible - the Link header provides built-in extensibility by
allowing new link relationships, mime-types, and other extensions.
Minimum roundtrips to retrieve the resource descriptor: 2
Appendix A.2.2. HTTP Response Header Via HEAD
Same as the HTTP Response Header method but used with an HTTP HEAD
request. The idea of using the HEAD method is to solve the wasteful
overhead of including the Link header in every reply. By limiting
the appearance of the Link header only to HEAD responses, typical GET
requests are not encumbered by the extra bytes.
[+] Self Declaration - Same as the HTTP Response Header method.
[-] Direct Descriptor Access - Same as the HTTP Response Header
method.
[-] Web Architecture Compliant - HTTP HEAD should return the exact
same response as HTTP GET with the sole exception that the
response body is omitted. By adding headers only to the HEAD
response, this solution violates the HTTP protocol and might not
work properly with proxies as they can return the header of the
cached GET request.
[+] Scale and Technology Agnostic - solves the wasted bandwidth
associated with the HTTP Response Header method, but still suffers
from the limitation imposed by requiring access to HTTP headers.
[+] Extensible - Same as the HTTP Response Header method.
Minimum roundtrips to retrieve the resource descriptor: 2
Appendix A.2.3. HTTP Content Negotiation
Using the Accept request header, the consumer informs the server it
is interested in the descriptor and not the resource itself, to which
the server responds with the descriptor document or its location. In
Yadis, the consumer sends an HTTP GET (or HEAD) request to the
Hammer-Lahav Expires July 13, 2009 [Page 19]
Internet-Draft Resource Discovery January 2009
resource URI with an Accept header and content-type application/
xrds+xml. This informs the server of the consumer's discovery
interest, which in turn may reply with the descriptor document
itself, redirect to it, or return its location via the X-XRDS-
Location response header.
[-] Self Declaration - does not address as it focuses on the
consumer declaring its intentions.
[+] Direct Descriptor Access - provides a simple method for directly
requesting the descriptor document.
[-] Web Architecture Compliant - while it can be argued that the
descriptor can be considered another representation of the
resource, it is very much external to it. Using the Accept header
to request a separate resource (as opposed to a different
representation of the same resource) violates web architecture.
It also prevents using the discovery content-type as a valid
(self-standing) web resource having its own descriptor.
[-] Scale and Technology Agnostic - requires access to HTTP request
and response headers, as well as the registration of multiple
handlers for the same resource URI based on the Accept header. In
addition, improper use or implementation of the Vary header in
conjunction with the Accept header will cause caches to serve the
descriptor document instead of the resource itself - a great
concern to large providers with frequently visited front-pages.
[-] Extensible - applies an implicit relationship type to the
descriptor mime-type, limiting descriptor formats to a single
purpose. It also prevents using existing mime-types from being
used as a descriptor format.
Minimum roundtrips to retrieve the resource descriptor: 1
Appendix A.2.4. HTTP Header Negotiation
Similar to the HTTP Content Negotiation method, this solution uses a
custom HTTP request header to inform the server of the consumer's
discovery intentions. The server responds by serving the same
resource representation (via an HTTP GET or HEAD requests) with the
relevant Link headers. It attempts to solve the HTTP Response Header
waste issue by allowing the consumer to explicitly request the
inclusion of Link headers. One such header can be called 'Request-
links' to inform the server the consumer would like it to include
certain Link headers of a given 'rel' type in its reply.
Hammer-Lahav Expires July 13, 2009 [Page 20]
Internet-Draft Resource Discovery January 2009
[+] Self Declaration - same as HTTP Response Header with the option
of selective inclusion.
[-] Direct Descriptor Access - does not address.
[-] Web Architecture Compliant - HTTP does not include any mechanism
for header negotiation and any custom solution will break existing
caches.
[+-] Scale and Technology Agnostic - Requires advance access to HTTP
headers on both the consumer and provider sides, but solves the
bandwidth waste issue of the HTTP Response Header method.
[+] Extensible - builds on top of Link header extensibility.
Minimum roundtrips to retrieve the resource descriptor: 2
Appendix A.2.5. <Link> Element
Embeds the location of the descriptor document within the resource
representation by leveraging the HTML <Link> header element (as
opposed to the HTTP header). Applies to HTML resource
representations or similar markup-based formats with support for
'Link'-like elements such as Atom. POWDER uses the <Link> element in
this manner, while XRDS uses the HTML <meta> element with an 'http-
equiv' attribute equals to X-XRDS-Location (to create an embedded
version of the X-XRDS-Location custom header).
[+] Self Declaration - similar to HTTP Response Header method but
limited to HTML resources.
[-] Direct Descriptor Access - the method requires fetching the
entire resource representation in order to obtain the descriptor
location. In addition, it requires changing the resource HTML
representation which makes discovery an intrusive process.
[+] Web Architecture Compliant - uses the <Link> element as
designed.
[+] Scale and Technology Agnostic - while this solution requires
direct retrieval of the resource and manipulation of its content,
it is extremely accessible in many platforms.
[-] Extensible - extensibility is restricted to HTML representations
or similar markup formats with support for a similar element.
Minimum roundtrips to retrieve the resource descriptor: 2
Hammer-Lahav Expires July 13, 2009 [Page 21]
Internet-Draft Resource Discovery January 2009
Appendix A.2.6. HTTP OPTIONS Method
The HTTP OPTIONS method is used to interact with the HTTP server with
regard to its capabilities and communication-related information
about its resources. The OPTIONS method, together with an optional
request header, can be used to request both the descriptor location
and descriptor content itself.
[-] Self Declaration - does not address.
[+] Direct Descriptor Access - provides a clean mechanism for
requesting descriptor information about a resource without
interacting with it.
[+] Web Architecture Compliant - uses an existing HTTP featured.
[-] Scale and Technology Agnostic - requires consumer and provider
access to the OPTIONS HTTP method. Also does not support caching
which makes this solution inefficient.
[+] Extensible - built-into the OPTIONS method.
Minimum roundtrips to retrieve the resource descriptor: 1
Appendix A.2.7. WebDAV PROPFIND Method
Similar to the HTTP OPTIONS method, the WebDAV PROPFIND method
defined in [RFC4918] can be used to request resource specific
properties, one of which can hold the location of the descriptor
document. PROPFIND, unlike OPTIONS, cannot return the descriptor
itself, unless it is returned in the required PROPFIND schema (a
multi-status XML element). Other alternatives include URIQA [URIQA],
an HTTP extension which defines a method called MGET, and ARK
(Archival Resource Key) [ARK] - a method similar to PROPFIND that
allows the retrieval of resource attributes using keys (which
describe the resource).
[-] Self Declaration - does not address.
[+-] Direct Descriptor Access - does not require interaction with
the resource, but does require at least two requests to get the
descriptor (get location, get document).
[+] Web Architecture Compliant - uses an HTTP extension with less
support than core HTTP, but still based on published standards.
Hammer-Lahav Expires July 13, 2009 [Page 22]
Internet-Draft Resource Discovery January 2009
[-] Scale and Technology Agnostic - same as the HTTP OPTIONS Method.
[+-] Extensible - uses extensible protocols but at the same time
depends on solutions that have already gone beyond the standard
HTTP protocol, which makes further extensions more complex and
unsupported.
Minimum roundtrips to retrieve the resource descriptor: 2
Appendix A.2.8. Custom HTTP Method
Similar to the HTTP OPTIONS Method, a new method can be defined (such
as DISCOVER) to return (or redirect to) the descriptor document. The
new method can allow caching.
[-] Self Declaration - does not address.
[+] Direct Descriptor Access - same as the HTTP OPTIONS Method.
[-] Web Architecture Compliant - depends heavily on extending every
platform to support the extension. Unlikely to be supported by
existing proxy services and caches.
[-] Scale and Technology Agnostic - same as HTTP OPTIONS Method with
the additional burden on smaller sites requiring access to the new
protocol.
[+] Extensible - new protocol that can extend as needed.
Minimum roundtrips to retrieve the resource descriptor: 1
Appendix A.2.9. Static Resource URI Transformation
Instead of using HTTP facilities to access the descriptor location,
this method defines a template to transform any resource URI to the
descriptor document URI. This can be done by adding a prefix or
suffix to the resource URI, which turns it into a new resource URI.
The new URI points to the descriptor document. For example, to fetch
the descriptor document for http://example.com/resource, the consumer
makes an HTTP GET request to http://example.com/resource;about using
a static template that adds the ';about' suffix.
[-] Self Declaration - does not address.
[+] Direct Descriptor Access - creates a unique URI for the
descriptor document.
Hammer-Lahav Expires July 13, 2009 [Page 23]
Internet-Draft Resource Discovery January 2009
[+-] Web Architecture Compliant - uses basic HTTP facilities but
intrudes on the domain authority namespace as it defines a static
template for URI transformation that is not likely to be
compatible with many existing URI naming conventions.
[+-] Scale and Technology Agnostic - depending on the static mapping
chosen. Some hosted environment will have a problem gaining
access to the mapped URI based on the URI format chosen.
[-] Extensible - provides a very specific and limited method to map
between resources and their descriptor, since each relationship
type must mint its own static template.
Minimum roundtrips to retrieve the resource descriptor: 1
Appendix A.2.10. Dynamic Resource URI Transformation
Same as the Static Resource URI Transformation method but with the
ability for each domain authority to specify its own discovery
transformation template. This can done by placing a configuration
file at a known location (such as robots.txt) which contains the
template needed to perform the URL mapping. The consumer first
obtains the configuration document (which may be cached using normal
HTTP facilities), parses it, then uses that information to transform
the resource URI and access the descriptor document.
[+-] Self Declaration - does not address individual resources, but
allows entire domains to declare their support (and how to use
it).
[+-] Direct Descriptor Access - once the mapping template has been
obtained, descriptors can be accessed directly.
[+-] Web Architecture Compliant - uses an existing known-location
design pattern (such as robots.txt) and standard HTTP facilities.
The use of a known-location if not ideal and is considered a
violation of web architecture but if it serves as the last of its
kind, can be tolerated. An alternative to the known-location
approach can be using DNS to store either the location of the
mapping or the map template itself, but DNS adds a layer of
complexity not always available.
[+-] Scale and Technology Agnostic - works well at the URI authority
level (domain) but is inefficient at the URI path level (resource
path) and harder to implement when different paths within the same
domain need to use different templates. With the decreasing cost
of custom domains and sub-domains hosting, this will not be an
issue for most services, but it does require sharing configuration
Hammer-Lahav Expires July 13, 2009 [Page 24]
Internet-Draft Resource Discovery January 2009
at the domain/sub-domain level.
[+-] Extensible - can be, depending on the schema used to format the
known-location configuration document.
Minimum roundtrips to retrieve the resource descriptor: initially 2,
1 after caching
Appendix B. Acknowledgments
With the exception of the Site-Meta template extension, very little
of this memo is original work. Many communities and individuals have
been working on solving discovery for many years and this work is a
direct result of their hard and dedicated efforts.
Inspiration for this memo derived from previous work on a descriptor
format called XRDS-Simple, which in turn derived from another
descriptor format, XRDS. Previous discovery workflows include Yadis
which is currently used by the OpenID community. While suffering
from significant shortcomings, Yadis was a breakthrough approach to
performing discovery using extremely restricted hosting environments,
and this memo has strived to preserve as much of that spirit as
possible.
The use of Link elements and headers and the introduction of the
'describedby' relationship type in this memo is a direct result of
the dedicated work and contribution of Phil Archer to the W3C POWDER
specification and Jonathan Rees to the W3C review of Uniform Access
to Information About. The Site-Meta approach was first proposed by
Mark Nottingham as an alternative to attaching links directly to
resource representations.
The author wishes to thanks the OASIS XRI community for their
support, encouragement, and enthusiasm for this work. Special thanks
go to Lisa Dusseault, Mark Nottingham, Drummond Reed, John Panzer,
and Joseph Holsten for their invaluable feedback.
The author takes all responsibility for errors and omissions.
12. References
12.1. Normative References
[]
Nottingham, M., "Link Relations and HTTP Header Linking",
draft-nottingham-http-link-header-03 (work in progress),
Hammer-Lahav Expires July 13, 2009 [Page 25]
Internet-Draft Resource Discovery January 2009
November 2008.
[I-D.nottingham-site-meta]
Nottingham, M. and E. Hammer-Lahav,
"draft-nottingham-site-meta-00",
draft-nottingham-site-meta-00 (work in progress),
October 2008.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", STD 66,
RFC 3986, January 2005.
[RFC4287] Nottingham, M., Ed. and R. Sayre, Ed., "The Atom
Syndication Format", RFC 4287, December 2005.
[RFC4918] Dusseault, L., "HTTP Extensions for Web Distributed
Authoring and Versioning (WebDAV)", RFC 4918, June 2007.
[W3C.REC-html401-19991224]
Jacobs, I., Hors, A., and D. Raggett, "HTML 4.01
Specification", World Wide Web Consortium
Recommendation REC-html401-19991224, December 1999,
<http://www.w3.org/TR/1999/REC-html401-19991224>.
12.2. Informative References
[ARK] Kunze, J. and R. Rodgers, "The ARK Identifier Scheme",
<http://www.cdlib.org/inside/diglib/ark/arkspec.html>.
[Discovery and HTTP]
Hammer-Lahav, E., "Discovery and HTTP", <http://
www.hueniverse.com/hueniverse/2008/09/
discovery-and-h.html>.
[POWDER] Archer, P., Ed., Smith, K., Ed., and A. Perego, Ed.,
"POWDER: Protocol for Web Description Resources",
<http://www.w3.org/TR/powder-dr/>.
[URIQA] Nokia, "The URI Query Agent Model",
<http://sw.nokia.com/uriqa/URIQA.html>.
Hammer-Lahav Expires July 13, 2009 [Page 26]
Internet-Draft Resource Discovery January 2009
[Uniform Access]
Rees, J., "Uniform Access to Information About",
<http://w3.org/2001/tag/doc/more-uniform-access.html>.
[XRD] Hammer-Lahav, E., Ed., "XRD 1.0".
[XRDS] Wachob, G., Reed, D., Chasen, L., Tan, W., and S.
Churchill, "Extensible Resource Identifier (XRI)
Resolution V2.0", <http://docs.oasis-open.org/xri/2.0/
specs/xri-resolution-V2.0.html>.
[XRDS-Simple]
Hammer-Lahav, E., "XRDS-Simple 1.0",
<http://xrds-simple.net/core/1.0/>.
[Yadis] Miller, J., "Yadis Specification 1.0",
<http://yadis.org/papers/yadis-v1.0.pdf>.
Author's Address
Eran Hammer-Lahav
Yahoo!
Email: eran@hueniverse.com
URI: http://hueniverse.com
Hammer-Lahav Expires July 13, 2009 [Page 27]