Network Working Group M. Nottingham
Internet-Draft E. Hammer-Lahav
Intended status: Informational February 10, 2009
Expires: August 14, 2009
Host Metadata for the Web
draft-nottingham-site-meta-01
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on August 14, 2009.
Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document.
Abstract
This memo describes a method for locating host-specific metadata for
the Web.
Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 1]
Internet-Draft Host Metadata for the Web February 2009
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Notational Conventions . . . . . . . . . . . . . . . . . . . . 3
3. The host-meta File Format . . . . . . . . . . . . . . . . . . 4
3.1. The Link host-meta Field . . . . . . . . . . . . . . . . . 5
4. Discovering host-meta Files . . . . . . . . . . . . . . . . . 5
5. Minting New meta-fields . . . . . . . . . . . . . . . . . . . 6
6. Security Considerations . . . . . . . . . . . . . . . . . . . 6
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6
7.1. application/host-meta Media Type Registration . . . . . . 6
7.2. The host-meta Field Registry . . . . . . . . . . . . . . . 7
7.2.1. Registration Template . . . . . . . . . . . . . . . . 8
7.2.2. The Link host-meta field . . . . . . . . . . . . . . . 8
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8
8.1. Normative References . . . . . . . . . . . . . . . . . . . 8
8.2. Informative References . . . . . . . . . . . . . . . . . . 9
Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 9
Appendix B. Frequently Asked Questions . . . . . . . . . . . . . 10
B.1. Is this mechanism appropriate for all kinds of
metadata? . . . . . . . . . . . . . . . . . . . . . . . . 10
B.2. Why not use OPTIONS * with content negotiation to
discover different types of metadata directly? . . . . . . 10
B.3. Why not use a META tag or microformat in the root
resource? . . . . . . . . . . . . . . . . . . . . . . . . 10
B.4. Why not use response headers on the root resource, and
have clients use HEAD? . . . . . . . . . . . . . . . . . . 10
B.5. Why scope metadata to an authority? . . . . . . . . . . . 10
B.6. Why /host-meta? . . . . . . . . . . . . . . . . . . . . . 11
B.7. Aren't you concerned about pre-empting an authority's
URI namespace? . . . . . . . . . . . . . . . . . . . . . . 11
B.8. Why use link relations instead of media types to
identify kinds of metadata? . . . . . . . . . . . . . . . 11
B.9. What impact does this have on existing mechanisms,
such as P3P and robots.txt? . . . . . . . . . . . . . . . 11
B.10. Why not (insert existing similar mechanism here)? . . . . 11
Appendix C. Document History . . . . . . . . . . . . . . . . . . 11
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11
Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 2]
Internet-Draft Host Metadata for the Web February 2009
1. Introduction
It is increasingly common for Web-based protocols to require the
discovery of policy or metadata before making a request. For
example, the Robots Exclusion Protocol specifies a way for automated
processes to obtain permission to access resources; likewise, the
Platform for Privacy Preferences [W3C.REC-P3P-20020416] tells user-
agents how to discover privacy policy beforehand.
While there are several ways to access per-resource metadata (e.g.,
HTTP headers, WebDAV's PROPFIND [RFC4918]), the overhead associated
with them often precludes their use in these scenarios.
When this happens, it is common to designate a "well-known location"
for such metadata, so that it can be easily located. However, this
approach has the drawback of risking collisions, both with other such
designated "well-known locations" and with pre-existing resources.
To address this, this memo proposes a single (and hopefully last)
"well-known location", /host-meta, which acts as a directory to the
interesting metadata about a particular authority. Future mechanisms
that require authority-wide metadata can easily include an entry in
the host-meta resource, thereby making their metadata cheaply
available (indeed, because it can be cached, the more mechanisms that
use it, the more efficient it becomes) without impinging on others'
URI space.
Note that the metadata provided by a host-meta resource is explicitly
scoped to apply to the entire authority (in the URI [RFC3986] sense)
associated with it (using the process described in Section 4); it
does not apply to a subset, nor does it apply to other authorities
(e.g., using another port, or a different hostname in the same
domain). However, individual mechanisms (e.g., a relation type in
the Link field) MAY reduce or expand this scope. This should only be
done after careful consideration of the consequences upon security,
administration, interoperability and network load.
Please discuss this draft on the www-talk@w3.org [1] mailing list.
2. Notational Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
This documnet uses the Augmented Backus-Naur Form (ABNF) notation of
[RFC5234], and explicitly includes the following rules from it: CRLF
Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 3]
Internet-Draft Host Metadata for the Web February 2009
(CR LF), OCTET (any 8-bit sequence of data), DIGIT, ALPHA, and WSP
(white space).
3. The host-meta File Format
The host-meta file format is an extremely simple textual language
that allows an authority to convey metadata about itself and its
resources.
Its syntax is similar to that of HTTP header-fields [RFC2616], but
has a few differences:
o White space is permissible both before and after the block of
fields, and
o fields MUST NOT be folded across multiple lines.
Furthermore, this format's use diverges from HTTP header-fields in a
number of ways:
o The fields are transferred as the message body, not as headers,
and
o rather than being related to a message, the fields in host-meta
pertain to the entire associated authority (see Section 4), and
o the permissible field-names are constrained by the host-meta field
registry. This specification defines one such field, Link.
host-meta = *( WSP / CRLF )
*( meta-field CRLF )
*( WSP / CRLF )
meta-field = field-name ":" [ field-value ]
field-name = 1*tchar
field-value = *( field-content / WSP )
field-content = <field content>
tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*"
/ "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
/ DIGIT / ALPHA
For example,
Link: </robots.txt>; rel="robots"
Link: </w3c/p3p.xml>; rel="privacy"; type="application/p3p.xml"
Link: <http://example.net/example>; rel="http://example.com/rel"
As with HTTP headers, field-names are not case-sensitive,
unrecognised field-names SHOULD be silently ignored when parsing this
format, and ordering of fields SHOULD NOT be considered significant
unless specified otherwise. Additionally, although the syntax does
Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 4]
Internet-Draft Host Metadata for the Web February 2009
not explicitly allow empty lines between fields, parsers SHOULD
silently discard them (i.e., be permissive in what they accept).
Field content is constrained by the specification indicated by its
associated field-name.
3.1. The Link host-meta Field
The "Link" host-meta field uses the syntax of the Link HTTP header-
field [I-D.nottingham-http-link-header] to convey links whose context
is the entire authority, rather than a single resource. For example,
Link: </terms>; rel="license"
indicates that the URI "/terms" refers to a license for all resources
associated with the authority.
The Link host-meta field differs from the Link header in the
following respects:
o Its context is defined as all resources that share its authority,
by default (although this MAY be overridden by a representation
obtained from the indicated resource), and
o When the link URI is relative, its base URI is the root resource
of the authority. For example, in the example above, if the
authority is "example.com", the full link URI would be
"http://example.com/me".
4. Discovering host-meta Files
The metadata for a given authority can be discovered by dereferencing
the path /host-meta on the same authority. For example, for an HTTP
URI [RFC2616], the following request would obtain metadata for the
authority "www.example.com:80";
GET /host-meta HTTP/1.1
Host: www.example.com
The semantics of the protocol used for access to the resource apply.
Therefore, if the resource indicates the client should try a
different request (in HTTP, the 301, 302, 303 or 307 response status
code), the client SHOULD attempt to do so; note that this implies
that the host-meta file for one authority MAY be retrieved from a
different authority. Likewise, if the resource is not available or
existent (in HTTP, the 404 or 410 status code), the client SHOULD
infer that metadata is not available via this mechanism.
Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 5]
Internet-Draft Host Metadata for the Web February 2009
If a representation is successfully obtained, but is not in the
format described above, clients SHOULD infer that the authority is
using this URI for other purposes, and not process it as a host-meta
file.
To aid in this process, authorities using this mechanism SHOULD
correctly label host-meta responses with the "application/host-meta"
internet media type.
5. Minting New meta-fields
Applications that wish to mint new meta-fields for use in the host-
meta format MUST register them in the host-meta field-registry,
following the procedures in Section 7.2. Field-names MUST conform to
the field-name ABNF Section 3, and field-value syntax MUST be well-
defined (e.g., using ABNF, or a reference to the syntax of an
existing header field-value). Field-values SHOULD use the ISO-859-1
character encoding. If a field-value applies to a scope other than
the entire authority, that scope MUST be well-defined.
6. Security Considerations
The metadata returned by the /host-meta resource is presumed to be
under the control of the appropriate authority and representative of
all resources contained by it. If this resource is compromised or
otherwise under the control of another party, it may represent a risk
to the security of the server and data served by it, depending on
what mechanisms use /host-meta.
Scoping metadata to a single authority is the default in host-meta.
Thus "http://example.com/", "https://example.com" and
"http://www.example.com/" all have different host-meta files with
seperate and non-overlapping scopes of applicability. Applications
that change the scope of metadata can incur security risks without
careful consideration.
7. IANA Considerations
7.1. application/host-meta Media Type Registration
The host-meta format can be identified with the following media type:
Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 6]
Internet-Draft Host Metadata for the Web February 2009
MIME media type name: application
MIME subtype name: host-meta
Mandatory parameters: None.
Optional parameters: None.
Encoding considerations: field-values may specify any encoding for
their contents, although it is expected that most will use ISO-
8859-1 or a subset thereof (for both historic and interoperability
purposes).
Security considerations: As defined in this specification. [[update
upon publication]]
Interoperability considerations: There are no known interoperability
issues.
Published specification: This specification. [[update upon
publication]]
Applications which use this media type: No known applications
currently use this media type.
Additional information:
Magic number(s):
File extension: None.
Fragment identifiers: None.
Base URI: None.
Macintosh File Type code: TEXT
Person and email address to contact for further information: Mark
Nottingham <mnot@mnot.net>
Intended usage: COMMON
Author/Change controller: This specification's author(s). [[update
upon publication]]
7.2. The host-meta Field Registry
This document establishes the host-meta field registry as the
namespace of field-names for use in meta-fields. Although some meta-
fields may be similar to message headers, both syntactically and
semantically, the host-meta field registry is separate from the
message header field registry [RFC3864] See Section 5 for details and
requirements for registered meta-fields.
meta-fields may be registered on the advice of a Designated Expert
(appointed by the IESG or their delegate), with a Specification
Required (using terminology from [RFC5226]).
Registration requests consist of the completed registration template
Section 7.2.1, typically published in an RFC or Open Standard (in the
sense described by [RFC2026], section 7). However, to allow for the
allocation of values prior to publication, the Designated Expert may
approve registration once they are satisfied that an RFC (or other
Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 7]
Internet-Draft Host Metadata for the Web February 2009
Open Standard) will be published.
Upon receiving a registration request (usually via IANA), the
Designated Expert should request review and comment from the apps-
discuss mailing list (or a successor designated by the APPS Area
Directors). Before a period of 30 days has passed, the Designated
Expert will either approve or deny the registration request,
communicating this decision both to the review list and to IANA.
Denials should include an explanation and, if applicable, suggestions
as to how to make the request successful.
7.2.1. Registration Template
Field name: The name requested for the new meta-field. This MUST
conform to the host-meta field specification details noted in
Section 3
Change controller: For RFCs, state "IETF". For other open
standards, give the name of the publishing body (e.g., ANSI, ISO,
ITU, W3C, etc.). A postal address, home page URI, telephone and
fax numbers may also be included.
Specification document(s): Reference to document that specifies the
field, preferably including a URI that can be used to retrieve a
copy of the document. An indication of the relevant sections may
also be included, but is not required.
Related information: Optionally, citations to additional documents
containing further relevant information.
7.2.2. The Link host-meta field
This specification registers one host-meta field.
Field name: Link
Change controller: IETF
Specification document(s): [[this document]]
Related information: [I-D.nottingham-http-link-header]
8. References
8.1. Normative References
[]
Nottingham, M., "Link Relations and HTTP Header Linking",
draft-nottingham-http-link-header-03 (work in progress),
November 2008.
[RFC2026] Bradner, S., "The Internet Standards Process -- Revision
3", BCP 9, RFC 2026, October 1996.
Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 8]
Internet-Draft Host Metadata for the Web February 2009
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", STD 66,
RFC 3986, January 2005.
[RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an
IANA Considerations Section in RFCs", BCP 26, RFC 5226,
May 2008.
[RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", STD 68, RFC 5234, January 2008.
8.2. Informative References
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
[RFC3864] Klyne, G., Nottingham, M., and J. Mogul, "Registration
Procedures for Message Header Fields", BCP 90, RFC 3864,
September 2004.
[RFC4918] Dusseault, L., "HTTP Extensions for Web Distributed
Authoring and Versioning (WebDAV)", RFC 4918, June 2007.
[W3C.REC-P3P-20020416]
Marchiori, M., "The Platform for Privacy Preferences 1.0
(P3P1.0) Specification", W3C REC REC-P3P-20020416,
April 2002.
URIs
[1] <http://lists.w3.org/Archives/Public/www-talk/>
Appendix A. Acknowledgements
We would like to acknowledge the contributions of everyone who
provided feedback and use cases for this draft; in particular, Phil
Archer, Dirk Balfanz, Tim Bray, Paul Hoffman, Barry Leiba, Ashok
Malhotra, Breno de Medeiros, and John Panzer. The authors take all
responsibility for errors and omissions.
Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 9]
Internet-Draft Host Metadata for the Web February 2009
Appendix B. Frequently Asked Questions
B.1. Is this mechanism appropriate for all kinds of metadata?
No. The primary use cases are described in the introduction; when
it's necessary to discover metadata or policy before a resource is
accessed, and/or it's necessary to describe metadata for a whole
authority (or large portions of it), host-meta is appropriate. In
other cases (e.g., fine-grained metadata that doesn't need to be
known ahead of time), other mechanisms are more appropriate.
B.2. Why not use OPTIONS * with content negotiation to discover
different types of metadata directly?
Two reasons; a) OPTIONS is not cacheable -- a severe problem for
scaling -- and b) it is not well-supported in browsers, and difficult
to configure in servers.
B.3. Why not use a META tag or microformat in the root resource?
This places constraints on the format of an authority's root resource
to be HTML or similar. While extremely common, it isn't universal
(e.g., mobile sites, machine-to-machine communication, etc.). Also,
some root resources are very large, which would place additional
overhead on clients and intervening networks.
B.4. Why not use response headers on the root resource, and have
clients use HEAD?
The headers on a root resource pertain to that resource, not the
whole site. While it is possible to mint new message headers that
apply to the whole site, such a header would need to be sent on every
response for the root resource, whether it was useful or not, with
the potential for substantially increasing the size of those
responses (which are often popular, and not very cacheable).
B.5. Why scope metadata to an authority?
The alternative is to allow scoping to be dynamic and determined
locally, but this has its own issues, which usually come down to a)
an unreasonable number of requests to determine authoritative
metadata, b) increased complexity, with a higher likelihood of
implementation and interoperability (or even security) problems.
Besides, many mechanisms on the Web already presume a single
authority scope (e.g., robots.txt, P3P, cookies, javascript
security), and the effort and cost required to mint a new URI
authority is small and shrinking.
Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 10]
Internet-Draft Host Metadata for the Web February 2009
B.6. Why /host-meta?
It's short, descriptive and according to search indices, not widely
used.
B.7. Aren't you concerned about pre-empting an authority's URI
namespace?
Yes, but it's unfortunately a necessary (and already present) evil;
this proposal tries to minimise future abuses.
B.8. Why use link relations instead of media types to identify kinds of
metadata?
A link relation declares the intent and use of the link (or inline
content, when present); a media type defines the format and
processing model for those bits.
B.9. What impact does this have on existing mechanisms, such as P3P and
robots.txt?
None, until they choose to use this mechanism.
B.10. Why not (insert existing similar mechanism here)?
We are aware that there are several existing proposals with similar
functionality. In our estimation, none have gained sufficient
traction. This may be because they were perceived to be too complex,
or tied too closely to one use case.
Appendix C. Document History
[[RFC Editor: please remove this section before publication.]]
o -01
* Changed "site-meta" to "host-meta" after feedback.
* Changed from XML to text-based header-like format.
* Remove capability for generic inline content.
* Added registry for host-meta fields.
* Clarified scope of metadata application.
* Added security consideration about HTTP vs. HTTPS, expanding
scope.
Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 11]
Internet-Draft Host Metadata for the Web February 2009
Authors' Addresses
Mark Nottingham
Email: mnot@mnot.net
URI: http://www.mnot.net/
Eran Hammer-Lahav
Email: eran@hueniverse.com
URI: http://hueniverse.com/
Nottingham & Hammer-Lahav Expires August 14, 2009 [Page 12]