Network Working Group A. Bryan
Internet-Draft N. McNab
Intended status: Standards Track H. Nordstrom
Expires: April 16, 2010
A. Ford
Roke Manor Research
October 13, 2009
Metalink/HTTP: Mirrors and Checksums in HTTP Headers
draft-bryan-metalinkhttp-09
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 16, 2010.
Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents in effect on the date of
publication of this document (http://trustee.ietf.org/license-info).
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document.
Bryan, et al. Expires April 16, 2010 [Page 1]
Internet-Draft Metalink/HTTP: Mirrors and Checksums October 2009
Abstract
This document specifies Metalink/HTTP: Mirrors and Checksums in HTTP
Headers, an alternative to the Metalink XML-based download
description format. Metalink/HTTP describes multiple download
locations (mirrors), Peer-to-Peer, checksums, digital signatures, and
other information using existing standards for HTTP headers. Clients
can transparently use this information to make file transfers more
robust and reliable.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Operation Overview . . . . . . . . . . . . . . . . . . . . 4
1.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3. Notational Conventions . . . . . . . . . . . . . . . . . . 4
2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. Mirrors / Multiple Download Locations . . . . . . . . . . . . 5
3.1. Mirror Priority . . . . . . . . . . . . . . . . . . . . . 6
3.2. Mirror Geographical Location . . . . . . . . . . . . . . . 6
3.3. Coordinated Mirror Policies . . . . . . . . . . . . . . . 6
3.4. Mirror Depth . . . . . . . . . . . . . . . . . . . . . . . 7
4. Peer-to-Peer / Metainfo . . . . . . . . . . . . . . . . . . . 7
4.1. Metalink/XML Files . . . . . . . . . . . . . . . . . . . . 7
5. OpenPGP Signatures . . . . . . . . . . . . . . . . . . . . . . 8
6. Checksums of Whole Files . . . . . . . . . . . . . . . . . . . 8
7. Client / Server Multi-source Download Interaction . . . . . . 8
7.1. Error Prevention, Detection, and Correction . . . . . . . 10
7.1.1. Error Prevention (Early File Mismatch Detection) . . . 11
7.1.2. Error Correction . . . . . . . . . . . . . . . . . . . 12
8. Multi-server Performance . . . . . . . . . . . . . . . . . . . 12
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13
10. Security Considerations . . . . . . . . . . . . . . . . . . . 13
10.1. URIs and IRIs . . . . . . . . . . . . . . . . . . . . . . 13
10.2. Spoofing . . . . . . . . . . . . . . . . . . . . . . . . . 13
10.3. Cryptographic Hashes . . . . . . . . . . . . . . . . . . . 14
10.4. Signing . . . . . . . . . . . . . . . . . . . . . . . . . 14
11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14
11.1. Normative References . . . . . . . . . . . . . . . . . . . 14
11.2. Informative References . . . . . . . . . . . . . . . . . . 15
Appendix A. Acknowledgements and Contributors . . . . . . . . . . 15
Appendix B. Comparisons to Similar Options (to be removed by
RFC Editor before publication) . . . . . . . . . . . 15
Appendix C. Document History (to be removed by RFC Editor
before publication) . . . . . . . . . . . . . . . . . 16
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17
Bryan, et al. Expires April 16, 2010 [Page 2]
Internet-Draft Metalink/HTTP: Mirrors and Checksums October 2009
1. Introduction
Metalink/HTTP is an alternative representation of Metalink
information, which is usually presented as an XML-based document
format [draft-bryan-metalink]. Metalink/HTTP attempts to provide as
much functionality as the Metalink/XML format by using existing
standards such as Web Linking [draft-nottingham-http-link-header],
Instance Digests in HTTP [RFC3230], and ETags. Metalink/HTTP is used
to list information about a file to be downloaded. This can include
lists of multiple URIs (mirrors), Peer-to-Peer information,
checksums, and digital signatures.
Identical copies of a file are frequently accessible in multiple
locations on the Internet over a variety of protocols (such as FTP,
HTTP, and Peer-to-Peer). In some cases, users are shown a list of
these multiple download locations (mirrors) and must manually select
a single one on the basis of geographical location, priority, or
bandwidth. This distributes the load across multiple servers, and
should also increase throughput and resilience. At times, however,
individual servers can be slow, outdated, or unreachable, but this
can not be determined until the download has been initiated. Users
will rarely have sufficient information to choose the most
appropriate server, and will often choose the first in a list which
may not be optimal for their needs, and will lead to a particular
server getting a disproportionate share of load. The use of
suboptimal mirrors can lead to the user canceling and restarting the
download to try to manually find a better source. During downloads,
errors in transmission can corrupt the file. There are no easy ways
to repair these files. For large downloads this can be extremely
troublesome. Any of the number of problems that can occur during a
download lead to frustration on the part of users.
Some popular sites automate the process of selecting mirrors using
DNS load balancing, both to approximately balance load between
servers, and to direct clients to nearby servers with the hope that
this improves throughput. Indeed, DNS load balancing can balance
long-term server load fairly effectively, but it is less effective at
delivering the best throughput to users when the bottleneck is not
the server but the network.
This document describes a mechanism by which the benefit of mirrors
can be automatically and more effectively realized. All the
information about a download, including mirrors, checksums, digital
signatures, and more can be transferred in coordinated HTTP Headers.
This Metalink transfers the knowledge of the download server (and
mirror database) to the client. Clients can fallback to other
mirrors if the current one has an issue. With this knowledge, the
client is enabled to work its way to a successful download even under
Bryan, et al. Expires April 16, 2010 [Page 3]
Internet-Draft Metalink/HTTP: Mirrors and Checksums October 2009
adverse circumstances. All this is done transparently to the user
and the download is much more reliable and efficient. In contrast, a
traditional HTTP redirect to a mirror conveys only extremely minimal
information - one link to one server, and there is no provision in
the HTTP protocol to handle failures. Furthermore, in order to
provide better load distribution across servers and potentially
faster downloads to users, Metalink/HTTP facilitates multi-source
downloads, where portions of a file are downloaded from multiple
mirrors (and optionally, Peer-to-Peer) simultaneously.
[[ Discussion of this draft should take place on IETF HTTP WG mailing
list at ietf-http-wg@w3.org or the Metalink discussion mailing list
located at metalink-discussion@googlegroups.com. To join the list,
visit http://groups.google.com/group/metalink-discussion . ]]
1.1. Operation Overview
Detailed discussion of Metalink operation is covered in Section 2;
this section will present a very brief, high-level overview of how
Metalink achieves its goals.
Upon connection to a Metalink/HTTP server, a client will receive
information about other sources of the same resource and a checksum
of the whole resource. The client will then be able to request
chunks of the file from the various sources, scheduling appropriately
in order to maximise the download rate.
1.2. Examples
A brief Metalink server response with checksum, mirrors, .metalink,
and OpenPGP signature:
Link: <http://www2.example.com/example.ext>; rel="duplicate"
Link: <ftp://ftp.example.com/example.ext>; rel="duplicate"
Link: <http://example.com/example.ext.torrent>; rel="describedby";
type="application/x-bittorrent"
Link: <http://example.com/example.ext.metalink>; rel="describedby";
type="application/metalink4+xml"
Link: <http://example.com/example.ext.asc>; rel="describedby";
type="application/pgp-signature"
Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=
1.3. Notational Conventions
This specification describes conformance of Metalink/HTTP.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
Bryan, et al. Expires April 16, 2010 [Page 4]
Internet-Draft Metalink/HTTP: Mirrors and Checksums October 2009
document are to be interpreted as described in BCP 14, [RFC2119], as
scoped to those conformance targets.
2. Requirements
In this context, "Metalink" refers to Metalink/HTTP which consists of
mirrors and checksums in HTTP Headers as described in this document.
"Metalink/XML" refers to the XML format described in
[draft-bryan-metalink].
Metalink servers are HTTP servers that use the Link header
[draft-nottingham-http-link-header] to present a list of mirrors of a
resource to a client. They MUST provide checksums of files via
Instance Digests in HTTP [RFC3230], whether requested or not. Mirror
and checksum information provided by the originating Metalink server
MUST be considered authoritative. Metalink servers and their
associated mirror servers MUST all share the same ETag policy (ETag
Synchronization), i.e. based on the file contents (checksum) and not
server-unique filesystem metadata. The emitted ETag MAY be
implemented the same as the Instance Digest for simplicity.
Mirror servers are typically FTP or HTTP servers that "mirror"
another server. That is, they provide identical copies of (at least
some) files that are also on the mirrored server. Mirror servers MAY
be Metalink servers. Mirror servers MUST support serving partial
content. Mirror servers SHOULD support Instance Digests in HTTP
[RFC3230]. HTTP mirror servers MUST share the same ETag policy as
the originating Metalink server.
Metalink clients use the mirrors provided by a Metalink server with
Link header [draft-nottingham-http-link-header]. Metalink clients
MUST support HTTP and MAY support FTP, BitTorrent, or other download
methods. Metalink clients MUST switch downloads from one mirror to
another if the mirror becomes unreachable. Metalink clients SHOULD
support multi-source, or parallel, downloads, where portions of a
file are downloaded from multiple mirrors simultaneously (and
optionally, from Peer-to-Peer sources). Metalink clients MUST
support Instance Digests in HTTP [RFC3230] by requesting and
verifying checksums. Metalink clients MAY make use of digital
signatures if they are offered.
3. Mirrors / Multiple Download Locations
Mirrors are specified with the Link header
[draft-nottingham-http-link-header] and a relation type of
"duplicate" as defined in Section 9.
Bryan, et al. Expires April 16, 2010 [Page 5]
Internet-Draft Metalink/HTTP: Mirrors and Checksums October 2009
A brief Metalink server response with two mirrors only:
Link: <http://www2.example.com/example.ext>; rel="duplicate";
pri=1; pref=1
Link: <ftp://ftp.example.com/example.ext>; rel="duplicate";
pri=2; geo="gb"; depth=1
[[Some organizations have many mirrors. Only send a few mirrors, or
only use the Link header if Want-Digest is used?]]
It is up to the server to choose how many Link headers to send. Such
a decision could be a hard-coded limit, a random selection, based on
file size, or based on server load.
3.1. Mirror Priority
Mirror servers are listed in order of priority (from most preferred
to least) or have a "pri" value, where mirrors with lower values are
used first.
This is purely an expression of the server's preferences; it is up to
the client what it does with this information, particularly with
reference to how many servers to use at any one time. A client MUST
respect the server's priority ordering, however.
[[Would it make more sense to use qvalue-style policies here, i.e.
q=1.0 through q=0.0 ?]]
3.2. Mirror Geographical Location
Mirror servers MAY have a "geo" value, which is a [ISO3166-1] alpha-2
two letter country code for the geographical location of the physical
server the IRI is used to access. A client may use this information
to select a mirror, or set of mirrors, that are geographically near
(if the client has access to such information), with the aim of
reducing network load at inter-country bottlenecks.
3.3. Coordinated Mirror Policies
There are two types of mirror servers: preferred and normal.
Optimally, HTTP mirror servers will share the same ETag policy as the
Metalink server, provide Instance Digests, or both. These mirrors
are preferred, and make it possible to detect early on, before data
is transferred, if the file requested matches the desired file.
Preferred mirror servers MUST share the same ETag policy or MUST
support Instance Digests. Preferred HTTP mirror servers have a
"pref" value of 1.
Bryan, et al. Expires April 16, 2010 [Page 6]
Internet-Draft Metalink/HTTP: Mirrors and Checksums October 2009
[[Suggestion to relax earlier MUSTs: In order for clients to identify
servers that have coordinated ETag policies, the ETag MUST begin with
"Metalink:", e.g.
ETag: "Metalink:SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5="
]]
3.4. Mirror Depth
Some mirrors may mirror single files, whole directories, or multiple
directories.
Mirror servers MAY have a "depth" value, where "depth=0" is the
default. A value of 0 means ONLY that file is mirrored. A value of
1 means that file and all other files and subdirectories in the
directory are mirrored. A value of 2 means the directory above, and
all files and subdirectories, are mirrored.
A mirror with a depth value of 4:
Link: <http://www2.example.com/dir1/dir2/dir3/dir4/dir5/example.ext>;
rel="duplicate"; pri=1; pref=1; depth=4
Is the above example, 4 directories up are mirrored, from /dir2/ on
down.
4. Peer-to-Peer / Metainfo
Metainfo files, which describe ways to download a file over Peer-to-
Peer networks or otherwise, are specified with the Link header
[draft-nottingham-http-link-header] and a relation type of
"describedby" and a type parameter that indicates the MIME type of
the metadata available at the IRI.
A brief Metalink server response with .torrent and .metalink:
Link: <http://example.com/example.ext.torrent>; rel="describedby";
type="application/x-bittorrent"
Link: <http://example.com/example.ext.metalink>; rel="describedby";
type="application/metalink4+xml"
4.1. Metalink/XML Files
Full Metalink/XML files for a given resource can be specified as
shown in Section 4. This is particularly useful for providing
metadata such as checksums of chunks, allowing a client to recover
Bryan, et al. Expires April 16, 2010 [Page 7]
Internet-Draft Metalink/HTTP: Mirrors and Checksums October 2009
from partial errors (see Section 7.1.2).
5. OpenPGP Signatures
OpenPGP signatures are specified with the Link header
[draft-nottingham-http-link-header] and a relation type of
"describedby" and a type parameter of "application/pgp-signature".
A brief Metalink server response with OpenPGP signature only:
Link: <http://example.com/example.ext.asc>; rel="describedby";
type="application/pgp-signature"
6. Checksums of Whole Files
Metalink servers MUST provide Instance Digests in HTTP [RFC3230] for
files they describe with mirrors. Mirror servers SHOULD as well.
A brief Metalink server response with checksum:
Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=
7. Client / Server Multi-source Download Interaction
Metalink clients begin a download with a standard HTTP [RFC2616] GET
request to the Metalink server. A Range limit is optional, not
required.
GET /distribution/example.ext HTTP/1.1
Host: www.example.com
The Metalink server responds with the data and these headers:
Bryan, et al. Expires April 16, 2010 [Page 8]
Internet-Draft Metalink/HTTP: Mirrors and Checksums October 2009
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 14867603
Content-Type: application/x-cd-image
Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5="
Link: <http://www2.example.com/example.ext>; rel="duplicate"
Link: <ftp://ftp.example.com/example.ext>; rel="duplicate"
Link: <http://example.com/example.ext.torrent>; rel="describedby";
type="application/x-bittorrent"
Link: <http://example.com/example.ext.metalink>; rel="describedby";
type="application/metalink4+xml"
Link: <http://example.com/example.ext.asc>; rel="describedby";
type="application/pgp-signature"
Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=
From the Metalink server response the client learns some or all of
the following metadata about the requested object, in addition to
also starting to receive the object:
o Mirror profile link.
o Instance Digest.
o Object size.
o ETag.
o Peer-to-peer information.
o Digital signature.
o Metalink/XML, which can include partial file checksums to repair a
file.
(Alternatively, the client could have requested a HEAD only, and then
skipped to making the following decisions on every available mirror
server found via the Link headers)
If the object is large and gets delivered slower than expected then
the Metalink client starts a number of parallel ranged downloads (one
per selected mirror server other than the first) using mirrors
provided by the Link header with "duplicate" relation type, using the
location of the original GET request in the "Referer" header field.
The size and number of ranges requested from each server is for the
client to decide, based upon the performance observed from each
server. Further discussion of performance considerations is
presented in Section 8.
If no Range limit was given in the original request then work from
the tail of the object (the first request is still running and will
eventually catch up), otherwise continue after the range requested in
the first request. If no Range was provided, the original connection
must be terminated once all parts of the resource have been
retrieved. It is recommended that a HEAD request is undertaken
Bryan, et al. Expires April 16, 2010 [Page 9]
Internet-Draft Metalink/HTTP: Mirrors and Checksums October 2009
first, so that the client can find out if there are any Link headers,
and then Range-based requests are undertaken to the mirror servers as
well as on the original connection.
If ETags are coordinated between mirrors, If-Match conditions based
on the ETag SHOULD be used to quickly detect out-of-date mirrors by
using the ETag from the Metalink server response. If no indication
of ETag syncronisation/knowledge is given then If-Match should not be
used, and optimally there will be an Instance Digest in the mirror
response which we can use to detect a mismatch early, and if not then
a mismatch won't be detected until the completed object is verified.
One of the client requests to a mirror server:
GET /example.ext HTTP/1.1
Host: www2.example.com
Range: bytes=7433802-
If-Match: "thvDyvhfIqlvFe+A9MYgxAfm1q5="
Referer: http://www.example.com/distribution/example.ext
The mirror servers respond with a 206 Partial Content HTTP status
code and appropriate "Content-Length" and "Content Range" header
fields. The mirror server response, with data, to the above request:
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 7433801
Content-Range: bytes 7433802-14867602/14867603
Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5="
Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=
If the first request was not Range limited then abort it by closing
the connection when it catches up with the other parallel downloads
of the same object.
Downloads from mirrors that do not have the same file size as the
Metalink server MUST be aborted.
Once the download has completed, the Metalink client MUST verify the
checksum of the file.
7.1. Error Prevention, Detection, and Correction
Error prevention, or early file mismatch detection, is possible
before file transfers with the use of file sizes, ETags, and Instance
Digests. Error dectection requires Instance Digests, or checksums,
to determine after transfers if there has been an error. Error
correction, or download repair, is possible with partial file
checksums.
Bryan, et al. Expires April 16, 2010 [Page 10]
Internet-Draft Metalink/HTTP: Mirrors and Checksums October 2009
7.1.1. Error Prevention (Early File Mismatch Detection)
In HTTP terms, the requirement is that merging of ranges from
multiple responses must be verified with a strong validator, which in
this context is the same as either Instance Digest or a strong ETag.
In most cases it is sufficient that the Metalink server provides
mirrors and Instance Digest information, but operation will be more
robust and efficient if the mirror servers do implement a
synchronized ETag as well. In fact, the emitted ETag may be
implemented the same as the Instance Digest for simplicity, but there
is no need to specify how the ETag is generated, just that it needs
to be shared among the mirror servers. If the mirror server provides
neither synchronized ETag or Instance Digest, then early detection of
mismatches is not possible unless file length also differs. Finally,
the error is still detectable, after the download has completed, when
the merged response is verified.
ETag can not be used for verifying the integrity of the received
content. But it is a guarantee issued by the Metalink server that
the content is correct for that ETag. And if the ETag given by the
mirror server matches the ETag given by the master server, then we
have a chain of trust where the master server authorizes these
responses as valid for that object.
This guarantees that a mismatch will be detected by using only the
synchronized ETag from a master server and mirror server, even
alerted by the mirror servers themselves by responding with an error,
preventing accidental merges of ranges from different versions of
files with the same name. This even includes many malicious attacks
where the data on the mirror has been replaced by some other file,
but not all.
Synchronized ETag can not strictly protect against malicious attacks
or server or network errors replacing content, but neither can
Instance Digest on the mirror servers as the attacker most certainly
can make the server seemingly respond with the expected Instance
Digest even if the file contents have been modified, just as he can
with ETag, and the same for various system failures also causing bad
data to be returned. The Metalink client has to rely on the Instance
Digest returned by the Metalink master server in the first response
for the verification of the downloaded object as a whole.
If the mirror servers do return an Instance Digest, then that is a
bonus, just as having them return the right set of Link headers is.
The set of trusted mirrors doing that can be substituted as master
servers accepting the initial request if one likes.
The benefit of having slave mirror servers (those not trusted as
Bryan, et al. Expires April 16, 2010 [Page 11]
Internet-Draft Metalink/HTTP: Mirrors and Checksums October 2009
masters) return Instance Digest is that the client then can detect
mismatches early even if ETag is not used. Both ETag and slave
mirror Instance Digest do provide value, but just one is sufficient
for early detection of mismatches. If none is provided then early
detection of mismatches is not possible unless the file length also
differs, but the error is still detected when the merged response is
verified.
7.1.2. Error Correction
If the object checksum does not match the Instance Digest then fetch
the Metalink/XML or other recovery profile link, where partial file
checksums can be found, allowing detection of which server returned
bad information. If the Instance Digest computation does not match
then the client needs to fetch the partial file checksums and from
there figure out what of the downloaded data can be recovered and
what needs to be fetched again. If no partial checksums are
available, then the client MUST fetch the complete object from a
trusted Metalink server.
Partial file checksums can be used to detect errors during the
download.
8. Multi-server Performance
When opting to download simultaneously from multiple mirrors, there
are a number of factors (both within and outside the influence of the
client software) that are relevant to the performance achieved:
o The number of servers used simultaneously.
o The ability to pipeline sufficient or sufficiently large range
requests to each server so as to avoid connections going idle.
o The ability to pipeline sufficiently few or sufficiently small
range requests to servers so that all the servers finish their
final chunks simultaneously.
o The ability to switch between mirrors dynamically so as to use the
fastest mirrors at any moment in time
Obviously we do not want to use too many simultaneous connections, or
other traffic sharing a bottleneck link will be starved. But at the
same time, good performance requires that the client can
simultaneously download from at least one fast mirror while exploring
whether any other mirror is faster. Based on laboratory experiments,
we suggest a good default number of simultaneous connections is
probably four, with three of these being used for the best three
mirrors found so far, and one being used to evaluate whether any
other mirror might offer better performance.
Bryan, et al. Expires April 16, 2010 [Page 12]
Internet-Draft Metalink/HTTP: Mirrors and Checksums October 2009
The size of chunks chosen by the client should be sufficiently large
that the chunk request headers and reponse headers represent neglible
overhead, and sufficiently large that they can be pipelined
effectively without needing a very high rate of chunk requests. At
the same time, the amount of time wasted waiting for the last chunk
to download from the last server after all the other servers have
finished should be minimized. Thus we currently recommend that a
chunk size of at least 10KBytes should be used. If the file being
transfered is very large, or the download speed very high, this can
be increased to perhaps 1MByte. As network bandwidths increase, we
expect these numbers to increase appropriately, so that the time to
transfer a chunk remains significantly larger than the latency of
requesting a chunk from a server.
9. IANA Considerations
Accordingly, IANA has made the following registration to the Link
Relation Type registry.
o Relation Name: duplicate
o Description: Refers to a resource whose available representations
are byte-for-byte identical with the corresponding representations of
the context IRI.
o Reference: This specification.
o Notes: This relation is for static resources. That is, an HTTP GET
request on any duplicate will return the same representation. It
does not make sense for dynamic or POSTable resources and should not
be used for them.
10. Security Considerations
10.1. URIs and IRIs
Metalink clients handle URIs and IRIs. See Section 7 of [RFC3986]
and Section 8 of [RFC3987] for security considerations related to
their handling and use.
10.2. Spoofing
There is potential for spoofing attacks where the attacker publishes
Metalinks with false information. In that case, this could deceive
unaware downloaders that they are downloading a malicious or
worthless file. Also, malicious publishers could attempt a
Bryan, et al. Expires April 16, 2010 [Page 13]
Internet-Draft Metalink/HTTP: Mirrors and Checksums October 2009
distributed denial of service attack by inserting unrelated IRIs into
Metalinks.
10.3. Cryptographic Hashes
Currently, some of the digest values defined in Instance Digests in
HTTP [RFC3230] are considered insecure. These include the whole
Message Digest family of algorithms which are not suitable for
cryptographically strong verification. Malicious people could
provide files that appear to be identical to another file because of
a collision, i.e. the weak cryptographic hashes of the intended file
and a substituted malicious file could match.
If a Metalink contains whole file hashes as described in Section 6,
it SHOULD include "sha" which is SHA-1, as specified in [RFC3174], or
stronger. It MAY also include other hashes.
10.4. Signing
Metalinks should include digital signatures, as described in
Section 5.
Digital signatures provide authentication, message integrity, and
non-repudiation with proof of origin.
11. References
11.1. Normative References
[ISO3166-1]
International Organization for Standardization, "ISO 3166-
1:2006. Codes for the representation of names of
countries and their subdivisions -- Part 1: Country
codes", November 2006.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
[RFC3174] Eastlake, D. and P. Jones, "US Secure Hash Algorithm 1
(SHA1)", RFC 3174, September 2001.
[RFC3230] Mogul, J. and A. Van Hoff, "Instance Digests in HTTP",
RFC 3230, January 2002.
Bryan, et al. Expires April 16, 2010 [Page 14]
Internet-Draft Metalink/HTTP: Mirrors and Checksums October 2009
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", STD 66,
RFC 3986, January 2005.
[RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource
Identifiers (IRIs)", RFC 3987, January 2005.
[draft-nottingham-http-link-header]
Nottingham, M., "Web Linking",
draft-nottingham-http-link-header-06 (work in progress),
July 2009.
11.2. Informative References
[draft-bryan-metalink]
Bryan, A., Ed., Tsujikawa, T., McNab, N., and P. Poeml,
"The Metalink Download Description Format",
draft-bryan-metalink-16 (work in progress), August 2009.
Appendix A. Acknowledgements and Contributors
Thanks to the Metalink community, Mark Handley, Mark Nottingham,
Daniel Stenberg, Tatsuhiro Tsujikawa, Peter Poeml, and Matt Domsch.
Support for simultaneous download from multiple mirrors is based upon
work by Mark Handley and Javier Vela Diago, who also provided
validation of the benefits of this approach.
Appendix B. Comparisons to Similar Options (to be removed by RFC Editor
before publication)
This draft, compared to the Metalink/XML format
[draft-bryan-metalink] :
o (+) Reuses existing HTTP standards without much new besides a Link
Relation Type. It's more of a collection/coordinated feature set.
o (?) The existing standards don't seem to be widely implemented.
o (+) No XML dependency, unless we use Metalink/XML for partial file
checksums.
o (+) Existing Metalink/XML clients can be easily converted to
support this as well.
o (+) Coordination of mirror servers is preferred, but not required.
Coordination may be difficult or impossible unless you are in
control of all servers on the mirror network.
Bryan, et al. Expires April 16, 2010 [Page 15]
Internet-Draft Metalink/HTTP: Mirrors and Checksums October 2009
o (---) Requires changes to server software.
o (-?) Tied to HTTP, not as generic. FTP/P2P clients won't be
using it unless they also support HTTP, unlike Metalink/XML.
o (-) Requires server-side support. Metalink/XML can be created by
user (or server, but server component/changes not required).
o (-) Also, Metalink/XML files are easily mirrored on all servers.
Even if usage in that case is not as transparent, it still gives
access to users at all mirrors (FTP included) to all download
information with no changes needed to the server.
o (-) Not portable/archivable/emailable. Metalink/XML is used to
import/export transfer queues. Not as easy for search engines to
index?
o (-) No way to show mirror geographical location (yet).
o (-) Not as rich metadata.
o (-) Not able to add multiple files to a download queue or create
directory structure.
Appendix C. Document History (to be removed by RFC Editor before
publication)
[[ to be removed by the RFC editor before publication as an RFC. ]]
Known issues concerning this draft:
o Use of Link header to describe Mirrors. Only send a few mirrors
with Link header, or only send them if Want-Digest is used? Some
organizations have many mirrors.
o ADDRESSED IN THIS DRAFT: A way to differentiate between mirrors
that have synchronized ETags and those that don't.
o ADDRESSED IN THIS DRAFT: Do we want a way to show that whole
directories are mirrored, instead of individual files?
o Will we use Metalink/XML for partial file checksums? That would
add XML dependency to apps for an important feature.
o Do we need an official MIME type for .torrent files or allow
"application/x-bittorrent"?
-09 : October 12, 2009.
o Mirror location, coordination, and depth.
o Split HTTP Digest Algorithm Values Registration into
draft-bryan-http-digest-algorithm-values-update.
-08 : October 4, 2009.
o Clarifications.
-07 : September 29, 2009.
o Preferred mirror servers.
-06 : September 24, 2009.
Bryan, et al. Expires April 16, 2010 [Page 16]
Internet-Draft Metalink/HTTP: Mirrors and Checksums October 2009
o Add Mismatch Detection, Error Recovery, and Digest Algorithm
values.
o Remove Content-MD5 and Want-Digest.
-05 : September 19, 2009.
o ETags, preferably matching the Instance Digests.
-04 : September 17, 2009.
o Temporarily remove .torrent.
-03 : September 16, 2009.
o Mention HEAD request, negotiate mirrors if Want-Digest is used.
-02 : September 6, 2009.
o Content-MD5 for partial file checksums.
-01 : September 1, 2009.
o Link Relation Type Registration: "duplicate"
-00 : August 24, 2009.
o Initial draft.
Authors' Addresses
Anthony Bryan
Pompano Beach, FL
USA
Email: anthonybryan@gmail.com
URI: http://www.metalinker.org
Neil McNab
Email: neil@nabber.org
URI: http://www.nabber.org
Henrik Nordstrom
Email: henrik@henriknordstrom.net
URI: http://www.henriknordstrom.net/
Bryan, et al. Expires April 16, 2010 [Page 17]
Internet-Draft Metalink/HTTP: Mirrors and Checksums October 2009
Alan Ford
Roke Manor Research
Old Salisbury Lane
Romsey, Hampshire SO51 0ZN
UK
Phone: +44 1794 833 465
Email: alan.ford@roke.co.uk
Bryan, et al. Expires April 16, 2010 [Page 18]