INTERNET DRAFT EXPIRES SETP 1998 INTERNET DRAFT
Network Working Group Heiko W.Rupp
Experimental 8.3.1998
A Protocol for the Transmission of Net News Articles
over IP multicast.
<draft-rfced-exp-rupp-04.txt>
Status of This Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its
areas, and its working groups. Note that other groups may also
distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
"work in progress."
To learn the current status of any Internet-Draft, please check
the "1id-abstracts.txt" listing contained in the Internet-
Drafts Shadow Directories on ftp.is.co.za (Africa),
ftp.nordu.net (Europe), munnari.oz.au (Pacific Rim),
ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).
Distribution of this document is unlimited.
Abstract
Mcntp (Multicast News Transfer Protocol) provides a way to use the IP
multicast infrastructure to transmit NetNews articles between news
servers. Doing so will reduce the bandwidth that is actually needed
for transmission of articles which is mostly done via NNTP. This does
not affect how news reading clients communicate with servers.
Overview and Rationale
NetNews are bulk data that are produced in large quantities every day
around the world. Distribution of NetNews on the Internet are usually
distributed with NNTP[1]. In order to get a fast and redundant
distribution many news servers communicate with many others, thus
imposing a higher load on the underlying network than necessary.
Assume the following scenario:
+--------- R1
/
S -- A ------- B -------- R2
\
+--------- R3
A sender S which wants to transmit articles via NNTP to receivers
R1...R3 will thus transmit them three times across the link from
A to B.
With IP Multicast[2], an efficient way to distribute datagrams to
groups of users exists in the Internet. Thus articles would traverse
the link A to B only once, thus reducing load on that link.
This cannot be done with existing news transfer technology, as it is
Rupp [Page 1]
based on TCP[10] which cannot be multicasted. The protocol described
in this memo is designed to put news articles into datagrams and
distribute them via IP multicast to receivers that are interested in
the specific newsgroup. For more information about NetNews, refer to
[1] and [7].
Protocol overview
This paragraph will show how news articles are propagated with Mcntp.
Basically, three parties are involved:
+ Multicast directory service, MD, coordinates the assignment
between multicast and news groups.
+ A Multicast sender, MS, that sends news articles over an
IP multicast infrastructure
+ A Multicast receiver, MR, gets packets from the IP multicast
infrastructure and processes them further.
So this can be seen as follows:
directory +---------+ directory
+------------- > | | ----------------+
\|/ | MD | \|/
+---------+ | | +--------+
| | +---------+ | |
| MS | | MR |
| | ------------ articles -------> | |
+---------+ +--------+
MS and MR can be implemented into existing news server software, or
can be implemented as separate processes that communicate with the
news servers (e.g. via NNTP); this does not matter to the protocol.
MD can either be implemented within MS, or as separate processes that
communicate with each other. A practical way is to have on MD per
sender host so that communication between MD and MS is fast and
reliable, while not too many resources are needed.
The protocol itself consist of two parts that will be presented in
the next two chapters -- Distribution of articles and the directory
service.
Packet format -- Distribution of articles
To send articles via IP multicast, they have to be encapsulated into
Rupp [Page 2]
UDP packets. The following diagram shows how this can be done:
+---------------------------------------+
| Magic |
+----+----+----+----+---------+---------+
| Ver|Rev |Comp|Cryp|Reserved | Offset |
+----+----+----+----+---------+---------+
| Original length |
+---------------------------------------+
| Length as sent |
+---------------------------------------+
| Sender-ID |
+---------------------------------------+
| Message-id |
+---------------------------------------+
| Data |
+---------------------------------------+
All entries are in network byte order. The fields have the following
meaning and types:
+ Magic (32-bit): The String ``McNt''
+ Ver (4-bit): Protocol version -- currently 1
+ Rev (4-bit): Protocol revision -- currently 1
+ Comp (4-bit): Compression method used. Currently are only 2
methods defined:
0 Article is not compressed
1 Article is compressed via zlib [8]
+ Cryp (4-bit): Encryption method used. See below
+ Reserved (8-bit): Reserved for future extensions.
+ Offset (8-bit): Offset of article data from packet start
+ Original length (32-bit): Length of article before compression,
encryption and signing.
+ Length as sent (including digital signature) (32-bit): Size of
the ``Data'' field (see below)
+ Sender-ID: Identification of the sender host terminated by a
Rupp [Page 3]
null byte (see below).
+ Message-ID: The message id of the article in the form it is
defined in RFC 1036 [7], terminated by a null byte.
+ Data: The signed article data after possible compression and
encryption.
This memo does not specify a encryption method for the case that the
field ``Crypt'' is set to anything other than 0; the involved parties
(i.e. the senders of encrypted news and their receivers) have to
agree on a method they want to use. If encryption and compression is
used then the article data is first to be compressed and then the
result to be encrypted.
All articles must be signed before sending them off the net. This is
accomplished by running the RIPEMD-160 message digest [11] algorithm
over the (possibly compressed and encrypted) article and then RSA-
encrypting the message digest with the private key that is suitable
for sender-id. The receiver decrypts the signature of the article
with the public key of sender-id and runs RIPEMD-160 over the data to
see if it has been altered on the way. An article with an invalid
signature or a non matching message digest has to be thrown away. The
sender-id can be the path entry or the hostname of the sending site;
there can also be more than one key pair per site e.g. to have
different keys for different newsgroups. The sender-id has to be
treated in a case independent manner.
Encryption of the message digest is done the following way. The 20
Bytes RIPEMD-160 message digest and the first 28 bytes of the
(possibly compressed and encrypted) message are tacked together to
form a 48 Bytes buffer. This buffer is then encrypted with the right
RSA private key and prepended to the original message without the
first 28 bytes:
+----------------+---------------------------------+
| Message digest | message |
+----------------+-----------+---------------------+
1 28 n
| \
+-----------------------+----------------------------------+
| Signature | message without first 28 Bytes |
+-----------------------+----------------------------------+
To send an article off, it is encapsulated and then just sent to the
appropriate multicast group. There is no feedback from the receiver
to the sender when an article is received.
Rupp [Page 4]
Directory service
In order to get a relation between newsgroups and multicast groups, a
directory service exists; this has been referenced as MD above. When
a sender MS wants to propagate a news group, it asks the directory
service for a multicast group it can use to distribute articles,
waits for the reply, and starts to send. The directory server
registers this group in its tables and periodically distributes this
table over IP multicast. For this purpose, the multicast group
``mcntp-directory.mcast.net'' has been officially been assigned by
the IANA. The UDP port which announcements are sent to, has
officially been assigned by the IANA as UDP port number 5418 with the
name ``mcntp''.
Announcements should not be sent too often to keep traffic low, but
often enough that new receivers don't have to wait to long to be able
to receive articles. Once a minute is assumed to be a good value
here. Announcements can be sent less often if they are transmitted
immediately after a change in the directory.
If more than one directory server is involved (e.g. if there is more
than one sender site), the directory servers have to listen to
announcement packets on ``mcntp-directory.mcast.net''. If it does not
receive a packet after five times the waiting period (e.g., five
minutes) it can consider itself alone on the net and can choose the
multicast groups as it wishes. See below on usage scenarios which
further explain this.
Groups that are local to an organisation (e.g. an ISP) or should stay
within their bounds, must be transported within the range of the
administratively scoped multicast addresses [12].
When a receiver (MR) wants to receive a newsgroup, it listens on
``mcntp-directory.mcast.net'' for announcements, parses them, and
then joins the appropriate multicast groups.
Multicast groups that are no longer in use (e.g. because the sender
has stopped working) must be removed from the announcement.
Rupp [Page 5]
The format of those announcement packets is:
+-----------+------+-----+--------+
| Magic | Vers | Rev | Offset |
+-----------+------+-----+--------+
| Length |
+---------------------------------+
| rmd160 |
+---------------------------------+------+
| Sender-ID | pad1 |
+-----------------+------+---+-----------+---+ -+
| Multicast group | Port |TTL| Newsgroup |pad| |
+-----------------+------+---+-----------+---+ |
... repeat ... | NG lines
+-----------------+------+---+-----------+---+ |
| Multicast group | Port |TTL| Newsgroup |pad| |
+-----------------+------+---+-----------+---+ -+
All numbers are in network byte order. The fields have the following
meaning and types:
+ Magic (16-bit): The Bytes 0xabba.
+ Vers (4-bit): Protocol version (see below).
+ Rev (4-bit): Protocol revision (currently 1).
+ Offset (8-bit): Offset of NG-lines from packet start.
+ Length (32-bit): Total packet length.
+ rmd160 (160-bit): RIPEMD-160 message digest over the rest of
the packet.
+ Sender-ID : Identification of sender host, terminated by a
null byte (see below).
+ Pad1: Padding to next 4-Byte boundary filled with null
bytes.
+ Multicast group (32-bit *): The associated multicast group.
+ Port (16-bit): UDP Port to use for this group.
+ TTL (8-bit): Time to live for multicast packets.
+ Newsgroup: Name of the Newsgroup(s), terminated by a null
Rupp [Page 6]
byte. See also below.
+ Pad: Padding of the string to the next 4-bytes boundary
filled with null bytes.
The protocol version (Vers) is currently 1 for IPv4 and 11 for IPv6.
The multicast group field (*) is 32 bit in size for IPv4 and 128-bit
for IPv6 in size.
The length field is 32 bit in size to support IPv6 jumbo datagrams.
The sender-ID is normally the fully qualified domain name of the
hosts that sends the announcement. As is common practice with
NetNews, this can also be the (possibly shorter) entry that the host
puts in the ``Path:'' header when an article passes through it. This
entry has to be treated in a case independent manner.
The rmd160 is computed over the sender-id field and all lines with
newsgroup to multicast group relations in the packet with the
RIPEMD-160 message digest algorithm.
The lines with newsgroup to multicast group relations are repeated as
often as needed to announce all groups. The TTL can be used by
clients to find out if packets that come from this source can reach
them, or if the sender is too far away. Note that all entries have
to fit into one UDP packet.
The sender-id and the newsgroups entries are padded to the next
4-bytes boundary in order to make processing easier.
TTL values of articles have to be chosen, especially for use on the
MBONE, in a way that newsgroups that are only of local relevance
(e.g. campus groups or groups local to a town) are not distributed
out of their normal distribution area. As already mentioned above,
articles that are only of a local meaning or of local relevance, must
be distributed within the administratively scoped group range [12].
The relation between multicast and newsgroups can range from one
multicast group per newsgroup over one multicast group per news
hierarchy (e.g. comp.*) to all articles in only one group. As current
implementations of kernels and routers get inefficient with too many
multicast groups, the use of one multicast group per newsgroup is
deprecated.
Rupp [Page 7]
Reliability Considerations
As UDP is a unreliable service, provisions for reliable distribution
of articles are needed. There exist some approaches to reliable
multicast (XTP [4], KLG [5] RMTP [6] and others) which all suffer
from some problem or other. Specifically, additional hard- or
software is needed and usually requires kernel modification.
As there is already a reliable transport of NetNews via NNTP, there
is no need for a reliable transport via IP multicast: articles need
not be in order, so it is no problem if one is missing in the
multicast. Since articles need not arrive in order, lost or missing
articles can easily be transmitted via an additional NNTP feed.
As UDP packets can be at maximum 64kBytes in size and every Mcntp
packet has to fit in one UDP packet, there is no provision given to
distribute news articles larger than about 63kBytes in size (other
than compressing them). This does not matter much in practise as
recent research has shown that more than 95% off all news articles
are smaller 64kBytes [9]. The remaining 5% can still be transferred
via NNTP. Some hosts may have problems in receiving UDP packets as
large as 64kBytes, so in practical use article sizes of 16kBytes
would be appropriate. These are still over 90% of all articles.
Usage Scenarios
These scenarios show how mcntp can be used in daily use. The main
difference between local and MBone wide usage are the multicast
groups that are used for distribution as stated above.
For a local use within an organisation there could be one central
sending site that redistributes all news articles it receives via
mcntp. No further action is needed.
When more than one directory server (MD) gets involved, directory
servers must wait on startup for announcement packets from other MD
processes, register the contained groups in its tables and make
decisions involving that tables. Decisions can be divided into the
following:
Use
If the group in which the sender (MS) wants to send is already
distributed over multicast, then the articles are distributed in
the existing group else a new multicast group is used. For
example: if de.* is already distributed over multicast group
a.b.c.d then use that group.
Rupp [Page 8]
New
Always create own multicast groups that don't clash with the ones
that are already existing. For example: if comp.* is already
distributed on group a.b.c.e and the sender (MS) wants to
distribute comp.foo, don't use group a.b.d.e, but create a new
one.
Standby
Only send articles for a specified newsgroup when no one other is
doing it. This can be used to implement backup functionality. For
example: Sender A is sending comp.*. If now a directory packet
arrives at site B, which no longer has comp.* in it, B can start
to send comp.*. When it sees again announcements from A, then it
stops the distribution of comp.*.
For use in an environment, where multiple organisations are involved
(e.g. on the MBone), the following could deployed: everyone that is
participating utilizes the ``use'' method described above. It only
sends articles that are locally produced (e.g. customers) and which
are not distributed via mcntp by another site. No articles received
from news peers should be distributed that way. After some delay (at
least 10 seconds), articles which are distributed via mcntp are
offered to peers over nntp as usual. The set of groups that is
distributed must be negotiated between the involved organisations.
With the current Usenet groups this could be:
- rec.*
- comp.*,news.*,gnu.*
- talk.*,misc.*
- humanities.*,sci.*,soc.*
- alt.*
Usage over Satellite connections
While in some regions of the world, terrestial bandwidth is cheap,
there are other regions where this is not the case. But those regions
can be reached by satellite beams. There are already some NetNews
over satellite mechanisms in place which often have their proprietary
protocol in place. With mcntp, transport of articles can go over the
same equipement, which is in place for IP communications. A possible
setup can be found in [13]. This setup has also the advantage that no
backchannel is needed, which allows the use of small and cheap anten-
nas on the receiver side.
Rupp [Page 9]
Summary
The distribution of NetNews articles via IP multicast can be a way to
decrease the network bandwidth used to distribute them. Articles are
delivered fast via a nonreliable protocol; later, the holes are
filled via a reliable, already existing protocol. Compression of
articles can further reduce the network Load. With encryption private
news groups can be established on a public IP multicast
infrastructure. A prototype of a reference implementation [13]
already shows that Mcntp is fast and can be used as an alternative to
classical transports. The use of zlib for compressing articles shows
a reduction of transferred volume (including protocol headers) to
about 65% of the original article volumes.
In cooperation with Orion Network Systems [14], mcntp has proven its
use for distribution of NetNews over a unidirectional satellite
connection.
Security Considerations
With the classical NNTP based distribution, every host on the path of
an article keeps track of it in the logfiles, making it possible to
find the sender of forged or abusive articles with the aid of the
administrators of the newshosts along the path. For the distribution
of NetNews over IP multicast, this is no longer true: routers don't
log packets flowing by and as the sender address of IP packets can be
forged, a sender can't be traced. This fact can be used to inject
forged news articles without being traceable.
To prevent the unnoticed injection of articles, a mcntp receiver only
accepts articles from senders that it trusts. This trust is build by
digitally signing the article with the private key of the sender and
verifying the signature at the receiver site. Receivers have to
accept only articles with good signatures
The RIPEMD-160 message digest algorithm has been chosen, as it is
more secure than MD5 while still being fast enough. The RSA
encryption algorithm has been chosen as there exist reference
implementations for usage inside US (from RSA Inc.) and outside
(rsaeuro by J.S.A.Kapp).
The key size for the RSA algorithm must be at least 512 bit in size
to prevent cracking of the key.
Rupp [Page 10]
References
[1] RFC 977 -- B. Kantor, P. Lapsley, "Network News Transfer Protocol:
A Proposed Standard for the Stream-Based Transmission of News".
[2] RFC 1112 -- S. Deering, "Host extensions for IP multicasting",
08/01/1989.
[3] RFC 768 -- J. Postel, "User Datagram Protocol", 08/28/1980.
[4] XTP -- W. T. Strayer, D.J. Dmepsey, B.C. Weaver, "XTP: The Xpress
Transfer Protocol", Addison-Wesley
[5] KLG -- M. Hofmann, "Zuverlaessige Kommunikation in heterogenen
Netzen", Thesis at "Institut fuer Telematik, CS Dept. Univ
Karlsruhe"
[6] RMTP -- Lin, John C., Paul Sanjoy, "RMTP: A Reliable Multicast
Transport Protocol".
[7] RFC 1036 -- M. Horton, R. Adams, "Standard for interchange of
USENET messages", 12/01/1987.
[8] RFC 1950 -- L. Deutsch, J. Gailly, "ZLIB Compressed Data Format
Specification version 3.3", 05/23/1996.
[9] http://www.pilhuhn.de/mcntp/histo/ -- Some Statistics about size
distribution of NetNews
[10] RFC 793 -- J. Postel, "Transmission Control Protocol", 09/01/1981.
[11] H. Dobbertin, A. Bosselaers, B. Preneel, "RIPEMD-160: A
Strengthened Version of RIPEMD" 04/18/1996. An earlier version
appeared in "Fast Software Encryption,LNCS 1039" Springer Verlag,
1996, pp. 71-82.
[12] [draft-ietf-mboned-admin-ip-space-04.txt/number of rfc] David
Meyer, "Administratively Scoped IP Multicast", [date of rfc]
[13] http://www.pilhuhn.de/mcntp/
[14] http://www.onsi.com/
Rupp [Page 11]
Author's Address
Heiko W.Rupp
Gerwigstr. 5
D-76131 Karlsruhe
Phone: +49 721 9661524
EMail: hwr@pilhuhn.de
Rupp [Page 12]
INTERNET DRAFT EXPIRES SEPT 1998 INTERNET DRAFT