INTERNET-DRAFT                                         Maurizio Codogno
draft-codogno-mime-nntp8bit-00.txt                                CSELT
Expires: February 11, 1999                        Date: August 06, 1998


                 The MIME application/nntp8bit Content-type


   Status of this Memo

   This document is an Internet Draft; Internet Drafts are working
   documents of the Internet Engineering Task Force (IETF) its Areas,
   and Working Groups.  Note that other groups may also distribute
   working documents as Internet Drafts.

   Internet Drafts are draft documents valid for a maximum of six
   months.  They may be updated, replaced, or obsoleted by other
   documents at any time.  It is not appropriate to use Internet Drafts
   as reference material or to cite them other than as a "working draft"
   or  "work in progress".

   Please check the abstract listing in each Internet Draft directory
   for the current status of this or any other Internet Draft.

   To view the entire list of current Internet-Drafts, please check
   the "1id-abstracts.txt" listing contained in the Internet-Drafts
   Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
   (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au
   (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu
   (US West Coast).

Abstract

   The application/nntp8bit content-type is proposed and defined as an
   efficient and simple way to transmit raw ("binary") data over an NNTP
   connection, taking into account the foreseeable limitations of that
   standard.

1.  Introduction

   Usenet News [NNTP, NEWS] are a very popular data transmission format:
   at the time of writing, there are tens of thousands of different
   discussion groups, and the traffic generated per site could be as
   much as 10 GB/day.

   The vast majority of the data is composed by binary files (images,
   audio or video clips, software programs...) which comprise up to 90%
   of the global traffic. Unfortunately, the two main ways used to codify
   binary data, that is UUENCODE and MIME application/octet-stream with
   Content-Transfer-Encoding base64, add a 33% overhead on the dimension
   of the file sent.

   The new specifics of the NNTP protocol which are worked up now
   [NEWNNTP] require an 8-bit-wide channel, and the companion new
   definition for Usenet Message Format [USEFOR] does not object to the
   presence of 8-bit data. There is however a problem, which does not

Codogno                   Expires February 1999                 [Page 1]


Internet Draft            application/nntp8bit              August, 1998

   alloy to send raw data directly: it is not possible to have in the
   body of an article an ASCII NUL (0x00) character, and ASCII CR and LF
   (0x0d, 0x0a) must appear together. Moreover, each line in the body
   must be at most 998 octets long, and must end with the CR-LF
   sequence (not counted in the 998 octets limit).

   A rather simple way to cope with these limitation is to develop a
   MIME Content Type which codes the text in such a way to comply with
   this. This solution has been preferred to the definition of a new
   Content Transfer Encoding because it is simple to have the former
   working: if a newsreader does not understand the format, it is
   possible to save the article and process it with an external filter.

2.  application/nntp8bit Registration Information

   The following form is copied from RFC 1590, Appendix A: registration
   of the new media type will be duly performed.


     To:  IANA@isi.edu
     Subject:  Registration of new Media Type content-type/subtype

     Media Type name:           application

     Media subtype name:        nntp8bit

     Required parameters:       Type, a media type/subtype

     Optional parameters:       Name, the name of the file

     Encoding considerations:   it must be encoded "8bit" or "binary".

     Security considerations:   NONE

     Published specification:   RFC-REL (this document).

     Person & email address to contact for further information:
                                Maurizio Codogno
                                CSELT CF/IM Dept.
                                Via G. Reiss Romoli, 274
                                I-10148 Torino TO
                                Italy
                                +39 011 228 6132
                                <mau@beatles.cselt.it>


3.  Definition of the coding

   Since it is expected that, at least in the beginning, the MIME type
   application/nntp8bit would not be commonly deployed, the
   specification of the coding has deliberately kept simple. Moreover,
   it can be supposed that most binary files sent by Usenet News are
   already compressed: therefore, it was thought that it is simple
   just to escape offending characters. A single exception has been

Codogno                   Expires February 1999                 [Page 2]


Internet Draft            application/nntp8bit              August, 1998

   made: since there may be the case that someone sends uncompressed
   files, and it seems that they contain a large amount of NUL
   characters, NUL is coded with a single octet.

   Since no chunk of data between CRLF pairs can be longer than 998
   octets, it is also necessary to add CRLF pairs in suitable places.
   The coding algorithm, written in pseudo-C, runs as follow:

   ----------------- cut ----------------------
   int nchar=0;
   char c, NUL=0x00, CR=0x0d, LF=0x0a;
   char X80=0x80, X81=0x81, X8A=0x8a, X8D=0x8d;

   while ((c=getchar()) != EndOfFile) {
      if (c == NUL)
         { printf("%c",X80); nchar++; }
      else if (c == CR)
         { printf("%c%c",X81,X8D); nchar+=2; }
      else if (c == LF)
         { printf("%c%c",X81,X8A); nchar+=2; }
      else if (c == X80)
         { printf("%c%c",X81,X80); nchar+=2; }
      else if (c == X81)
         { printf("%c%c",X81,X80); nchar+=2; }
      else
         { printf("%c",c); nchar++; }

      if (nchar >= 997)
         { printf("%c%c",CR,LF); nchar=0; }
   }
   ----------------- cut ----------------------

   while the uncoding algorithm is the following:

   ----------------- cut ----------------------
   char c, NUL=0x00, CR=0x0d, LF=0x0a;
   char X80=0x80, X81=0x81, X8A=0x8a, X8D=0x8d;

   while ((c=getchar()) != EndOfFile) {
      if (c == CR)
         c=getchar(); /* eat CRLF */
      else if (c == X80)
         printf("%c",NUL);
      else if (c == X81) {
         c=getchar(); /* get escaped char */
         if (c == X80) printf("%c",X80);
         else if (c == X81) printf("%c",X81);
         else if (c == X8A) printf("%c",LF);
         else if (c == X8D) printf("%c",CR);
      }
      else
         printf("%c",c);
   }
   ----------------- cut ----------------------

Codogno                   Expires February 1999                 [Page 3]


Internet Draft            application/nntp8bit              August, 1998

   Note that a real implementation should of course check for malformed
   input data, and return correspondingly an error message.

   The overhead induced by this coding can be roughly measured as
   follows:

   - four octets out of 256 are coded with two octects, increasing
     the total dimension by 1.6% on average;
   - there are two extra octets each 997 or 998, adding a further 0.2%;
   - there is the MIME header overhead, which is negligible for large
     files.

   It is therefore possible to code a typical article with just 2%
   overhead, rather than the 33% of UUENCODE or base64 encoding.

4.  User Agent Requirements

   User agents that do not recognize application/nntp8bit shall, in
   accordance with [MIME], treat the entire entity as
   application/octet-stream. This is ok, since the data may then be
   saved as an external file which can be processed offline.

   MIME User Agents that recognize application/nntp8bit will decode the
   stream of data and present it to the user as a file with content
   defined in the Type parameter.


4.1 Recursion

   MIME is a recursive structure.  Hence one must expect an
   application/nntp8bit entity to contain other application/nntp8bit
   entities.  When a application/nntp8bit entity is being processed for
   display or storage, any enclosed application/nntp8bit entities shall
   be processed as though they were being stored.


5.  Further work

   It could be possible to define a way to process articles split before
   transmission, because of their large size. Two possible ways to do
   this are

   - add a MIME optional parameter which says which part of the file is
     being sent
   - use an escape sequence "0x81 0xnn", with nn going from 01 to 79, at
     the beginning of the stream data to indicate which part is being
     sent.

   The latter system limits the dimension of the complete file being
   sent, but it is more compact.





Codogno                   Expires February 1999                 [Page 4]


Internet Draft            application/nntp8bit              August, 1998

6.  Security considerations

   It may be possible to prepare a coded stream which can execute
   malicious programs, if a newsreader cannot understand this MIME Media
   Type. It has however to be noted that the specifications for Usenet
   message would allow such a message anyway, so no new security issue
   should be added.


7.  Acknowledgments

   [I hope someone in the USEFOR IETF group will help me!]
   The author, however, take full responsibility for all errors
   contained in this document.

8.  References


[MIME]      Borenstein, N. and Freed, N., "MIME (Multipurpose Internet
            Mail Extensions): Mechanisms for Specifying and Describing
            the Format of Internet Message Bodies", June 1992, RFC 1341.

[NEWS]      Horton, M., Adams, R., "Standard for Interchange of USENET
            Messages", December 1987, AT&T Bell Labs and Center for
            Seismic Studies, RFC 1036.

[NEWNNTP]   Barber, S. "Network News Transport Protocol", work in
            progress, ftp://ds.internic.net/internet-drafts/draft-ietf-
            nntpext-base-04.txt

[NNTP]      Kantor, B., Lapsley, P., "Network News Transfer Protocol",
            February 1986, U.C. San Diego and U.C. Berkeley, RFC 977.

[USEFOR]    Ritter, D., N., "User Article Format", work in progress,
            ftp://ds.internic.net/internet-drafts/draft-ietf-usefor-
            article-01.txt


9.  Author's address

   Maurizio Codogno
   CSELT CF/IM Dept.
   Via G. Reiss Romoli, 274
   I-10148 Torino TO
   Italy
   +39 011 228 6132
   <mau@beatles.cselt.it>








Codogno                   Expires February 1999                 [Page 5]