INTERNET-DRAFT                                                J. Salsman
Filename: <draft-salsman-www-device-upload-07.txt>         Cisco Systems
to be submitted to the W3C, HTML activity for forms          1 June 1999

               Form-based Device Input in HTML

Status of this Memo

   This draft extends an experimental protocol for the Internet
   community.  This draft does not specify an Internet standard
   of any kind.  Discussion and suggestions for improvement are
   requested.  Distribution of this memo is unlimited.

   This document is an Internet-Draft and is in full conformance
   with all provisions of Section 10 of RFC 2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other
   documents at any time.  It is inappropriate to use Internet-
   Drafts as reference material or to cite them other than as
   "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt
   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html

1.  Abstract

   Currently, HTML forms allow the producer of the form to request
   information -- including files of data -- from the operator reading
   the form.  However, this capability is limited because HTML forms
   don't provide a way to ask the operator to submit input from
   arbitrary sources such as audio devices like microphones.  Since
   input and upload from various devices is a feature that will
   benefit many applications, this draft proposes an extension to the
   HTML INPUT TYPE=FILE form element specified in [RFC 1867] to allow
   information providers to express requests for uploads from audio
   and other devices uniformly.  A discussion of MIME audio data
   types to facilitate useful audio upload responses follows.  Also
   security discussions are included, as are audio usability and
   quality discussions, and a description of a backward compatibility
   strategy allowing new user agents to utilize HTML written with
   earlier proposals for audio input in mind.  Motivations, including
   language instruction assistance, voice transcription, and
   high-quality transmission under low-bandwidth conditions, conclude.

2.  Introduction

   The following protocol extensions are defined by this memo:

   - a DEVICE attribute to be used in HTML with the INPUT element
   along with the TYPE=FILE attribute-value, which identifies the
   peripheral device from which the input file is to be taken.  The
   following nine device names are suggested in this memo:  microphone,
   mic, filesystem, files, camera, keyboard, scanner, serial, any.

   - an HTTP request header named Client-File-Maxlength, specifying a
   decimal integer of bytes, which specifies the input buffer length
   available to the client for storage of input data for file upload.

   - two parameters for the MIME Content-Disposition header:

     - 'device' -- the lowercase value of the DEVICE attribute, or
     'unavailable' if the requested device is supported but unavailable,
     or 'unsupported' if the device is not supported or the MIME type(s)
     requested in the ACCEPT attribute are not supported.

     - 'alternates' -- a space-separated list of MIME types which
     indicates the types available from the requested device when the
     requested type(s) are unsupported.

   - and the TYPE=AUDIO HTML Input element extension, first published
   in 1993 by Dave Raggett within the HTML+ Discussion Document, and
   implemented before the introduction of [RFC 1867].

3.  Extensions

   Section 3.1 of [RFC 1867] provides for the presentation of an
   arbitrary "widget" to specify input for file uploads.  When an
   INPUT tag of type FILE is encountered with a DEVICE attribute, the
   associated value (such as MICROPHONE, or MIC) should select the use
   of a widget capable of buffering and editing real-time input (such
   as speech) from the specified device, instead of selecting a file.

   If an ACCEPT attribute also has a value in a device file input
   element, the browser may constrain the MIME type of uploaded data to
   match those with the list of types specified as the value of the
   ACCEPT attribute.  If the value of the DEVICE attribute is FILESYSTEM
   (or FILES) then the INPUT element may be treated as usual according
   to [RFC 1867] except that the subset of files presented to the
   operator to choose from may be constrained by the specified list of
   MIME types instead of a pattern of file names or extensions.  Please
   note that without DEVICE=FILES the ACCEPT attribute will probably be
   treated as a filename pattern.

   If the value of the DEVICE parameter is ANY, the operator may be
   offered a choice of all available supported devices and files,
   restricted to the choices compatible with the MIME types specified
   in the ACCEPT attribute, if present.

   File-upload forms are submitted with ENCTYPE="multipart/form-data"
   -- an alternate FORM tag specification for sending MIME content.
   Each "part" of such a submission, representing the value of each
   input element in the submitted form, is given a Content-Disposition
   header, which in the case of 'content-disposition: file' -- may also
   have a Content-Type header representing the MIME type of the data
   being uploaded.  Files uploaded using the extensions in this memo
   SHOULD [RFC 2119] include a Content-Type header when the file type
   is known or can be accurately determined by the client browser.
   Since no original filename as specified in section 3.3 of [RFC 1867]
   will be available for arbitrary peripheral input, parameters of the
   'content-disposition: form-data' submission headers SHOULD include a
   'device' parameter with the lowercase value of the DEVICE attribute
   of the associated form input element, unless the device or MIME
   type(s) specified are unsupported in which case the value of the
   'device' header parameter may be 'unsupported', or unless the device
   is unavailable in which case the value SHOULD be 'unavailable'.

   If the MIME types requested are unsupported, an additional parameter
   'alternates' may be included with a space-separated list of MIME
   types of the same content-type which may be supported as alternatives
   for the specified device.  The 'alternates' parameter is not intended
   as a content negotiation feature; merely a way for a server to log
   upload failures which were constrained by the lack of type conversion
   facilities.  The content-disposition header parameter syntax is
   described in [RFC 2183], which along with [RFC 1867], contain
   examples of the protocols extended by this memo.

   There may be significant limitations on the client browser's
   ability to buffer input for upload.  Browsers may provide an
   estimate of the default MAXLENGTH available for device input and
   upload through the HTTP header 'Client-file-maxlength:' followed
   immediately with the decimal representation of the number of bytes
   representing the content-length available to the browser for
   buffering (reference: section 14 of RFC 2068.)  A server may also
   provide information about the largest file size it can accept for
   upload, by using the MAXLENGTH attribute with a value representing
   the decimal integer of bytes in the HTML form's INPUT elements.

   Furthermore, the VALUE attribute may be used to provide a numeric
   disambiguation between multiple similar devices when present. Under
   most conditions the operator SHOULD be allowed to select the device
   from ambiguous sources of input, or re-select it if specified with a
   VALUE parameter. The VALUE attribute may also be used to specify
   alternative methods of input when the value of VALUE is nonnumeric.

   If real time events, such as those described and proposed by
   Gregory S. Aist in "A General Architecture for a Real-Time
   Discourse Agent and a Case Study in Computerized Oral Reading
   Tutoring" (Carnegie Mellon University Computational Linguistics
   Program, 6 December 1996), are required, then the Real-time
   Transport Protocol (RTP, currently RFC 1889) may be used instead.
   Because of security concerns discussed in section 9 below, HTML
   scripts MUST NOT be able to invoke a form submission when the form
   involves any kind of file upload without explicit instructions
   from the session operator to the contrary.

3.1.  Examples

   In this short form, the HTML author has requested the upload of
   sampled microphone input from the operator upon form submission:

     <FORM ENCTYPE="multipart/form-data" METHOD=POST ACTION="_URL_">
       Say something:  <INPUT NAME=SPEECH1 TYPE=FILE DEVICE=MIC>
       <INPUT TYPE=SUBMIT VALUE="Send Speech">
     </FORM>

   Below MIC is not used as an abbreviation.  The author of the HTML
   has requested that the data input from the microphone be encoded as
   either the MIME type Audio/L16 -- sixteen bit signed linear audio
   samples (most-significant byte first) -- as specified in [RFC 1890]
   section 4.4.8, with a single (monaural) channel and a sample rate of
   11,025 samples per second, or an unspecified extended MIME Audio
   type named 'x-cepstral-voc'.  Please note that MIME types are here
   separated by spaces except when the following character is a
   semicolon, in which case the following non-space string should be of
   the form ";parameter=value" which modifies the preceding MIME
   type, but space before such parameters is optional.

     <INPUT NAME=SPEECH2 TYPE=FILE DEVICE=MICROPHONE
       ACCEPT="audio/L16;rate=11025 ;channels=1 audio/x-cepstral-voc">
     <INPUT NAME=FILE1 TYPE=FILE DEVICE=FILES ACCEPT="text/*">

   Below the form element may be used to upload a file as usual, except
   that the files to select from may be constrained to text files,
   without explicit regard of their filename or extensions.  Please
   note that "/*" after "text" is optional.

     <INPUT NAME=FILE1 TYPE=FILE DEVICE=FILE ACCEPT="text/*">

   The next two examples show how these extensions may be used to
   request input from other kinds of devices, such as the second of two
   or more cameras connected to the system running the browser.
   Please note that the VALUE is only a suggestion, and the browser
   operator should still be offered to select from multiple devices,
   with the only difference being the default selection.

     <INPUT NAME=PICTURE1 TYPE=FILE DEVICE=CAMERA VALUE=2>

   For this next example only, if there is only one keyboard, the
   operator's preferred editor may be invoked, but the filename should
   be unique and not influenced in any way by the string "EMACS".  If
   that value influences the selection of an editor, it should do so
   with a pre-specified table (such as a "mailcap" file) and should not
   be used as any part of the command string of the editor executed.

     <INPUT NAME=ESSAY1 TYPE=FILE DEVICE=KEYBOARD VALUE=EMACS>

   The example below requests the operator to select images from any
   device, including the filesystem, for upload to the server, as long
   as they are less than 100 KB (and any values specified in the
   Server-file-maxlength and Client-file-maxlength HTTP headers; the
   minimum of all three values should take precedence.)

     <INPUT NAME=PICTURE2 TYPE=FILE DEVICE=ANY ACCEPT="image"
       MAXLENGTH=100000>

4.  Compatibility with earlier forms of audio input

   Audio microphone input was proposed in late 1993 and implemented
   (with url-encoded submission data) in experimental versions of
   X-Mosaic in Germany no later than 1995.  To accommodate the syntax
   of these earlier extensions, a browser might interpret an HTML
   statement such as

     <INPUT TYPE=AUDIO ...>

   as the device input form

     <INPUT TYPE=FILE DEVICE=MICROPHONE ...>

   with all other attribute/value pairs of the original INPUT element
   kept the same as specified.  This would retain compatibility for
   all implementations of which the author of this draft is aware.

5.  User interface usability and quality concerns for audio

   An audio sample is customarily recorded on computer equipment with
   a dialog routine capable of allowing the user to record, pause,
   play back, erase, or otherwise edit the recording.  Browsers may
   provide the operator with the same kind of dialog routine for audio
   device input.  And if a MAXLENGTH has been specified or is in force
   because of limited buffer size, a display of the buffer size used
   and remaining may be displayed as a dynamic bar graph (or as a
   percentage if graphics are unavailable.)  A display of time in
   seconds used and remaining in the buffer may also be provided.

   Most MIME types defined for audio do not provide high-quality audio
   encodings.  The 'audio/basic' and other types which use a sample rate
   of 8,000 samples per second truncate the audio spectrum at 4,000 Hz
   according to the Nyquist theorem, discarding information important
   for discerning consonants.  Also, audio/basic and other MIME Audio
   types use a sample size of eight bits, which does not usually provide
   enough dynamic range for accurate automatic speech recognition unless
   published automatic gain control algorithms are reliably used.  If
   sixteen-bit unsigned audio encodings are used according to section
   4.4.8 of [RFC 1890], the sample rate -- specified as the 'rate'
   parameter of the MIME type 'audio/l16' -- may be at least 11,025 or
   16,000 to adequately provide sufficient information for automatic
   speech recognition.  Otherwise, the audio feature extraction encoding
   of the speech recognition algorithm may be used to provide a more
   compact representation to shorten the upload.

6.  HTML Document Type Description changes

   Along with the extension to the HTML InputType entity described in
   the previous section, this proposal makes an addition to the HTML
   DTD for the INPUT element ATTLIST of an #IMPLIED attribute DEVICE
   of type CDATA, and reserves an #IMPLIED attribute CONNEG, also of
   type CDATA.

   Contemporary revisions of HTML are being defined as modules within
   XML, which involves a different DTD structure.  The preceding
   paragraph was written with the HTML 3.2 DTD in mind, and is no
   longer up to date.  It still serves to disambiguate the syntax
   of the proposal.  It is worth noting that all attribute values must
   be quoted in XML, and "empty" tags (unitary elements which are not
   used to bracket text, e.g., input but not form) must end with space
   followed by a forward-slash before the trailing angle bracket.  For
   example:

     <input type="file" device="camera" accept="video/mpeg" />

   Registration of new DEVICE names not suggested in this draft will
   be administered by the W3C HTML activity or delegated to IANA as
   described in BCP 26 (RFC 2434) at the option of the W3C HTML
   activity.  The official definition of the assigned DEVICE values
   may be reflected in the comments of the DTD when approved and
   published by the W3C or other authority, immediately following the
   definition of the DEVICE attribute.

7.  Motivations

   The primary motivation for these extensions is to add the capability
   of speech input to Web-based educational systems. [MR 1, 2]  Other
   motivations include the development of "dictation servers" [MR 3]
   capable of transforming spoken audio uploaded though an HTTP session
   to the corresponding text suitable for sending in email or including
   in another document, for example.  Natural language continuous speech
   recognition software conforming to standard APIs for automatic
   dictation is as of this writing available for free in small quantity
   so there is ample reason to believe that transcription servers might
   soon become commonplace on the Web with these extensions.  These
   extensions could also be a help to hearing-impaired people who
   could use a "phonology server" to practice improving pronunciation.

   Larry Masinter, author of [RFC 1867], and member of the IETF Content
   Negotiation Working Group has indicated that graphical paper
   scanners might be used for applications such as OCR and bar-code
   upload.  "DEVICE=SCANNER" and "DEVICE=SERIAL" are suggested.

   Finally, it is important to note that the addition of this proposal
   will allow web-enabled devices, such as radio telephones, to
   transmit high-quality asynchronous content, such as voicemail, under
   conditions of very low bandwidth.

8.  Scaling considerations

   The protocol proposed in this draft has been proven to scale for
   very large files, but is not intended for open-ended uploads of
   content of indeterminate length.  RTP (RFC 1889) is much more
   appropriate for such open-ended transmission of device input.

9.  Security considerations

   Browser operators may not want to send their files, recordings,
   pictures, video, or other device inputs to arbitrary sites without
   their explicit permission and direction.  Therefore, browser
   authors are encouraged to disallow the submission of forms which
   include any kind of file upload by any means other than the
   standard HTML operator-controlled buttons for form submission
   without explicit instruction from the session operator to the
   contrary.  Accordingly, the SIZE parameter, document style sheets,
   and document layers may be prevented from obscuring any kind of
   file upload widget, especially those capable of accepting a default
   filename.  Furthermore, just as the operator may take direct action
   to initiate, terminate, review and edit recording as described in
   the previous section, browser authors are encouraged to prevent
   HTML scripts from taking those and similar actions, unless for
   example the operator has specifically enabled such script actions
   with a security option.  Even then, such preferences may be
   specified by the operator to reset after an interval or at the end
   of the session.  Finally, explicit information may be needed to
   insure that the operator is informed when files are being uploaded.

10.  Author's address and acknowledgments

   James Salsman
   W3C Representative
   Cisco Systems, San Jose, California

   Bovik Research Inst., a non-profit organization
   1285 Montecito Ave Apt 57
   Mountain View, CA  94043

   Email:  jps@bovik.org, jsalsman@cisco.com
   Phone:  (650) 967-2737

   Larry Masinter and Harald Alvestrand contributed excellent advice.
   Ed Tecot contributed the means of device and media independence.
   David McMillian contributed to the description of capabilities of
   the audio widget.  Syracuse Language Systems, The Learning Co.,
   and EduSoft, Ltd., contributed much of the inspiration; Jack Mostow
   et alli did much more work for younger grades.  Dave Raggett helped
   integrate into the fast-paced development of HTML.  Keith Moore
   provided invaluable comments and assistance.

11.  Provisional copyright statement and permissions

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that this paragraph is included on all such copies
   and derivative works.  However, this document itself may not be
   modified in any way, except as needed for the purpose of developing
   Internet and W3C standards or as required to translate it into
   languages other than English.  Copyright (c) 1999.

   The limited permissions granted above are perpetual and will not be
   revoked by the author or his successors or assigns.  Moreover, the
   author will not make any effort to restrict the use of the
   information contained in this document.

   This document and the information contained herein is provided on
   an "as is" basis and the author disclaims all warranties, express
   or implied, including but not limited to any warranty that the use
   of the information herein will not infringe on any rights or any
   implied warranties of merchantability or fitness for a particular
   purpose.  In the opinion of the author, use of the information
   contained in this document does not infringe on any rights.

References

[RFC 1867] Form-based File Upload in HTML.  E. Nebel & L. Masinter,
           November 1995.

[RFC 1890] RTP Profile for Audio and Video Conferences with Minimal
           Control.  H. Schulzrinne, January 1996.

[RFC 2068] Hypertext Transfer Protocol -- HTTP/1.1.  R. Fielding,
           J. Gettys, J. Mogul, H. Frystyk, & T. Berners-Lee,
           January 1997.

[RFC 2119] Key words for use in RFCs to Indicate Requirement Levels.
           S. Bradner, March 1997.

[RFC 2183] Communicating Presentation Information in Internet
           Messages:  The Content-Disposition Header.  R. Troost,
           S. Dorner, & K. Moore, August 1997.

[MR 1] http://www.cs.cmu.edu/~listen
           Literacy instruction by a reading tutor that listens,
           from Carnegie Mellon's Project LISTEN.

[MR 2] http://www.ordinate.com
           Over-the-phone automated testing of English fluency,
           listening, and vocabulary from Ordinate Corporation.

[MR 3] http://www.cybertranscriber.com
           Automatic transcription from spoken dictation from
           Speech Machines Corporation.

END OF DRAFT
Filename: <draft-salsman-www-device-upload-07.txt>

changes from -05 to -06:
  title shortened
  'type' parameter --> Content-type header (doesn't break CGI.pm)
  itemized: HTML element attributes, HTTP request,
    content-disposition parameters (IANA parameter templates to
    be filed seperatly upon approval)
  removed server-file-maxlength due to redundancy with HTML MAXLENGTH
  removed CONNEG reservations and disclaimed content negotiation
  grammar adjusted with RFC 2119 in mind
  format changes and section renumbering for Application Area
  additional examples, incl. XML tag style
  references put in RFC style

changes from -06 to -07:
  corrected historical notes to reference dsr@w3.org's HTML+ draft
  deleted reference to 'Content-disposition: file' which never occurs

:jps