Internationalization of the Hypertext Markup Language
RFC 2070

Document Type RFC - Historic (January 1997; No errata)
Obsoleted by RFC 2854
Last updated 2013-03-02
Stream IETF
Formats plain text pdf html bibtex
Stream WG state WG Document
Document shepherd No shepherd assigned
IESG IESG state RFC 2070 (Historic)
Consensus Boilerplate Unknown
Telechat date
Responsible AD (None)
Send notices to (None)
Network Working Group                                       F. Yergeau
Request for Comments: 2070                           Alis Technologies
Category: Standards Track                                     G. Nicol
                                          Electronic Book Technologies
                                                              G. Adams
                                                              Spyglass
                                                             M. Duerst
                                                  University of Zurich
                                                          January 1997

         Internationalization of the Hypertext Markup Language

Status of this Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Abstract

   The Hypertext Markup Language (HTML) is a markup language used to
   create hypertext documents that are platform independent.  Initially,
   the application of HTML on the World Wide Web was seriously
   restricted by its reliance on the ISO-8859-1 coded character set,
   which is appropriate only for Western European languages.  Despite
   this restriction, HTML has been widely used with other languages,
   using other coded character sets or character encodings, at the
   expense of interoperability.

   This document is meant to address the issue of the
   internationalization (i18n, i followed by 18 letters followed by n)
   of HTML by extending the specification of HTML and giving additional
   recommendations for proper internationalization support.  A foremost
   consideration is to make sure that HTML remains a valid application
   of SGML, while enabling its use with all languages of the world.

Table of Contents

   1.  Introduction .................................................. 2
     1.1. Scope ...................................................... 2
     1.2. Conformance ................................................ 3
   2. The document character set ..................................... 4
     2.1. Reference processing model ................................. 4
     2.2. The document character set ................................. 6
     2.3. Undisplayable characters ................................... 8

Yergeau, et. al.            Standards Track                     [Page 1]
RFC 2070               HTML Internationalization            January 1997

   3. The LANG attribute.............................................. 8
   4. Additional entities, attributes and elements ................... 9
     4.1. Full Latin-1 entity set .................................... 9
     4.2. Markup for language-dependent presentation ................ 10
   5. Forms ..........................................................16
     5.1. DTD additions ..............................................16
     5.2. Form submission ............................................17
   6. External character encoding issues .............................18
   7. HTML public text ...............................................20
     7.1. HTML DTD ...................................................20
     7.2. SGML declaration for HTML ..................................35
     7.3. ISO Latin 1 character entity set ...........................37
   8. Security Considerations.........................................40
   Bibliography ......................................................40
   Authors' Addresses ................................................43

1.  Introduction

   The Hypertext Markup Language (HTML) is a markup language used to
   create hypertext documents that are platform independent.  Initially,
   the application of HTML on the World Wide Web was seriously
   restricted by its reliance on the ISO-8859-1 coded character set,
   which is appropriate only for Western European languages.  Despite
   this restriction, HTML has been widely used with other languages,
   using other coded character sets or character encodings, through
   various ad hoc extensions to the language [TAKADA].

   This document is meant to address the issue of the
   internationalization of HTML by extending the specification of HTML
   and giving additional recommendations for proper internationalization
   support.  It is in good part based on a paper by one of the authors
   on multilingualism on the WWW [NICOL].  A foremost consideration is
   to make sure that HTML remains a valid application of SGML, while
   enabling its use with all languages of the world.

   The specific issues addressed are the SGML document character set to
   be used for HTML, the proper treatment of the charset parameter
   associated with the "text/html" content type and the specification of
Show full document text