Internet Draft M. Duerst
<draft-duerst-ruby-00.txt> University of Zurich
Expires 30 February 1997 30 August 1996
Ruby in the Hypertext Markup Language
Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working doc-
uments of the Internet Engineering Task Force (IETF), its areas, and
its working groups. Note that other groups may also distribute work-
ing documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months. Internet-Drafts may be updated, replaced, or obsoleted by
other documents at any time. It is not appropriate to use Internet-
Drafts as reference material or to cite them other than as a "working
draft" or "work in progress".
To learn the current status of any Internet-Draft, please check the
1id-abstracts.txt listing contained in the Internet-Drafts Shadow
Directories on ds.internic.net (US East Coast), nic.nordu.net
(Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
Rim).
Distribution of this document is unlimited. Please send comments to
the author at <mduerst@ifi.unizh.ch>. This document is intended to
become an informational RFC, and its contents is designed for adop-
tion in other standards and specifications.
Abstract
The Hypertext Markup Language (HTML) is a markup language used to
create hypertext documents that are platform independent. Initially
HTML was designed primarily for Western European languages; most of
the issues of basic internationalization to make HTML better usable
for other languages have in the meantime been addressed. Ruby are
importannt phonetic annotations used mainly for ideographic charac-
ters in East Asia. This document proposes markup for ruby in HTML and
explains its usage.
Expires 28 February 1997 [Page 1]
Internet Draft Ruby in HTML 28 August 1996
Table of contents
1. Introduction ................................................... 2
1.1 General ......................................................2
1.2 Notational Conventions .......................................3
2. Syntax ......................................................... 3
2.1 The RUBY Attribute ...........................................3
2.2 Usage Limitations ............................................3
2.3 Changes to the DTD ...........................................4
2.4 Nested Attributes ............................................4
3. Guidelines for Implementation .................................. 5
4. Design Considerations .......................................... 5
Bibliography .......................................................6
Author's Address ...................................................7
1. Introduction
1.1 General
The Hypertext Markup Language (HTML) [RFC1866] is a simple markup
language used to create hypertext documents that are platform inde-
pendent. The main features for full international use of HTML are
described in [HTML-I18N]. This draft describes markup for an addi-
tional feature needed for international HTML, namely ruby. Ruby are
short phonetic annotations for ideographic characters used throughout
East Asia.
Ruby are placed at the right side of their base characters for verti-
cal text, and atop for horizontal text. They are rendered with about
half the size of their base characters.
Ruby are used frequently in Japan in most kinds of publications, such
a books and magazines, but also in China, especially in schoolbooks.
With the increasing international use of the WWW, new and very bene-
ficial uses of ruby can also appear.
In texts stored electoronically and enriched with structural markup,
ruby can be very convenient for other applications than rendering. In
particular, they should be of immense value for searching, indexing,
and text-to-speach conversion.
The name "ruby" is the name of the 5.5 point type size in British
terminology; this was the size most used for ruby. In Japan, the term
"furigana" is also used.
Expires 28 February 1997 [Page 2]
Internet Draft Ruby in HTML 28 August 1996
1.2 Notational Conventions
In the examples in this document, ideographic characters are denoted
as space-separated strings of uppercase letters. Annotation charac-
ters are denoted by lowercase letters.
2. Syntax
2.1 The RUBY Attribute
A ruby annotation is a string of ruby characters associated with a
string of characters from the base text. This association is
expressed by introducing an attribute RUBY to the inline elements of
HTML. Examples of inline elements are <EM>, <STRONG>, <Q>, and
<SPAN>. <SPAN> is the generic phrase-level element. Other than car-
rying attributes, it does not have any particular semantics. As ruby
usually are not combined with other kinds of markup, <SPAN> will be
used most of the time to place ruby on base characters. This is an
examlpe:
<SPAN RUBY="kobayashi">KO HAYASHI</SPAN>
2.2 Usage Limitations
The length of a group of base characters or the number of ruby char-
acters per base character are not limited by this specification.
However, authors and tools are requested to keep these numbers rea-
sonably low. Otherwise, it will be very difficult even for a sophis-
ticated renderer to construct an nice display. Also, this specifica-
tion does not limit the types of base characters to which ruby can be
attached, or of the types of characters that can be used as ruby.
The length of a group of base characters, in the case of Japanese,
will have an average of about two, with four or five characters still
being common. For the number of ruby per base character, five is a
number for which examples are known, but here also the average will
be close to two. For both linguistic and typographic reasons, it is
not possible to limit ruby to associate to single base characters.
For Chinese texts annotated with Pinyin romanization, the average
number of ruby per base character is closer to four; for Chinese
texts with bopomofo annotations, the average number of ruby per base
character is again around two. For other combinations of base charac-
ters and ruby, these numbers can be different.
Expires 28 February 1997 [Page 3]
Internet Draft Ruby in HTML 28 August 1996
2.3 Changes to the DTD
This section describes the changes to the HTML DTD necessary to
include the RUBY attribute. The description is based on the DTD in
[HTML-I18N]. In this case, the only change necessary is to add the
following text to the "attrs" DTD "Macro":
RUBY CDATA #IMPLIED -- phonetic annotation for ideographs --
For other versions of HTML, other changes may be necessary.
2.4 Nested Attributes
If RUBY attributes are present on several levels of nested in-line
elements, then these attributes are to be considered as alternatives,
and not in a cumulative way. Thus for examlpe
<SPAN RUBY="kobayashi">
<SPAN RUBY="ko">KO</SPAN>
<SPAN RUBY="bayashi">HAYASHI</SPAN>
</SPAN>
could be interpreted as
<SPAN RUBY="kobayashi">
<SPAN>KO</SPAN>
<SPAN>HAYASHI</SPAN>
</SPAN>
to distribute the ruby evenly over the base characters, or as
<SPAN>
<SPAN RUBY="ko">KO</SPAN>
<SPAN RUBY="bayashi">HAYASHI</SPAN>
</SPAN>
to allow to split ruby correctly when breaking lines between KO and
HAYASHI.
NOTE -- the above is designed to allow extremely sophisti-
cated renderers to do high quality line breaking. The
author of this draft however does not know any display
algorithm or software that currently is able to perform
this function, and therefore does suggest to authors that
they do not use this feature.
Expires 28 February 1997 [Page 4]
Internet Draft Ruby in HTML 28 August 1996
3. Guidelines for Implementation
This document does not specify any particular implementation for the
rendering of ruby. The following are some possibilities, listed by
increasing typographic quality, with some comments.
- Display ruby in-line, after their base charcaters, in parentheses.
In this case, an option to switch off ruby display is almost
mandatory, because texts with many ruby will otherwise be diffi-
cult to read. For other implementations, an option to switch off
ruby display may also be a good idea, but it is not as necessary
as here.
- Place ruby above their base characters, with half the hight of the
base characters. Use fixed spacing. In case the ruby are longer
than their corresponding base characters, leave some space blank
after the base characters. Always keep a group of base characters
and their ruby on the same line.
- Same as last solution, but expand ruby proportionally in case they
are shorter than their associated base characters.
- In case the ruby are longer than their associated base characters,
test if previous or following characters of the base text have
associated ruby. If this is not the case (particularly if these
characters are not ideographic), let the ruby overlap the base
characters to avoid blank space.
- Use nested ruby attributes for highest-quality rendering including
line-breaks (very difficult to implement).
More strict implementation specifications with examples can be found
in [JIS95].
4. Design Considerations
Besides the solution proposed in this document, various alternatives
for ruby markup were discussed. They all turned out to be more com-
plex than having ruby as an attribute, without significant additional
benefits. For some more details about these proposals, please see
[DUR96].
Some solutions, defining one or more elements for base characters and
ruby, would have made ruby visible even by browsers not aware of the
new markup. However, to provide reasonable rendering in these cases,
complicated rules about the removal of parentheses would have had to
be introduced.
Expires 28 February 1997 [Page 5]
Internet Draft Ruby in HTML 28 August 1996
Using an attribute to indicate ruby also has the disadvantage that
only the whole string of ruby, but not individual characters in it,
can be given a special appearance. As it is highly unlikely that such
a feature was ever used anywhere, this is not really a problem.
Acknowledgements
I am grateful in particular to the following persons for their advice
and help: Junichiro Kida, Literary Critic, Japan; Yasuo Kida, Apple
Japan; Tatsuo L. Kobayashi, Just Systems, Japan; Francois Yergeau,
Alis Technology, Canada; Gavin Nicol, ETB, Tokyo; Martin Brian, The
SGML Centre, UK; the organizers of the 8th Unicode conference; the
participants of the I18N workshop at the 1996 WWW conference in
Paris.
Bibliography
[DUR96] M.J. Duerst, "Ruby in HTML", <http://www.ifi.unizh.ch/
groups/mml/people/mduerst/ruby/ruby.html>, May 1996.
[GOLD90] C. F. Goldfarb, "The SGML Handbook", Y. Rubinsky, Ed.,
Oxford University Press, 1990.
[HTML] T. Berners-Lee and D. Connolly, "Hypertext Markup Lan-
guage - 2.0" (RFC1866), MIT/W3C, November 1995.
[HTML-I18N] F. Yergeau, G. Nicol, G. Adams, and M. Duerst, "Inter-
nationalization of the Hypertext Markup Language",
Work in progress (draft-ietf-html-i18n-05.txt), August
1996.
[JIS95] Japanese Industrial Standards Committee, "Line compo-
sition Rules in Japanese", Japanese Industrial Stan-
dard JIS X 4051-1995 (in Japanese).
Expires 28 February 1997 [Page 6]
Internet Draft Ruby in HTML 28 August 1996
Author's Address
Martin J. Duerst
Multimedia-Laboratory
Department of Computer Science
University of Zurich
Winterthurerstrasse 190
CH-8057 Zurich
Switzerland
Tel: +41 1 257 43 16
Fax: +41 1 363 00 35
E-mail: mduerst@ifi.unizh.ch
NOTE -- Please write the author's name with u-Umlaut wherever
possible, e.g. in HTML as Dürst.
Expires 28 February 1997 [Page 7]