Network Working Group                         D. Crocker (editor)
Internet-Draft:  DRAFT-DRUMS-ABNF-01.{txt,ps}Internet Mail Consortium
Expiration <4/97>



          Augmented BNF for Syntax Specifications: ABNF



STATUS OF THIS MEMO

This document is an Internet-Draft.  Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its
areas, and its working groups.  Note that other groups may also
distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time.  It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as ``work
in progress.''

To learn the current status of any Internet-Draft, please check
the ``1id-abstracts.txt'' listing contained in the Internet-
Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net
(Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East
Coast), or ftp.isi.edu (US West Coast).


TABLE OF CONTENTS

1. INTRODUCTION
2. RULE DEFINITION
     2.1. Rule Naming
     2.2. Rule Form
     2.3. End-of-Rule
     2.4. Terminal Values
     2.5. External Encodings
3. OPERATORS
     3.1. Rule1 Rule2:
     Concatenation
     3.2. Rule1 / Rule2:
     Alternatives
     3.3. Incremental
     Alternatives
     3.4. (Rule1 Rule2):
     Sequence Group
     3.5. {Rule 1 Rule2}:  Set
     Group
     3.6. *Rule:  Repetition
     3.7. nRule:  Specific
     Repetition
     3.8. [RULE]:  Optional
     3.9. #Rule:  Lists
     3.10. Value Ranges
     3.11. Operator Precedence
4. ; COMMENTS
5. ABNF DEFINITION OF ABNF...
6. APPENDIX A - CORE
7. ACKNOWLEDGEMENTS
8. CONTACT



1.   INTRODUCTION

Internet technical specifications often need to define a format
syntax and are free to employ whatever notation their authors
deem useful.  Over the years, a modified version of Backus-Naur
Form (BNF), called Augmented BNF, has been popular among many
Internet specifications.  It balances compactness with reasonable
representational power.  In the early days of the Arpanet, each
specification contained its own definition of ABNF.  This
included the email specifications, RFC733 and then RFC822 which
have come to be the common citations for defining ABNF.  The
current document separates out that definition, to permit
selective reference.  Predictably, it also provides some
enhancements.

The differences between standard BNF and the ABNF defined here
involve naming rules, repetition, alternatives, and order-
independence, and rules that add alternatives to existing rules,
lists, and value ranges.  Appendix A (Core) supplies rule
definitions for a core lexical analyzer, of the type common to
several Internet specifications.  It is provided as a convenience
and is otherwise separate from the meta language defined in the
body of this document.



2.   RULE DEFINITION

2.1. Rule Naming

The name of  a rule is simply the name itself; that is, a
sequence of characters, not beginning with  a digit, with an
asterisk ("*"), or with a number (pound) sign ("#").  (This
avoids ambiguity with the various repetition mechanisms, defined
below.)

Unlike original BNF, angle brackets ("<", ">") are not  required.
However, angle brackets may used around a rule reference whenever
their presence will facilitate discerning the use of  a rule
name.  This is typically restricted to rule name references in
free-form prose, or to distinguish partial rules that combine
into a string not separates by linear white space, such as shown
in the discussion about repetition, below.

2.2. Rule Form

A rule is define by the following sequence:

     name =  elements

where <name> is the name of the rule and <elements> is one or
more rules or terminal specifications.  The equal sign separates
the name from the definition of the rule.  The elements are a
sequence of one or more rule names and/or value definitions,
combined according to the various operators, defined in this
document, such as alternative and repetition.

NOTE: The battle between human factors and purity within the
mathematics community  includes lingering core of authors who
prefers additional, awkward typing effort for rules.  For that
community, the use of "::=" is permited after the rule name, in
place of the simpler "=".

2.3. End-of-Rule

Formally the grammar requires a one-token look-ahead to find the
"=" token, which indicates that the previous token is the name of
a new rule.  Informally, rules start in column 1, with rule
continuation indicated by blank (linear white space) in column 1.
In some documentation, "column 1" might be virtual, with a
consistent indentation from the left margin, for all rules.

2.4. Terminal Values

Rules resolve into terminal values.  Values within ABNF are
represented as decimal numbers.  Hence, an ABNF parser processes
a sequence of characters.  Each character is represented as a
decimal number.  Terminals are specified by pure decimal numbers.
Hence:

     CR = 12

specifies the ASCII value for carriage return.

For a sequence of values which can be represented as simple,
graphical characters, they may be specified as a string of
literals, enclosed in quotation-marks.  The character set for
these strings is ASCII.  Hence:

     rulename = "abc"

is equivalent to

     rulename = 97 98 99

2.5. External Encodings

External representations of these characters will vary according
to constraints in the storage or transmission environment.
Hence, the same ABNF-based grammar may have multiple external
encodings, such as one for a 7-bit ASCII environment, another for
a binary octet environment and still a different one when 16-bit
Unicode is used.  Encoding details are beyond the scope of ABNF,
although Appendix A (Core) provides definitions for a 7-bit ASCII
environment as has been common to much of the Internet.

By separating external encoding from the syntax, it is intended
that alternate encoding environments can be used for the same
syntax.



3.   OPERATORS

3.1. Rule1 Rule2:  Concatenation

A rule can define a simple, ordered string of values -- i.e., a
concatenation of contiguous characters -- by listing a sequence
of rule names.  For example:

     foo =  "a"

     bar =  "b"

     mumble =  foo bar foo

So that the rule <mumble> defines the string "aba".

LINEAR WHITE SPACE:  Concatenation is at the core of the ABNF
parsing model.  A string of contiguous characters (values) is
parsed according to the rules defined in ABNF.  For Internet
specifications, there is some history of permitting linear white
space (space and horizontal tab) to be freely-and
implicitly-interspered around major constructs, such as
delimiting special characters or atomic strings.

     This specification for ABNF does NOT provide such implicit
     specification.

Any grammar which wishes to permit linear white space around
delimiters or string segments must specify it explicitly.

3.2. Rule1 / Rule2:  Alternatives

Elements separated by slash ("/") are alternatives.   Therefore,

     foo / bar

will accept <foo> or <bar>.

3.3. Incremental Alternatives

It is sometimes convenient to specify a list of alternatives in
fragments.  That is, an initial rule may define one or more
alternatives, with later rule definitions adding to the set of
alternatives.  This is particularly useful for otherwise-
independent specifications which derive from the same parent rule
set, such as often occurs with parameter lists.  ABNF permits
this incremental definition through the construct:

     oldrule =/ <additional alternative(s)>

So that the rule set

     ruleset = alt1 / alt2

     ruleset =/ alt3 / alt4

     ruleset =/ alt5

is the same as specifying

     ruleset = alt1 / alt2 / alt3 / alt4 / alt5

3.4. (Rule1 Rule2):  Sequence Group

Elements enclosed in parentheses are  treated  as  a  single
element, whose contents are strictly ordered.   Thus,

     (elem foo) / (bar blat) elem

allows the token sequences (elem foo elem) and (bar blat elem).
Without the grouping, the rule:

     elem foo / bar elem

would match (elem foo elem) or (elem bar elem).  The local
grouping notation I also used within free text to set off an
element sequence from the prose.

3.5. {Rule 1 Rule2}:  Set Group

Elements enclosed in squibbly brackets are treated as a single,
unordered element.  Its contents may occur in any order.  Hence:

     {elem foo} bar

would match (elem foo bar) and (foo elem bar).

NOTE: Specifying alternatives is quite different from specifying
set grouping.  Alternatives indicate the matching of exactly one
(sub-)rule out of the total grouping.  The set mechanism
indicates the matching of a string which contains all of the
elements within the group; however the elements may occur in any
order.

3.6. *Rule:  Repetition

The operator "*" preceding an element indicates repetition. The
full form is:

     <l>*<m>element

where <l> and <m> are optional decimal values, indicating at
least <l> and at most <m> occurrences  of  element.

Default values are 0 and infinity so that <*element> allows any
number, including zero; <1*element> requires at  least  one;
<3*3element> allows exactly 3 and <1*2element> allows one or two.

3.7. nRule:  Specific Repetition

A rule of the form:

     <n>element

is equivalent to

     <n>*<n>element

That is, exactly  <N>  occurrences  of <element>. Thus 2DIGIT is
a 2-digit number, and 3ALPHA is a string of three alphabetic
characters.

3.8. [RULE]:  Optional

Square brackets enclose optional elements:

     [foo bar]

is equivalent to

     *1(foo bar).

3.9. #Rule:  Lists

A construct "#" is defined as being similar to "*", for a list
sequence:

     <l>#<m>element

indicates at least <l> and at most <m> elements, each  separated
by one or more commas (","). This makes the usual form of lists
very easy; a rule such as:

     element *("," element)

can therefore be shown as

     1#element

Wherever this construct is used, null elements are allowed, but
do not  contribute to the count  of  elements present.  That  is,

     element,,element

is  permitted,  but counts as only two elements.  Therefore,
where at least one  element  is required, at least one non-null
element must be present.

Default values are 0 and infinity so that <#element> allows any
number,  including  zero; <1#element> requires at least one; and
<1#2element> allows one or two.

3.10.     Value Ranges

Values separated by double periods ("..") specify a range of
values.   Values may be specified in decimal or with rule
references.  The form:

     12..15

represents the range of data values from 12 to 15, inclusively.
When the values are specified using rules rather than explicit
decimal numbers, the rules must reduce to single, decimal values.
Hence:

     CR = 12

     LF = 15

     smallrange = LF..CR

is valid and indicate the value range 12 to 15.

3.11.     Operator Precedence

The various mechanisms described above have the following
precedence:

     Repetition, List

     Grouping, Optional

     Alternative



4.   ; COMMENTS

A semi-colon starts  a comment that continues to the end of line.
This is a simple way of including useful notes in  parallel  with
the specifications.



5.   ABNF DEFINITION OF ABNF...
                            ; modified version from one
                               submitted by Paul Overell.  The
                               errors are of course, mine. /d


     rule           =  name ("=" / "=/")  elements [comment]
                            ; gotta start somewhere
                            ; continues if next line starts
                               with white space
                            ; basic rules definition and
                               incremental concatenation


     name           =  ALPHA *(ALPHA / DIGIT / "-")
                            ; need to beef this up for richer
                               set of characters

     comment        =  ";" *CHAR CRLF

     elements       =  1*element *("/" 1*element)
                            ; concatenation and alternatives

     element        =  el-component / grouping / repeating /
                    range

     el-component   =  element / name

     grouping       =  sequence / set / option

     sequence       =  "(" name ")"

     set            ="{" name "}"

     option         =  "[" name "]

     repeating      =  ( [number] ("*" / "#") [number] rule )
                            ; repetition and list
                       / exact-repetition

     exact-repetition =  number name

     range          =  (name / dval) ".." (name / dval)
                            ; Defines a sequence of values.

     dval           =  1*(0..9)



6.   APPENDIX A - CORE

This Appendix is provided as a convenient core for specific
grammars.  The definitions may be used as a core set of rules.

Certain  basic  rules  are  in uppercase, such as SPACE, TAB,
CRLF, DIGIT, ALPHA, etc.



     ALPHA          = "a".."z" / "A".."Z"

     CHAR           =  0..127

     CR             =  13

     CRLF           =  CR

     DIGIT          =  "0".."9"

     LF             =  10

     QCHAR          =  <ascii character excepting " and \>
                    / ( "\\" CHAR )

     SPACE          =  31

     HTAB           =  8

     ...

                            ; Well, this could probably go on
                               for awhile.  How much do we want
                               to stuff in here?

Externally, data are represented as "network virtual ASCII",
namely 7-bit ASCII in an 8th bit field, with the high (8th) bit
set to zero.



7.   ACKNOWLEDGEMENTS

The syntax for ABNF was originally specified in RFC #733.  Ken L.
Harrenstien, of SRI International, was responsible for re-coding
the BNF into an augmented BNF that makes the representation
smaller and easier to understand.  The current round of
specification was part of the DRUMS working group, with
significant contributions from Paul Overell, Bill McQuillan,
Keith Moore and Chris Newman.



8.   CONTACT

David H. Crocker

Internet Mail Consortium
675 Spruce Dr.
Sunnyvale, CA 94086 USA

<dcrocker@imc.org>

Phone:    +1 408 246 8253
Fax:      +1 408 249 6205