Applications Area Working Group S. Leonard
Internet-Draft Penango, Inc.
Intended Status: Informational September 22, 2014
Expires: March 26, 2015
The text/markdown Media Type
draft-ietf-appsawg-text-markdown-02.txt
Abstract
This document registers the text/markdown media type for use with
Markdown, a family of plain text formatting syntaxes that optionally
can be converted to formal markup languages such as HTML.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
[[TODO: add table of contents.]]
Leonard Exp. March 26, 2015 [Page 1]
Internet-Draft The text/markdown Media Type September 2014
1. Introduction
1.1. On Formats
In computer systems, textual data is stored and processed using a
continuum of techniques. On the one end is plain text: a linear
sequence of characters in some character set (code), possibly
interrupted by line breaks, page breaks, or other control characters.
Plain text provides /some/ fixed facilities for formatting
instructions, namely codes in the character set that have meanings
other than "represent this character on the output medium"; however,
these facilities are not particularly extensible. Compare with
[RFC6838] Section 4.2.1. Applications may neuter the effects of these
special characters by prohibiting them or by ignoring their dictated
meanings, as is the case with how modern applications treat most
control characters in US-ASCII. On this end, any text reader or
editor that interprets the character set can be used to see or
manipulate the text. If some characters are corrupted, the corruption
is unlikely to affect the ability of a computer system to process the
text (even if the human meaning is changed).
On the other end is binary format: a sequence of instructions
intended for some computer application to interpret and act upon.
Binary formats are flexible in that they can store non-textual data
efficiently (perhaps storing no text at all, or only storing certain
kinds of text for very specialized purposes). Binary formats require
an application to be coded specifically to handle the format; no
partial interoperability is possible. Furthermore, if even one byte
or bit are corrupted in a binary format, it may prevent an
application from processing any of the data correctly.
Between these two extremes lies formatted text, i.e., text that
includes non-textual information coded in a particular way, that
affects the interpretation of the text by computer programs.
Formatted text is distinct from plain text and binary format in that
the non-textual information is encoded into textual characters, which
are assigned specialized meanings /not/ defined by the character set.
With a regular text editor and a standard keyboard (or other standard
input mechanism), a user can enter these textual characters to
express the non-textual meanings. For example, a character like "<"
no longer means "LESS-THAN SIGN"; it means the start of a tag or
element that affects the document in some way.
On the formal end of the spectrum is markup, a family of languages
for annotating a document in such a way that the annotations are
syntactically distinguishable from the text. Markup languages are
(reasonably) well-specified and tend to follow (mostly) standardized
syntax rules. Examples of markup languages include SGML, HTML, XML,
Leonard Exp. March 26, 2015 [Page 2]
Internet-Draft The text/markdown Media Type September 2014
and LaTeX. [[TODO: CITE.]] Standardized rules lead to
interoperability between markup processors, but a skill requirement
for new (human) users of the language that they learn these rules in
order to do useful work. This imposition makes markup less accessible
for non-technical users (i.e., users who are unwilling or unable to
invest in the requisite skill development).
informal /---------formatted text----------\ formal
<------v-------------v-------------v-----------------------v---->
plain text informal markup formal markup binary format
(Markdown) (HTML, XML, etc.)
Figure 1: Degrees of Formality in Data Storage Formats for Text
On the informal end of the spectrum are lightweight markup languages.
In comparison with formal markup like XML, lightweight markup uses
simple syntax, and is designed to be easy for humans to enter with
basic text editors. Markdown, the subject of this document, is an
/informal/ plain text formatting syntax that is intentionally
targeted at non-technical users (i.e., users upon whom little to no
skill development is imposed) using unspecialized tools (i.e., text
boxes). Jeff Atwood once described these informal markup languages as
"humane" [HUMANE].
1.2. Markdown Design Philosophy
Markdown specifically is a family of syntaxes that are based on the
original work of John Gruber with substantial contributions from
Aaron Swartz, released in 2004 [MARKDOWN]. Since its release a number
of web or web-facing applications have incorporated Markdown into
their text entry systems, frequently with custom extensions. Fed up
with the complexity and security pitfalls of formal markup languages
(e.g., HTML5) and proprietary binary formats (e.g., commercial word
processing software), yet unwilling to be confined to the
restrictions of plain text, many users have turned to Markdown for
document processing. Whole toolchains now exist to support Markdown
for online and offline projects.
Informality is a bedrock premise of Gruber's design. Gruber created
Markdown after disastrous experiences with strict XML and XHTML
processing of syndicated feeds. In Mark Pilgrim's "thought
experiment", several websites went down because one site included
invalid XHTML in a blog post, which was automatically copied via
trackbacks across other sites [DIN2MD]. These scenarios led Gruber to
believe that clients (e.g., web browsers) SHOULD try to make sense of
data that they receive, rather than rejecting data simply because it
fails to adhere to strict, unforgiving standards. (In [DIN2MD],
Gruber compared Postel's Law [RFC0793] with the XML standard, which
Leonard Exp. March 26, 2015 [Page 3]
Internet-Draft The text/markdown Media Type September 2014
says: "Once a fatal error is detected [...] the processor MUST NOT
continue normal processing" [XML1.0-3].) As a result, there is no
such thing as "invalid" Markdown; there is no standard demanding
adherence to the Markdown syntax; there is no governing body that
guides or impedes its development. If the Markdown syntax does not
result in the "right" output (defined as output that the author
wants, not output that adheres to some dictated system of rules),
Gruber's view is that the author either should keep on experimenting,
or should change the processor to address the author's particular
needs (see [MARKDOWN] Readme and [MD102b8] perldoc; see also
[CATPICS]).
1.3. Uses of Markdown
Since its introduction in 2004, Markdown has enjoyed remarkable
success. Markdown works for users for three key reasons. First, the
markup instructions (in text) look similar to the markup that they
represent; therefore the cognitive burden to learn the syntax is low.
Second, the primary arbiter of the syntax's success is *running
code*. The tool that converts the Markdown to a presentable format,
and not a series of formal pronouncements by a standards body, is the
basis for whether syntactic elements matter. Third, Markdown has
become something of an Internet meme [INETMEME], in that Markdown
gets received, reinterpreted, and reworked as additional communities
encounter it. There are communities that are using Markdown for
scholarly writing [CITE], for screenplays [CITE], for mathematical
formulae [CITE], and even for music annotation [CITE]. Clearly, a
screenwriter has no use for specialized Markdown syntax for
mathematicians; likewise, mathematicians do not need to identify
characters or props in common ways. The overall gist is that all of
these communities can take the common elements of Markdown (which are
rooted in the common elements of HTML circa 2004) and build on them
in ways that best fit their needs.
1.4. Uses of Labeling Markdown Content as text/markdown
To support identifying and conveying Markdown (as distinguished from
plain text), this document defines a media type and parameters that
indicate, in broad strokes, the author's intent on how to interpret
the Markdown. This registration draws particular inspiration from the
text/troff registration [RFC4263]; troff is an informal plain text
formatting syntax primarily intended for output to monospace line-
oriented printers and screen devices. In that sense, Markdown is a
kind of troff for modern computing.
The primary purpose of an Internet media type is to label "content"
on the Internet, as distinct from "files". Content is any computer-
readable format that can be represented as a primary sequence of
Leonard Exp. March 26, 2015 [Page 4]
Internet-Draft The text/markdown Media Type September 2014
octets, along with type-specific metadata (parameters) and type-
agnostic metadata (protocol dependent). From this description, it is
apparent that appending ".markdown" to the end of a filename is not a
sufficient means to identify Markdown. Filenames are properties of
files in file systems, but Markdown frequently exists in databases or
content management systems (CMSes) where the file metaphor does not
apply. One CMS [RAILFROG] uses media types to select appropriate
processing, so a media type is necessary for the safe and
interoperable use of Markdown.
Unlike complete HTML documents, [MDSYNTAX] provides no means to
include metadata into the content stream. Several derivative flavors
have invented metadata incorporation schemes (e.g., [MULTIMD]), but
these schemes only address specific use cases. In general, the
metadata must be supplied via supplementary means in an encapsulating
protocol, format, or convention. The relationship between the content
and the metadata is not directly addressed by this specification;
however, by identifying Markdown with a media type, Markdown content
can participate as a first-class citizen with a wide spectrum of
metadata schemes.
Finally, registering a media type through the IETF process is not
trivial. Markdown can no longer be considered a "vendor"-specific
innovation, but the registration requirements even in the vendor tree
have proven to be overly burdensome for most Markdown implementers.
Moreover, registering hundreds of Markdown variants with distinct
media types would impede interoperability: virtually all Markdown
content can be processed by virtually any Markdown processor, with
varying degrees of success. The goal of this specification is to
reduce all of these burdens by having one media type that
accommodates diversity and eases registration.
1.3. Requirements Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
Leonard Exp. March 26, 2015 [Page 5]
Internet-Draft The text/markdown Media Type September 2014
2. Example
The following is an example of Markdown as an e-mail attachment:
MIME-Version: 1.0
Content-Type: text/markdown; charset=UTF-8; flavor=Original;
processor="Markdown.pl-1.0.2b8 --html4tags"
Content-Disposition: attachment; filename=readme.md
Sample HTML 4 Markdown
=============
This is some sample Markdown. [Hooray!][foo]
(Remember that link names are not case-sensitive.)
Bulleted Lists
-------
Here are some bulleted lists...
* One Potato
* Two Potato
* Three Potato
- One Tomato
- Two Tomato
- Three Tomato
More Information
-----------
[.markdown, .md](http://daringfireball.net/projects/markdown/)
has more information.
[fOo]: http://example.com/some/foo/location
'This Title Will Not Work with Markdown.pl-1.0.1'
3. Markdown Media Type Registration Application
This section provides the media type registration application for the
text/markdown media type (see [RFC6838], Section 5.6).
Type name: text
Subtype name: markdown
Required parameters: charset. Per Section 4.2.1 of [RFC6838],
charset is REQUIRED. There is no default value. UTF-8 is
Leonard Exp. March 26, 2015 [Page 6]
Internet-Draft The text/markdown Media Type September 2014
RECOMMENDED; however, neither [MDSYNTAX] nor popular
implementations at the time of this registration actually require
or assume any particular encoding. In fact, many Markdown
processors can get along just fine by operating on character codes
that lie in the Portable Character Set (i.e., printable US-ASCII),
blissfully oblivious to coded values outside of that range.
Optional parameters:
The following parameters reflect the author's intent regarding the
content. A detailed specification can be found in Section 4.
flavor: The variant, or "flavor" of the Markdown content, with
optional rules (qualifiers). Default value: "Original".
processor: A specific Markdown implementation, with optional
arguments. Default value: none (receiver's choice).
output-type: The Content-Type (Internet media type) of the output,
with optional parameters. Default value: "text/html".
Encoding considerations: Text.
Security considerations:
Markdown interpreted as plain text is relatively harmless. A text
editor need only display the text. The editor SHOULD take care to
handle control characters appropriately, and to limit the effect of
the Markdown to the text editing area itself; malicious Unicode-
based Markdown could, for example, surreptitiously change the
directionality of the text. An editor for normal text would already
take these control characters into consideration, however.
Markdown interpreted as a precursor to other formats, such as HTML,
carry all of the security considerations as the target formats. For
example, HTML can contain instructions to execute scripts, redirect
the user to other webpages, download remote content, and upload
personally identifiable information. Markdown also can contain
islands of formal markup, such as HTML. These islands of formal
markup may be passed as-is, transformed, or ignored (perhaps
because the islands are conditional or incompatible) when the
Markdown is interpreted into the target format. Since Markdown may
have different interpretations depending on the tool and the
environment, a better approach is to analyze (and sanitize or
block) the output markup, rather than attempting to analyze the
Markdown.
Specific security considerations apply to the optional parameters;
Leonard Exp. March 26, 2015 [Page 7]
Internet-Draft The text/markdown Media Type September 2014
for details, consult Section 4.
Interoperability considerations:
Markdown flavors are designed to be broadly compatible with humans
("humane"), but not necessarily with each other. Therefore, syntax
in one Markdown flavor may be ignored or treated differently in
another flavor. The overall effect is a general degradation of the
output, proportional to the quantity of flavor-specific Markdown
used in the text. When it is desirable to reflect the author's
intent in the output, stick with the flavor identified in the
flavor parameter.
Published specification: This specification.
Applications that use this media type:
Markdown conversion tools, Markdown WYSIWYG editors, and plain text
editors and viewers; target markup processors indirectly use
Markdown (e.g., web browsers for Markdown converted to HTML).
Additional information:
Magic number(s): None
File extension(s): .md, .markdown
Macintosh file type code(s): TEXT
Person & email address to contact for further information:
Sean Leonard <dev+ietf@seantek.com>
Restrictions on usage: None.
Author/Change controller: Sean Leonard <dev+ietf@seantek.com>
Intended usage: COMMON
Provisional registration? Yes
4. Optional Parameters
The following optional parameters can be used by an author to
indicate the author's intent regarding how the Markdown ought to be
processed. For security and accuracy, IANA registries will be
created. However, authors who wish to use custom values by private
agreement may do so via an extension mechanism; all unregistered
identifiers MUST start with an exclamation mark "!".
Leonard Exp. March 26, 2015 [Page 8]
Internet-Draft The text/markdown Media Type September 2014
All identifiers are case-sensitive; receivers MUST compare for exact
equality. Identifiers MUST NOT be registered if another registration
differs only in the casing, as these registrations may cause
confusion.
The following ABNF definitions are used in this section:
EXTCHAR = <any character outside the US-ASCII range,
essentially amounting to any Unicode
code point beyond U+007F without requiring
Unicode or any particular encoding>
REXTCHAR = <EXTCHAR without spaces (Zs category) or
control characters>
Figure X: ABNF Used in This Section
The discussion in this section presumes that the parameter values are
discrete strings. When encoded in protocols such as MIME [RFC2045],
however, the value strings MUST be escaped properly.
4.1. flavor
The flavor parameter indicates the Markdown variant in which the
author composed the content. The overall intent of this parameter is
to provide a facility for Markdown tools, such as graphical editors,
to be able to broadly categorize the content and perform useful
services such as syntax highlighting without resorting to executing
the Markdown processor. Of course, actual recipients may use this
information for any useful purpose, including picking and configuring
an appropriate Markdown processor. The entire parameter is case-
sensitive.
An IANA registry of flavors will be created as discussed in Section
5. A flavor identifier is composed of two or more Unicode characters
excluding spaces (Zs category), control characters, the hyphen-minus
"-", quotation marks """, and the plus sign "+"; however, ASCII
characters alone SHOULD be used. Additionally, registered flavor
identifiers MUST NOT begin with "!", the exclamation mark. By
convention, flavor identifiers start with a capital letter (when
using Roman characters), but this is not a requirement. Unregistered
flavor identifiers MUST begin with "!" (plus two additional
characters).
When omitted, the default value is "Original". Its meaning is covered
in Section 5. Generators MUST NOT emit empty flavor parameters, but
parsers MUST treat empty flavor parameters the same as if omitted.
Leonard Exp. March 26, 2015 [Page 9]
Internet-Draft The text/markdown Media Type September 2014
The full ABNF of the flavor parameter is:
flavor-param = flavor *( *WSP rule ) *WSP
flavor = registered-fid / unregistered-fid
registered-fid = fid-char 1*("!" / fid-char)
unregistered-fid = "!" 2*fid-char
fid-char = %d35-%d42 / %d44 / %d46-%d126 / REXTCHAR
rule = "+" (should-rule / any-rule)
should-rule = should-rule-char [ *(should-rule-char / "_")
should-rule-char ]
any-rule = 1*rule-char
rule-char = %d35-%d42 / %d44-%d126 / REXTCHAR
Figure X: ABNF of the flavor parameter
4.1.1. flavor rules
[[TODO: consider. This section is mainly inspired from pandoc.]]
Most flavors are self-contained, with no options. However, some
flavors have optional rules that may be applied with discretion. For
those flavors where optional rules are an integral feature, the
author MAY indicate that those extra rules be applied in a plus sign-
delimited list.
Because Markdown has no inherent concept of validity, authors SHOULD
be aware that receivers are not required to honor these optional
rules--the special characters in the Markdown content may well be
interpreted as plain text, rather than Markdown markup. Generally
speaking, defining a new (simple) flavor is preferable to defining a
complex flavor with multiple optional rules.
A flavor rule identifier is composed of any sequence of Unicode
characters excluding spaces (Zs category), control characters,
quotation marks """, exclamation marks "!", and the plus sign "+";
however, lowercase ASCII letters and the underscore "_" alone SHOULD
be used, where the underscore SHOULD NOT be at the beginning or end.
The syntax for flavor rules derives in significant part from pandoc
[PANDOC].
[[TODO: There are no requirements about exclamation marks for
unregistered rules...flavor rules SHOULD be registered along with the
Leonard Exp. March 26, 2015 [Page 10]
Internet-Draft The text/markdown Media Type September 2014
flavor, but a receiver does not need to reject the flavor parameter
simply because it does not recognize a rule...it can just ignore the
rule.]]
4.2. processor
The processor parameter indicates the specific Markdown
implementation that the author intends be used. The purpose of this
parameter is to control the automatic processing of Markdown into
some output format, but of course actual recipients may use this
information for any useful purpose. The entire parameter is case-
sensitive.
An IANA registry of processors will be created as discussed in
Section 5. A processor identifier is composed of two or more Unicode
characters excluding spaces (Zs category), control characters, the
hyphen-minus "-", quotation marks """, the less-than sign "<", and
the greater-than sign ">"; however, ASCII characters alone SHOULD be
used. Additionally, registered processor identifiers MUST NOT begin
with "!", the exclamation mark. Unregistered processor identifiers
MUST begin with "!" (plus two additional characters).
When omitted, the default value is to use whatever processor the
receiver prefers. Generators MUST NOT emit empty processor
parameters, but parsers MUST treat empty processor parameters the
same as if omitted.
The full ABNF of the processor parameter is:
processor-param = processor [ "-" version ]
*( 1*WSP argument ) *WSP
processor = registered-pid / unregistered-pid
registered-pid = pid-char 1*("!" / pid-char)
unregistered-pid = "!" 2*pid-char
version = pid-char *("!" / pid-char)
argument = regular-argument / uri-argument
regular-argument = 1*(regular-char / quoted-chars)
pid-char = %d35-%d44 / %d46-%d59 / %d61 /
%d63-126 / REXTCHAR
regular-char = %d33 / %d35-%d59 / %d61 / %d63-126 / REXTCHAR
Leonard Exp. March 26, 2015 [Page 11]
Internet-Draft The text/markdown Media Type September 2014
quoted-chars = DQUOTE *pqcontent DQUOTE
pqcontent = %d1-%d33 / %d35-127 / EXTCHAR / DQUOTE DQUOTE
uri-argument = "<" URI-reference ">" ; from [RFC3986]
Figure X: processor parameter ABNF
4.2.1. processor version
For better precision, an author MAY include the processor version.
The version is delimited from the processor identifier with a hyphen-
minus "-"; the version string itself is an opaque string. Version
strings (e.g., "2.0", "3.0.5") are registered and updated along with
the processor registration. Updates to processor registrations SHOULD
only add new versions when those new versions have a material
difference on the interpretation of the Markdown content. If a
processor has a version "2014.10" and a version "2014.11", for
example, but "2014.11" only provides performance updates, then the
processor registration SHOULD NOT separately register the "2014.11"
version. The repertoire of the version string is the same as the
processor identifier (and like the processor identifier, ASCII
characters alone SHOULD be used).
A receiver that recognizes the processor but not the processor
version MAY use any version of the processor, preferably the latest
version.
4.2.2. processor arguments
Processor arguments MAY be supplied for finer-grained control over
how the processor behaves. Multiple arguments and URI references are
supported.
4.2.2.1. Quoted Arguments
According to the ABNF above, arguments are delimited by whitespace.
Quotation marks are used to support zero-length arguments, as well as
whitespace or quotation marks in a single argument. If a quotation
mark appears anywhere in the argument, the following text is
considered quoted; two successive quotation marks "" mean one
quotation mark. A single quotation mark ends the quoting. Because of
this rule, quotation marks do not have to appear at the termini of an
argument; embedded quotation marks start (and end) quoting within a
single argument. For example:
a""b
means:
ab
Leonard Exp. March 26, 2015 [Page 12]
Internet-Draft The text/markdown Media Type September 2014
for the actual argument.
4.2.2.2. URI Reference Arguments
Certain processors can take supplementary content, such as metadata,
from other resources. To support these workflows, an author MAY use
the URI delimiters <> to signal a URI, such as cid: or mid: URLs
[RFC2392] in the context of MIME messages. The URI MUST comply with
[RFC3986], and MAY be a relative reference if the subject Markdown
content has a base URI. The receiver is to interpret this as a
request to retrieve the resource, and to supply that resource in a
local reference form that the processor can use (e.g., via a
temporary file). The URI MUST be entire argument; the URI cannot be
combined with other text to constitute the argument (and the ABNF
above supports this restriction). The reason for this restriction is
security, so that a maliciously constructed argument string cannot
resolve to some other file reference (such as parent directories like
../ or special files such as /dev/hd0). If the processor accepts URI
strings directly, the string is to be supplied as a regular string
without <> delimiters. For security reasons, direct file references
MUST NOT be included in the processor arguments.
The prior paragraph notwithstanding, certain workflows may require
file references. In such cases, file: URLs [RFC1738] (including
relative references) are appropriate. The receiver SHOULD apply the
same security and privacy analyses to file: URLs as it would to any
other URI.
4.2.2.3. Appropriate Arguments and Security Considerations
Not all arguments are appropriate for inclusion in the processor
parameter. Appropriate arguments are basically limited to those that
affect the output markup, without side-effects. Arguments MUST NOT
identify input sources or output destinations. For example, if a
processor normally reads Markdown input using the arguments "-i
filename" or "< filename" (i.e., from standard input), those
arguments MUST be omitted. Arguments that have no bearing on the
output MUST be omitted as well, such as arguments that control
verbosity of the processor (-v) or that cause side-effects (such as
writing diagnostic messages to some other file). Of course, if
warnings or errors are signaled within the output, arguments enabling
that output MAY be used.
When in doubt, a receiver SHOULD omit arguments with unknown or
undocumented effects, and MAY ignore author-supplied arguments
entirely, but SHALL NOT reorder arguments. An author has very little
assurance that a receiver will honor unregistered arguments.
Consequently, the burden is squarely on processor registrants
Leonard Exp. March 26, 2015 [Page 13]
Internet-Draft The text/markdown Media Type September 2014
(Section 5.2) to document their arguments properly.
For security reasons, the parsed argument array (or a string
unambiguously representing the delimited argument array) MUST be
passed directly to the processor. Emitting the argument array as-is
in a batch script (for example) may cause risky side effects, such as
automatic substitutions, alias activation, or macro execution. The
arguments in this parameter MUST be encoded to preserve characters
outside of US-ASCII, and to signal the required encoding to the
receiver. When going between (system) processes, some implementations
may interpret character codes based on locale environment variables.
Therefore, it is not sufficient to pass arguments from this parameter
"as-is" to the processor: the routine MUST change the locale or
transform the arguments to an appropriate character encoding so that
there is no ambiguity. Furthermore, the NUL character (%d0, U+0000)
is not permitted because most common operating systems use that code
point as a delimiter.
4.2.3. Examples of processor parameters
[[TODO: provide examples.]]
4.3. output-type
The output-type parameter indicates the Internet media type (and
parameters) of the output from the processor.
When omitted, the default value is "text/html". Generators MUST NOT
emit empty output-type parameters, but parsers MUST treat empty
output-type parameters the same as if omitted.
The default value of text/html ought to be suitable for the majority
of current purposes. However, Markdown is increasingly becoming
integral to workflows where HTML is not the target output; examples
range from TeX [CITE], to PDF [CITE], to OPML [CITE], and even to
entire e-books [CITE].
Security provides a significant motivator for this parameter. Most
Markdown processors emit byte (octet) streams; without a well-defined
means for a Markdown processor to pass metadata onwards, it is
perilous for post-processing to assume that the content is always
HTML. A processor might emit PostScript (application/postscript)
content, for example, in which case an HTML sanitizer would fail to
excise dangerous instructions.
The value of output-type is an Internet media type with optional
parameters. The syntax (including case sensitivity considerations) is
the same as specified in [RFC2045] for the Content-Type header (with
Leonard Exp. March 26, 2015 [Page 14]
Internet-Draft The text/markdown Media Type September 2014
updates over time), namely:
type "/" subtype *(";" parameter)
; Matching of media type and subtype
; is ALWAYS case-insensitive.
Figure X: Content-Type ABNF (from [RFC2045])
The Internet media type in the output-type parameter MUST be
observed. Processors or processor arguments that conflict with the
output-type parameter MUST be re-chosen, ignored, or rejected.
Although arbitrary optional parameters may be passed along with the
Internet media type, receivers are under no obligation to honor or
interpret them in any particular way. For example, the parameter
value "text/plain; format=flowed; charset=ISO-2022-JP" obligates the
receiver to output text/plain (and to treat the output as plain text-
-no sneaking in or labeling the output as HTML!). In contrast, such a
parameter value neither obligates the receiver to follow [RFC3676]
(for flowed output) nor to output ISO-2022-JP Japanese character
encoding (see [RFC1468]).
Markdown implementations for all kinds of formats already exist,
including formats that are not registered Internet media types, or
that are inexpressible as Internet media types. For example, one
Markdown processor for the mass media industry outputs formatted
screenplays [CITE to fountain.io]: none of applicable media types
application/pdf, text/html, or text/plain adequately distinguish this
kind of output. Such distinctions SHOULD be made in the processor
parameter (and to a lesser extent, the flavor parameter),
underscoring that the primary concern of the output-type parameter is
making technical and security-related decisions.
The output-type parameter does not distinguish between fragment
content and whole-document content. A Markdown processor MAY (and
typically will) output HTML or XHTML fragment content, without
preambles or postambles such as <!DOCTYPE>, <html>, <head>, </head>,
<body>, </body>, or </html> elements. Receivers MUST be aware of this
behavior and take appropriate precautions.
[[TODO: consider.]]
The author may specify the output-type "text/markdown", which has a
special meaning. "text/markdown" means that the author does not want
to invoke Markdown processing at all: the receiver SHOULD view the
Markdown source as-is. In this case, the processor choice has little
practical effect because the Markdown is not actually processed, but
other tools can use the flavor parameter (and secondarily if so
inclined, the processor parameter) to perform useful services such as
Leonard Exp. March 26, 2015 [Page 15]
Internet-Draft The text/markdown Media Type September 2014
syntax highlighting. This output-type is not the default because one
generally assumes that Markdown is meant for composing rather than
reading: readers expect to see the output format (or dual-display of
the output and the Markdown). However, if authors are collaboratively
editing a document or are discussing Markdown, "text/markdown" may
make sense. While the optional parameter output-type may be used
recursively (as a sneaky way to stash the author's follow-on or
secondary intent), receivers are not obligated to recognize it;
optional parameters internal to output-type MAY be ignored.
5. IANA Considerations
IANA is asked to register the media type text/markdown in the
Standards tree using the application provided in Section 2 of this
document.
IANA is also asked to establish a subtype registry called "Markdown
Parameters". The registry has two sub-registries: a registry of
flavors and a registry of processors.
5.1. Registry of Flavors
Each entry in this registry shall consist of a flavor identifier and
information about the flavor, as follows:
5.1.1. Flavor Template
Identifier: [Identifier]
Description: [Concise, prose description of the syntax,
with emphasis on its purpose, the community
that it addresses, and notable variations
from [MDSYNTAX] or another flavor.]
Documentation: [References to documentation.]
Rules:
{for each rule}
Identifier: [Identifier]
Description: [Concise, prose description of the rule.]
Documentation: [References to documentation.]
Responsible Parties:
{for each party}
([type: individual, corporate, representative])
[Name] <contact info 1>...<contact info n>
Currently Maintained? [Yes/No]
Leonard Exp. March 26, 2015 [Page 16]
Internet-Draft The text/markdown Media Type September 2014
Tools:
{for each tool}
Name: [Name]
Version(s): [Significant version or versions that
implement the flavor]
Type: ["Processor" or some other type]
Reference(s): <contact info 1>...<contact info n>
Purpose: [Concise, prose description of the tool.]
A responsible party can be an individual author or maintainer, a
corporate author or maintainer (plus an individual contact), or a
representative of a community of interest dedicated to the Markdown
syntax.
Multiple tools MAY be listed, but only one is necessary for a
successful registration. If a tool is a Markdown processor, it MUST
be registered; however, any Markdown-related tool (for example,
graphical editors, emacs "major modes", web apps) is acceptable. The
purpose of the tool requirement is to ensure that the flavor is
actually used in practice.
5.1.2. Initial Registration
The registry shall have the following initial registration:
Identifier: Original
Description: Gruber's original Markdown syntax.
Documentation: [MDSYNTAX]
Rules: None.
Responsible Parties:
(individual) John Gruber <http://daringfireball.net/>
<comments@daringfireball.net>
Currently Maintained? No
Tools:
Name: Markdown.pl
Version(s): 1.0.1, 1.0.2b8
Type: Processor
Reference(s): <http://daringfireball.net/projects/markdown/>
Purpose: Converts Markdown to HTML or XHTML circa 2004.
5.1.3. Reserved Identifiers
Leonard Exp. March 26, 2015 [Page 17]
Internet-Draft The text/markdown Media Type September 2014
The flavors registry SHALL have the following identifiers RESERVED.
No one is allowed to register them (or any case variations of them).
Standard
Common
Markdown
5.1.4. Standard of Review
Registrations are made by a highly constrained Expert Review
[RFC5226] that amounts more-or-less to First-Come, First-Served with
sanity checking.
The designated expert SHALL review the flavor registration. The
identifier MUST comply with the syntax specified in this document.
Additionally, the identifier MUST NOT differ from other registered
identifiers merely by case. The description and documentation SHOULD
provide sufficient guidance to an implementer to implement a tool to
handle the flavor. The designated expert SHOULD warn the registrant
if the description and documentation are inadequate; however,
inadequacy (in the opinion of the designated expert) will not bar a
registration.
All references (including contact information) MUST be verified as
functional at the time of the registration.
If rules are included in the registration, the rule identifiers MUST
comply with the syntax specified in this document. The description
and documentation of each rule SHOULD provide sufficient guidance to
an implementer to implement a tool to handle the rule. The designated
expert SHOULD warn the registrant if the description and
documentation are inadequate; however, inadequacy (in the opinion of
the designated expert) will not bar a registration.
The designated expert MUST determine that all tools listed in the
registration are real implementations. If a tool is a Markdown
processor, the processor MUST be registered in the Registry of
Flavors in Section 5.2. The designated expert MAY request that the
registrant provide evidence that a tool actually works (for example,
that it passes certain test suites); however, the failure of a tool
to work according to the flavor registration will not bar a
registration. (For example, not even Gruber's own Markdown.pl
implementation complies with [MDSYNTAX]. C'est la vie!)
If a registration is being updated, the designated expert SHOULD
verify that the updating registrant matches the contact information
on the prior registration, and if not, that the updating registrant
has authority from the prior registrant to update it. All fields may
be updated except the Identifier, which is permanent: not even case
Leonard Exp. March 26, 2015 [Page 18]
Internet-Draft The text/markdown Media Type September 2014
may be changed.
5.2. Registry of Processors
Each entry in this registry SHALL consist of a processor identifier
and information about the processor, as follows:
5.2.1. Processor Template
Identifier: [Identifier]
Description: [Concise, prose description of the processor,
with emphasis on its purpose, the community
that it addresses, and notable variations
from [MDSYNTAX] or another flavor.]
Documentation: [References to documentation.]
Versions:
{for each version}
Identifier: [Identifier]
Description: [Optional, concise, prose description of the
version. "N/A" SHALL be used to indicate no description.]
Arguments:
{in general}
Argument Ordering: [Concise, prose description of how
arguments need to be ordered.]
{for each argument}
Argument Syntax: [Syntax here; multiple consecutive argument
positions are allowed, separated by a single space. Use
braces for variable information (add : for example input),
<URI> for URI references, and .. for sequences of arguments
with # as a placeholder for the number of arguments or
..-.. to indicate the first character of the subsequent
argument that ends the sequence, e.g.:
-c
--title {title: "The Rain in Spain"}
--metadata <URI>
--bullet-chars:{#} {char 1}..{char #}
--verbs {verb: walk, run, sleep}..-..
]
Description: [Concise, prose description of the argument.]
Documentation: [References to documentation.]
Output Type(s): [Internet media types, comma-separated
(with optional LWSP)]
Leonard Exp. March 26, 2015 [Page 19]
Internet-Draft The text/markdown Media Type September 2014
Security Considerations: [Sufficient description of risks and
other considerations; "N/A" or
"None" responses are insufficient.]
Responsible Parties:
{for each party}
([type: individual, corporate, representative])
[Name] <contact info 1>...<contact info n>
Currently Maintained? [Yes/No]
A responsible party can be an individual author or maintainer, a
corporate author or maintainer (plus an individual contact), or a
representative of a community of interest dedicated to the Markdown
processor.
5.2.2. Initial Registration
The registry shall have the following initial registration:
Identifier: Markdown.pl
Description: Gruber's original Markdown processor, written in
Perl. Requires Perl 5.6.0 or later. "Welcome to
the 21st Century." Works with Movable Type 2.6+,
Blosxom 2.0+, BBEdit 6.1+, and the command-line.
Documentation: [MARKDOWN]
Versions:
Identifier: 1.0.1
Description: The 2004-12-17 version.
Identifier: 1.0.2b8
Description: The 2007-05-09 version. Fixes many bugs
and adds several new features; see
VERSION HISTORY in Markdown.pl.
Leonard Exp. March 26, 2015 [Page 20]
Internet-Draft The text/markdown Media Type September 2014
Arguments:
Argument Syntax: --html4tags
Description:
"Use the --html4tags command-line switch to produce HTML
output from a Unix-style command line."
Without this argument, Markdown.pl outputs XHTML style
tags by default, e.g.: <br />. Even though XHTML style
is the default, the output SHOULD be analyzed as
text/html; the processor makes no attempt to make
its output well-formed application/html+xml
(not surprising--see the design philosophy).
Documentation: [MARKDOWN]
Output Type: text/html
Security Considerations: The security of this implementation
has not been fully analyzed.
Responsible Parties:
(individual) John Gruber <http://daringfireball.net/>
<comments@daringfireball.net>
Currently Maintained? No [[TODO: maybe?]]
5.2.3. Reserved Identifiers
The processors registry SHALL have the following identifiers
RESERVED. No one is allowed to register them (or any case variations
of them).
Standard
Markdown
md
5.2.4. Standard of Review
Registrations are First-Come, First-Served [RFC5226]. The checks
prescribed by this section can be performed automatically.
The identifier MUST comply with the syntax specified in this
document. Additionally, the identifier MUST NOT differ from other
registered identifiers merely by case. The description and
documentation SHOULD provide sufficient guidance to an implementer to
know how to invoke the processor and handle the output.
All references (including contact information) MUST be verified as
functional at the time of the registration.
If arguments are included in the registration, the Argument Syntax
Leonard Exp. March 26, 2015 [Page 21]
Internet-Draft The text/markdown Media Type September 2014
MUST comply with the template instructions in Section 5.2.1. Each
description and documentation field SHOULD provide sufficient
guidance to an implementer to know how to invoke the processor and
handle the output.
The Security Considerations field is not optional; it MUST be
provided.
If a registration is being updated, the contact information MUST
either match the prior registration and be verified, or the prior
registrant MUST confirm that the updating registrant has authority to
update the registration. All fields may be updated except the
Identifier, which is permanent: not even case may be changed.
6. Security Considerations
See the answer to the Security Considerations template questions in
Section 2.
Security considerations for the optional parameters are integrated
throughout Section 4.
7. References
7.1. Normative References
[MARKDOWN] Gruber, J., "Daring Fireball: Markdown", December 2004,
<http://daringfireball.net/projects/markdown/>.
[MDSYNTAX] Gruber, J., "Daring Fireball: Markdown Syntax
Documentation", December 2004,
<http://daringfireball.net/projects/markdown/syntax>.
[RFC1738] Berners-Lee, T., Masinter, L., and M. McCahill, "Uniform
Resource Locators (URL)", RFC 1738, December 1994.
[RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part One: Format of Internet Message
Bodies", RFC 2045, November 1996.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", STD 66, RFC
3986, January 2005.
[RFC5226] Narten, T., and H. Alvestrand, "Guidelines for Writing an
Leonard Exp. March 26, 2015 [Page 22]
Internet-Draft The text/markdown Media Type September 2014
IANA Considerations Section in RFCs", RFC 5226, May 2008.
[RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322,
October 2008.
[RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type
Specifications and Registration Procedures", BCP 13, RFC
6838, January 2013.
7.2. Informative References
[HUMANE] Atwood, J., "Is HTML a Humane Markup Language?", May 2008,
<http://blog.codinghorror.com/is-html-a-humane-markup-
language/>.
[DIN2MD] Gruber, J., "Dive Into Markdown", March 2004,
<http://daringfireball.net/2004/03/dive_into_markdown>.
[MD102b8] Gruber, J., "[ANN] Markdown.pl 1.0.2b8", May 2007,
<http://six.pairlist.net/pipermail/markdown-discuss/2007-
May/000615.html>, <http://daringfireball.net/projects/
downloads/Markdown_1.0.2b8.tbz>.
[CATPICS] Gruber, J. and M. Arment, "The Talk Show: Ep. 88: 'Cat
Pictures' (Side 1)", July 2014,
<http://daringfireball.net/thetalkshow/2014/07/19/ep-088>.
[INETMEME] Solon, O., "Richard Dawkins on the internet's hijacking of
the word 'meme'", June 2013,
<http://www.wired.co.uk/news/archive/2013-06/20/richard-
dawkins-memes>, <http://www.webcitation.org/6HzDGE9Go>.
[MULTIMD] Penney, F., "MultiMarkdown", April 2014,
<http://fletcherpenney.net/multimarkdown/>.
[PANDOC] MacFarlane, J., "Pandoc", 2014,
<http://johnmacfarlane.net/pandoc/>.
[RAILFROG] Railfrog Team, "Railfrog", April 2009,
<http://railfrog.com/>.
[RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC
793, September 1981.
[RFC2392] Levinson, E., "Content-ID and Message-ID Uniform Resource
Locators", RFC 2392, August 1998.
[RFC4263] Lilly, B., "Media Subtype Registration for Media Type
Leonard Exp. March 26, 2015 [Page 23]
Internet-Draft The text/markdown Media Type September 2014
text/troff", RFC 4263, January 2006.
[XML1.0-3] Bray, T., Paoli, J., Sperberg-McQueen, M., Maler, E., and
F. Yergeau, "Extensible Markup Language (XML) 1.0 (Third
Edition)", World Wide Web Consortium Recommendation REC-
xml-20040204, February 2004,
<http://www.w3.org/TR/2004/REC-xml-20040204#dt-fatal>.
[TODO] [[Add remaining references.]]
Appendix A. Change Log
This draft is a continuation from draft-ietf-appsawg-text-markdown-
01.txt. These technical changes were made:
1. The entire document was reorganized: optional parameters now
have their own section, and the Introduction section is
divided into four subsections.
2. The Introduction section provides substantial background
information, along with goals and use cases for both Markdown
and the Internet media type registration.
3. The rules parameter was reverted back to flavor, and flavor
was beefed up.
4. The processor parameters were consolidated and simplified.
5. Dependencies on POSIX were removed.
6. The output-type parameter was added.
7. Unregistered identifiers can be used with their own ! syntax.
8. The IANA Considerations section was fleshed out in great
detail, with emphasis on easing the registration process.
9. Security considerations were weaved throughout the
specification. Overall, most of the complexity in this
specification comes directly from the security considerations.
Those considerations are necessary since a lot of bad things
can and will happen when HTML, URIs, and executable code get
together.
10. Changed the example in Section 2 to use initially registered
identifiers.
11. Added output-type="text/markdown" for recursive handling
(i.e., don't process this Markdown, just show it like it is).
Leonard Exp. March 26, 2015 [Page 24]
Internet-Draft The text/markdown Media Type September 2014
Author's Address
Sean Leonard
Penango, Inc.
5900 Wilshire Boulevard
21st Floor
Los Angeles, CA 90036
USA
EMail: dev+ietf@seantek.com
URI: http://www.penango.com/
Leonard Exp. March 26, 2015 [Page 25]