Ballot for draft-ietf-payload-rtp-ttml
Yes
No Objection
Note: This ballot was opened for revision 03 and is now closed.
Thank you for writing this -- I found it interesting and useful.
Thank you for the work done in this document. The unusual wording of 'RTP carriage' in section 4.2.1 is interesting. -éric
Thanks for addressing my DISCUSS and COMMENT points. I have preserved them below for posterity. --------------------------------------------------------------------------- Thanks for the work everyone put into this document. I think it's not quite ready to publish, due to one ambiguity, one critical missing feature, and the lack of guidance around fragmentation. I also have two comments that I consider very important, although they don't quite rise to the level of blocking publication. As always, it's possible that my DISCUSS points are off-base, and I'd be happy to be corrected if I've misunderstood anything here. --------------------------------------------------------------------------- §4.1: > When the document spans more > than one RTP packet, the entire document is obtained by > concatenating User Data Words from each contributing packet in > ascending order of Sequence Number. This is underspecified, in that it doesn't make it clear whether it would be valid to split a single UTF-8 or UTF-16 character between RTP packets, and it is nearly certain that different implementations will make different assumptions on this point, leading to interop failures. For example, the UTF-8 encoding of '¢' is 0xC2 0xA2. Would it be valid to place the "0xC2" in one packet and the "0xA2" in a subsequent packet? Without specifying this, it is quite likely that some implementations will use, e.g., UTF-8 strings to accumulate the contents of RTP packets; and most such libraries will emit errors or exhibit unexpected behavior if units of less than a character are added at any time. (The same point holds for splitting a UTF-16 byte across packets). I don't think it much matters which choice you make (explicitly allowing or explicitly forbidding splitting characters between packets), but it does need to be explicit. I have a slight personal preference for requiring that characters cannot be split (both for ease of implementation on the receiving end and to more smoothly handle missing data due to extended packet loss), but leave it to the authors and working group to decide. --------------------------------------------------------------------------- Unlike other definitions to convey non-loss-resilient data on RTP streams, this document had no defined mechanism to deal with packet loss. This makes it unusable on the public Internet, where packet loss is an inevitable feature of the network. The existing text-in-RTP specifications define procedures to deal with such loss (see, e.g., RFC 4103 section 4 and RFC 4396 section 5). --------------------------------------------------------------------------- This format is rather unique in that it, alone among all other RTP text formats, is designed to send monolithic documents that may stretch into the multiple kilobyte range. While fragmentation is mentioned as a possibility, the document provides no implementation guidance about when to fragment documents, and what sizes each fragment should assume. RFC 4396 section 4.4 is an example of the kind of information I would expect to see in a document like this, with emphasis on the fact that TTML documents are going to frequently exceed the PTMU for a typical network connection. --------------------------------------------------------------------------- §1: > TTML (Timed Text Markup Language)[TTML2] is a media type for > describing timed text such as closed captions (also known as > subtitles) in television workflows or broadcasts as XML. Although superficially similar, there are important distinctions between subtitles (intended to help a hearing audience exclusively with spoken dialog, typically because the audio is in a different language or otherwise difficult to understand) and closed captions (intended to aid deaf or hard-of-hearing viewers by providing a direct, word-for-word transcription of dialog as well as descriptions of all other audio present). Calling one "also known as" the other is incorrect. I suggest rephrasing as: TTML (Timed Text Markup Language)[TTML2] is a media type for describing timed text such as closed captions and subtitles in television workflows or broadcasts as XML. --------------------------------------------------------------------------- §4.2.1.1: > The TTML document instance MUST use the "media" value of the > "ttp:timeBase" parameter attribute on the root element. This statement makes an assumption that the "http://www.w3.org/ns/ttml#parameter" namespace MUST be mapped to the "ttp" prefix, which is both bad form and probably not what is intended. I suggest rephrasing as: The TTML document instance MUST include a "timeBase" element from the "http://www.w3.org/ns/ttml#parameter" namespace containing the value "media".
I agree with Adam’s DISCUSS.
I would recommend starting some new top-level sections within what is currently Section 4.2, rather than going down to six levels of subsections (4.2.1.2.1.2), which can get confusing when other people are citing parts of this document. Please respond to the Gen-ART review.
Thanks for this clear and well-written document! Section 2 The term "word" refers to byte aligned or 32-bit aligned words of data in a computing sense and not to refer to linguistic words that might appear in the transported text. Either of byte-aligned and 4-byte-aligned, as opposed to aligned to one of those and in multiples of the other in length? Section 4 I find myself feeling like I would benefit from a brief discussion of the relationship between documents and the RTP stream before getting into the details of the payload format (e.g., "one document per subtitle", "many documents per stream but each document contains some minutes of data", or "totally up to the profile in use"). Even having finished the I-D I'm still wondering: it's clear that we only have a single TTLM stream in a given RTP stream, and a given RTP packet has (part of) a TTML document in the epoch of the timestamp of the RTP packet, and I can only have one document active at a time. On the flip side, different documents must belong to different epochs. So it seems that I could either make large documents stuck on a single timestamp, or small documents with (relatively) rapidly advancing timestamps, regardless of how I need to actually split the TTML content into packets in order to meet MTU requirements (and possibly packet pacing ones). Given that this is RTP and we're used to ignoring things with old timestamps, I mostly expect the latter to be more common, but would appreciate some guidance in the document [sic]. This seems to roughly be Adam's third Discuss point. Section 4.2.1.2 If the TTML document payload is assessed to be invalid then it MUST be discarded. When processing a valid document, the following requirements apply. Does this imply that I have to wait for the entire document to arrive before I start processing it? Each TTML document becomes active at the epoch E. E MUST be set to nit: I suggest s/the/its/, since there is not a global distinguished epoch. Most of the security considerations I can think of apply more to the TTML format itself rather than the RTP payload. I might include a short note that the text contents are meant to be interpreted by a human, and content from untrusted sources should be viewed with appropriate levels of skepticism.
Small comment on Sec 4.1. - Maybe: OLD "These bits are reserved for future use and MUST be set to 0x0." NEW "These bits are reserved for future use and MUST be set to 0x0 and ignored at receive."