Summary: Has a DISCUSS. Needs one more YES or NO OBJECTION position to pass.
I support Magnus' point about the time-ordering of adjacent frames in a packet. Additionally, I am not sure that there's quite enough here to be interoperably implementable. Specifically, we seem to be lacking a description of how an encoder or decoder knows which TSVCIS parameters, and in what order, to byte-pack or unpack, respectively. One might surmise that there is a canonical listing in [TSVCIS], but this document does not say that, and furthermore [TSVCIS] is only listed as an informative reference. (I couldn't get my hands on my copy, at least on short notice.) If we limited ourselves to treating the TSVCIS parameters as an entirely opaque blob (codec, convey these N octets to the peer with the appropriate one- or two-byte trailer for payload type identification and framing), that would be interoperably implementable, since the black-box bits are up to some other codec to interpret. In a similar vein, we mention but do not completely specify the potential for using CODB as an end-to-end framing bit, in Section 3.1 (see Comment), which is not interoperably implementable without further details.
Where is [TSVCIS] available? Is [NRLVDR] the same as https://apps.dtic.mil/dtic/tr/fulltext/u2/a588068.pdf ? A URL in the references would be helpful. Is additional TSVCIS data only present after 2400bps MELPe and the first thing to get dropped under bandwidth pressure? The abstract and introduction imply this by calling out MELPe 2400 bps speech parameters explicitly, but Section 3 says that TSVCIS augments standard 600, 1200, and 2400 bps MELP frames. It's helpful that Section 3.3 gives some general guidance for decoding this payload type ("[t]he way to determine the number of TSVCIS/MELPe frames is to identify each frame type and length"), but I think some generic considerations would be very helpful to the reader much earlier, along the lines of "MELPe and TSVCIS data payloads are decoded from the end, using the CODA and CODB (and, if necessary, CODC and others) bits to determine the type of payload. For MELPe payloads the type also indicates the payload length, whereas for TSVCIS data an additional length field is present, in one of two possible formats. A TSVCIS coder frame consists of a MELPe data payload followed by zero or one TSVCIS data payload; after the TSVCIS payload's presence/length is determined, then the preceding MELPe payload can be determined and decoded. Per Section 3.3, multiple TSVCIS frames can be present in a single RTP packet." This (or something like it) would also serve to clarify the role of the COD* bits, which is otherwise only implicitly introduced. Section 1.1 RFC 2736 is BCP 36 (but it's updated by RFC 8088 which is for some reason an Informational document and not part of BCP 36?!). Section 2 In addition to the augmented speech data, the TSVCIS specification identifies which speech coder and framing bits are to be encrypted, and how they are protected by forward error correction (FEC) techniques (using block codes). At the RTP transport layer, only the speech coder related bits need to be considered and are conveyed in unencrypted form. In most IP-based network deployments, standard Am I reading this correctly that this text is just summarizing what's in the TSVCIS spec in terms of what needs to be in unencrypted form, so the "only the speech coder related bits[...]" is not new information from this document? I'm not sure I agree with the conclusion, regardless -- won't the (MELPe) speech coder bits be enough to convey the semantic content of the audio stream, something that one might desire to keep confidential? link encryption methods (SRTP, VPNs, FIPS 140 link encryptors or Type 1 Ethernet encryptors) would be used to secure the RTP speech contents. Further, it is desirable to support the highest voice quality between endpoints which is only possible without the overhead of FEC. I think I'm missing a step in how this conclusion was reached. TSVCIS will be characterized. Depending on the bandwidth available (and FEC requirements), a varying number of TSVCIS specific speech coder parameters need to be transported. These are first byte-packed and then conveyed from encoder to decoder. Per the Discuss point, how do I know which parameters need to be transported, and in what order? Byte packing of TSVCIS speech data into packed parameters is processed as per the following example: Three-bit field: bits A, B, and C (A is MSB, C is LSB) Five-bit field: bits D, E, F, G, and H (D is MSB, H is LSB) MSB LSB 0 1 2 3 4 5 6 7 +------+------+------+------+------+------+------+------+ | H | G | F | E | D | C | B | A | +------+------+------+------+------+------+------+------+ This packing method places the three-bit field "first" in the lowest bits followed by the next five-bit field. Parameters may be split between octets with the most significant bits in the earlier octet. Any unfilled bits in the last octet MUST be filled with zero. I agree with Adam that this is very unclear. A is the MSB of the three-bit field but the LSB of the octet overall? We probably need an example of splitting a parameter across octets as well, to get the bit ordering right. Section 3.1 It should be noted that CODB for both the 2400 and 600 bps modes MAY deviate from the values in Table 1 when bit 55 is used as an end-to- end framing bit. Frame decoding would remain distinct as CODA being Where is the use of CODB as an end-to-end framing bit defined? If we're going to provide neither a complete description of how to do it nor a reference to a better description, we probably shouldn't mention it at all. Section 3.2 RTP packet. The packed parameters are counted in octets (TC). In the preferred placement, shown in Figure 6, a single trailing octet SHALL be appended to include a two-bit rate code, CODA and CODB, I'd consider saying something about this being the preferred format ("placement") due to its shorter length than the alternative, and say that it "SHOULD be used for TSVCIS payloads with TC less than or equal to 77 octetes". Section 3.3 When a longer packetization interval is used, is that indicated by signaling or RTP timestamps or otherwise? TSVCIS coder frames in a single RTP packet MAY be of different coder bitrates. With the exception for the variable length TSVCIS parameter frames, the coder rate bits in the trailing byte identify the contents and length as per Table 1. Maybe also note that the penultimate octet gives the length there? Information describing the number of frames contained in an RTP packet is not transmitted as part of the RTP payload. The way to determine the number of TSVCIS/MELPe frames is to identify each frame type and length thereby counting the total number of octets within the RTP packet. terminology nit: if a frame is the combination of MELPe and TSVCIS payload data units then there are two layres of decoding to get a length for the frame, since we have to get the TSVCIS length and then the MELPe length. Section 4.2 Parameter "ptime" cannot be used for the purpose of specifying the nit: missing article ("The parameter") will be impossible to distinguish which mode is about to be used (e.g., when ptime=68, it would be impossible to distinguish if the packet is carrying one frame of 67.5 ms or three frames of 22.5 ms). So how is the operating mode determined, then? (I think this is the same question I asked above) Section 4.4 For example, if offerer bitrates are "2400,600" and answer bitrates are "600,2400", the initial bitrate is 600. If other bitrates are provided by the answerer, any common bitrate between the offer and answer MAY be used at any time in the future. Activation of these other common bitrates is beyond the scope of this document. It seems important to specify whether this requires a new O/A exchange or can be done "spontaneously" by just encoding different frame types. (It seems like the latter is possible, on first glance, and this is implied by Section 3.3's discussion of mixing them in a single packet.) Section 5 Please expand PLC at first use (not second). Section 6 I don't understand the PLC usage. Is the idea that a receiver, on seeing an SSRC gap, constructs fictitious PLC frames to "fill the gap" and passes the resulting stream to the decoder? Section 8 and important considerations in [RFC7201]. Applications SHOULD use one or more appropriate strong security mechanisms. The rest of this section discusses the security-impacting properties of the payload format itself. I thought we described TSVCIS itself (much earlier in the document) as requiring encryption for some data; wouldn't that translate to a "MUST" here and not a "SHOULD"?
s/Department of Defense/US Department of Defense/
Section 2. Per “In most IP-based network deployments, standard link encryption methods (SRTP , VPNs, FIPS 140 link encryptors or Type 1 Ethernet encryptors) would be used to secure the RTP speech contents.”, the inclusion of STRP in this list of “link encryption” methods was surprising. The other methods typically provide a service agnostic tunnel but STRP is application specific (and doesn’t protect a link). Section 8, Per “Applications SHOULD use one or more appropriate strong security mechanisms”, what exactly is the “SHOULD” requiring?
Like Éric, this is far outside my area of expertise, so I'm balloting "NoObj" in the "I read the protocol action, and I trust the sponsoring AD so have no problem and / or this is outside my area of expertise." sense of the term.
Thanks for the work the authors and working group put into this document. I have a handful of comments of varying importance. --------------------------------------------------------------------------- §2: > At the RTP transport layer, only the > speech coder related bits need to be considered and are conveyed in Nit "...speech-coder-related bits..." > Depending on the bandwidth available > (and FEC requirements), a varying number of TSVCIS specific speech Nit: "...TSVCIS-specific..." --------------------------------------------------------------------------- §2: > Byte packing of TSVCIS speech data into packed parameters is > processed as per the following example: > > Three-bit field: bits A, B, and C (A is MSB, C is LSB) > Five-bit field: bits D, E, F, G, and H (D is MSB, H is LSB) > > MSB LSB > 0 1 2 3 4 5 6 7 > +------+------+------+------+------+------+------+------+ > | H | G | F | E | D | C | B | A | > +------+------+------+------+------+------+------+------+ > > This packing method places the three-bit field "first" in the lowest > bits followed by the next five-bit field. Parameters may be split > between octets with the most significant bits in the earlier octet. I've read over this example several times and I still can't make sense of how I might go about implementing the intended packing. I can kind of make out an implication that there are some TSVCIS parameters that I'm supposed to... bit reverse into a byte? I think? But then we get to the notion of "earlier" octets, with MSB (which one? TSVCIS or RTP?) bits appearing in these "earlier" octets, and I'm at a near complete loss. Once I get into concrete examples (e.g., Figure 2), parameter MSBs appear to be in what I think most people would term "later" bytes rather than "earlier" bytes. This explanation really needs clarification, as I suspect that readers will have several conflicting interpretations of what this is supposed to mean. --------------------------------------------------------------------------- §4.1: > tcmax: specifies the TSVCIS maximum value for TC supported or > desired ranging from 1 to 255. If "tcmax" is not present, a > default value of 35 is used. > > [EDITOR NOTE - the value of 35 is suggested based on a > preferred 8kbps TSVCIS coder bitrate.] It's unclear to me whether this EDITOR NOTE is intended to be left in the final document. It doesn't appear to be a note for the RFC Editor (as it's not actionable from an editing perspective), but neither does it look like the kind of thing that typically appears in a published RFC. Please either remove this text, or make sure the intended disposition of this text is clearly indicated.
Thank you for the work put into this document. It is far from my area of expertise, so, I have reviewed only parts of it and my comments below may be non relevant. Regards, -éric == COMMENTS == -- Section 2.3 -- C.1) "The assignment of an RTP payload type for this new packet format is outside the scope of this document and will not be specified here", possibly outside of my expertise area, but, why not requesting a payload type in this document? And I see no other documents related to TSVCIS in the AVTCORE WG that could do it. It also appears to contradict sections 4 and 7. -- Section 3.1 -- C.2) Probably worth explaining why CODC is sometime "not available" ? Is this not always present as hinted in the text below? If so, then what about using "not present" ? == NITS == -- Section 3 -- N.1) in "the timestamp is, as always, that", could perhaps replace "as always" by "as specified in XYZ"
Thanks for addressing the discusses and comments. I leave this comment, as something that could have been more explicit about decoding, however one skilled in the art will figure it out. A. Section 3.3: TSVCIS coder frames in a single RTP packet MAY be of different coder bitrates. With the exception for the variable length TSVCIS parameter frames, the coder rate bits in the trailing byte identify the contents and length as per Table 1. If I understand this correctly in an RTP payload that contain mulyiplr bit-rate frames the safest way of decoding this payload is to work from the end of the payload towards the start identifying a frame at a time. Then after having figured out how many frames actually are present, one can calculate the timestamp value for each frame.