Telechat Review of draft-ietf-speechsc-mrcpv2-

Request Review of draft-ietf-speechsc-mrcpv2
Requested rev. no specific revision (document currently at 28)
Type Telechat Review
Team General Area Review Team (Gen-ART) (genart)
Deadline 2011-11-29
Requested 2011-11-30
Other Reviews Genart Last Call review of - by Miguel García (diff)
Secdir Early review of - by Catherine Meadows (diff)
Review State Completed
Reviewer Miguel García
Review review-ietf-speechsc-mrcpv2-genart-telechat-garcia-2012-01-12
Posted at
Draft last updated 2012-01-12
Review completed: 2012-01-12


I have been selected as the General Area Review Team (Gen-ART)
reviewer for this draft. For background on Gen-ART, please see the FAQ at

Please resolve these comments along with any other comments you may receive.

Document: draft-ietf-speechsc-mrcpv2-24.txt
Reviewer: Miguel Garcia <miguel.a.garcia at>
Review Date: 2011-05-03
IETF LC End Date: 2011-04-13

Summary: The document is on the right track, but has some issues that 

should be addressed before publication as a standards track RFC.

Major issues: none

Minor issues:

- Section 4.2 reads:

   To remain backwards compatible with conventional SDP usage,
   the format field of the m-line MUST have the arbitrarily-selected
   value of "1".

The comment is that in other protocols, for example, MSRP [RFC4975] it
has been selected to use an asterisk "*" in the format field. I wonder if
it is possible to unify criteria, in order to allow usage of conventional
SDP parsers.

- Although at the general level the draft says that it reuses the SDP
offer/answer model, I think it would be good to explicitly mention it,
just explicitly indicate that "we aren't doing anything new here". For
example, in Section 4.2, the text reads

   When the client wants to add a media processing resource to the
   session, it issues a SIP re-INVITE transaction.

and I think it would be better to say:

   When the client wants to add a media processing resource to the
   session, it issues a new SDP offer, according to the procedures of
   RFC 3264 [RFC3264], in a SIP re-INVITE request.

Also, on page 17, the text reads:

   When the client wants to de-allocate the resource from this session,
   it issues a SIP re-INVITE transaction with the server.  The SDP MUST
   offer the control m-line with port 0.

It could be completed with:

   When the client wants to de-allocate the resource from this session,
   it issues a new SDP offer, according to RFC 3264 [RFC3264], where the
   control m-line port MUST be set to 0. This SDP offer is sent in a SIP
   re-INVITE request.

- Section 4.2 describes the usage of the a=setup attribute, but it does
not clarify if setting up MRCP sessions MUST always use the a=setup
attribute or not. I think the answer is yes, at least for TCP or TLS
transports (those defined in this version of the draft). But there should
be a "MUST use the 'setup' attribute as per RFC 4145" somewhere. I
recommend to add the following sentence at the beginning of the last
paragraph in page 16:

   MRCVv2 clients and servers using TCP as a transport protocol MUST use
the procedures specified in RFC 4145 [RFC4145] for setting up the TCP
connection, with the considerations described hereby. Similarly,  MRCVv2
clients and servers using TCP/TLS as a transport protocol MUST use the
procedures specified in RFC 4572 [RFC4572] for setting up the TLS
connection, with the considerations described hereby. The 'setup
attribute, as described ... [continue with the current text]

- Section 5.1. The ABNF of "mrcp-version", at the top of page 26, comes
out of the blue: there is no "mrcp-version" in the ABNF on page 25, so,
it is difficult to find out how to put "mrcp-version" into context.

- Section 5.4, in the client failure table, the description of the 407
contains a normative text "MAY BE". I think these tables are an
informative summary. All the normative text should be written in detail
elsewhere in the document. You can safely turn them lowercase.

- On Section 8.5.1 the spec talks about using the multipart/mixed Media 

Type. It is not clear to me if implementation of multipart/mixed is a 

MUST or a MAY that is negotiated.

- On Section 8.6, first paragraph, the text reads:

  The SPEAK" method can carry voice and prosody....

I guess the "can" should be replaced by a normative "MAY".

- On Sections 9.4.9, 10.4.8, and 11.4.11, there is no indication of the 

possible values of the "media-type-value". I guess you want to say that 

these are intended to be MIME types (so far, the word "MIME type" is not 

written), and in that case, you should also say that possible values are 

any of the values included in the MIME media types registry maintained by 


- On Sections 9.4.11, 10.4.3, and 11.4.16, the ABNF merely reads:

    completion-cause       = "Completion-Cause" ":" 3DIGIT SP
                                              1*VCHAR CRLF

 I am assuming that when the ABNF refers to "3DIGIT", these are supposed 

to be any of the values included in the "Cause-Code" column of the table 

in the same section. Similarly, I am assuming that the "1*VCHAR" should 

include any of the values of the "Cause-Name" column in the same table. I 

think this should be spelled out. I suggest to change the ABNF for this one:

    completion-cause       = "Completion-Cause" ":" cause-code SP
                                             cause-name CRLF
    cause-code                 = 3DIGIT
    cause-name                = *VCHAR

And then do the missing mapping with a text similar to:

"The 'cause-code' contains a numerical value selected from the Cause-Code 

column of the following table. The 'cause-name' contains the 

corresponding token selected from the Cause-Name column.

Note that the resolution of this issue should be also applied to Sections 

10.4.3 and 11.4.16.

- Question: In Sections 9.4.21 and 10.4.6, should "1*UTFCHAR" be included 

in quotes? It is true that there is no other subfield in the same header, 

meaning that there is no intention to parse the text. But somehow I feel 

safer if you enclosed the text in quotes. Also, remember that this text 

is coming from other protocol beyond your control, so you never know, for 

example, if the other protocol is going to add CRLF or something weird 

that will crash the recipient of the header.

Note that the resolution of this issue should also be applied to Section 


- On Section 9.4.26, the ABNF reads:

   recognition-mode   = "Recognition-Mode" ":" 1*ALPHA CRLF

However, there is only to values for choice, "normal" and "hotword". I 

think the following ABNF represents better that there is only two 

possible values:

  recognition-mode   = "Recognition-Mode" ":"
                                        normal-value / hotword-value
  normal-value            = "normal"
  hotword-value          = "hotword"

- Section 9.4.40. I wonder why the Phrase-NL is defined as UTFCHAR, while 

most of the other headers are defined as VCHAR

- Section 9.9, third paragraph, the text reads:

   "If the client needs to explicitly control grammar weights for the 

recognition operation, it must employ method 3 below. "

  I guess the "must" should be a "MUST".

- Section 9.9, the paragraph that goes after bullet point number "3", the 

text reads:

  "In addition to performing recognition on the input, the recognizer 

may also enroll the collected utterance in a personal grammar if the ..."

  Here the "may" should be a "MAY".

- Section 9.9, bullet point number 1 on page 114:

  "the recognizer must complete ...."

  I guess the "must" should be a "MUST". There is another instance with 

the same text on bullet point number 1 on page 115.

- Section 9.9, bullet point number 2 on page 115:

  "the RECOGNIZE completes with ..."

  I guess it should be normative: "the RECOGNIZE MUST complete with ..."

- Section 9.17, the text reads:

  "The END-PHRASE-ENROLLMENT method may be called ONLY during an active 

phrase enrollment session. "

  Two issues: First, the "may" should be a "MAY". Second, I guess the 

readability is increased if the sentence is turned into active. What 

about this:

  "The client MAY call the END-PHRASE-ENROLLMENT method ONLY during ..."

- Section 9.17, the text reads:

  "... the client can abort ..."

  I guess the "can" should be a "MAY".

- Section 9.17, second paragraph. The text should clearly indicate the 

name of header where the 'location/URI' should be included.

- In Section 10.4.7, I would like to clarify that if the record-uri is 

not sent in the header, then the actual audio should be sent as a MIME 

body. I would suggest to replace:

If this header field is not specified in the RECORD request, the server 

MUST capture the audio and send it in the "STOP" response or the RECORD- 

COMPLETE event as a message body.

with this text:

If this header field is not specified in the RECORD request, the server 

MUST capture the audio, MUST encode it as a MIME body, and MUST send it 

in the "STOP" response or the RECORD- COMPLETE event.

- In Section 10.4.7, I believe you should write normative text at the end 

of paragraphs 1 and in paragraph 3. Also, in paragraph 3, you need to add 

references to all the valid schemes and add the 'cid' scheme. And perhaps 

write all the scheme names in lowercase.

                                                 In this case, the response

 carrying the audio content MUST include a Content ID (cid) [RFC2392] 

value in this header

 pointing to the Content-ID in the message body.

And in the the third paragraph:

Implementations MUST only use 'http', 'https' [RFC2616], 'file' 

[RFC3986], and 'cid' [RFC2392] schemes in the URI. Other URI schemes MUST 

NOT be used. Note, however, that implementations already exist that 

support other schemes.

- Section 10.5. Perhaps a reference to Section 10.4.7 should be added, 

because the details are described in 10.4.7.

- Sections 4.2 and 10.6. The text says that if physical security is 

provided, one can void TLS and merely used TCP. I have the feeling that 

the connection of these two concepts (physical security and lack of TLS) 

is not sufficiently justified for the security experts. One could access 

the resources of the MRCP server remotely, via a TCP connection, not 

using TLS. Or even sniff the network. So, physical security is not enough 

for voiding TLS.

- Section 11.4.2. I am missing more normative strength text here. For 


  + a couple of occurrences of "is required" should be set to "MUST be 


  + "may" should be "MAY"
  + "can" should be "MAY"

- Section I am missing more normative strength text. In 

particular, a couple of occurrences of "must" in the first paragraph 

should be "MUST".

- Section 11.6, 7th paragraph. I am missing more normative strength text. 

When the text say:

  "... operations may be performed on the verifier resource". The "may" 

should be a "MAY".

- Section 12.1, bullet point 1. Although the text says that clients and 

servers MUST support digest authentication, I think they SHOULD use it 

(isn't that the intention). In particular, the server SHOULD authenticate 

the client using SIP digest authentication. Also, a reference to RFC 3261 

should be added here.

- Section 12.1, bullet point 2. I agree that clients and servers SHOULD 

employ 'sips' URIs, but I guess the 'sips' is just the format of the 

underlying TLS security. I guess you should add "including that clients 

and servers SHOULD setup up TLS [ref] connections".

- Section 12.3. The last word of this section, "recommended", should be 

normative: "RECOMMENDED".

- Section 13.1.6 describes a mechanism where vendor-specific extensions 

use the reverse DNS mechanism, for example., "". Then, if 

the vendor-specific extension is connected to DNS to avoid clashes in 

names, why is there a need for an expert review policy prior to its 

registration? I see a contradiction in having a self-managing registry by 

avoiding clashes due to the connection to DNS, and then having anything 

else than a volunteer registry.

- Section 13.7.2 registers new SDP "att-field" at the *session-level* (at 

least according to the title in Section 13.7.2. However, the text of both 

registers reads "media-level" in the type of attribute. So, are these two 

session-level or media-level SDP attributes? I think they are 

media-level, in which case, they both should be written in Section 

13.7.3, which is the SDP "att-field" registration at the media-level.

- I think it is not correct to have normative text (capitalized) in the 

examples, because examples are informative by nature. The text around the 

examples should describe what is happening. If there is a need for 

normative text, then it should be already written elsewhere, and if it 

isn't, it should be written in the normative sections, but not in the 

examples. So, I found the following instances of normative text in 

Section 14:

  + RECOMMENDED on page 186
  + A couple of MUST on page 187
  + MUST on page 189

- Sections 16.3 and 16.3. If these are schema definitions, then why don't 

they start with:

  <?xml version ="1.0" encoding="UTF-8"?>
  and the namespace definition, like any other XML schema document?


- Expand acronyms at first occurrence. This includes: SSML, NLSML, VXML, RTSP

- General to all the examples containing SDP. In Section 4.3. I think the 

"o=" lines are not correct, in particular, the IP address in the o= line 

is set to "" in both the offer and the answer. So, the first 

thing is that it should be a different value in the offer and the answer. 

The second thing is that it should typically be equal to the IP address 

that we see in the c= line, unless you have a very good reason for it. 

Since it is not the goal of the spec to deal with these rare use cases, I 

would recommend to set the IP address in the o= line to become equal to 

that in the c= line.

This  also applies to the example in Section 7 and the examples in 

Section 14.

- General to all the examples containing SDP. According to RFC 3264 you 

must include a "t=0 0" line in all the SDP offers and answers.

- General to all the examples containing SDP. There is an "s=" line that 

contains text. According to RFC 3264, it is RECOMMENDED that it contains 

a white space or a dash "-". If you want to add a subject, the SIP 

Subject header field serves the purpose.

- Section 5, add a formal reference to ISO 8859-1 when the text mentions it.

- It would be good to have numbers in those tables in Section 5.4. I
mean, there should be a caption saying "Table 1: Success 2xx response codes".

- In Sections 9.4.21 and 10.4.6, add a formal reference to RFC3629 when 

you name "UTF-8".

- Section, 1st bullet point:
  s/contains an float/contains a float

- Section, add a formal reference to ISO 8601 when the text 

mentions it.

- Section 9.9, bullet point numbered "3". Add a reference to the document 

that defines the  "3C grammar weights"

- Section 9.9, the first paragraph on page 114,
 s/The No-Input-Timer MUST BE started/The No-Input-Timer MUST be started

- Section 10, add a formal reference to RFC 2326 when mentioning RTSP.

- Around Section 10.4.7, it would be good to write a simple example 

showing a message body that includes the audio. This is to see an example 

of a 'cid' URI scheme.

- General: Names of URI schemes should be written in lowercase, for 

example: 'https' URI scheme. There is an instance in Section 10.6, page 

135. Other instances of "SIPS:" are in Section 12.1

- Starting in Section 10.6, but with many occurrences after that, there 

is a message from the Server to the Client which is written as 

"MRCP/2/0", where the second "/" should actually be a ".", such as: 


- Sections 9.20 and 9.21. The third message of the example starts with 

this line:


The request-id value is 543267. However, the request-id of the client is 

543266. I guess this request-id should also be 543266.

- Section 10.6, 10.7, 10.8. The request-id of the request is set to 

543257. However, the request-id of the following responses are all set 

to 456234. I guess they should all be set to the same value.

- Section 10.8, page 137, first paragraph:
  s/audio ./audio.

- Section 11, fourth paragraph:

- Section 11.1. The request-id of the fifth message is set to 314164, 

however, it should be set to 314162, because it is a response to the 

forth message.

- Section 11.5.1. There is a closing bracket ")" at the end of the first 

sentence, for which there isn't an opening bracket.

- Section 13.1.1. The registry of the set-cookie and set-cookie2 point to 

the wrong RFC. They should point to RFCs 2109 and 2965, respectively.

- Section 13.1.3. I would suggest to split this long registry and create 

five separate registries, one per resource type and the Generic. This 

simplifies if in the future someone wants to add a new, e.g., Synthesizer 

header. With the current registry, IANA will add the new at the end of 

the registry, after the last header of the Verifier, and most people will 

miss this new hypothetical Synthesizer header. If you create five 

different registries, then the problem will be solved.

- Sections 13.7.1 and 13.7.2. There are two tokens being registered in 

each Section. Can you add an empty line in between these two 

registrations (within the same Section) for the sake of readability?

- Section 14.1. The example at the top of page 182, the "m=" line reads:

  m=audio 49170 RTP/AVP 0 96

  However, there is no "a=rtpmap:96" line and an "a=fmtp:96" describing 

what the media type "96" is. I guess 96 represents a telephone event, in 

which case, you should add:

  a=rtpmap:96 telephone-event/8000
  a=fmtp:96 0-15

- Next to last paragraph on page 187, the text reads "... turn around and 

issued a ...".


- First sentence on page 189:
  s/a SIP BYE/a SIP BYE request

- Idnits reveals:
 ** Obsolete normative reference: RFC 3388 (Obsoleted by RFC 5888)
 ** Obsolete normative reference: RFC 2109 (Obsoleted by RFC 2965)
 ** Obsolete normative reference: RFC 4646 (Obsoleted by RFC 5646)
 ** Downref: Normative reference to an Experimental RFC: RFC 2483

 -- Obsolete informational reference (is this intentional?): RFC 2234 

(Obsoleted by RFC 4234)


      Miguel Garcia
Miguel A. Garcia
Ericsson Spain