Skip to main content

Telechat Review of draft-ietf-speechsc-mrcpv2-
review-ietf-speechsc-mrcpv2-genart-telechat-garcia-2012-01-12-00

Request Review of draft-ietf-speechsc-mrcpv2
Requested revision No specific revision (document currently at 28)
Type Telechat Review
Team General Area Review Team (Gen-ART) (genart)
Deadline 2011-11-29
Requested 2011-11-30
Authors Daniel C. Burnett , Saravanan Shanmugham
I-D last updated 2012-01-12
Completed reviews Genart Last Call review of -?? by Miguel Angel García
Genart Telechat review of -?? by Miguel Angel García
Secdir Early review of -?? by Catherine Meadows
Assignment Reviewer Miguel Angel García
State Completed
Request Telechat review on draft-ietf-speechsc-mrcpv2 by General Area Review Team (Gen-ART) Assigned
Completed 2012-01-12
review-ietf-speechsc-mrcpv2-genart-telechat-garcia-2012-01-12-00
I have been selected as the General Area Review Team (Gen-ART)
reviewer for this draft. For background on Gen-ART, please see the FAQ at
<

http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>

Please resolve these comments along with any other comments you may receive.

Document: draft-ietf-speechsc-mrcpv2-24.txt
Reviewer: Miguel Garcia <miguel.a.garcia at ericsson.com>
Review Date: 2011-05-03
IETF LC End Date: 2011-04-13



Summary: The document is on the right track, but has some issues that 


should be addressed before publication as a standards track RFC.




Major issues: none

Minor issues:

- Section 4.2 reads:

   To remain backwards compatible with conventional SDP usage,
   the format field of the m-line MUST have the arbitrarily-selected
   value of "1".

The comment is that in other protocols, for example, MSRP [RFC4975] it
has been selected to use an asterisk "*" in the format field. I wonder if
it is possible to unify criteria, in order to allow usage of conventional
SDP parsers.

- Although at the general level the draft says that it reuses the SDP
offer/answer model, I think it would be good to explicitly mention it,
just explicitly indicate that "we aren't doing anything new here". For
example, in Section 4.2, the text reads

   When the client wants to add a media processing resource to the
   session, it issues a SIP re-INVITE transaction.

and I think it would be better to say:

   When the client wants to add a media processing resource to the
   session, it issues a new SDP offer, according to the procedures of
   RFC 3264 [RFC3264], in a SIP re-INVITE request.

Also, on page 17, the text reads:

   When the client wants to de-allocate the resource from this session,
   it issues a SIP re-INVITE transaction with the server.  The SDP MUST
   offer the control m-line with port 0.

It could be completed with:

   When the client wants to de-allocate the resource from this session,
   it issues a new SDP offer, according to RFC 3264 [RFC3264], where the
   control m-line port MUST be set to 0. This SDP offer is sent in a SIP
   re-INVITE request.

- Section 4.2 describes the usage of the a=setup attribute, but it does
not clarify if setting up MRCP sessions MUST always use the a=setup
attribute or not. I think the answer is yes, at least for TCP or TLS
transports (those defined in this version of the draft). But there should
be a "MUST use the 'setup' attribute as per RFC 4145" somewhere. I
recommend to add the following sentence at the beginning of the last
paragraph in page 16:

   MRCVv2 clients and servers using TCP as a transport protocol MUST use
the procedures specified in RFC 4145 [RFC4145] for setting up the TCP
connection, with the considerations described hereby. Similarly,  MRCVv2
clients and servers using TCP/TLS as a transport protocol MUST use the
procedures specified in RFC 4572 [RFC4572] for setting up the TLS
connection, with the considerations described hereby. The 'setup
attribute, as described ... [continue with the current text]

- Section 5.1. The ABNF of "mrcp-version", at the top of page 26, comes
out of the blue: there is no "mrcp-version" in the ABNF on page 25, so,
it is difficult to find out how to put "mrcp-version" into context.

- Section 5.4, in the client failure table, the description of the 407
contains a normative text "MAY BE". I think these tables are an
informative summary. All the normative text should be written in detail
elsewhere in the document. You can safely turn them lowercase.



- On Section 8.5.1 the spec talks about using the multipart/mixed Media 


Type. It is not clear to me if implementation of multipart/mixed is a 


MUST or a MAY that is negotiated.




- On Section 8.6, first paragraph, the text reads:

  The SPEAK" method can carry voice and prosody....

I guess the "can" should be replaced by a normative "MAY".



- On Sections 9.4.9, 10.4.8, and 11.4.11, there is no indication of the 


possible values of the "media-type-value". I guess you want to say that 


these are intended to be MIME types (so far, the word "MIME type" is not 


written), and in that case, you should also say that possible values are 


any of the values included in the MIME media types registry maintained by 


IANA.




- On Sections 9.4.11, 10.4.3, and 11.4.16, the ABNF merely reads:

    completion-cause       = "Completion-Cause" ":" 3DIGIT SP
                                              1*VCHAR CRLF



 I am assuming that when the ABNF refers to "3DIGIT", these are supposed 


to be any of the values included in the "Cause-Code" column of the table 


in the same section. Similarly, I am assuming that the "1*VCHAR" should 


include any of the values of the "Cause-Name" column in the same table. I 


think this should be spelled out. I suggest to change the ABNF for this one:




    completion-cause       = "Completion-Cause" ":" cause-code SP
                                             cause-name CRLF
    cause-code                 = 3DIGIT
    cause-name                = *VCHAR

And then do the missing mapping with a text similar to:



"The 'cause-code' contains a numerical value selected from the Cause-Code 


column of the following table. The 'cause-name' contains the 


corresponding token selected from the Cause-Name column.






Note that the resolution of this issue should be also applied to Sections 


10.4.3 and 11.4.16.






- Question: In Sections 9.4.21 and 10.4.6, should "1*UTFCHAR" be included 


in quotes? It is true that there is no other subfield in the same header, 


meaning that there is no intention to parse the text. But somehow I feel 


safer if you enclosed the text in quotes. Also, remember that this text 


is coming from other protocol beyond your control, so you never know, for 


example, if the other protocol is going to add CRLF or something weird 


that will crash the recipient of the header.






Note that the resolution of this issue should also be applied to Section 


10.4.6.




- On Section 9.4.26, the ABNF reads:

   recognition-mode   = "Recognition-Mode" ":" 1*ALPHA CRLF



However, there is only to values for choice, "normal" and "hotword". I 


think the following ABNF represents better that there is only two 


possible values:




  recognition-mode   = "Recognition-Mode" ":"
                                        normal-value / hotword-value
  normal-value            = "normal"
  hotword-value          = "hotword"



- Section 9.4.40. I wonder why the Phrase-NL is defined as UTFCHAR, while 


most of the other headers are defined as VCHAR




- Section 9.9, third paragraph, the text reads:



   "If the client needs to explicitly control grammar weights for the 


recognition operation, it must employ method 3 below. "




  I guess the "must" should be a "MUST".



- Section 9.9, the paragraph that goes after bullet point number "3", the 


text reads:






  "In addition to performing recognition on the input, the recognizer 


may also enroll the collected utterance in a personal grammar if the ..."




  Here the "may" should be a "MAY".

- Section 9.9, bullet point number 1 on page 114:

  "the recognizer must complete ...."



  I guess the "must" should be a "MUST". There is another instance with 


the same text on bullet point number 1 on page 115.




- Section 9.9, bullet point number 2 on page 115:

  "the RECOGNIZE completes with ..."

  I guess it should be normative: "the RECOGNIZE MUST complete with ..."

- Section 9.17, the text reads:



  "The END-PHRASE-ENROLLMENT method may be called ONLY during an active 


phrase enrollment session. "






  Two issues: First, the "may" should be a "MAY". Second, I guess the 


readability is increased if the sentence is turned into active. What 


about this:




  "The client MAY call the END-PHRASE-ENROLLMENT method ONLY during ..."

- Section 9.17, the text reads:

  "... the client can abort ..."

  I guess the "can" should be a "MAY".



- Section 9.17, second paragraph. The text should clearly indicate the 


name of header where the 'location/URI' should be included.






- In Section 10.4.7, I would like to clarify that if the record-uri is 


not sent in the header, then the actual audio should be sent as a MIME 


body. I would suggest to replace:






If this header field is not specified in the RECORD request, the server 


MUST capture the audio and send it in the "STOP" response or the RECORD- 


COMPLETE event as a message body.




with this text:



If this header field is not specified in the RECORD request, the server 


MUST capture the audio, MUST encode it as a MIME body, and MUST send it 


in the "STOP" response or the RECORD- COMPLETE event.






- In Section 10.4.7, I believe you should write normative text at the end 


of paragraphs 1 and in paragraph 3. Also, in paragraph 3, you need to add 


references to all the valid schemes and add the 'cid' scheme. And perhaps 


write all the scheme names in lowercase.




                                                 In this case, the response


 carrying the audio content MUST include a Content ID (cid) [RFC2392] 


value in this header



 pointing to the Content-ID in the message body.

And in the the third paragraph:



Implementations MUST only use 'http', 'https' [RFC2616], 'file' 


[RFC3986], and 'cid' [RFC2392] schemes in the URI. Other URI schemes MUST 


NOT be used. Note, however, that implementations already exist that 


support other schemes.






- Section 10.5. Perhaps a reference to Section 10.4.7 should be added, 


because the details are described in 10.4.7.






- Sections 4.2 and 10.6. The text says that if physical security is 


provided, one can void TLS and merely used TCP. I have the feeling that 


the connection of these two concepts (physical security and lack of TLS) 


is not sufficiently justified for the security experts. One could access 


the resources of the MRCP server remotely, via a TCP connection, not 


using TLS. Or even sniff the network. So, physical security is not enough 


for voiding TLS.






- Section 11.4.2. I am missing more normative strength text here. For 


example:


  + a couple of occurrences of "is required" should be set to "MUST be 


present"



  + "may" should be "MAY"
  + "can" should be "MAY"



- Section 11.5.2.10. I am missing more normative strength text. In 


particular, a couple of occurrences of "must" in the first paragraph 


should be "MUST".






- Section 11.6, 7th paragraph. I am missing more normative strength text. 


When the text say:


  "... operations may be performed on the verifier resource". The "may" 


should be a "MAY".






- Section 12.1, bullet point 1. Although the text says that clients and 


servers MUST support digest authentication, I think they SHOULD use it 


(isn't that the intention). In particular, the server SHOULD authenticate 


the client using SIP digest authentication. Also, a reference to RFC 3261 


should be added here.






- Section 12.1, bullet point 2. I agree that clients and servers SHOULD 


employ 'sips' URIs, but I guess the 'sips' is just the format of the 


underlying TLS security. I guess you should add "including that clients 


and servers SHOULD setup up TLS [ref] connections".






- Section 12.3. The last word of this section, "recommended", should be 


normative: "RECOMMENDED".






- Section 13.1.6 describes a mechanism where vendor-specific extensions 


use the reverse DNS mechanism, for example., "com.example.foo". Then, if 


the vendor-specific extension is connected to DNS to avoid clashes in 


names, why is there a need for an expert review policy prior to its 


registration? I see a contradiction in having a self-managing registry by 


avoiding clashes due to the connection to DNS, and then having anything 


else than a volunteer registry.






- Section 13.7.2 registers new SDP "att-field" at the *session-level* (at 


least according to the title in Section 13.7.2. However, the text of both 


registers reads "media-level" in the type of attribute. So, are these two 


session-level or media-level SDP attributes? I think they are 


media-level, in which case, they both should be written in Section 


13.7.3, which is the SDP "att-field" registration at the media-level.






- I think it is not correct to have normative text (capitalized) in the 


examples, because examples are informative by nature. The text around the 


examples should describe what is happening. If there is a need for 


normative text, then it should be already written elsewhere, and if it 


isn't, it should be written in the normative sections, but not in the 


examples. So, I found the following instances of normative text in 


Section 14:



  + RECOMMENDED on page 186
  + A couple of MUST on page 187
  + MUST on page 189



- Sections 16.3 and 16.3. If these are schema definitions, then why don't 


they start with:



  <?xml version ="1.0" encoding="UTF-8"?>
  and the namespace definition, like any other XML schema document?

Nits:
====

- Expand acronyms at first occurrence. This includes: SSML, NLSML, VXML, RTSP



- General to all the examples containing SDP. In Section 4.3. I think the 


"o=" lines are not correct, in particular, the IP address in the o= line 


is set to "192.0.2.4" in both the offer and the answer. So, the first 


thing is that it should be a different value in the offer and the answer. 


The second thing is that it should typically be equal to the IP address 


that we see in the c= line, unless you have a very good reason for it. 


Since it is not the goal of the spec to deal with these rare use cases, I 


would recommend to set the IP address in the o= line to become equal to 


that in the c= line.






This  also applies to the example in Section 7 and the examples in 


Section 14.






- General to all the examples containing SDP. According to RFC 3264 you 


must include a "t=0 0" line in all the SDP offers and answers.






- General to all the examples containing SDP. There is an "s=" line that 


contains text. According to RFC 3264, it is RECOMMENDED that it contains 


a white space or a dash "-". If you want to add a subject, the SIP 


Subject header field serves the purpose.




- Section 5, add a formal reference to ISO 8859-1 when the text mentions it.

- It would be good to have numbers in those tables in Section 5.4. I
mean, there should be a caption saying "Table 1: Success 2xx response codes".



- In Sections 9.4.21 and 10.4.6, add a formal reference to RFC3629 when 


you name "UTF-8".




- Section 9.6.3.3., 1st bullet point:
  s/contains an float/contains a float



- Section 9.6.3.4, add a formal reference to ISO 8601 when the text 


mentions it.






- Section 9.9, bullet point numbered "3". Add a reference to the document 


that defines the  "3C grammar weights"




- Section 9.9, the first paragraph on page 114,
 s/The No-Input-Timer MUST BE started/The No-Input-Timer MUST be started

- Section 10, add a formal reference to RFC 2326 when mentioning RTSP.



- Around Section 10.4.7, it would be good to write a simple example 


showing a message body that includes the audio. This is to see an example 


of a 'cid' URI scheme.






- General: Names of URI schemes should be written in lowercase, for 


example: 'https' URI scheme. There is an instance in Section 10.6, page 


135. Other instances of "SIPS:" are in Section 12.1






- Starting in Section 10.6, but with many occurrences after that, there 


is a message from the Server to the Client which is written as 


"MRCP/2/0", where the second "/" should actually be a ".", such as: 


"MRCP/2.0".






- Sections 9.20 and 9.21. The third message of the example starts with 


this line:




 S->C: MRCP/2.0 ... INTERPRETATION-COMPLETE 543267 200 COMPLETE



The request-id value is 543267. However, the request-id of the client is 


543266. I guess this request-id should also be 543266.






- Section 10.6, 10.7, 10.8. The request-id of the request is set to 


543257. However, the request-id of the following responses are all set 


to 456234. I guess they should all be set to the same value.




- Section 10.8, page 137, first paragraph:
  s/audio ./audio.

- Section 11, fourth paragraph:
  s/'multi-verification.'/'multi-verification'.



- Section 11.1. The request-id of the fifth message is set to 314164, 


however, it should be set to 314162, because it is a response to the 


forth message.






- Section 11.5.1. There is a closing bracket ")" at the end of the first 


sentence, for which there isn't an opening bracket.






- Section 13.1.1. The registry of the set-cookie and set-cookie2 point to 


the wrong RFC. They should point to RFCs 2109 and 2965, respectively.






- Section 13.1.3. I would suggest to split this long registry and create 


five separate registries, one per resource type and the Generic. This 


simplifies if in the future someone wants to add a new, e.g., Synthesizer 


header. With the current registry, IANA will add the new at the end of 


the registry, after the last header of the Verifier, and most people will 


miss this new hypothetical Synthesizer header. If you create five 


different registries, then the problem will be solved.






- Sections 13.7.1 and 13.7.2. There are two tokens being registered in 


each Section. Can you add an empty line in between these two 


registrations (within the same Section) for the sake of readability?




- Section 14.1. The example at the top of page 182, the "m=" line reads:

  m=audio 49170 RTP/AVP 0 96



  However, there is no "a=rtpmap:96" line and an "a=fmtp:96" describing 


what the media type "96" is. I guess 96 represents a telephone event, in 


which case, you should add:




  a=rtpmap:96 telephone-event/8000
  a=fmtp:96 0-15



- Next to last paragraph on page 187, the text reads "... turn around and 


issued a ...".



  s/issued/issue

- First sentence on page 189:
  s/a SIP BYE/a SIP BYE request

- Idnits reveals:
 ** Obsolete normative reference: RFC 3388 (Obsoleted by RFC 5888)
 ** Obsolete normative reference: RFC 2109 (Obsoleted by RFC 2965)
 ** Obsolete normative reference: RFC 4646 (Obsoleted by RFC 5646)
 ** Downref: Normative reference to an Experimental RFC: RFC 2483


 -- Obsolete informational reference (is this intentional?): RFC 2234 


(Obsoleted by RFC 4234)





Regards,

      Miguel Garcia
--
Miguel A. Garcia
+34-91-339-3608
Ericsson Spain