Early Review of draft-ietf-trill-directory-assist-mechanisms-03
review-ietf-trill-directory-assist-mechanisms-03-rtgdir-early-halpern-2016-04-18-00

Request Review of draft-ietf-trill-directory-assist-mechanisms
Requested rev. no specific revision (document currently at 12)
Type Early Review
Team Routing Area Directorate (rtgdir)
Deadline 2016-04-18
Requested 2016-04-13
Draft last updated 2016-04-18
Completed reviews Rtgdir Early review of -03 by Matthew Bocci (diff)
Rtgdir Early review of -03 by Joel Halpern (diff)
Secdir Last Call review of -11 by Daniel Franke (diff)
Genart Last Call review of -10 by Francis Dupont (diff)
Opsdir Last Call review of -10 by Tianran Zhou (diff)
Assignment Reviewer Joel Halpern
State Completed
Review review-ietf-trill-directory-assist-mechanisms-03-rtgdir-early-halpern-2016-04-18
Reviewed rev. 03 (document currently at 12)
Review result Not Ready
Review completed: 2016-04-18

Review
review-ietf-trill-directory-assist-mechanisms-03-rtgdir-early-halpern-2016-04-18

Hello,



I have been selected as the Routing Directorate reviewer for this draft. 


The Routing Directorate seeks to review all routing or routing-related 


drafts as they pass through IETF last call and IESG review, and 


sometimes on special request. The purpose of the review is to provide 


assistance to the Routing ADs. For more information about the Routing 


Directorate, please see ‚Äč 


http://trac.tools.ietf.org/area/rtg/trac/wiki/RtgDir








Although these comments are primarily for the use of the Routing ADs, it 


would be helpful if you could consider them along with any other IETF 


Last Call comments that you receive, and strive to resolve them through 


discussion or by updating the draft.




Document: draft-ietf-trill-directory-assist-mechanisms-07.txt
Reviewer: Joel Halpern
Review Date: 13-April-2016
IETF LC End Date: N/A
Intended Status: Proposed Standard



Summary: I have significant concerns about this document and recommend 


that the Routing ADs discuss these issues further with the authors.


    I do believe that the major issues are easily resolvable.  I have 


tried to provide my best guess as to text how to resolve each of them.


    I would like to see the minor issues discussed and preferably 


addressed.




Major Issues:


    In the state machine transitions in section 2.3.3 for push servers, 


it appears that if the event indicating that the server is being shut 


down occurs while the server is already Going Stand-By or Uncompleting, 


the transitions indicate that this "going down" event will be lost.  A 


strict reading of this would seem to mean that the "go Down" event would 


need to recur after the timeout condition.  This would seem to be best 


addressed by a new state "Going-Down" whose timeout behavior is to move 


to down state.






In section 2.3.2, The descriptions for event 3 and 5 are identical.  I 


believe from the state transitions that condition 3 is supposed to 


reflect the server NOT having complete data when the Activate condition 


is met.






In section 3.2.1 there is provision for using a received frame as a 


Query.  There are type indications as to what the type of the frame is. 


 I believe that the intent is that the query always contains the full 


received Ethernet Frame, no matter what the type is.  But it does not 


say that.  So one could also conclude that for ARP, what I should send 


is the ARP message, and for ND, the ND message, etc.  I believe the text 


needs to be clarified.  If my guess is correct that the full Ethernet 


Frame is to be send in all cases, then explanatory text as to why the 


various type codes exist would seem helpful, since the received frame 


contains enough information to support decoding.






Minor Issues:


    In section 2.3.3 describing the state transitions for push servers, 


there is an event (event 1) described as "the server was Down but is now 


Up."  The state transition diagram describes this as being a valid event 


that does not change the servers state if the server is in any state 


other than "Down." In one sense, this is reasonable, saying that such an 


event is harmless.  I would however expect some sort of logging or 


administrative notification, as something in the system is quite confused.






    Should section 2.4 include a note that indicates that reliance on 


information completeness does mean that there are windows when new 


entities join the space represented by particular TRILL data label 


during which packets for that destination may be dropped, due to clients 


not yet having received the updated information?  I believe this window 


is small, and it is quite reasonable to also note that in such text.






    Text in section 3.2.2.1 on lifetimes and the information 


maintenance in section 3.3 imply that the clients and servers must 


maintain a connection.  Presumably, this is required already by the 


RBRidge Channel protocol, and I understand that we should not repeat the 


entire protocol here.  It would seem to make readers life MUCH simpler 


if the text noted that the RBRidge Channel protocol requires that there 


be a maintained connection between the client and the server, and that 


these mechanisms leverage the presence of that connection.






    In section 3.2.2.2 on Pull directory forwarding, I expect to see 


text about and to whom the Pull server will flood the received request. 


 Instead, the text appears to say that it is teh response that will be 


flooded.  More importantly, the descriptive text talks about sending the 


response, which would normally be a description of sending the response 


to the requestor, not sending it to someone else.


    In a related confusion, it seems very strange that a "flood" 


request will result in sending an underlying paket unicast to the 


destination.  This may be just terminology, but it seems likely to 


confuse implementors.  Maybe the flag should be called the Forward flag, 


with a note in the definition that it nromally causes the response to be 


sent to multiple parties, but in the case of a raw MAC frame, results in 


the packet being forwarded to the destination or flooded, as the server 


can manage?






    In the description in section 3.3 of Cache management, in the text 


on method one in which the servers keep minimal state, it would seem 


that a large health warning is needed, as this method will cause all 


clients to discard all positive data whenever any positive data at the 


server changes (even if no client is using the modified data.)  This 


makes a flapping end station an attack on the cache of all clients!


    It strikes me that the working group could help get robust 


deployment by making method 3 (tracking what you told clients) a SHOULD. 


 (I grant that it is not a MUST, as the other choices do work.)




Editorial Issues / Nits :