Skip to main content

Early Review of draft-ietf-trill-directory-assist-mechanisms-03

Request Review of draft-ietf-trill-directory-assist-mechanisms
Requested revision No specific revision (document currently at 12)
Type Early Review
Team Routing Area Directorate (rtgdir)
Deadline 2016-04-18
Requested 2016-04-13
Authors Donald E. Eastlake 3rd , Linda Dunbar , Radia Perlman , Yizhou Li
I-D last updated 2016-04-18
Completed reviews Rtgdir Early review of -03 by Matthew Bocci (diff)
Rtgdir Early review of -03 by Joel M. Halpern (diff)
Secdir Last Call review of -11 by Daniel Fox Franke (diff)
Genart Last Call review of -10 by Francis Dupont (diff)
Opsdir Last Call review of -10 by Tianran Zhou (diff)
Assignment Reviewer Joel M. Halpern
State Completed
Request Early review on draft-ietf-trill-directory-assist-mechanisms by Routing Area Directorate Assigned
Reviewed revision 03 (document currently at 12)
Result Not ready
Completed 2016-04-18

I have been selected as the Routing Directorate reviewer for this draft. 

The Routing Directorate seeks to review all routing or routing-related 

drafts as they pass through IETF last call and IESG review, and 

sometimes on special request. The purpose of the review is to provide 

assistance to the Routing ADs. For more information about the Routing 

Directorate, please see ‚Äč

Although these comments are primarily for the use of the Routing ADs, it 

would be helpful if you could consider them along with any other IETF 

Last Call comments that you receive, and strive to resolve them through 

discussion or by updating the draft.

Document: draft-ietf-trill-directory-assist-mechanisms-07.txt
Reviewer: Joel Halpern
Review Date: 13-April-2016
IETF LC End Date: N/A
Intended Status: Proposed Standard

Summary: I have significant concerns about this document and recommend 

that the Routing ADs discuss these issues further with the authors.

    I do believe that the major issues are easily resolvable.  I have 

tried to provide my best guess as to text how to resolve each of them.

    I would like to see the minor issues discussed and preferably 


Major Issues:

    In the state machine transitions in section 2.3.3 for push servers, 

it appears that if the event indicating that the server is being shut 

down occurs while the server is already Going Stand-By or Uncompleting, 

the transitions indicate that this "going down" event will be lost.  A 

strict reading of this would seem to mean that the "go Down" event would 

need to recur after the timeout condition.  This would seem to be best 

addressed by a new state "Going-Down" whose timeout behavior is to move 

to down state.

In section 2.3.2, The descriptions for event 3 and 5 are identical.  I 

believe from the state transitions that condition 3 is supposed to 

reflect the server NOT having complete data when the Activate condition 

is met.

In section 3.2.1 there is provision for using a received frame as a 

Query.  There are type indications as to what the type of the frame is. 

 I believe that the intent is that the query always contains the full 

received Ethernet Frame, no matter what the type is.  But it does not 

say that.  So one could also conclude that for ARP, what I should send 

is the ARP message, and for ND, the ND message, etc.  I believe the text 

needs to be clarified.  If my guess is correct that the full Ethernet 

Frame is to be send in all cases, then explanatory text as to why the 

various type codes exist would seem helpful, since the received frame 

contains enough information to support decoding.

Minor Issues:

    In section 2.3.3 describing the state transitions for push servers, 

there is an event (event 1) described as "the server was Down but is now 

Up."  The state transition diagram describes this as being a valid event 

that does not change the servers state if the server is in any state 

other than "Down." In one sense, this is reasonable, saying that such an 

event is harmless.  I would however expect some sort of logging or 

administrative notification, as something in the system is quite confused.

    Should section 2.4 include a note that indicates that reliance on 

information completeness does mean that there are windows when new 

entities join the space represented by particular TRILL data label 

during which packets for that destination may be dropped, due to clients 

not yet having received the updated information?  I believe this window 

is small, and it is quite reasonable to also note that in such text.

    Text in section on lifetimes and the information 

maintenance in section 3.3 imply that the clients and servers must 

maintain a connection.  Presumably, this is required already by the 

RBRidge Channel protocol, and I understand that we should not repeat the 

entire protocol here.  It would seem to make readers life MUCH simpler 

if the text noted that the RBRidge Channel protocol requires that there 

be a maintained connection between the client and the server, and that 

these mechanisms leverage the presence of that connection.

    In section on Pull directory forwarding, I expect to see 

text about and to whom the Pull server will flood the received request. 

 Instead, the text appears to say that it is teh response that will be 

flooded.  More importantly, the descriptive text talks about sending the 

response, which would normally be a description of sending the response 

to the requestor, not sending it to someone else.

    In a related confusion, it seems very strange that a "flood" 

request will result in sending an underlying paket unicast to the 

destination.  This may be just terminology, but it seems likely to 

confuse implementors.  Maybe the flag should be called the Forward flag, 

with a note in the definition that it nromally causes the response to be 

sent to multiple parties, but in the case of a raw MAC frame, results in 

the packet being forwarded to the destination or flooded, as the server 

can manage?

    In the description in section 3.3 of Cache management, in the text 

on method one in which the servers keep minimal state, it would seem 

that a large health warning is needed, as this method will cause all 

clients to discard all positive data whenever any positive data at the 

server changes (even if no client is using the modified data.)  This 

makes a flapping end station an attack on the cache of all clients!

    It strikes me that the working group could help get robust 

deployment by making method 3 (tracking what you told clients) a SHOULD. 

 (I grant that it is not a MUST, as the other choices do work.)

Editorial Issues / Nits :