Bidirectional Forwarding Detection (bfd) IETF 98 - Chicago 1 hour session, March 27, 2017; 17:10-18:10 Chairs: WG Status. 10 mins draft-ietf-bfd-multipoint, ready for WGLC? draft-ietf-bfd-multipoint-active-tail to be made experimental. draft-ietf-bfd-optimizing-authentication, ready for WGLC? draft-tanmir-rtgwg-bfd-mc-lag-ip, draft-tanmir-rtgwg-bfd-mc-lag-mpls, WG adoption? BFD work in other groups. Reshad Rahman: Status on BFD Yang Module (draft-ietf-bfd-yang). 10 mins Sonal Agarwal: BFD secure sequence numbers (draft-sonal-bfd-secure-sequence-numbers). 10 mins Mahesh Jethanandani: BFD Stability (draft-ashesh-bfd-stability). 10 mins Optimizing BFD Authentication (draft-ietf-bfd-optimizing-authentication). 5 mins ===== WG Chairs slides (Jeff) - MPLS MIB is done but in zombie state. - Need to address some charter stuff to remove MIBs. - BFD generic authentication - in zombie state, possibly will be resurrected as part of work to be discussed in this session. - BFD Multipoint, need to close out the work. Base implementation already done by Nokia/ALU. Active tail draft is unlikely to see deployment any time soon. Would like to publish the document set within the next IETF cycle. Looking for a document shepherd; Greg Mirsky is possibly interested? - BFD over multi-chassis (Tantsura/Mirsky), was presented in IETF 96. Sense of room was that it should have been adopted. Will send out adoption call(***). Greg Mirsky: Comment from Joe Messinger in Berlin about new extensions to 802.1ax LAG specification for distributed resilient network interfaces. Makes MC-LAG as a special case. Consider bringing that work to ieee? Greg doesn't think IEEE will be interested in using a Layer 3 protocol to manage a Layer 2 object. Would work on extending Y.1731, e.g., or CFM. Would like BFD community to consider if there's interest to generalize BFD or MC-LAG if there's something we want do first. Jeff: Mailing list is best place to have this discussion due to discussion. This headache with IEEE is well known from before; this is a weird mix of Layer 3 and Layer 2, just a different perspective. ===== BFD Yang Module Update Reshad Rahman presenting: ------------------------- [Slides.] Moving to a schema mount model. rtg-cfg draft went to RFC (8022). BFD model augmenting that. Update to interactions with other Yang modules. BFD over VCCV not there. L2VPN model going through a fair amount of churn. Have to decide whether this model should cover VCCV or in a different document. No BFD over MPLS-TP - not in this draft. RPCs not needed; Greg will talk about Echo later. Jeff: In terms of VCCV, we're getting pushed to publish. Should we just get this out sooner than later? Alvaro Retana: If it was me, push it out. How long would the other modules take? Do it, then augment it later. Jeff haas our job is to make sure the structure of the module is good enough to augment later. Reshad: Base model. The various types just augment from there. Could try as an experiment on existing L2VPN model to see how it'd work. Open Issues. Discussion on alias whether echo transmit interval should be a parameter or an RPC? For me, it comes down to whether this is on-demand or continuous. Comply with RFC 6087-bis. Comments from Jeff on mailing list and other open issues. Then Yang doctor review. Jeff Tantsura: It takes at least a month to get a Yang doctor review. ===== BFD Echo data model Greg Mirsky presenting: ----------------------- [slides] Payload of echo packets are non-determined. It's implementation specific. Asymmetric. Node will advertise its desired receive interval. Asynchronous will send min Rx and Tx intervals. One peer may use echo while the other does not. (Echo 0.) RFC 5880 suggests that BFD Echo can be used in conjunction with slower speed BFD Async messages. BFD Yang model is such that: 1. Model includes Tx Echo interval as part of configuration. It's persistent configuration. 2. Echo is not a *proactive* mechanism like Async, but is more *on-demand.* Proactive vs. On-demand: Proactive has persistent configuration, on-demand doesn't. Discussion started on WG list and we have different interpretations of the mechanism. If you're using Echo BFD, you can reduce Async rate. Not sure on-demand will make possible to do this correction of the rate. If we reduce the number of messages, we also impact the state machine. The definition of "failure" becomes a bit more ambiguous. Jeff: What we see in the real world. With regard to the BFD state machine, echo is there to say "I'm taking control over what I consider reachable or not". Payload is my choice. Verification is my choice. It's intentionally taking it out of the state machine. BFD control is there to signal that we don't want packet rate to exceed things. Jeff: With regard to the Yang module, configuration hasn't changed. The user has expressed their intent for async and echo. Operationally, we'll always show the active state. This may be different than what's been configured. What's missing is when we've negotiated down is whether Echo has any representation in the operational state. This is explicitly left out of the spec, which makes it sort of hard to put it into a standardized yang module. What we could do is put in operational state to say that Echo is *active*. We could let others augment the Yang module to show what they're injecting, but that'd probably be a vendor augmentation module. Greg: Original discussion started that Tx interval, whether it's part of persistent config or part RPC? Initiating echo session - is it part of configuration or part of RPC? Jeff: In terms of management, it's an inherent property of the end-user. Implementation wouldn't prod the yang module to implement it. Where you might want to expose this if the end-user wants to do this on demand; e.g. S-BFD? On-demand behavior for the management plane, probably not. Perhaps for S-BFD. Greg: S-BFD, there's certain dependency on the IGP protocol? Jeff: There's a dependency, IGP is one way to do it. Provisioning is another. Greg: Point was to see if Echo Tx/Rx parameters needed to persist. Jeff: Probably out of scope, but good for vendor augmentation. Greg: Would consider that not in the base model? Ok, I can agree with that. ===== BFD secure sequence numbers Presented by Sonal Agarwal: --------------------------- [slides] BFD authentication is on the entire packet. Sequence number is part of every packet. Enhancement, we don't have to authenticate the whole packet since it's expensive. Instead, we can authenticate is part of the packet; that's the sequence number. This gives us sequence number unpredictability/hiding. Currently sequence numbers increase predictably. This makes man in the middle easy. Proposal is that we push the sequence number through an asymmetric hash function. This hides the sequence number. Provision a hash algorithm (symmetric hash) and a shared key. Jeff: Broader discussion after next presentation about optimizing BFD. What should become apparent is this mechanism is intended to addresss the issue of running unauthenticated BFD packets. How do we mitigate man in the middles in that case? [?] - This is assuming the null authentication draft? Sonal: Side by side with the other. That draft could make use of this mechanism, but this is independent. Jeff: This mechanism can be used outside of BFD. Generic problem that IETF has is when you have a piece of high value data, and you can't do an expensive cipher, this mechanism is good for medium grade security for data that has a short life span. ===== Optimizing BFD Authentication Presented by Mahesh Jethanandani: --------------------------------- How do we deal with unauthenticated packets? And how Sonal's draft fits into this. Updated IANA considerations. Asking for NULL Auth code point. Only BFD packets that make state changes needs "real" authentication. The rest can go in the clear. However, what happens if someone tries to hijack a session when it's in the clear? Recommendation was that we should periodically authenticate an in the clear packet. But only helps if you do it "frequently enough". This is where the suggestion of using the meticulous sequence number and to munge it, then we would be able to prevent the man in the middle attack for the case where the packet is otherwise in the clear. Draft has been stable for a while. Ready for last call. Questions? Greg Mirsky: One of the possible modes when the session is up is to use authentication with periodic timer trigger? Mahesh: That was one of the options. Greg: That was because it's computationally challenging to do this on every packet. The idea is that authentication can be offfloaded from the forwarding engine. When we receive a packet with authentication being validated. When we complete authentication? Or verifying part of it. Mahesh: When it's completely authenticated. Greg: That impacts the state machine. The rate of sending packets is quite aggressive. Mahesh: When you're doing a state change you're not under obligation... Greg: The periodic/every so often, load causes us to lose traffic? For example, we've shipped this occasional packet down to the line card for authentication, and then we lose two packets in a row. ([Jeff], I believe Greg is presuming a Detection Multiplier of 3 for his example.) This could cause sessions to drop. If authentication takes more time than receiving two packets. Jeff: Rate on line card and auth operation causes packet loss? Greg: Authentication is bump in the wire - shifted to control plane? Does this change the validation scheme? High rate of packets, 3.3ms. Control plane is too busy to get it back in 7ms. Next 2 packets that should be hardware processed are lost. One packet that's waiting 2 packets lost, we're now lost. Mahesh: True. In that scenario, have no option to drop session. Greg: You think that it doesn't have to change the validation scheme that this packet gets counted as a valid packet? Mahesh: One possible implementation, before bring the sessino down, check pending packets for validation? Greg: Single unauthenticated packet causes session down? not in document? ? - Authentication is a separate process. At some point you get around to validating the authentication. Detection of authentication issue. You don't block, but there might be a delay. Greg: You encrypt the whole packet, the whole message. ? - you put a checksum on the packet. Mahesh: Referring to previous option. We periodically, once a second encrypt a packet. A timer can expire and can bring session down. I accept that argument. Les Ginsberg: It's a practical problem that authentication can't be done within session holdtime. Is this an actual problem Jeff: It's a slightly different problem. It's doing that whole process every 3.3ms. Les: That's why there's a proposal to this on every N packets. The issue is what happens when you lose those N packets? Greg: We may have thousands of sessions. Mahesh: It's not just that authentication for that packet that is taking more than 3.3ms, if it takes 9.9ms to succeed to see if you've lost 3 packets. If you can't authenticate it in 9.9ms, you probably shouldn't run authentication in the first place. If hardware isn't capable of authenticating every N-th packet. Greg: Multiple concurrent BFD sessions. If authentication is expensive, and you want to do it periodically, if performance isn't a problem. Why wouldn't we do it for every one? Mahesh: We know that it is a problem. What we don't know is if authenticating every N-th packet. We wouldn't be proposing this if we couldn't do *any* packet at wire speed. Greg: If we have a problem authenticting packet and we authentic every 1 packet of 1k, then it's still a problem. Les: It's just a matter of scale. We're increasing the cost of supporting a session. Greg: It's not obviously 3. We're changing the flow of packet in a system. If we authenticate every packet in the system, it's sequential. If we off-load, we change that from sequential. Ashesh in jabber: In hardware implementation, the sequence number option will be a better fit compared to a scenario where the authentication handler is not in-line. In any case, authentication of the off-loaded frame is not related to the subsequent frames. Jeff: More than one person is making the point that if you change the flow of authentication, say a different engine, the fundamental property you care about is can you do this work at a timely fashion at the appropriate scale. It's a multidimensional problem as BFD always is. If you increase the numbers of one thing, you change the scale on the others. When you make N big enough, you end up with headache. BFD does give us jitter to help a bit. The text on the periodic authentication text should probably talk more about jitter to avoid self synchronization. Mahesh: We can add that. Reshad: Has there been any experimentation to see how this impacts scale, scale X times better? Mahesh: Not specifically with this form of authentication. The difference between no authentication and full authentication is roughly 1:5. Jeff: Think we're a bit premature for last call. Procedures in Sonal's draft covers helping this, but still need to discuss sequence number discovery. Probably ready after integration. (***) Jeff: Will move to adopt Sonal's draft. WGLC not ready on for optimizing. Reshad: Can optimizing draft move to WGLC without sequence number stuff? Mahesh: Will probably get hung up in IESG. Jeff: Both drafts will likely need to advance together. ===== BFD Stability Presented by Mahesh Jethanandani ------------------------------------------------- Decided to drop delay measurement. Keep the sequence number mechanism to help us see quality of BFD session. Now a very simple draft. Adoption request(***) Jeff: This does clear out one of the security issues. Reshad : Who's read? 5-6?