Bidirectional Forwarding Detection (bfd)
IETF 98 - Chicago
1 hour session, March 27, 2017; 17:10-18:10 

Chairs: WG Status. 10 mins
 draft-ietf-bfd-multipoint, ready for WGLC?
 draft-ietf-bfd-multipoint-active-tail to be made experimental.
 draft-ietf-bfd-optimizing-authentication, ready for WGLC?
 draft-tanmir-rtgwg-bfd-mc-lag-ip, draft-tanmir-rtgwg-bfd-mc-lag-mpls, WG adoption?
 BFD work in other groups.

Reshad Rahman:
 Status on BFD Yang Module (draft-ietf-bfd-yang). 10 mins

Sonal Agarwal:
 BFD secure sequence numbers (draft-sonal-bfd-secure-sequence-numbers). 10 mins

Mahesh Jethanandani: 
 BFD Stability (draft-ashesh-bfd-stability). 10 mins
 Optimizing BFD Authentication (draft-ietf-bfd-optimizing-authentication). 5 mins

=====

WG Chairs slides (Jeff)

- MPLS MIB is done but in zombie state.
- Need to address some charter stuff to remove MIBs.
- BFD generic authentication - in zombie state, possibly will be resurrected
  as part of work to be discussed in this session.
- BFD Multipoint, need to close out the work.  Base implementation already
  done by Nokia/ALU.  Active tail draft is unlikely to see deployment any time
  soon.  Would like to publish the document set within the next IETF cycle.
  Looking for a document shepherd; Greg Mirsky is possibly interested?
- BFD over multi-chassis (Tantsura/Mirsky), was presented in IETF 96.  Sense
  of room was that it should have been adopted.  Will send out adoption
  call(***).

Greg Mirsky: Comment from Joe Messinger in Berlin about new extensions to
802.1ax LAG specification for distributed resilient network interfaces.  Makes
MC-LAG as a special case. Consider bringing that work to ieee?  Greg doesn't
think IEEE will be interested in using a Layer 3 protocol to manage a Layer 2
object. Would work on extending Y.1731, e.g., or CFM. Would like BFD community to
consider if there's interest to generalize BFD or MC-LAG if there's something
we want do first.  

Jeff: Mailing list is best place to have this discussion due to discussion.
This headache with IEEE is well known from before; this is a weird mix of
Layer 3 and Layer 2, just a different perspective.

=====

BFD Yang Module Update
Reshad Rahman presenting:
-------------------------

[Slides.]
Moving to a schema mount model.
rtg-cfg draft went to RFC (8022).  BFD model augmenting that.
Update to interactions with other Yang modules.
BFD over VCCV not there.  L2VPN model going through a fair amount of churn.
Have to decide whether this model should cover VCCV or in a different
document.  No BFD over MPLS-TP - not in this draft.  
RPCs not needed; Greg will talk about Echo later.

Jeff: In terms of VCCV, we're getting pushed to publish.  Should we just get
this out sooner than later?
Alvaro Retana: If it was me, push it out. How long would the other modules
take?  Do it, then augment it later.
Jeff haas our job is to make sure the structure of the module is good enough
to augment later.
Reshad: Base model. The various types just augment from there.  Could try as
an experiment on existing L2VPN model to see how it'd work.

Open Issues.
Discussion on alias whether echo transmit interval should be a parameter or an
RPC?  For me, it comes down to whether this is on-demand or continuous.
Comply with RFC 6087-bis.
Comments from Jeff on mailing list and other open issues. Then Yang doctor
review.

Jeff Tantsura: It takes at least a month to get a Yang doctor review.

=====

BFD Echo data model
Greg Mirsky presenting:
-----------------------
[slides]

Payload of echo packets are non-determined.  It's implementation specific.
Asymmetric. Node will advertise its desired receive interval.  Asynchronous
will send min Rx and Tx intervals.  One peer may use echo while the other does
not. (Echo 0.)
RFC 5880 suggests that BFD Echo can be used in conjunction with slower speed
BFD Async messages.

BFD Yang model is such that:
1. Model includes Tx Echo interval as part of configuration.  It's persistent
   configuration.
2. Echo is not a *proactive* mechanism like Async, but is more *on-demand.*

Proactive vs. On-demand:
Proactive has persistent configuration, on-demand doesn't.

Discussion started on WG list and we have different interpretations of the
mechanism.  If you're using Echo BFD, you can reduce Async rate.  Not sure
on-demand will make possible to do this correction of the rate.  If we reduce
the number of messages, we also impact the state machine.  The definition of
"failure" becomes a bit more ambiguous.

Jeff: What we see in the real world.  With regard to the BFD state machine,
echo is there to say "I'm taking control over what I consider reachable or
not".  Payload is my choice. Verification is my choice.  It's intentionally
taking it out of the state machine. BFD control is there to signal that we
don't want packet rate to exceed things.  
Jeff: With regard to the Yang module, configuration hasn't changed.  The user
has expressed their intent for async and echo.  Operationally, we'll always
show the active state.  This may be different than what's been configured.
What's missing is when we've negotiated down is whether Echo has any
representation in the operational state.  This is explicitly left out of the
spec, which makes it sort of hard to put it into a standardized yang module.
What we could do is put in operational state to say that Echo is *active*.  We
could let others augment the Yang module to show what they're injecting, but
that'd probably be a vendor augmentation module.

Greg: Original discussion started that Tx interval, whether it's part of
persistent config or part RPC? Initiating echo session - is it part of
configuration or part of RPC?

Jeff: In terms of management, it's an inherent property of the end-user.
Implementation wouldn't prod the yang module to implement it.  Where you might
want to expose this if the end-user wants to do this on demand; e.g. S-BFD?
On-demand behavior for the management plane, probably not.  Perhaps for S-BFD.

Greg: S-BFD, there's certain dependency on the IGP protocol?

Jeff: There's a dependency, IGP is one way to do it.  Provisioning is another.

Greg: Point was to see if Echo Tx/Rx parameters needed to persist.

Jeff: Probably out of scope, but good for vendor augmentation.

Greg: Would consider that not in the base model?  Ok, I can agree with that.

=====

BFD secure sequence numbers 
Presented by Sonal Agarwal:
---------------------------
[slides]
BFD authentication is on the entire packet.
Sequence number is part of every packet.
Enhancement, we don't have to authenticate the whole packet since it's
expensive.  Instead, we can authenticate is part of the packet; that's the
sequence number.  This gives us sequence number unpredictability/hiding.
Currently sequence numbers increase predictably.  This makes man in the middle easy.  
Proposal is that we push the sequence number through an asymmetric hash
function.  This hides the sequence number.  Provision a hash algorithm
(symmetric hash) and a shared key.

Jeff: Broader discussion after next presentation about optimizing BFD.  What
should become apparent is this mechanism is intended to addresss the issue of
running unauthenticated BFD packets.  How do we mitigate man in the middles in
that case?

[?] - This is assuming the null authentication draft?

Sonal: Side by side with the other. That draft could make use of this
mechanism, but this is independent.

Jeff: This mechanism can be used outside of BFD.  Generic problem that IETF
has is when you have a piece of high value data, and you can't do an expensive
cipher, this mechanism is good for medium grade security for data that has a
short life span.

=====

Optimizing BFD Authentication
Presented by Mahesh Jethanandani: 
---------------------------------
How do we deal with unauthenticated packets? And how Sonal's draft fits into this.
Updated IANA considerations.  Asking for NULL Auth code point.
Only BFD packets that make state changes needs "real" authentication.  The
rest can go in the clear.  However, what happens if someone tries to hijack a
session when it's in the clear?

Recommendation was that we should periodically authenticate an in the clear
packet.  But only helps if you do it "frequently enough".

This is where the suggestion of using the meticulous sequence number and to
munge it, then we would be able to prevent the man in the middle attack for
the case where the packet is otherwise in the clear.
Draft has been stable for a while. Ready for last call. Questions?

Greg Mirsky: One of the possible modes when the session is up is to use
authentication with periodic timer trigger?

Mahesh: That was one of the options.

Greg: That was because it's computationally challenging to do this on every
packet.   The idea is that authentication can be offfloaded from the
forwarding engine.  When we receive a packet with authentication being
validated.  When we complete authentication? Or verifying part of it.

Mahesh: When it's completely authenticated.

Greg: That impacts the state machine.  The rate of sending packets is quite
aggressive.

Mahesh: When you're doing a state change you're not under obligation...

Greg: The periodic/every so often, load causes us to lose traffic? For
example, we've shipped this occasional packet down to the line card for
authentication, and then we lose two packets in a row.  ([Jeff], I believe
Greg is presuming a Detection Multiplier of 3 for his example.) This could
cause sessions to drop.  If authentication takes more time than receiving two
packets.

Jeff: Rate on line card and auth operation causes packet loss?

Greg: Authentication is bump in the wire - shifted to control plane? Does this
change the validation scheme?  High rate of packets, 3.3ms.  Control plane is
too busy to get it back in 7ms. Next 2 packets that should be hardware
processed are lost.  One packet that's waiting 2 packets lost, we're now lost.

Mahesh: True. In that scenario, have no option to drop session.

Greg: You think that it doesn't have to change the validation scheme that this
packet gets counted as a valid packet?

Mahesh: One possible implementation, before bring the sessino down, check
pending packets for validation?

Greg: Single unauthenticated packet causes session down? not in document?

? - Authentication is a separate process. At some point you get around to
validating the authentication.  Detection of authentication issue. You don't
block, but there might be a delay.

Greg: You encrypt the whole packet, the whole message.

? - you put a checksum on the packet.

Mahesh: Referring to previous option. We periodically, once a second encrypt a
packet.  A timer can expire and can bring session down.  I accept that
argument.

Les Ginsberg: It's a practical problem that authentication can't be done
within session holdtime.  Is this an actual problem 

Jeff: It's a slightly different problem. It's doing that whole process every 3.3ms.

Les: That's why there's a proposal to this on every N packets. The issue is
what happens when you lose those N packets?

Greg: We may have thousands of sessions.

Mahesh: It's not just that authentication for that packet that is taking more
than 3.3ms, if it takes 9.9ms to succeed to see if you've lost 3 packets.  If
you can't authenticate it in 9.9ms, you probably shouldn't run authentication
in the first place. If hardware isn't capable of authenticating every N-th
packet. 

Greg: Multiple concurrent BFD sessions.  If authentication is expensive, and
you want to do it periodically, if performance isn't a problem. Why wouldn't
we do it for every one?

Mahesh: We know that it is a problem.  What we don't know is if authenticating
every N-th packet.  We wouldn't be proposing this if we couldn't do *any*
packet at wire speed.

Greg: If we have a problem authenticting packet and we authentic every 1
packet of 1k, then it's still a problem.

Les: It's just a matter of scale.  We're increasing the cost of supporting a
session.

Greg: It's not obviously 3. We're changing the flow of packet in a system.  If
we authenticate every packet in the system, it's sequential.  If we off-load,
we change that from sequential.

Ashesh in jabber: In hardware implementation, the sequence number option will
be a better fit compared to a scenario where the authentication handler is not
in-line. In any case, authentication of the off-loaded frame is not related to
the subsequent frames.

Jeff: More than one person is making the point that if you change the flow of
authentication, say a different engine, the fundamental property you care
about is can you do this work at a timely fashion at the appropriate scale.
It's a multidimensional problem as BFD always is.  If you increase the numbers
of one thing, you change the scale on the others.  When you make N big enough,
you end up with headache.  BFD does give us jitter to help a bit.  The text on
the periodic authentication text should probably talk more about jitter to
avoid self synchronization.

Mahesh: We can add that.

Reshad: Has there been any experimentation to see how this impacts scale,
scale X times better?

Mahesh: Not specifically with this form of authentication.  The difference
between no authentication and full authentication is roughly 1:5.

Jeff: Think we're a bit premature for last call.  Procedures in Sonal's draft
covers helping this, but still need to discuss sequence number discovery.
Probably ready after integration.  (***) 
Jeff: Will move to adopt Sonal's draft. WGLC not ready on for optimizing. 

Reshad: Can optimizing draft move to WGLC without sequence number stuff?

Mahesh: Will probably get hung up in IESG.

Jeff: Both drafts will likely need to advance together.

=====

BFD Stability
Presented by Mahesh Jethanandani
-------------------------------------------------
Decided to drop delay measurement. Keep the sequence number mechanism to help
us see quality of BFD session.

Now a very simple draft.
Adoption request(***)

Jeff: This does clear out one of the security issues.
Reshad : Who's read? 5-6?