Failure Detection and Locator Pair Exploration Protocol for IPv6 Multihoming
RFC 5534

Note: This ballot was opened for revision 13 and is now closed.

(Mark Townsley) Yes

(Ron Bonica) No Objection

Comment (2007-12-19 for -)
No email
send info
I support Dan's comment on managability, but will let him hold the discuss

(Ross Callon) No Objection

(Lisa Dusseault) No Objection

(Lars Eggert) (was Discuss) No Objection

Comment (2008-01-24)
No email
send info
Section 3.3., paragraph 7:
> o  Positive feedback from upper layer protocols.  For instance, TCP
>    can indicate to the IP layer that it is making progress.  This is
>    similar to how IPv6 Neighbor Unreachability Detection can in some
>    cases be avoided when upper layers provide information about
>    bidirectional connectivity [RFC2461].

  This is a pretty large architectural change. No transport protocol had
  to indicate to the network layer "that it is making progress" in order
  to sustain connectivity. It's also unclear what "progress" is - TCP
  connections can be idle for days, with no packets going back and
  forth. This is a feature.


Section 3.3., paragraph 9:
> o  Negative feedback from upper layer protocols.  It is conceivable
>    that upper layer protocols give an indication of a problem to the
>    multihoming layer.  For instance, TCP could indicate that there's
>    either congestion or lack of connectivity in the path because it
>    is not getting ACKs.

  TCP knows how to deal with congestion or transient lack of
  connectivity internally. TCP has no notion of indicating to the
  network layer that it should do something about the connectivity, and
  it's unclear when this new option should be used rather than reacting
  to connectivity events through congestion control.


Section 3.3., paragraph 10:
> o  ICMP error messages.  Given the ease of spoofing ICMP messages,
>    one should be careful to not trust these blindly, however.  Our
>    suggestion is to use ICMP error messages only as a hint to perform
>    an explicit reachability test or move an address pair to a lower
>    place in the list of address pairs to be probed, but not as a
>    reason to disrupt ongoing communications without other indications
>    of problems.

  If I understand this correctly, this proposed that SHIM6 hijack the
  ICMP messages that are generated in response to transport sessions.
  This is problematic, because it will prevent the transport protocols
  from reacting to these ICMP messages in the currently-specified way.
  If both SHIM6 and the transport protocol act on ICMP messages, i.e.,
  if they're not intercepted but snooped, there is a potential for
  problematic interactions between the SHIM6 response mechanism to an
  ICMP message and the transport protocol response mechanism.


Section 3.5., paragraph 1:
> Efficient congestion
> control over multiple paths is a considered research at the time this
> specification is written.

  Yes, multipath congestion control is research, and because SHIM6
  prevents transports from being aware about the actual paths that are
  available or being used (because it overloads addresses), it
  effectively prevents the use of any future multipath congestion
  control transports.


Section 1., paragraph 0:
> 1.  To avoid the other side from concluding there is a reachability
>     failure, it's necessary for a host implementing the failure
>     detection mechanism to generate periodic keepalives when there is
>     no other traffic.

  Many transport protocols went to great lengths in order to not needing
  to send keepalives when there is no payload data to be exchanged.
  Introducing keepalives at the SHIM6 layer basically eliminates the
  benefits of that design choice. Why does SHIM6 exchange keepalives to
  test connectivity if the transport protocols don't have any data to
  exchange? Also, there's already the issue with battery drain caused by
  NAT keepalives (cf. the SAFE BOF); this mechanism exacerbates this
  problem.


Section 1., paragraph 2:
>     The interval after which keepalives are sent is named Keepalive
>     Interval.  This document doesn't specify a value for Keepalive
>     Interval, but recognizes that an often used approach is sending
>     keepalives at one-half to one-third of the Keepalive Timeout
>     interval, so that multiple keepalives are generated and have time
>     to reach the correspondent before it times out.  An upper bound
>     on this interval would be (Keepalive Timeout - 2) seconds, so
>     that one keepalive has time to reach the other side, assuming a
>     maximum one-way delay of 2 seconds.

  A path that includes two GSM access links can easily have a
  propagation delay > 2 seconds. Also, if this document doesn't specify
  the intervals, which document is?


Section 6., paragraph 1:
> Section 7 provides some suggested defaults for these timeout values.
> Experience from the deployment of the SHIM6 protocol is needed in
> order to determine what values are most suitable.  The setting of
> these values is also related to various parameters in transport
> protocols, such as TCP keepalive interval.

  (From Bernard's tsv-dir review.) Negotiation of a static SHIM6
  Keepalive timeout, is allowed, if different from the default value.
  However, this relationship is not explored.  The TCP keepalive
  interval is generally kept quite large, partly out of a desire not to
  tear down idle TCP connections due to a transient failure.  The SHIM6
  keepalive interval during idle is not defined in the Failure Detection
  document, but my impression was that it could be much shorter and
  this would seem to collide with the philosophy of TCP keepalives.  So
  I'm not clear what the above sentence means.


Section 4.2., paragraph 6:
> Upon changing to a new address pair, the network path traversed most
> likely has changed, so that the ULP SHOULD be informed.  This can be
> a signal for the ULP to adapt due to the change in path so that, for
> example, TCP could initiate a slow start procedure, although it's
> likely that the circumstances that led to the selection of a new path
> already caused enough packet loss to trigger slow start.

  There is currently no clear approach for how transport protocols
  should react to connectivity change indications. Suggest to remove 
  everything after and including the "for example".


Section 4.2., paragraph 7:
> Similarly, one can also envision that applications would be able to
> tell the IP or transport layer that the current connection in
> unsatisfactory and an exploration for a better one would be
> desirable.  This would require an inter-layer communication mechanism
> to be developed, however.  In any case, this is another issue that we
> treat as being outside the scope of pure address exploration.

  One can envision this, but it's completely unclear what
  "unsatisfactory" and "better" means in this regard. Also, the SHIM6
  mechanism selects *a* path, but doesn't attempt to select one
  according to a specific set of criteria. Not sure if this paragraph is
  helpful - remove?


Section 4.3., paragraph 5:
> Section 7 suggests default values for the timers associated with the
> exploration process.  The value Initial Probe Timeout (0.5 seconds)
> specifies the interval between initial attempts to send probes;
> Number of Initial Probes (4) specifies how many initial probes can be
> sent before the exponential backoff procedure needs to be employed.
> This process increases the time between every probe if there is no
> response.  Typically, each increase doubles the time but this
> specification does not mandate a particular increase.

  (From Bernard's tsv-dir review.) Interactions of SHIM6 with congestion
  control.  Section 4.3 of the Failure Detection document talks about
  exploration timeout values.  Exploration can be kicked off if no
  inbound traffic is received within Send Timeout (default = 10
  seconds). The first observation is that the Send Timeout should
  probably depend on the RTO estimate, as it does in SCTP.  Otherwise we
  could have a  network with a high RTO and SHIM6 exploration could
  commence after RTO is  backed off only a few times.  This would be
  undesirable from a congestion  control point of view. The suggested
  value of the Initial Probe Timeout (500ms) is less than RTOmin and 4
  probes can be sent before initiating exponential backoff.  This seems
  like it could violate "conservation of packets".  Why doesn't
  exponential backoff begin immediately? Instead of a static default
  Initial Probe Timeout value, might it make sense to base this on RTO
  estimates?  Again, in SCTP these issues  are handled due to
  integration with the transport layer.


Section 7., paragraph 4:
> Alternate values of the Send Timeout may be selected by a host and
> communicated to the peer in the Keepalive Timeout Option.

  Guidance is needed on appropriate and inappropriate values for the
  send timeout.

(Sam Hartman) No Objection

(Russ Housley) (was Discuss) No Objection

Comment (2007-12-19)
No email
send info
  From the SecDir by Steve Kent:
  >
  > This discussion does a good job of exploring the DoS concerns
  > related to use of REAP with SHIM6. However, I would like to see a
  > paragraph or two explaining why the designers of REAP chose to not
  > use a protocol like IPsec to provide security for REAP. I am not
  > saying the choice made here is bad, but rather that it would be
  > useful to capture the rationale behind the choice.
  >
  I saw a response, and there seemed to be agreement that the
  requested paragraph seems like a good thing.  Where is it?

(Chris Newman) No Objection

(Jon Peterson) No Objection

(Tim Polk) No Objection

Comment (2007-12-18 for -)
No email
send info
Section 4.3 Exploration Order, first sentence of paragraph 2 is slightly mangled:

   Nodes first consult the RFC 3484 default address selection rules
   [RFC3484] Section 4 rules to determine what combinations of addresses
   are allowed from a local point of view, as this reduces the search
   space.

The address selection rules in RFC 3484 are specified in sections 5 and 6; section 4 
is "Candidate Source Addresses".  I believe deleting "Section 4 rules" following the
[RFC3484] reference is a complete fix.

(Dan Romascanu) (was Discuss) No Objection

Comment (2008-01-24)
No email
send info
It would be appropriate to include in Section 9 a recommendation that failure detections and switch-over events to a backup path be communicated via asynchronous notifications to managament systems (if available) and recorded in events logs.

(David Ward) (was Discuss) No Objection

Magnus Westerlund (was Discuss) No Objection

(Jari Arkko) Recuse

(Cullen Jennings) (was Discuss, No Objection, No Record, Discuss, No Objection) No Record