Failure Detection and Locator Pair Exploration Protocol for IPv6 Multihoming
RFC 5534
Yes
(Mark Townsley)
No Objection
(Chris Newman)
(David Ward)
(Jon Peterson)
(Lisa Dusseault)
(Magnus Westerlund)
(Ross Callon)
(Sam Hartman)
(Tim Polk)
Recuse
(Jari Arkko)
No Record
(Cullen Jennings)
Note: This ballot was opened for revision 13 and is now closed.
Lars Eggert
(was Discuss)
No Objection
Comment
(2008-01-24)
Section 3.3., paragraph 7: > o Positive feedback from upper layer protocols. For instance, TCP > can indicate to the IP layer that it is making progress. This is > similar to how IPv6 Neighbor Unreachability Detection can in some > cases be avoided when upper layers provide information about > bidirectional connectivity [RFC2461]. This is a pretty large architectural change. No transport protocol had to indicate to the network layer "that it is making progress" in order to sustain connectivity. It's also unclear what "progress" is - TCP connections can be idle for days, with no packets going back and forth. This is a feature. Section 3.3., paragraph 9: > o Negative feedback from upper layer protocols. It is conceivable > that upper layer protocols give an indication of a problem to the > multihoming layer. For instance, TCP could indicate that there's > either congestion or lack of connectivity in the path because it > is not getting ACKs. TCP knows how to deal with congestion or transient lack of connectivity internally. TCP has no notion of indicating to the network layer that it should do something about the connectivity, and it's unclear when this new option should be used rather than reacting to connectivity events through congestion control. Section 3.3., paragraph 10: > o ICMP error messages. Given the ease of spoofing ICMP messages, > one should be careful to not trust these blindly, however. Our > suggestion is to use ICMP error messages only as a hint to perform > an explicit reachability test or move an address pair to a lower > place in the list of address pairs to be probed, but not as a > reason to disrupt ongoing communications without other indications > of problems. If I understand this correctly, this proposed that SHIM6 hijack the ICMP messages that are generated in response to transport sessions. This is problematic, because it will prevent the transport protocols from reacting to these ICMP messages in the currently-specified way. If both SHIM6 and the transport protocol act on ICMP messages, i.e., if they're not intercepted but snooped, there is a potential for problematic interactions between the SHIM6 response mechanism to an ICMP message and the transport protocol response mechanism. Section 3.5., paragraph 1: > Efficient congestion > control over multiple paths is a considered research at the time this > specification is written. Yes, multipath congestion control is research, and because SHIM6 prevents transports from being aware about the actual paths that are available or being used (because it overloads addresses), it effectively prevents the use of any future multipath congestion control transports. Section 1., paragraph 0: > 1. To avoid the other side from concluding there is a reachability > failure, it's necessary for a host implementing the failure > detection mechanism to generate periodic keepalives when there is > no other traffic. Many transport protocols went to great lengths in order to not needing to send keepalives when there is no payload data to be exchanged. Introducing keepalives at the SHIM6 layer basically eliminates the benefits of that design choice. Why does SHIM6 exchange keepalives to test connectivity if the transport protocols don't have any data to exchange? Also, there's already the issue with battery drain caused by NAT keepalives (cf. the SAFE BOF); this mechanism exacerbates this problem. Section 1., paragraph 2: > The interval after which keepalives are sent is named Keepalive > Interval. This document doesn't specify a value for Keepalive > Interval, but recognizes that an often used approach is sending > keepalives at one-half to one-third of the Keepalive Timeout > interval, so that multiple keepalives are generated and have time > to reach the correspondent before it times out. An upper bound > on this interval would be (Keepalive Timeout - 2) seconds, so > that one keepalive has time to reach the other side, assuming a > maximum one-way delay of 2 seconds. A path that includes two GSM access links can easily have a propagation delay > 2 seconds. Also, if this document doesn't specify the intervals, which document is? Section 6., paragraph 1: > Section 7 provides some suggested defaults for these timeout values. > Experience from the deployment of the SHIM6 protocol is needed in > order to determine what values are most suitable. The setting of > these values is also related to various parameters in transport > protocols, such as TCP keepalive interval. (From Bernard's tsv-dir review.) Negotiation of a static SHIM6 Keepalive timeout, is allowed, if different from the default value. However, this relationship is not explored. The TCP keepalive interval is generally kept quite large, partly out of a desire not to tear down idle TCP connections due to a transient failure. The SHIM6 keepalive interval during idle is not defined in the Failure Detection document, but my impression was that it could be much shorter and this would seem to collide with the philosophy of TCP keepalives. So I'm not clear what the above sentence means. Section 4.2., paragraph 6: > Upon changing to a new address pair, the network path traversed most > likely has changed, so that the ULP SHOULD be informed. This can be > a signal for the ULP to adapt due to the change in path so that, for > example, TCP could initiate a slow start procedure, although it's > likely that the circumstances that led to the selection of a new path > already caused enough packet loss to trigger slow start. There is currently no clear approach for how transport protocols should react to connectivity change indications. Suggest to remove everything after and including the "for example". Section 4.2., paragraph 7: > Similarly, one can also envision that applications would be able to > tell the IP or transport layer that the current connection in > unsatisfactory and an exploration for a better one would be > desirable. This would require an inter-layer communication mechanism > to be developed, however. In any case, this is another issue that we > treat as being outside the scope of pure address exploration. One can envision this, but it's completely unclear what "unsatisfactory" and "better" means in this regard. Also, the SHIM6 mechanism selects *a* path, but doesn't attempt to select one according to a specific set of criteria. Not sure if this paragraph is helpful - remove? Section 4.3., paragraph 5: > Section 7 suggests default values for the timers associated with the > exploration process. The value Initial Probe Timeout (0.5 seconds) > specifies the interval between initial attempts to send probes; > Number of Initial Probes (4) specifies how many initial probes can be > sent before the exponential backoff procedure needs to be employed. > This process increases the time between every probe if there is no > response. Typically, each increase doubles the time but this > specification does not mandate a particular increase. (From Bernard's tsv-dir review.) Interactions of SHIM6 with congestion control. Section 4.3 of the Failure Detection document talks about exploration timeout values. Exploration can be kicked off if no inbound traffic is received within Send Timeout (default = 10 seconds). The first observation is that the Send Timeout should probably depend on the RTO estimate, as it does in SCTP. Otherwise we could have a network with a high RTO and SHIM6 exploration could commence after RTO is backed off only a few times. This would be undesirable from a congestion control point of view. The suggested value of the Initial Probe Timeout (500ms) is less than RTOmin and 4 probes can be sent before initiating exponential backoff. This seems like it could violate "conservation of packets". Why doesn't exponential backoff begin immediately? Instead of a static default Initial Probe Timeout value, might it make sense to base this on RTO estimates? Again, in SCTP these issues are handled due to integration with the transport layer. Section 7., paragraph 4: > Alternate values of the Send Timeout may be selected by a host and > communicated to the peer in the Keepalive Timeout Option. Guidance is needed on appropriate and inappropriate values for the send timeout.
Mark Townsley Former IESG member
Yes
Yes
()
Chris Newman Former IESG member
No Objection
No Objection
()
Dan Romascanu Former IESG member
(was Discuss)
No Objection
No Objection
(2008-01-24)
It would be appropriate to include in Section 9 a recommendation that failure detections and switch-over events to a backup path be communicated via asynchronous notifications to managament systems (if available) and recorded in events logs.
David Ward Former IESG member
(was Discuss)
No Objection
No Objection
()
Jon Peterson Former IESG member
No Objection
No Objection
()
Lisa Dusseault Former IESG member
No Objection
No Objection
()
Magnus Westerlund Former IESG member
(was Discuss)
No Objection
No Objection
()
Ron Bonica Former IESG member
No Objection
No Objection
(2007-12-19)
I support Dan's comment on managability, but will let him hold the discuss
Ross Callon Former IESG member
No Objection
No Objection
()
Russ Housley Former IESG member
(was Discuss)
No Objection
No Objection
(2007-12-19)
From the SecDir by Steve Kent: > > This discussion does a good job of exploring the DoS concerns > related to use of REAP with SHIM6. However, I would like to see a > paragraph or two explaining why the designers of REAP chose to not > use a protocol like IPsec to provide security for REAP. I am not > saying the choice made here is bad, but rather that it would be > useful to capture the rationale behind the choice. > I saw a response, and there seemed to be agreement that the requested paragraph seems like a good thing. Where is it?
Sam Hartman Former IESG member
No Objection
No Objection
()
Tim Polk Former IESG member
No Objection
No Objection
(2007-12-18)
Section 4.3 Exploration Order, first sentence of paragraph 2 is slightly mangled: Nodes first consult the RFC 3484 default address selection rules [RFC3484] Section 4 rules to determine what combinations of addresses are allowed from a local point of view, as this reduces the search space. The address selection rules in RFC 3484 are specified in sections 5 and 6; section 4 is "Candidate Source Addresses". I believe deleting "Section 4 rules" following the [RFC3484] reference is a complete fix.
Jari Arkko Former IESG member
Recuse
Recuse
()
Cullen Jennings Former IESG member
(was Discuss, No Objection, No Record, Discuss, No Objection)
No Record
No Record
()