Skip to main content

Telechat Review of draft-ietf-babel-rtt-extension-05

Request Review of draft-ietf-babel-rtt-extension
Requested revision No specific revision (document currently at 07)
Type Telechat Review
Team Internet of Things Directorate (iotdir)
Deadline 2024-02-13
Requested 2024-02-05
Requested by Éric Vyncke
Authors Baptiste Jonglez, Juliusz Chroboczek
I-D last updated 2024-02-15
Completed reviews Secdir Telechat review of -05 by Shivan Kaul Sahib (diff)
Iotdir Telechat review of -05 by Pascal Thubert (diff)
Intdir Telechat review of -05 by Antoine Fressancourt (diff)
Secdir Last Call review of -04 by Shivan Kaul Sahib (diff)
Genart Last Call review of -04 by Roni Even (diff)
Opsdir Last Call review of -04 by Sheng Jiang (diff)
Rtgdir Early review of -03 by Joel M. Halpern (diff)
Assignment Reviewer Pascal Thubert
State Completed
Request Telechat review on draft-ietf-babel-rtt-extension by Internet of Things Directorate Assigned
Posted at
Reviewed revision 05 (document currently at 07)
Result Almost ready
Completed 2024-02-15

I am the assigned IoT-Directorate reviewer for draft-ietf-babel-rtt-extension
and reviewed version 05. I found teh document to be almost ready, with a few
possible additions and changes that would make the reader's journey more
agreeable. Please find my comments below.


> Missing ref to updated document

1.  Introduction

   The Babel routing protocol [RFC8966] does not mandate a specific
   algorithm for computing metrics; existing implementations use a
   packet-loss based metric on wireless links and a simple hop-count
   metric on all other types of links.  While this strategy works
   reasonably well in many networks, it fails to select reasonable
   routes in some topologies involving tunnels or VPNs.

> an applicability statement of the various possibilities would be useful in
the future. Could be a paper or an RFC. AT least it would make sense to have an
applicability section here. For instance, IOT may experience large and
asymetric delays. Current Wi-Fi (pre deterministic - RAW) can experience short
delays but relatively important delay variation. It is unclear whether the work
here could apply to any of these, seems more targetted at long range / stable
and symmetrical delay comlines.

   the existing implementations of Babel consider both routes as having
   the same metric, and will therefore route the traffic through C in
   roughly half the cases.

   > There's art in IGPs that allow configuring metrics or derive metrics from
   line speed. Is that availabe in current implementations? Note that for the
   given example speed of light will certainly have measurable effects. But
   going to Orleans and back may be hidden inside e.g., wireless delays.

   In this document, we specify an extension to the Babel routing
   protocol that enables precise measurement of the round-trip time
   (RTT) of a link, and allows its usage in metric computation.  Since
   this causes a negative feedback loop, special care is needed to
   ensure that the resulting network is reasonably stable (Section 4).

   > I'm effectively concerned with the effect of buffer bloats which could
   create oscillations exactly like early ARPNET load-based metric.

   We believe that this protocol may be useful in other situations than
   the one described above, such as when running Babel in a congested
   wireless mesh network or over a complex link layer that performs its
   own routing; the fine granularity of the timestamps used (1µs) should
   make it possible to experiment with RTT-based metrics on this kind of
   link layers.

> Not sure we want that text. Highly debatable until experimented with, see
curre,t experimentations of ARVR on Wi-Fi which suffer from variable lags.

3.2.  Protocol operation

   A Babel speaker periodically sends Hello messages to its neighbours
   (Section 3.4.1 of [RFC8966]).  Additionally, it occasionally sends a
   set of IHU messages, at most one per neighbour (Section 3.4.2 of

> define IHU on first use; explain what it is for vs Hello

      A          B
        |      |
     t1 +      |
        |\     |
        | \    |
        |  \   |  Hello(t1)
        |   \  |
        |    \ |
        |     \|
        |      + t1'
        |      |
        |      |               RTT = (t2 - t1) - (t2' - t1')

>  Ref IEEE 1588? there are many profiles for it; maybe this work could show as

Important to indicate which time stamps are used (eg where in the stack is t1
measured). Do we measure the latency inside the sender meaning that the time
stamp is that of the software above, or do we measure stating at MAX enqueue,
or starting at PHY XMIT?

For short distance / high precision as claimed in the introduction, setting the
time stamps in the message above is hard and the standards often allow a second
message to carry the value of the time stamp that the hardware provides for the
first sending. Depending on the answer above, this might or might not be needed.

   In order to enable the computation of RTTs, a node A MUST include in
   every Hello that it sends a timestamp t1 (according to A's local
   clock), as illustrated in Figure 2.  When a node B receives A's
   timestamped Hello, it computes the time t1' at which the Hello was
   received (according to B's local clock).  It then MUST record the
   value t1 in the Origin Timestamp field of the Neighbour Table entry
   corresponding to A, and the value t1' in the Receive Timestamp field
   of the Neighbour Table entry.

> Do we need a sequence counter to filter out bloated IHU answers that are
received out of sync?

   In principle, this algorithm is inaccurate in the presence of clock
   drift (i.e., when A's and B's clocks are running at different
   frequencies).  However, t2' - t1' is usually on the order of seconds,
   and significant clock drift is unlikely to happen at that time scale.

> back to applicability of the work. I believe some expectations on the clock
drift vs RTT can be made for modern hardware. Nodes have an idea of which clock
they use and what drift they have. The draft could recommend that the clocking
error be 2 orders of magnitude less than the RTTs that the protocol measures,
else the measurement cannot be trusted.

   When a Hello TLV is buffered for transmission, we insert a PadN sub-
   TLV (Section 4.7.2 of [RFC8966]) with a length of 4 octets within the
   TLV.  When the packet is ready to be sent, we check whether it
   contains a 4-octet PadN sub-TLV; we then overwrite the PadN sub-TLV
   with a Timestamp sub-TLV with the current time, and send out the

> hardware will not do that. Back to my earlier question of which step in the
stack is relevant for this measurement. Surelly any step that is dependent on
the load of this system (variable but independent of the link being used)  as
opposed to the load to the transmission should be omitted.

   Second, using the RTT signal for route selection gives rise to a
   negative feedback loop: when a route has a low RTT, it is deemed to
   be more desirable, which causes it to be used for more data traffic,
   which may lead to congestion, which in turn increases the RTT.
   Without some form of hysteresis, using RTT for route selection would
   lead to oscillations between parallel routes, which might lead to
   packet reordering and negatively affect upper-layer protocols (such
   as TCP).

> I believe this discussion should be seen earlier in the text, eg in the
introduction (not the solution but at least that the issue exists and is
addressed in the protocol). See my early comment on ARPANET.

4.3.  Hysteresis

   Even after applying a bounded mapping from smoothed RTT to a cost
   value, the cost may fluctuate when a link's RTT is between rtt-min
   and rtt-max.  This is effectively mitigated by using a robust
   hysteresis algorithm, such as the one described in Appendix A.3 of

> if this is what solves the oscillation issue please mention it, and maybe
discuss how it does so. Maybe provide references to the papers that exist on
the matter and your early trials.

8.  Security Considerations

> Maybe discuss the consequences of a MIM that modifies the values eg to
discourage Paris to Paris and cause routing via Tokyo?