IETF 108 TCPM Minutes - July 27, 2020 ===================================== - primary note taker session 1: Neal Cardwell - primary note taker session 2: Wesley Eddy - please queue to speak using meetecho - please state your name before speaking - this meeting is being recorded - two sessions today TCPM Session 1 - WG status --------- - Michael Scharf summarized the agenda - there were no modifications - Recent RFCs - rto-consider requires an update, so is work-in-progress - RACK has completed WGLC, but had quite a large number of comments - 793bis getting close to WGLC draft-ietf-tcpm-2140bis ----------------------- - Michael S. presenting, for the authors - about comments received in WGLC - current version is 05 - ran WGLC May 30 - Jun 14 - summarized post-WGLC comments - Mirja Kühlewind: - Regarding security considerations and linkability: if you reuse state, then that could potentially reveal that there's linkability between the two connections, even if, for example, the source address has changed. The document could add a warning in the document. - Regarding Appendix C: A summary would be fine, but this a full copy of an expired past draft that was controversial in the past - smaller comment on TFO will be addressed in next release of doc - if you have any opinion, now is your latest chance to offer feedback - plan is to move forward with this doc draft-ietf-tcpm-rfc793bis ------------------------- - presented by: Wesley Eddy - rev 17 posted a few weeks ago - believed to addresses all comments to date - new TCP header flags registry - taking the existing registry, and putting the flags in it - current registry only has ECN-related flags - change is to make this a subregistry - flesh it out with the other bits - indicate reserved bits - add other TCP control flags in the registry - looking for acknowledgment/feedback on this plan - doing a fairly long WGLC soon sounds like a great plan - Richard Scheffenegger: - i believe the 12 bits in the flags bits are standalone, and bits 0-3 is the version field - recommend making the definition/semantics of the bit offset more clear - Wesley: in the header diagram in the ASCII art, the "Data Offset" bits take up the first 0-3 bits - the bit number here is an offset from the aligned point - maybe we could use "bit offset" instead - Martin Duke - how will IANA react when we override some of the bits with the Accurate ECN bits? - Wesley E: won't they just update the registry? - Michael S: yes, it is an interesting question about how to deal w/ bits 8-9 (CWR, ECE) - but we don't need to sort this out in 793bis - assignment notes could be used - Mirja Kühlewind - in Accurate ECN we don't change the names of bits 8-9 - it's not proposed to change the names - only propose to add a new name for bit number 7 - Richard S. - IANA could add a secondary reference about the secondary meaning of the bits - this issue doesn't need to hold up 793bis - Wesley E, Michael S: indicate they agree - Michael S: - will start WGLC in next 2 months - may wait to avoid colliding with summer holiday season - please prepare to review this document, and comment on this during WGLC - this doc is one of the most important deliverables in last few years draft-ietf-tcpm-rack -------------------- - presented by: Yuchung Cheng - over 90% of the text has been rewritten, based on WGLC feedback - reviewers suggested it would be good to have event sequences; added that - motivation - quickly detect packet drops in short flows, or end of app data flights - detect loss retransmissions - tolerate low reordering degree in time distance - e.g. if the last packet is delivered just a little time (small fraction of the RTT) before the rest of the flight - for longer-term reordering, RACK does not intend to tolerate this - changed text to not comment about how frequent/common various scenarios are, since this can vary by environment - high-level design - goal: try to do fast recovery as much as possible; trying to avoid RTO (which resets cwnd) - using per-packet timestamps, ACKs - use ACKs to adjust RTT, reordering measurements - if packet is overdue based on packet's transmit time, RACK.RTT and reordering window, consider it lost - RTT used by RACK is RTT based on last packet that was delivered - and must be an RTT for a segment sent after the possibly lost segment - sometimes you run out of ACKS - so after 2 RTT, send TLP - just to send one more packet to probe for losses, to trigger fast recovery instead of RTO - Example: how TLP recovers faster via RACK - SACK for TLP allows RACK to deduce a packet is lost - MUST, SHOULD, MAY changes - reordering window SHOULD adapt - reordering timer SHOULD be used - You can just implement RACK; don't have to implement TLP. Omitting TLP is not recommended (we do recommend implementing TLP), but RACK will work without TLP. - don't need to have many different timers at sender - just one timer, with a state variable to distinguish whether timer is TLP, RTO, ZWP, RACK reordering timer - removed max ACK delay of 200ms - this is implementation-specific - trying to avoid outdated magic numbers - relationship to other RFCs - RACK-TLP is an alternative to replace RFC6675 - most stacks have used RFC6675 thus far, or in the past - If you implement RACK-TLP, you don't need to implement RFC6675 (conservative SACK recovery) and RFC5827 (Early Retransmit) - complementary and compatible: - we encourage Limited Transmit - RTO restart: computing timer value in a different way - F-RTO: double check if something is really lost, by sending new data; we encourage - RTO - Eifel: Eifel detects spurious retransmission, undo the cwnd reduction from congestion control; works with RACK-TLP; do encourage this - thankful for all the reviews of the draft - we hope the document has improved - please take another look and let us know - Yoshifumi - did you modify any logic in the algorithms of RACK? - Yuchung: in the latest version, we did not change the algorithm (except for making the 200ms implementation-specific) - Michael S: - proposal is that once all persons who commented have confirmed their issues have been dealt with, we will complete this WGLC and ship the document to the IESG draft-ietf-tcpm-hystartplusplus ------------------------------- - presented by Praveen Balasubramanian - fundamental problem: traditional slow-start can overshoot the ideal send rate, and can cause massive packet losses - can cause a latency increase - if packet loss is high enough, and implementations don't have RACK loss detection, this can result in RTO - uses delay increase algorithm from the original Hystart paper, which is linked in the draft - measure increase in delay, use this to exit slow start - if get delay spike in wifi links or temporary congestion, this can cause spurious exit - so after exit slow-start, use Limited Slow Start - and also compute cwnd as it would be computed by the congestion control module, and use the max of the two cwnds - we define tuning constants based on our measurement experiments - Algorithm Details - track MinRTT - don't want to maintain MinRTT for whole connection, because this can go stale - there are heuristics about when a delay-based decision can be made - Hystart++ SHOULD be used on first slow start, and MAY be used after idle - We don't recommend using on subsequent slow-starts, because the connection will already have an ssthresh at that point to help avoid overshoot. - Changes in draft 03 - LOW_CWND=16: prevents algorithm from kicking in too early - the higher the value used, this may cause overshoot - N_RTT_SAMPLE=8: the more samples you have, the more accurate your estimate is - lower values cause lower accuracy - MIN_RTT_THRESH = 4ms, MAX_RTT_THRESH = 16ms - sensitivity is controlled by this - these values work well in practice - LSS_DIVISOR=0.25 - Status and Next Steps - 2 implementations so far: Windows TCP, CloudFlare QUIC - Linux TCP: has only original HyStart - Next: - want to look into usage of bandwidth and throughput estimate; would like to compare to just using delay increase - would like to compare to BBR STARTUP - please review, provide feedback - please share performance data if you have an implementation of Hystart++ - Stuart Cheshire: - on slide 3 (Algorithm Details) - question about the threshold to detect when RTT is growing - IIUC this is a fixed time in ms, rather than a time relative to the RTT - wonder if it makes sense to make it a multiple of the RTT - in 10Gbps, 100ms is many packets - at modem speeds, a jitter of 100ms is only a few packets in the queue - in a datacenter network, you'd want to have a smaller threshold, rather than a fixed threshold - Praveen: testing has been focused on wide area networks and the Internet - datacenter networks, time spent in recovery is not that large - whether this can help in datacenter networks with datacenter networks can be investigated - LEDBAT has used fixed constants - Martin Duke - has a normative reference to ABC; presumably ABC can have its status upgraded - has a normative reference to RFC3742, which is currently experimental - is there support for upgrading RFC3742 to proposed standard? - Jana Iyengar: is there a need for RFC3742 as a normative reference? - Praveen: yes, we have a dependency on the algorithm, but we may be able to subsume if needed - Jana Iyengar: subsuming the part of RFC3742 that's necessary seems like a reasonable way to move forward - Martin Duke: - this could obsolete RFC3742 in principle - Jana Iyengar: - how much deployment experience do you have in datacenters and the Internet? - Praveen: very widely deployed: default-enabled for all connections (datacenter, Internet, everywhere) for 2 years - previous 2 TCPM we presented the data - support comparing against BBR STARTUP - Praveen: yes, this is on our road map - Ian Swett: - support moving forward with this work - concerned about fixed threshold - support comparison with BBRv1 and BBRv2 STARTUP, as a 3-way comparison - Praveen: we will probably focus on BBRv2, based on discussion with Neal - Bob Briscoe: - from chat window: "Extending Stuart's question, am I correct that the threshold /is/ relative to the previous round's minRTT (it's 1/8), but the upper and lower threshold's are constants, not relative to thresholds" - Praveen: that is correct - Neal Cardwell: also expressing support for the RTT threshold being an adaptive multiple of the MinRTT; that's what Linux has done since 2008, currently with multiple of 1.125x, which has worked well in production - Praveen: yes, as discussed in Bob's question, the RTT threshold is adaptive, but the ceiling and floor are fixed draft-gomez-tcpm-ack-rate-request --------------------------------- - presented by Carles Gomez - perhaps for receiver behavior, perhaps the text should say "MAY" use R rather than MUST use R - Stuart C: - very happy to see this work - in very constrained devices, often there is only enough RAM for a single send buffer - if sender sends only 1 segment, have to wait for a delayed ACK timer - this is often used as an argument for why constrained devices can't use TCP - this would help remove that objection to using TCP for constrained devices - Carles: - we have a reference to the IoT use case, will look at updating to make this more detailed - Jana I: - we are working on something similar for QUIC; an early draft at the moment - suggest taking a look at that one as well - QUIC has a reordering tolerance field as well - if sender can handle reordering, it can indicate it has a reordering tolerance of N packets - to make the transport more robust to reordering - Carles: thanks, we will take a look at this reordering feature, and will take it into consideration for next revision - Gorry F: - you mentioned there was an ability to turn off ACK delay - might want a more nuanced way to turn off delay ACKs and get immediate ACKs for the next few segments - Carles: yes, this may be an interesting feature; we will take this into consideration draft-amend-tcpm-mptcp-robe --------------------------- - presented by Markus Amend - RobE stands for "Robust Establishment" - in these experiments, all the RobE solutions are faster than standard MPTCP - Michael S: - as chair, I think there are IPR disclosures (at least 2), and when I checked last time they didn't explain the licensing conditions - Markus: yes - request that companies that have IPR explain the licensing conditions - Markus: I think that's fair; is that the only hurdle? - the IPR can influence the interest of others - Mirja K: - I don't see any of the MPTCP authors in this meeting - there were discussions about security implications of trying to establish two connections at the same time - Markus: I remember, and I think we described well that there are no security implications. The approaches are very different in terms of security. - IIRC only the case where you try to establish two at the same time was an issue - Follow-up discussion in second session (see below) TCPM session 2 - draft-ietf-netconf-tcp-client-server ------------------------------------ - presented by Kent Watsen - this is a NETCONF WG draft, but plan is to do a joint working group last call between NETCONF and TCPM - Kent reviewed the YANG model, and parameter groupings - this will enter a WGLC shortly - there were no comments from the rest of the WG during the meeting draft-scharf-tcpm-yang-tcp -------------------------- - presented by Michael Scharf - this YANG model differs from the client/server model Kent presented - IDR BGP model needs TCP model that permits TCP AO configuration - contains parts inspired by the TCP MIB: connection list and statistics - includes AO and MD5 configuration under the connection list parameters - there are areas for client and server configuration - there was a question about whether people who previously had objections to an earlier revision had any comments, but none either spoke or were present - Yoshifumi asked if there were any objections, and noted that there was previously some support on the list and during previous presentations - there were some issues acclimating to the hum tool in Meetecho - 2 hums were held: - hum for: piano - hum against: pianissimo - Yoshifumi: both hums are not very strong. The chairs will discuss next steps. Follow-up discussion on draft-amend-tcpm-mptcp-robe --------------------------------------------------- - question about what needs to be done to get this adopted - Michael S. mentioned that some MPTCP experts were not present today in the meeting, IPR may be an issue, and the difference between lab prototypes or full implementations should be clarified draft-kang-tcpm-accurate-data-scheduling-by-servers --------------------------------------------------- - presented by Jiao Kang - three use case scenarios were described - a message flow for MP_Navigation was described and the option format - Yoshifumi: agreed MP_PRIO is too simple, and tangible use cases should be used to determine what proposal is the best for addressing this. - continue discussion on the mailing list draft-touch-tcpm-ao-test-vectors -------------------------------- - presented by Juhamatti Kuusisaari - addresses the problem of lack of TCP AO test vectors - also discusses known implementation issues - intended as Informational - WG adoption requested - Michael S. asked if vendor implementations have been tested with these - hackathons may be a good venue for this as TCP AO has been a topic recently - Michael would like to confirm that the document is correct - authors are working to get one other implementation that matches - Tero Kivinen: have been doing hackathon work on TCP AO in tcpdump - Michael S.: This may be something useful, I suggest to continue the work