Skip to main content

Minutes IETF105: tsvarea
minutes-105-tsvarea-00

Meeting Minutes Transport Area Open Meeting (tsvarea) AG
Date and time 2019-07-25 19:50
Title Minutes IETF105: tsvarea
State Active
Other versions plain text
Last updated 2019-08-09

minutes-105-tsvarea-00
TSVAREA @ IETF-105 (Montreal)

Thursday, July 25, 2019
15:50-17:20 - Thursday Afternoon session II
Room: Centre Ville

Note Takers:
    Eric Kinnear

* Administrativa - TSV ADs

- Current status of working groups: TCPINC has been closed. Thanks to everyone
for the work in that area" - Several documents published in the last couple of
months, 4 new RFCs - maprg tomorrow, lots of other related sessions this week
with videos to watch - WebTransports in dispatch, touches transport a lot

Brian Trammell: WebTransports also had a side meeting, outcome was maybe
thinking about a BoF in Singapore. We should make sure there is encouragement
for transport people to show up, definitely needs transport eyeballs during the
BoF phase. David Schinazi: To add to WebTransport topic: new mailing list under
ART -webtransport@ietf.org Colin Perkins: LOOPS also needs transport eyeballs

* Where to do multipath work in the IETF? - Mirja
- Multiple groups in IETF and IRTF are talking multipath

David Black: (As chair of tsvwg) interest in moving work elsewhere. Would like
to see one forum for multipath topics, how about a non-working group forming
BoF in Singapore to go over all of these (kind of like dispatch mechanism). A
MP-interest BoF would be a good way to get people together. Mirja: Is that
cross-area? David: Not sure, not enough visibility

Lars Eggert: MPTCP should probably be closed, maintenance goes in tcpm or
tsvwg. DCCP/SCTP don't care, but do care about QUIC. Main MP work, that is
going to be done for QUIC, I think needs to be done in QUIC because it ties in
with protocol details. TCP was already quite established, so MPTCP made sense
as a different wg, but doing MPQUIC in a different wg would be a challenge. We
absolutely want to learn from MPTCP, don't want to reinvent, but there's a
problem where architectural thinking was done; should take those principles and
apply those to QUIC, since some of MPTCP is ugly to fit in with existing TCP
and get through middleboxes. That could be something that ICCRG could do, or a
joint session. Mirja: In practice, we have a problem where some of the core TCP
people don't show up in MTPCP or QUIC anymore. We need to make sure the right
people show up in the right places. Lars: I think that's an artefact of how
things evolved. MPTCP has been a proxy protocol and not really a TCP protocol,
so it's not surprising that the TCP folks aren't there anymore. QUIC we're not
seeing a ton of participation from recovery/TCP people, but I think now that
this will ramp up especially with Jana and Marten's work on the network
simulator. I'm hoping that the engagement will correct itself.

Spencer Dawkins: Thanks for bringing this forward to tsvarea. If we could come
up with principles that could be applied in different situations would be worth
doing. Questions like how do you pick which paths to use for which packets
when, that's not tied to a protocol. Maybe there are things like that which
could be spun off. Cross-area possibilities: we've had conversations about
applications that pop up and want to manage paths themselves; they might be
able to learn from those principles as well, even without being transport
protocols. I would hope that security and operations people would be
interested, as well.

Stu Card: Some of this is being done in the network coding research group. I've
been running MP routing and transport to aircraft for ~15 years, going to first
principles: it's the job of routing to create paths, and it's the job of
transport to use the paths. Transport could give hints about paths it would
like, routing could give hints about what paths it can construct. It's not just
that we need people to talk to each other, we need protocols to talk to each
other. Don't know where in IETF would be able to talk about those hints. This
is definitely a cross-area question.

Brian Trammell: (As chair of panrg) panrg is a venue for a subset of this
problem, exactly that kind of hinting. Sounds like a MP interest BoF is a good
idea. If only the usual people in the room, then that's one input. If routing
people show up, that's a good thing. This is definitely in the transport area
somehow, but we don't need to have the BoF in the mic line right here. On MPTCP
vs. MPQUIC: the way that TCP was multipath was constrained by the design of TCP
but MPQUIC's design is not going to be dealing with the same set of
constraints. Both between existing deployment and the network requirements.
That says that there needs to be transport-independent design principles. I'd
like to discuss that in a BoF in Singapore.

Lars: When you say MP, do you think it is in scope for a transport session to
use path diversity that is deeper than the first hop? MPTCP was originally
trying to be multi-path and ended up mostly multi-interface. Mirja: I had
multi-interface in my mind, but multi-path stuff is also present across
transport.

Jana Iyengar: I think that this problem statement is way too broad. As Lars
just said, there's multi-interface vs. multi-path/transport. MPTCP was a
separate wg for protocol work. We didn't discuss the congestion controller all
that much, the protocol document said you could replace it with whatever you
want. If we think about QUIC as the next thing, the QUIC wg is where that
should happen. Outside of that, I don't know is what's left to be done, are we
talking about protocol level work or congestion level? Doesn't make sense to
have multipath be separate because you want the protocol work to be done where
the protocol experts are. Mirja: I don't disagree. I do think there are things
left that may come up when people use this more, there are questions about
scheduling, which paths to set up when, interface questions. Some of them might
more clearly be in a research area. Do you think there might be space for a
multipath RG in IRTF? Jana: There's room if you create room for it, ask Colin.
I don't know how there's overlaps. Congestion controller: iccrg. Scheduling,
don't want to touch that at all, ultimately left to implementors, not valuable
enough to create a research group. Robin Marx (over jabber replayed by Mirja):
MPQUIC is the same authors as for MPTCP. Mirja: The problems we have here is
that we want to make sure that the people who have the expertise come to where
the work is done. The problem is that we have too many places. Also for people
who are interested in MPTCP/MP* outside of the IETF, we want to make sure that
they can find the right place in the IETF. Jana: In terms of MPTCP, use tcpm
and tsvwg, they are both great homes for continuing work, we've done that in
the past. In terms of how to get the right people in the work, we've struggled
as well with QUIC, most of TCPM isn't coming to QUIC even though we're bringing
the recovery draft into TCPM itself. Bob, Gorry are showing up, but not many
others. Mirja: We could have done QUIC core vs. QUIC HTTP mapping wg, could
have addressed that problem. Don't want to change it, but how do we move
forward with new work like that. Jana: I think the protocol work is more than
what you do above it, the protocol work belongs in the quic wg.

Florin Baboescu (Broadcom): A BoF is a bad idea. What is the overall timeline
for addressing this work? If you consider this to be used inside of SDOs,
having it be a BoF would eliminate it from 3GPP, since we'd not be able to do
work until March next year. It was already lost in Rel-16. Other groups like
BBF also wanted to have work get done. Mirja: No matter what we do, there will
be no MPQUIC work until next March. Florin: You cannot even *start* the work.
Mirja: You can start it, but not in the wg. Florin: You cannot start the work
even if not going to continue.

Colin Perkins: I don't think a BoF will make any difference in the timescale of
the QUIC work. MPQUIC needs to be in the QUIC wg. I'd be surprised if
everything is finished and ready to do this work in that timeframe. In terms of
MP-congestion control work, yes we could do that sort of work in ICCRG. In many
ways, that would make a lot of sense. There's also been discussion about moving
single-path congestion control work into ICCRG instead of doing it for TCP
specific. The downside there is that nothing can be standards track, only
informational or experimental. Maybe IETF is okay with referring to
non-standards track. Mirja: Also, IETF documents have IETF consensus, IRTF
documents don't. Colin: Yes, maybe that matters, maybe it doesn't. The Internet
doesn't run on standard CC anyway. Colin: Can we generalize multipath
congestion control across transports, I'm not sure if we know enough to answer
that. I think we need to discuss to see how general this is or if it's protocol
specific. +1 to a longer discussion in Singapore.

David Schinazi: The reason we form wgs in the IETF is that we have people who
don't have a home and we give them a home in which to sing kumbaya. I don't see
anyone with a draft saying "I don't know where to put this". This will go to
the right place. The reason we close MPTCP is that there are no more things
happening, if they come back then we will reopen it. We don't need a place for
hypothetical drafts that might come one day. Mirja: That doesn't address that
the people with the expertise will need to go to multiple groups and repeat
themselves, and people coming from outside don't know where to go. David: I
don't see the multiple working groups that aren't talking to each other. Mirja:
We have MPTCP and TCP people don't show up. David: But creating a new wg will
not get people to show up. I go for the drafts and the agenda, not for the
working group. Mirja: You'd go for something chartered to do MP-QUIC work.
David: We're all busy, there are always conflicts. I started this meeting in a
different room for something on their agenda and then came over here. I don't
think more working groups is the way to solve it.

Christian Huitema: I'll be brief to leave space for the next speaker (EK:
himself). There's no evidence that the people doing multipath aren't
participating in QUIC; they are. There would be something to be said for having
ICCRG look into multipath congestion control then that would be very
interesting. QUIC took some research that was happening and pulled it in, in
theory that's New Reno, but in practice it isn't.

Mirja: Thank you for the input. I think the community needs to propose
something here, please work on that if you're interested and think you can
scope it in a reasonable way.

* Congestion Defense in Depth - Christian Huitema

[Slide 1]
The way that we've been doing transport until the last three years, transport
was TCP and friends, and it was developed mostly in the kernel. If you want to
change something like TCP retransmission algorithm, it's very difficult, lots
of process, there's a gatekeeper that stands between you and imposes friction
that limits the amount of crazy innovation that you could do. Application level
transports solve that, QUIC interop spreadsheet has 19 implementations on 4
kernels: independence. That works because it's shipped in the application, as a
library.

[Slide 2]
That innvoation is very good, new opportunities for development and research.

[Slide 3]
But, if you're working for some company, and you want to win over other
companies, this can go very wrong very quickly. - Competitive congestion
control: Google QUIC has a constant for "how many cubic connections do you want
to emulate?" - Adversarial congestion control: If you can detect what others
are doing, you can hit their network-based signals yourself.

[Slide 4]
If you have an application that keeps winning local congestion, other users of
other apps will be upset. At a larger scale, could break everything.

[Slide 5]
We need to stop agreeing to play nice and instead enforce playing nice.
In depth means multiple points throughout the network.

Jana Iyengar: Thank you, this is a very interesting question and a very new
space where development and deployment of congestion controllers will be much
easier. That will change things, but I'm not sure how much. 15-20 years ago,
video applications started to get deployed on the internet and they were doing
their own congestion controllers, people were crying murder. We seem to have
survived that. I wanted to disagree with you about the gatekeeper, there has
been a bit of that in the linux kernel, but most server providers/content
providers haven't been stopped from deploying whatever they wanted to deploy.
BBR from Google, Akamai deploys FAST. There are a bunch that aren't in upstream
linux, the kernel hasn't been the gatekeeper in the past for this particular
thing. People who deploy stuff are those gatekeepers. Netflix, Google, anyone
who has servers and is deploying a significant amount of traffic. They haven't
changed fundamentally. Christian: But they have. Take the Google Cloud, the guy
who can change that isn't Google Cloud, but the guy who puts that on their VM
in the cloud somewhere. Jana: But today the guy running his VM has the ability
to change whatever they want. Christian: We should panic slowly. Jana: It's a
fallacy to think of the kernel as a gatekeeper, the people doing that
gatekeeping aren't experts in congestion control, that's been a problem for a
while.

Lars Eggert: Worrying about these things makes you an excellent candidate for
transport AD. :) I'm not as worried: there's roughly two kinds of applications
on the internet. Ones that ship enough bytes to do damage, and the long tail of
ones who don't. The guys who ship a lot of data are also the ones who carefully
monitor how their data is going, their revenue stream is dependent on not
causing the users trouble, that's the feedback loop that we get. Access
networks isolate users to their upstream link, at worst you're hurting
yourself. BitTorrent switching to LEDBAT is an example of that. The internet is
not really in danger here, but self-interference is a problem.

Matt Mathis: There are a bunch of mechanisms that defend the internet. Until
recently, the recovery code in TCP wasn't good enough, so if you pushed harder
on congestion control you took more losses and ran slower. Now, that's a bit
better at repairing so that you can do it at 50% loss in some cases and just
fill the holes. Access links are a small fraction of the core, you generally
hurt the people in the next office and they'll come knocking, that's a good
feedback mechanism. BBR is interesting because we had confidence that, with a
single metric (queue occupancy), we're pretty sure that we're not hurting
everything, but that's not always going to be the case. You can imagine that
you've got a widely deployed game that's got bad congestion control and then
whoops, a DDoS. Of course, an explicit DDoS doesn't care what you think.

Roni Even: In the beginning nobody had congestion control, you just replaced
losses. Everyone just went as fast as possible. In the past there was a
gatekeeper, because you had to ask for bandwidth. Now you can do whatever you
want, if they don't care about the network and they can get over losses, why
should they care. It's important to have enforcement, maybe some proactive
congestion control.

Victor Vasiliev: A few of you may remember that BitTorrent decided they didn't
want to use UDP, media raised panic that they were switching to an aggressive
congestion algorithm called LEDBAT. All of the congestion algorithms published
recently fill buffers much less than Cubic in pretty much all the cases. Almost
all of the iterations of BBRv1, even the one with major CC bugs, was still much
better than cubic in terms of filling queues. I don't believe that those
gatekeepers ever existed. If you look at the congestion control algorithms in
the linux kernel there are a lot of them, and I'm not sure that all of them are
rigorously tested. TCP cubic had a really bad bug and it existed for 10 years,
we only found it when we found a similar bug in QUIC cubic. I believe that the
level of gatekeeping is overstated and sockets with IPPROTO_UDP have existed
for >40 years and the internet has not collapsed. I do agree that if we deploy
AQM more widely that this will be better, not to be afraid of collapse, but
that this would enable people to run congestion control algorithms that are
more robust to stochastic loss that's not induced by themselves. Christian:
That's a very important point, another reason to deploy isolation between
users, it enables freedom of innovation for those users. If you know that you
are safe, you can try what's best for you, you don't need to be compatible with
cubic.

Andrew McGregor: SRE-ing a system that did everything on that slide, it's so
well documented you could implement it. Bandwidth enforcer (?) does
Google-Google traffic. There's another one that enforces traffic facing the
internet. Google Compute Engine requires all traffic to run the gauntlet of
both of them. Any responsible cloud provider or CDN is going to have a global
congestion controller. It worked for Google when we accidentally deployed BBR
to 80% of internet traffic, and cubic didn't stop.

Colin Perkins: I'm going to be a little cycnical, but I don't think RTP
congestion control matters. By the time you've hit persistent congestion the
user has given up. All the traffic is video anyways, on-off dynamics of
MPEG-DASH break congestion control anyways, but it's already limited and
providers have dashboards to make sure it's not causing problems. This is,
however, a problem because everything is moving to QUIC and QUIC encrypts the
transport headers

David Schinazi: I used to be a gatekeeper. With this laptop I can check in code
to Google QUIC, last year I could check in code to xnu. Now at Google, we have
tests, whereas at Apple there weren't any. I remember a really funny bug where
the Apple kernel got into an infinite loop which would send a packet in that
loop when using TCP Fast Open. There was a network in Switzerland that was
really unhappy for a while. There were some people (snake-oil salesmen) that
disabled congestion control and claimed to make TCP faster, that's silly. I
disagree with your premise, but agree with the conclusion, deploy AQM and we
might solve bufferbloat.

Brian Trammell: I agree that there is a problem, but maybe a slightly different
problem. I want to go back to what Matt said, security and reliability are flip
sides of the same thing, they're all transmission safety problems. The work
that has been done in some networks to isolate users/processes/good vs. bad
traffic is all the same work. Putting better AQM in the internet is a great way
to make the world better anyways. As a researcher that spent a good amount of
time to try and get cloud platforms to throw traffic as fast as I could at
other servers, all of them had outbound bandwidth policing. The things that we
need to enforce to keep the internet working decently well for people's
services also happen to prevent congestion collapse.

* Update on logging schema for qlog - Robin Marx
- Everybody should output the same type of log, proposed qlog, it's JSON based
for humans but also machine readable. - QUIC could be an incubator to get some
more insights, we've been working on that since IETF 104. - Two documents:
schema and one is definitions for events in HTTP/3 and QUIC Support custom
events without a specific schema ((See
https://datatracker.ietf.org/doc/draft-marx-qlog-main-schema and
https://datatracker.ietf.org/doc/draft-marx-qlog-event-definitions-quic-h3) -
Want to do an end-to-end log, an intermediary in the middle of the network can
log and you can keep it in a single log (can aggregate multiple traces, view
from vantage points);
  - Can add summary to the top of the file, can save off configuration settings
  to share a particular view - Most people will have enough events with just
  20, don't need to implement all 70+ (estimated) events
- qlog could be a way to implement tests, examine output to verify behavior
- Thank you everyone in the community for the support, there are some open
questions - New mailing list: qlog@ietf.org

Victor Vasiliev: Sorry I haven't had time to make Google quictrace qlog
compatible yet. I'd be interested in transitioning this to other protocols, our
old format was common between protocols and it made nobody happy. Protocols
were different in important ways.

Jana Iyengar: It's great to see you doing work on this, I hope to soon make
quicly integrate with qlog. In terms of a BoF, I think the work that's
happening is great, I think there's a lot of engagement that's happening, but I
don't think there's any clear outcome that would come from formalizing it.

Brian Trammell: Thank you very much for this, I'm shocked and amazed at how far
this has come in one meeting cycle. Given the amount of progress that's
happening, I would say the next steps are keep on making it a better toolset,
we'll come back and backfill the "how can we generalize this" later. I will
take my privacy and security question offline. I'm shocked and appalled at how
awesome this is.