SCONE-PROTOCL BoF


Time: 2024-03-21 Thursday 09:30-11:30
Links: Notes, Meetecho (On-site, remote)
Venue: Plaza Terrace Room
Chairs: Cullen Jennings, Martin Thomson

Agenda

Introduction (Chairs) - 10m
Scribe: Alan Frindell
Backup: Martin Duke

Agenda Bash and Administrivia

PLUS++ (and other post-incremented four-letter words)

BOF Questions:

  1. Is there a problem to be solved?
  2. Do we understand that problem enough to engineer a solution?
  3. Are enough people intersted enough to do the wor?

Reminder to focus discussion on answering these questions.

Traffic policing in networks: goals and methods (Marcus Ihlar) - 15m

Some Access Networks throttle video for $reasons

Data caps based on subsciption moving to bitrate caps: unlimited data
but limited quality

ABR Video dynamically selects resolution based on estimates/predictions
of capacity.

Shapers leverage the design of ABR to shape.
Detect video flows (can be tricky/expensive)
Select bitrate based on policy (static or dynamic)

Policing (packet loss during bursts)
Hard to predict loss pattens, messes with congestion control
Shaping (buffer packets during bursts)
Adds delay, increases RTT, also messes with congestion control

Shapers/Policers designed to enforce policy: User experience is not the
primary objective
Tuning for QoE is difficult for operators - may not be able to measure
application QoE

Determining video flows is getting harder with encrypted protocols

Some operators and providers are attempting to address with proprietary
mechanisms

Is dry Swedish bread as good as it gets?

Solution space and trials (Matt Joras) - 15m

Results of Meta/Ericsson Feasability study using MASQUE

ABR Video: video broken into short segments, encoded in a range of
qualities. Client selects based on time it takes to fetch a given
segment.

POC: use ABR with an agreed cap. Signal from the network to the
application to limit the number of available qualities

POC uses Connect UDP/Masque because Meta and Ericsson had this software
already implemented and it has similar properties to the desired
solution.

Facebook App <-> MASQUE Proxy <-> FB Video CDN Server

MASQUE proxy cannot see the application traffic - the cryptographic
context is end-to-end. MASQUE proxy inserts the media rate signal. See
draft for details.

Trial both caps at the application, by removing out-of-policy qualities,
and indicating to the server to cap it's send rate.

Lars: Are both hops encrypted?

Matt: Yes -- client to proxy is a MASQUE connection; client to server is
QUIC.

Tommy: what is in the capsule, is it bidirectional?

Matt: We send a little bit higher rate from server for POC. It's more
complicated in practice.

Jana: The Facebook app is the client? (Yes?) What is the 'max send rate
= 2.5 Mbps'?

Matt: The app then tells the server what rate to use based on the
feedback it got from the proxy.

There are two adaptations: app and transport

Jonathan Hoyland: How does the client know the rate?

Matt: Explicit signal from the network: HTTP Capsule from the MASQUE
proxy.

Cullen: Who creates the 2.5 mb rate

Matt: It comes from a device in the packet core.

Live slide review: bidirection arrows confusing -- consider uplaoding a
new slide.

Eye chart for lab setup

Real packet core, real internet to Meta CDN
Red Line: shaper
Blue Line: no-shaper

Repeated Testing using fixed video playlist
Shaper NOT policer (queuing not dropping)

TL;DR: Better video experience with similar network tonnage

Video Quality: VMAF (percentage of quality) and stalls (spinners)

Higher peak quality is less important than consistency. Low quality
damaging to user experience

Outliers matter

THE DATA

VMAF

Shaper: 36% annoying
Adaptation: 6% annoying

Shaper: 62% acceptble
Adaptation: 94% acceptable

Shaper hit higher qualities in the top end, but that doesn't matter

Shaper has 2% Unacceptable
Adaptation: 0

Shaper using slightly more data

What does it mean?

Outliers matter
Applicaitons change faster than networks - hard to tune from network
side

Stall Duration

Adaptation: some stalls, 94% have <= 1s stalls
Shaper: p90 - 5s of stalling

Alan: What is the duration of the videos that had 5 sec of stalling?

Abishek: 20 videos each played for 10 seconds = a 200 sec video session.

Takeaways:

Could we do this using real packet core, app, HTTP/3 - yes.
And it's not very difficult.

App and Transport adaptations are both feasible

Chris Box: Was this conducted in the lab (faraday cage) or varying radio
conditions?

Abishek: Radio conditions were kept constant in the lab.

Chris: A drive test would be more effective.

Matt: We have production data that shows the same trends.

Mike Bishop: is the use of a proxy necessary?

Matt: Just for ease of experimentation. This presentation is not the
form of the final solution, just indicates the value.

Jana: Shapers vs Policers - depends on region.
Chairs: Is this clarifying
Jana: experiments with multiple flows?
Matt: vary from 2-3
Jana: 2 and 3 =~ 1
Jonathan
Wes Hardaker: Did you compare different bitrates? You are optimizing for
people who are willing to not pay as mucn. Some people can pay more.
Does the quality curve look similar at 3Mbps.

Matt: we did do experiments for higher bit rate.
We can move the curve more to the right, but wasn't relevant for this
applicaiton.

Lessons from IETF history (Brian Trammell) - 15m

Fiddling with slide control. Welcome to IETF.

Haven't we been here before? Not really.

What was PLUS and what did we learn?

Pushing a path layer into the stack. This BOF did not go very well.

2016 Berlin. Chose QUIC as the future wire image - explicitly minimal by
encrypting all the things. Path signalling questions unanswered.

Most important learnings:

Building a generalized approach for complete signalling replacement:
good engineering, bad policy.

Advisory side channel next to an encrypted channel reduces
trustworthiness.

Minimalism is essential

Signaling particpants must align with control points
Credible threat of network violence (policer)

Does SCONE have these benefits

Discussion on use case and scope (Everyone) - 55m

Jonathan Hoyland: This data needs to be added by the lowest bandwidth
hop.

Cullen: In this case, the policer/shaper is the bottleneck.

Jonathan: How can we be sure there is only one such box?

Martin: That sounds like an excellent question for the working group.

Matt: There's at least one box that we want to solve for.

Dan Druta: I want to clarify a few things that have popped up in the
chat:

Not about proliferating shaping, but reducing it.
Cannot identify video because of multiplexing
Shaping is a very blunt tool to identify ABR flows that can adapt
themselves.
This is a real problem and we needed a solution yesterday.

Shaping can be done on a er application basis, with consent from the
user.

Can identify other types of traffic - background traffic (but this is
not the intent).

Jana Iyengar: To Jonathan: it's relevant if sconepro has higher
bandwidth than bottleneck. Eg: signal looks like an upgrade.

Complexity elided by presentations: network has complexity about how the
network distributes bandwidth it has.

Is there a problem: yes. But we need to agree.
Do we understand it well enough: meh?
Definitely interested in doing the work.

Brian Trammell: Two traditions - building networks and network
cooperative operations.

Radio: this traffic has properties X so I'll give it treatment Y. Sense
and react.

Other hand: encrypted-end-to-end. This inspection is no longer possible.
"We need to figure out what this traffic is"

Impedance mismatch problem.

Is ther a problem: yes
Understanding: not yet. Probably has a engineerable solution
Interested: yes

Livingood: This is bizarre. Was there an edible in my scone. I can't
believe we're having this conversation again. At the center of p2p
shaping.

It's a good objective to get rid of throttling/shaping. There are other
mechanisms - regulatory actions (US to bar shaping). Adding capacity.
Don't see what you can't deliver.

David Schinazi: Encrypting all the things enthusiast. To Jason: it would
be an ideal world to live in, but not the one we do live in.

Part of the problem here came from switching from TCP to QUIC.

Problem to be solved: multiple problems? To be solved: youtube, has
agreements with network operators. If client can reduce streaming
quality, you can get zero-rating. Making this better, such that the
client can opt in (requirement) is a better soliution to the one we have

There is a problem and we do understand it. Needs some refinement
though. MASQUE option seems plausible but there are definitely other
options.
Definitely interested in helping

Hardie: Basic habits of good explicit signal design.

  1. You're not doing something generalized. One specifc thing to the
    people who care. ECN - good signal design - no incentive to fake it,
    and goes in one direction. Can improve situation for the network and
    the app.

Spin bit: this group is good at looking at signals and identifying side
channel attacks. MASQUE based proxy is possible if you are rich enough.
Is this worth doing for an interopable protocol? Is the signal flowing
the right direction and giving the minimal data needed.

I'm willing to help.

Tommy Pauly: Thank you for the segue Ted. Point to the IAB RFC 9419 -
good adivce for when we're looking at this. Enumerate some

intentional distribution, control of distribution, protecting and
authenticating information, minimization of information, limiting
impact, and minimizing the set of entites.

Overall direction: good piece of work to do. Like the use of QUIC based
proxies. Some MASQUE work on forwarding is relevant. L4S opportunity. To
Jason's point: we don't want more throttling, but we can use this
mechanism for L4S or something slightly richer, but in a more generic
way, not favoring propietary agreements. Intersects work in INTAREA
about proxy discovery.

New protocol engineering needed for this is minimal is extensions or
descibing how to put things together.

Yes, Yes, I'll help.

Ian Swett: definitely problems to solve, like for Youtube. current
negotiation model not scalable. would love to see mechanism to avoid SNI
sniffing, so we could encrypt it. We understand enough to proceed. Is
focus QUIC or QUIC + TCP? Both is good but maybe focus on QUIC first.

MT: the charter says QUIC and only QUIC

Ian: I haven't come up with anything better than MASQUE. Happy to do the
work.

Marten Seemann: Encryption enthusiast. Not sold on encrypting this
information. Value in keeping this information in the open. If it were
unencrypted, multiple boxes could update it. Privacy: as soon as we
encrypt, there will be demand to stuff more and more things in there.

Tom Saffell: YouTube Infra: We have built something like this. What
hasn't worked well - we built a propietary protocol, hard to adaopt.
Looking forward to standard. Since 2018. Operator can communicate max
media rate (kpbs) - apply that to ABR logic, simiar to Meta. operator
removes policer. A/B experiemtn. Improves several metrics (see transcipt
or uploaded slides). Small increas in perf. rxmit reduced 60%. Access to
a fast network is the key to video perf. Operators want tonnage control
- youtube is supportive. Win/win for everyone.

Yes, Yes, Yes.

Wonho Park: TikTok (PM for global edge). Traffic shaping is not the
optimal solution. Does not result in optimal QoE. 3% outliers is still
significant. Self-regulating protocol standard, would be interested in
using it. Deliver media more efficiently while maintaing QoE.

Yes, Yes, Yes.

Cullen: Welcome to IETF, and thank you for providing feedback to the BOF

Martin Duke: Value prop is vaguely RSVPish -- whatever its technical
problems, that seems innocuous. My instinct is to have the same
heebeejeebies as Jason. Concerned about impact on best-effort traffic --
can I still send video without this? But this reduces incentives to do
clumsy things with heuristics, which is good for best-effort. Also
concerned about extensibility. ABR sounds good - how to do you keep it
from other sorts of metadata. If we keep that in mind - we'll be ok.

Answer positively to the three questions.

Lars: Probably: yes, yes and it looks like it, but I'm negative.

Operators are in love with putting in complexity to manage capacity.
Vendors are happy to sell them boxes that will do that.

Will it turn into something people pay for.

Sanjay: Marcus and Matt and Brian - good job.

This is a problem looking for a solution: yes.
Do we understand enough: broad brush - yes, more work needs to be done.
We don't understand yet.
Yes, will help.

Lucas: Irrespective if there's a problem. Are there other problems
besides ABR.

This is just information is already detectable and determinable. It's
about reducing the delay in detecting these signals from the path. Doing
this in a structured, standard way, could improve things. Network can
indicate inifity (no throttling). Signalling to eg: Facebook app - might
have a lower layer access to the stack. Need to surface this information
to higher level applicaitons.

There are dependencies we might need to take on to deliver value.

Christopher Inacio: 1 & 3: yes. 2: not so convinced. There's a
scalability problem. We've heard a lot of support from Facebook and
YouTube, but how does this work on a device like a MyFi. What does the
signalling look like and how does that work?

We don't have the model yet for wide deployment.

Cullen: Do you think a WG could do the engineering to figure that out,
or is it a research problem?

Christopher: not sure and we haven't talked about it. Massive gotcha.
Maybe but I'm not convinced.

Suhas: Yes, yes, yes. Explicit signal enthusiast. Cooperative signal
useful:

  1. removes guesswork
  2. operators using explicit signals make them more transparent, instead
    of guesswork
  3. coming from interactive video deliver experince - similar problems.
    More and more media are e2e encrypted - what kind of information can
    we put in to help middle box make decisions
  4. MoQ has use cases that can be used. Designs are applicable here. How
    do we take this work with other standards bodies (eg: how do we get
    this into packet core)

There's a clear problem, we should be working on it.

Jeff Smith: T-Mobile. Familiar with pacers and shapers and
self-regulation. Based on these conversations, there's absolute interest
but lack of standard limits adoption. Standardized approach would be
much better and get wider adoption. Interested in this work and could be
a benefit.

Yes to all three.

Julien: Nokae. Yes, maybe, maybe. Interested in scalability issues. Lots
of video flows going through a proxy here. How would this work with very
large deployment, will impact cost.

Cullen: You heard Ian, right now the shapers/policers are expensive.
This means you can remove them, may reduce costs. You are running
something like a masque proxy (is cheaper than shapers) -- He is
relaying the opinion of the proponents, not stating his own.

Julien: ok, but shapers are already there.

Stephen Farrell: Echo Lars, Chris. Yes, No, Seems like it.

No on 2 - if the charter text says a client authenticates a random box,
possibly insurmountable problem

MT: Would it change your opinion if it was not backed by a PKI (just
proof of being on path)

Stephen: Unsure how that is consistent with charter text (can't invent
new security mechanisms)

MT: That may be a problem with the charter

Leslie: (MOPS chair) There is a problem here. Unsure on 2. Yes on 3,
there are enough people. YT and FB are not all the video. Other videos
providers out there, please add mops conflict.

Tom Saffell: Didn't address policy cconcerns: why not make the network
faster. Practical realities. Only implement if it's consistent with
principles.

Transparency to users - restrictions must be visible
User choice - buy a plan with no restriction
Equal treatment - wish to be treated as any other provider

Marcus: To Lars: willing to sell compelx stuff to operators. Selling
radios and capacity is much more profitable. Economic reality precludes
doing it that way. We do have a problem

Dan Druta: this is not a capacity problem. Giving users/subscribers
choice. Yes, yes, and yes.

Cullen: Will the WG be able to find a engineering solution. How to
discover and authenticate to proxies? Anyone can address that issue
right now -- argue it is solvable or not solveable?

MT: Tom, how have you done it your deployments?

Tom: It's all out of band, not relevant to this discussion.

Marcus: I think discovery is going to be super important problem.
Intarea, discovery of proxies being discussed, along these lines may be
used here. Tied to your specific access network. This is solvable.

Ted: It's solvalbe - our lady of anycast will bless us if we have
nothing else. Using a capsule is a great demonstration that this can
work. A solution that doesn't require that relationship would be much
better. Explicit shaper notification - you can avoid it by jumping down.
Don't assume MASQUE. Signal characteristcs along the path.

Jana: Proof that the network element can also drop your packets. If it
can write to your packets, it can drop. There are ways to solve this
problem.

Anoop: Clarify some of the questions. It's not going to be a random box
- will have access to subscriber policy, and capability to drop packets.
Shaping also taking place there.

Discovery: in 3GPP, this box is already authenticated. The base station
selection will also pick this box for you, as an example.yes:

Trammell: Echoing Ted and Anoop. The scope here is we're building a new
type of ECN. Thinking about the dynamics and deployment problems with
ECN will make the engineering problem more tractable.

Bishop: Some people are getting caught up with proxy. Anything on path
that can drop your packets. ICMP - this is the packet I did not drop but
could have. I'm about to start dropping. This is tractable.

Matt Joras: About ECN - there are lot of things to learn from ECN. This
is not strictly a congestion problem. This is not ECN++. This is an
application level signal, not related to whether there's congestion
right now. The network may not be congestion - subscriber policy
limitation. CAP doesn't want this information, but want emergent network
properties.

ECN by itself will not solve all these problems.

Joerg: Regarding discovering: is it always a single proxy, how do you
know when you are done discovering?

Anoop: in a mobile network, there's only one box in the data plane.
Discovery is only going to make use of 3GPP procedures.

Conclusions (Chairs) - 10m

Show of Hands: Should the IETF form a wg on the topic we have discussed
today (note: this is not a wg forming bof, and have not reviewed the
charter)

Yes: 51
No: 20
No Opinion: 12
(121 people in meetecho)

Cullen: Do No votes want to say way?

Mirja: Not sure if it needs to be a WG

Stephen Farrell: Question 2 is still a problem. There's a lack of
distiction if this is a piece of software the user is using (eg:
Facebook app is not a user.)

Chris Inacio: Echo Stephen. Seems unexplored, Engineering question.
Fundamental no. Didn't bring up MP QUIC.

Jonathan Hoyland: Multipath problem may be intractable. The paths might
not be independent paths. So how does shaping work for that.

Cullen: Let's defer Multipath discussion today, but we need to think
about it.

Zahed: THinking about going to lunch. Good discussion. Good
demonstration of interest from all the stakeholders. But there are
issues with architecture, processing, security. More work is to done,
agree with the chair summary. Let's discuss more on the mailing list.

Cullen: Will go to list to discuss issues raised today.

The show of hands shows significnat discomfort. Will work on trying to
get the nailed down on the list.

Thanks for a pleasant BoF.