IPPM Meeting @ IETF 115

Monday 7 November 2022, 9:30-11:30 GMT

Working Group Documents

Welcome, Note Well, Agenda, Status

Chairs

Second WGLC ongoing for draft-ietf-ippm-explicit-flow-measurements with
no objections

draft-ietf-ippm-stamp-on-lag / draft-ietf-ippm-otwamp-on-lag

Tianran Zhou

Newly adopted documents.

Extending OWAMP/TWAMP/STAMP to add multiple micro-sessions that are
coordinated, to help measure the various member links of a LAG or ECMP
routes.

Adding new control messages to set up micro-sessions, and mark
micro-sessions in packets.

Next step is adding a stateless process for STAMP; in this mode, the
reflector can copy the micro-session ID in the reflector session ID.

Frank Brockners: How will this work if there are multiple LAG segments?

Tianran: It won't work, but we could detect it. It needs adjacent nodes.

Frank: Sounds like this should be clarified and pointed out.

Greg Mirsky: This matches the BFD on LAG requirements from RFC 7130.

draft-ietf-ippm-capacity-protocol

Al Morton

Discussing SECDIR response and encrypted mode.

In the control phase, using two types of packets: setup and information
exchange.

Now have a complete authenticated mode on control packets.

Data phase has optional authenticated mode. Authenticated measurements
at 50ms intervals. Can also include sending rate structure from server.

Not adding anything to the load PDUs -- these are too high rate. They
also don't have much information to protect.

Also still have unauthenticated mode.

DTLS was suggested by secdir review, but determined not to be
appropriate.
Also discussed an "encrypt-all-the-things" approach. Effectively, this
would be creating a tunnel. The authors believe that you could just run
this protocol over an encrypted tunnel rather than building a new tunnel
into the IPPM protocol.

Martin Duke: Why can't we use DTLS to exchange sensitive info?

Al: It adds retransmissions and ordered delivery. We need to measure the
feedback packets directly, without DTLS adding extra behaviors, etc.

Martin: Seems like some scope creep?

Al: No scope creep. We measure loss and delay variation, and we use that
information directly in our load adjustment algorithm when searching for
the maximum capacity.

Expecting more SEC and SECDIR discussion and review. Would like to have
an agreement on fully encrypted mode. Note that when we define encrypted
mode (like in OWAMP and TWAMP), they're defined but not really used in
practice.

Looking for WG last call in Jan 2023.

Also sharing measurements on the list for working load and capacity.

Martin: Doesn't DTLS not retransmit non-handshake messages?

Al: So DTLS might be applicable. But if you want to encrypt everything,
use a tunnel.

Martin: Use a tunnel if you want to encrypt data traffic, yes.

Al: That's about knowing if the tunnel impacts accuracy or not.

Tommy Pauly: Wanted to clatify the next steps here around encryption.
Seems like you're proposing "put it in an encrypted tunnel if you want",
if DTLS does work put that in optionally. Are those the things that are
on the table? Is the full encrypted mode down to put it into any tunnel
you want, or a specific protocol change?

Al: Trying to avoid protocol changes at this point. Really simple, so
people can pick their own tunnel. Likely to have their own tunnel up
anyway. Don't want to get into the details of which tunnel to use.

Tommy: So comes down to one paragraph discussing encryption, how to run
it in a tunnel, etc.

draft-ietf-ippm-encrypted-pdmv2

Nalini Elkins

Implementation updated to use eBPF instead of kernel changes; this works
on more platforms now (including in user mode). Implemented for PDMv1,
working on v2. Hackathon demo available!

IAB workshop on encrypted network management included the encrypted
PDMv2 work. Looking to engage CDN providers.

Question about viability of IPv6 extension headers on the internet; will
this actually work?
Our opinion is that they probably will work. Having a side meeting on
thursday to discuss more. Working with CDN and cloud providers on this.

One cloud provider and one CDN working closely to fix the problems -- eg
in a CDN, you get to the edge in IPv6, but internal to the CDN network
you're travelling in IPv4 (and hence don't support extension headers).

Draft discussion:

Currently focusing on secondary-to-secondary. Would want other drafts
for primary-to-secondary and primary-to-primary.

Tommy: If the documents are being split, would need to do some
adoption/consensus call to say we want to have more different documents.
Curious to know how much of the current document woukd be generic
between the different modes (how entangled would they be, how much are
we gaining by separating them out?)

Nalini: Have some of the security requirements for the
secondary-secondary(?) and secondary-primary flow. Suggest we take these
out and move then into a new document. Repetition will be minimal;
reason for the split is to have shorter documents. Perhaps we should
create a draft for whot the documents would contain and include empty
sections listing considerations.

Tommy: Seems reasonable. But this approach needs group consensus.

Nalini: Let's create an outline of what the two documents would look
like split apart. Another consideration for a separate document: if
scalability is not an issue for an implementer, can go ahead and
separately implement secondary-secondary for small-scale environments.

draft-ietf-ippm-responsiveness

Stuart Cheshire

Issue #62
(Recall the motivation is to measure latency when under load, and we
create load)
We need multiple TCP connections to fill the pipe, so we keep adding 4
connections at a time and watch throughput go up. At some point we hit
the limit by adding connections, and not getting more throughput. This
extra traffic is now sitting in a queue. The current test terminates
when the link is filled, but this does not measure the maximum
bufferbloat. So, the current measurement underestimates.

The solution is to add one connection per second, and keep probing
responsiveness, and measure both goodput and responsiveness, and only
stop when both stabilize. This will be in the next draft.

Issue #17:
Using a well-known URI, like .well-known/responsiveness.json. Servers
could implement this generically.

Issue #55:
What if the results don't stabilize? When to stop if there are
fluctuations.

Issue #66:
Allowing non-TLS measurements to let low-end devices that can't generate
enough TLS traffic to saturate the link (eg doing TLS at 1 Gbps is
beyond their capability).

Al: Have you looked at the measurements we shared on the IPPM list? They
point to the underestimation that you're talking about. The capacity
measurements saw more latency being generated. The go responsiveness
tool measured a much lower capacity.

Stuart: Are you saying that the algorithm or the implementation is
flawed?

Al: Both.

Stuart: The software and draft are still in development. The existing
code looks for throughput stopping changing to stop measuring, but that
isn't enough.

Al: TCP with multiple connections has some instabilities with what it
measures, and it can have variations and interaction problems. Matt
Mathis can also help explain. Your changes for algorithm may work out
very well, but the current one is too low for both throughput and
latency.

Stuart: Are you saying TCP is the wrong protocol?

Al: Just pointing out that when you have multiple TCP connections an
instability can result.

Martin: You mentioned use case of measuring buffer bloat to WAP. Isn't
that a single hop?

Stuart: Buffer bloat is in the wifi chipset in the access point. This is
the slowest hop, where the queue builds up. Can have 0.5s, 1s, 5s
buffering.

Martin: Are you directly connecting to the access point or to something
beyond it?

Stuart: Both. In the case of multiple devices, each device could run a
little web server which would allow the client to find out where the
bloat is. (eg have also observed bloat in the ISP)

Tommy: Is well-known URI the URL of the test? What happens in the
non-TLS case?

Stuart: Previously we've been thinking too much in binary: either
clear-text or TLS. It's not that these devices can't support TLS at all,
but they can't support TLS at a high rate. Configuration block will
always be fetched over TLS but the high-rate traffic can be unencrypted.
Good idea.

Tommy: Currently your test is always done explicitly over HTTP/2, when
unencrpyted that won't be an option so will need to move to HTTP/1 which
will have different perf characteristics.

Stuart: Know that intention of HTTP/2 over a secure connection is not
optional. However, for this benchmarking traffic, nothing about HTTP/2
that says you can't run it over TCP.

Tommy: You can (run it over TCP), but not sure we want to encourage
implementations to build the unencrypted HTTP/2 mechanism.

Stuart. Not as simple as just moving to HTTP/1. The test works by
fetching a large chunk of data and then sending a get for a 1-byte
object half way through. That's how we measure. This ability isn't
present in HTTP/1. Will need to think about the right way to do this.

Tommy: Curiosity: How are you inducing the HTTP stacks to create
multiple TCP connections when they're doing HTTP/2?

Stuart: Best understanding of this is that on mac/iOS it's using the
URLSession APIs. This code is explicitly creating multiple sessions.
Don't know about the go implementation.

Tommy: May be worth a note to implementers along these lines.

Stuart: Good point. A client may not realise that multiple requests are
being coalesced.

Marcus (as individual): have you considered these tests in certain
mobile scenarios? eg you're on a high-speed transit, you start a
measurement while you're on one cell, and then hand over. eg handing
over from 4G to 5G, you're moving to a cell with higher capacity and you
suddenly see the delay dropping.

Stuart: Simple answer is we haven't considered these kinds of changing
environments. For usability reasons, our goal is that the test completes
in 10 seconds. By keeping it short that reduces the opportunites for
conditions to change. Can't imagine a fully general solution to the
situatiuon you've described; the same scenario could arise if you're
sharing a connection with another client and they finish a large
download half-way through.

Stuart (cont): At a human level, hopefully if you run the test a few
times you get consistent results. Focus of this test is not to discover
bandwidth (many existing tests for that) but to measure buffering.
Buffering tends to be a configuration option of the network hardware, so
expect consistent behavior/results.

draft-ietf-ippm-connectivity-monitoring

(Skipped, no update)

draft-ietf-ippm-ioam-data-integrity

Justin Iurman

No changes since IETF 114; we were waiting on implementation feedback
(issues with students working on implementation). Implementation was
delayed, and will delay a bit further. Should have updates by IETF 116.

Received a review from IANA, need to reply.

Probably should kick off early allocation with IANA.

draft-ietf-ippm-ioam-yang

Tianran Zhou

Main issue is to add examples. Other minor issues were raised.

WGLC raised many discussion points, which largely were questions about
IOAM — is DEX included, etc. The document will be updated to have the
YANG model only match the IOAM data document. It will thus only look at
the configuration, and not the data export model.

Should the configuration of IOAM over IPv6 or NSH be included? Thinking
yes.

Frank: Two comments, the first about whether to include an example Yang
model. We could create a full example, but we also have to recognize
there's no implementation so it would be made up. How useful would this
be? Could consider moving the document to experimental? If we look at
other management documents recently, there are no examples. Is there
guidance on whether we want to include a full example?

Frank (cont): Also, would be very much in favour of focusing the scope
on the core IOAM document and avoid references to other documents that
are in flight.

Greg: I understand the desire to move this along, but I think DEX is
important for many constrained environments. If there is no effort to
cover DEX in a YANG model, this is a risk. IOAM DEX is looking useful
for MPLS work, etc. So having it covered here is important.

Tianran: We may need consensus on how to operate DEX

Greg: I think we should have this discussion before we decide how to
progress this work.

draft-ietf-ippm-stamp-srpm

Rakesh Gandhi

Addressed comments from the list, and added IANA early allocation code
points. No open issues.

Work progressing in SPRING that relies on this, and an individual
proposal in MPLS. Please review this also.

Requesting WGLC.

Proposed Work

draft-mhmcsfh-ippm-pam

Greg Mirsky

Precision Availability Metrics (PAM)
Service Level Objectives are looking at key performance metrics; the
entire history isn't needed, but violations of the SLO are.

Got comments from Adrian Farrel, added him as author.

One use case for this is in network slides (see work in TEAS).

Open question about individual packets that violate SLOs (counts of
violations, etc, may be added in the future)

WG adoption call had been made, did not clearly end.

Tommy: Got some support, but not much detail. One review from Benoit
objected, but that thread didn't get progress after Greg responded.
Suggesting to update the document to cover those comments, and then
we'll do a new call for adoption.

Lightning Talks

draft-zhou-ippm-enhanced-alternate-marking

Giuseppe Fioccola

Alternate Marking bis documents are in RFC editor queue. Some pending
points about bits that could be used are covered here.

However there are some extra considerations:

The draft aims to consider all these aspects.

draft-wang-ippm-ipv6-distributed-flow-measurement / draft-wang-ippm-ipv6-flow-measurement

Haojie Wang

Propose a distributed flow performance measuremnt method without a
controller. Results could be used by the router to select the forwarding
path that meets the SLA.

To flexibly deploy on-path flow performance mesreument based on
alternate marking method with the participation of a controller. A node
ID and steerable measurement period is enabled.

draft-cnbf-ippm-user-devices-explicit-monitoring

F. Bulgarella, M. Nilo

Proposal to put the explicit flow measurement probes directly on the
user devices.

Advantages:

Device owner decides whether to mark traffic and whether to share
performance data.

Martin: Right to consider privacy considerations. But concerned that
this might amount to a mandatory forfeiture of privacy.

draft-mirsky-mpls-stamp

Greg Mirsky

Extend LSP ping to bootstrap a STAMP session.

Similar to how LSP ping is used to bootstrap P2P BFD session. Also
ongoing work to bootstrap point-to-multipoint BFD.

draft-weng-ippm-srpm-path-consistency-over-srv6

Sijun Weng

Measure SRv6 policy through STAMP and TWAMP light, and check that
transmisision paths of test+reply packets are consistent.

Problem with inconsistent path: delay+loss measuerment is inaccurate.

Rakesh: Draft in SPRING that talks about using STAMP for SR network that
has a path-segment component. Let us know if there is something missing.

draft-wang-ippm-alt-mark-yang

Minxue Wang

Diversity of service types and SLAs in the 5G network brings challenges.
Need a consistent Yang model for alternate marking.