ICE WG IETF 96 minutes

Note takers: Marc Petit-Huguenin, Peter Thatcher

Recording at https://www.ietf.org/audio/ietf96/ietf96-schoeneberg-20160721-1000.mp3

draft-ietf-ice-rfc5245bis

(3:03) Christer H: Review of changes since IETF 95.

(4:43) Emil Ivov: (On sending before nomination) We need to make sure that we don't send media before an ICE check response is received.

(5:37) Martin Thomson: If you don't have a check response, you don't have consent to send, which is covered in the consent check document, but I'm not sure if it's in 5245.

... some discussion ...

(6:12) Ari Keränen (from the floor): 5245 says a candidate pair needs to be validated, and it needs to receive a check response to be validated. So it's in there.

We think we are good, but:

(6:24) Action Item: Emil will double-check that 5245bis says that you can't send media until after an ICE check response.

(6:30) Christer Holmberg: Continuation of changes since IETF 95.

(7:26) Cullen Jennings: (on 5ms min Ta value) This doesn't works. Will talk about this on next presentation.

... some discussion ... agree to come back to this

(9:15) Christer H: Continuation of changes since IETF 95.

(12:13) Cullen Jennings: (On freezing) Freezing is very important. You can only not do it if you are also doing bundle. Must do either bundle or frozen, can't have unbundled without freezing.

(12:49) Peter Thatcher: I used to support removing frozen candidates but am no longer strongly supportive of removing it. 

(13:25) Consensus: Keep Frozen candidates (no one raised hands to remove the frozen algorithm)

(14:00) Christer H: Brings up a problem with the unfreezing algorithm.

(15:09) Emil Ivov: describes a solution to the unfreezing algorithm. Presents a table to make it clear. The solution is to unfreeze the "first" candidate pair for every foundation (column), even if they aren't in the same checklist (row). The problem is that if we unfreeze an entire checklist (row), it kills the freezing optimization. But if we don't unfreeze an entire checklist (row), then in certain edge cases (such as when the first checklist (row) has missing candidate pairs for a given foundation (column)) some candidate pairs never get unfrozen. If we unfreeze the "first" candidate for every foundation (column) then we won't have the problem with those edge cases and we can remove the rule to unfreeze the whole checklist.

(17:57) Cullen Jennings: There is text somewhere that says that once all the unfrozen pairs are checked, unfreeze the rest. So in the classic algorithm, those pairs will still be unfrozen.

(18:37) Emil Ivov: I haven't been able to find any text like that, except to unfreeze the whole checklist.

(18:43) Cullen Jennings: We need to have some text that says once you've checked everything unfrozen, unfreeze things. Your solution may be better. 

(19:03) Jonathan Lennox: Yes, there is such text, but it's per-checklist (row), not for the whole session. Which is clearly not what we want. What we want is for it to be for the whole session. Maybe that's what it meant to say, and that's reasonable.

(20:10) Emil Ivov: This is actually a separate thing (to unfreeze when everything unfrozen is checked). Implementations should be able to chose to do that, or give up.

(20:53) Jonathan Lennox: Any implementation can give up at any time.

... some discussion ...

(21:40) Cullen Jennings: Great table, thank you for making it. On the idea of unfreezing the "top surface", I'm not sure it achieves what we want. What we want to do is check a given row and then try the same approach for the other rows. So I don't think we want to check some of one row and some of another at the same time. So I'm not sure we 

(23:00) Jonathan Lennox: I'm more worried about the distributed media case. And in that case, this is much better. I want to start checks on both at the same time. 

(23:25) Emil Ivov: This solves the distributed case and the other (weird) case.

(23:30) Martin Thomson: The solution is correct. It works for both cases. But I do not understand how you will make it working with trickle, where you have holes.

(24:20) Emil Ivov: With trickle, the idea is to keep the same rule and always unfreeze the thing in every column (foundation) as they arrive.

(25:00) Martin Thomson: Is there a pathological case where things are trickled from the bottom up and everything is unfrozen?

(25:05) Emil Ivov: Yes, there is.

(25:10) Martin Thomson: Is there any technique to avoid that?

(25:15) Emil Ivov: Not worth the hassle.

(25:20) Martin Thomson: Jonathan says trickle in order... we should write that down.

...

(26:14) Emil Ivov: Action Item: Make sure there is a clear recommendation to try to trickle in declarative order.

(26:30) Tim Patton: What is the order?

(26:50) Emil Ivov: In SDP, it's defined in SDP.

(27:00) Jonathan Lennox: The order is defined by the use case. For SDP, it is the order in the SDP, for other you have to define it. 

(27:20) Cullen Jennings: I'm convinced that it is a good solution to the problem. 

Apparent Consensus: Unfreeze the "first" candidate in each foundation (unfreeze the top surface).

Ari Keränen: Should we put the table in 5245bis? 

(27:47) Martin Thomson: This is editorial, but having an example with the table would be good. 

(28:23) Emil Ivov: There is more than editorial. I am willing to make an example. Do we also want to rewrite the definition, to make it normative? Or are things good enough?

(29:21) Cullen Jennings: That would be a major surgery on the doc.

(29:55) Action Item: Emil to create a pull request, and provide an example (by the first week of August).

(31:00) Peter Thatcher: Do we still want to add text to unfreeze when everything unfrozen is checked?

(31:11) Jonathan Lennox: Yes, we need to change the text to unfreeze per-session not per-row. .... I think we have consensus on that and we can wordsmith.

(31:50): Cullen Jennings: We need two changes, 1. That (new text for unfreezing when everything unfrozen has been checked) and 2. Emil's idea to unfreeze the "highest thing".

(32:10) Peter Thatcher: I believe we have consensus on #2, and still need it on #1.

(32:20) Jonathan Lennox: Anyone object?

(32:30) Emil Ivov: As long as unfreezing in this way is not required, because I want to be able to not do that. I want to be able to give up instead.

(32:52) Jonathan Lennox: Giving up is fine. You can always give up.

(33:00) Action Item: Make sure there is clear text that you can quit at any time.

(33:23) Cullen Jennings: There seems to be consensus on what Jonathan proposes (unfreezing everything when everything unfrozen is checked) and that we should have clear text that we can quit at any time.

(34:04) Martin Thomson: We need to make sure we don't unfreeze things that we didn't mean to unfreeze in the first place. There is a certain case ... ... discussion off mic ...

(35:10) Emil Ivov: There's no way you'd end up in the situation.

(35:20) Jonathan Lennox: Yes, you can end up in that situation. ... Maybe say something like that says once you've nominated, everything that 's still unfrozen goes to failed. once something is nominated, everything that's not been checked is failed.

(36:11) Martin Thomson: When something comes out of waiting and you're about to start checking, see if there's something on the component (row) that's higher priority, in which case you can abandon that pair. 

(36:30) Jonathan Lennox: But priority order is not the same as row/component order. That's the whole point of regular nomination. But once something is nominated, then you can stop.

(37:09) Emil Ivov: Should we be concerned about the order. 

(37:30) Jonathan Lennox: Let's not design at the mic.

Consensus: Start unfreezing everything after everything unfrozen has been checked, and make it clear that you can quit at any time.

(38:30) Jonathan Lennox (about ICE restarts not changing roles): want to make sure that if the 5245 endpoints changes role and 5245bis doesn't, that things will work.

(39:00) As long as the 5245 endpoint does conflict resolution correctly, things should work just fine.

(39:30) Christer Holmberg: There is text about that.

(42:25) Cullen Jennings: Brings up issue of many concurrent ICE agents sending many packets without a limit. We definitely need a limit.

(46:30) Peter Thatcher: (from the floor) change the retransmission timers such that non-retransmissions have higher priority over retransmissions to make the many-ICE-agents use case work better. 

(47:44) Martin Thomson: Peter is suggesting that new checks push aside retransmissions, but that might cause issues with hole punching. 

(49:10) Emil Ivov: Pacing does not take in account retransmission. This is a problem in browsers, the minimum requirements should not go above that. Default value should be higher.

(50:07) Peter Thatcher: The global bit rate limit takes in account retransmissions, doesn't it?

(50:16) Cullen Jennings: Yes, it does. 

(50:20) Ari Keränen: It's in 5245. The RTO formula takes retransmissions into account.

(51:27) Ben Schwartz: Where is the bottleneck between all these network interfaces?

(51:45) Cullen Jennings: The bottleneck can be improved by Shaved Ice (make each packet smaller) and fewer retransmissions (we don't need 7; only 3-4) ... a strawman proposal: First, a global max bitrate. We won't make it past IESG without that. Second, document the issue, and don't resolve the issue of multiple ICE agents in parallel in 5245bis. Let RTCWEB or other groups resolve that in their contexts. Third, have a way to signal Ta in SDP.

...

(57:35) Peter Thatcher: We should add back the global pacing, and it being bitrate-based. I like the idea to have this, but I'm not convinced of putting a number in 5245bis; I'm willing to hear opinions. I'd prefer not to block 5245bis on things we don't need in it.

(58:30) Tim Panton: Is packet rate more than bitrate?

Cullen Jennings: For ICE, at these packet sizes and bitrate limits, the packet rate is very low anyway.

...

(1:01:00) Jonathan Lennox: VPN looks like separate interface but share the same bandwidth. Per-interface logic is too hard.

(1:01:30) Martin Thomson: I ran through the same thought process as Jonathan. I can't see anything other than this (global bitrate limit). I don't think we should wait for Shaved ICE. To Tim's point: we should worry about packet rate, but not at these rates 

(1:03:00) Ari Keränen: The ICE pacing negotiation parameter is already in ice-sdp.

(1:30:30) Cullen Jennings: Great, I missed that. About Peter's question: do we need a number here?

(1:04:01) Emil Ivov: We cannot set minimum, recommended should be the only one kept 

(1:05:30) Cullen Jennings: We have to have a global bitrate of some sort to prevent DOS attacks.

(1:06:10) Ben Schwartz: Why feedback this in the Ta calculation? Maybe useful. 

(1:06:30) Peter Thatcher: The packet size is not static. They are controlled by the ufrag size, which is specified by SDP.

(1:07:14) Martin Thomson: Having Ta negotiated is a good thing. Some recommendations is needed. There is a problem with RTCWEB where there's a gap between when you negotiate and when you start the ICE agent. We could just say "if you have multiple ICE agents, it will hurt; don't do that". The most simple is to do only one ICE agent at a time. Others can do things more advanced. But we don't need to do anything in this draft. 

(1:08:58) Cullen Jennings: I don't know why we even have two value (recommended and min). Why wouldn't you just use the min?

(1:09:16) Peter Thatcher: Neither of the browsers use the min because the packets are too large. But with Shaved ICE we can get the packets smaller, and use a lower Ta, closer to the min.

(1:10:30) Cullen Jennings: I think we shouldn't set a min at all. Just a bitrate limit.

(1:10:45) Martin Thomson: 5ms does not work. Shaved ICE could help, but not get down to 5ms. You could only fit the IP header. We can just update the number when we do Shaved ICE.

(1:12:10) Peter Thatcher: We need to get consensus on having a number, and then we can decide on an umber.

(1:12:30) Cullen Jennings: The transport people won't be happy without a number.

(1:12:45) Jonathan Lennox: Transport people should be in the loop now.

(1:12:55) Cullen Jennings: I sent the Transport people a number, and have gotten zero reply so far. 

...

(1:13:30) Cullen Jennings: ... with simulations, we found that an unlimited buffer is bad. It needs to drop or have a small buffer.

(1:14:30) Bernard Aboba: Two separate things: DDOS protection and congestion control. The retransmission backoff gives us congestion control. The global limit gives us DDOS protection.

...

(1:15:00) Ari Keränen: We started a conversation with the transport guys previously (a year ago or so), and they didn't raise issues.

(1:15:30) Cullen Jennings: I think we need more input from them. Let's try and do something this week. 

(1:16:00) Varun Singh: There should be some guidance for application developers about when the limit is reached.

(1:17:30) Cullen Jennings: I think we just need to warn people that simultaneous PeerConnections isn't going to work well. Either do things sequentially, or set the Tas properly.

(1:18:15) Varun Singh: App developers that don't know what they are doing are going to need some recommendation are they are going to run into the same problems over and over.

(1:18:45) Cullen Jennings: The problem might be in another tab, so in some cases, there's nothing the app could do.

(1:19:45) Emil Ivov: There is nothing the browser can do for the multiple tabs scenario.

(1:20:00) Cullen Jennings: There might be a way, here's a thought experiment ...

(1:20:45) Emil Ivov: It's more complicate than that ...

(1:21:30) Let's get back on topic of the bitrate limit. Chrome has a 256 kbps limit.

(1:22:00) Martin Thomson: 256 kbps limit is a lot. We have 128 kbs limit on Firefox. We need to pick a number but also a window in which to enforce it.

(1:24:30) Peter Thatcher: It looks like our range is about 64kbps to 256kbps. 

(1:24:45) Cullen Jennings: It's highly unlikely we pick a number outside that range.

(1:25:00) Ari Keränen: We can't spend more time on the list. We need to pick a number now.

(1:25:30) Cullen Jennings: Proposal: 1. Add a global rate limit of 250Kbs, 2. Remove Ta min and recommended Ta., 3. Some warning text that multiple concurrent ICE agents won't work well, but don't fix it here.

(1:26:55) Martin Thomson: Slightly different proposal: I would prefer to keep the SDP signal and keep the Ta value to keep the two sides in sync.

(1:27:30): Ari Keränen: We already have the SDP signal. What Cullen is suggesting is that we remove the Ta values.

(1:28:30) Christer H: What if the remote does not send a value? We need to have a Ta for that case.

(1:29:00) Cullen Jennings: That number has to be the same one in the original RFC.

(1:29:30) Peter Thatcher: We have three options. 1) do nothing, 2) add global bitrate, 3) add global bitrate and remove Ta.

(1:29:50) Cullen Jennings: I've fine with #2 and #3, but not #1. Slight preference for #3.

(1:30:15) Consensus call for adding a global max bitrate. Result: 10 for, 0 opposed.

(1:30:45) Consensus for adding a global max bitrate

(1:31:00) Consensus call for removing Ta values.

(1:31:45) Martin Thomson: This document only needs to say that both sides agree on the Ta. It doesn't need to say how that number is chosen.

...

(1:31:30) Ari Keränen: The SDP doc in MMUSIC can cover the backwards compatible Ta. This doc doesn't need to cover that.

...

(1:34:20) Ben Schwartz: Will implementation need to follow the Ta negotiated, even when it knows it will hit the global rate limit. 

(1:35:00) Cullen Jennings: Yes, you still need to follow the Ta. You can renegotiate the Ta. 

(1:35:20) Martin Thomson: 20ms in 8.1 still in the draft is a mistake.

(1:35:45) Christer H: Yes, the 20ms is in the appendix, copy-pasted from the RFC,

and should be removed. It's mentioned nowhere else. 

(1:36:45) Emil Ivov: Are we removing all values?

(1:37:00) Cullen Jennings: Why have a limit on Ta when we have the bitrate limit now? But I feel like this is a bikeshed thing. Maybe the simplest thing is to just not change the draft.

(1:38:00) Emil Ivov: Just dropping the values and figuring it out later scares me. It's like leaving the EU. I'm fine with the global rate limit. But I'm afraid of dumping the Ta value.

(1:38:45) Peter Thatcher: I agree with Emil to not change the draft by removing the Ta.

(1:39:30) Jonathan Lennox: Unlikely that you can renegotiate the Ta in the middle. You don't know when your peer got the new Ta limit.

(1:40:30) Peter Thatcher: We need a number. Is 256kbps OK?

(1:40:45) Cullen Jennings: We need feedback from the transport guys. We should tell them we chose this number because we want a few PeerConnections to work. And making it less would degrade user experience.

Christer H: Can you provide some text? 

(1:42:40) Cullen Jennings: Maybe this week, but if not, it will have to be after August.

(1:42:45) Consensus: Keep the min and recommended Ta values.

(1:43:10) Mo Zanaty: Browsers are already starting media at 250kbps, so staring ICE at 250kbps makes sense. 

Jonathan Lennox: Data channel?

(1:43:45) Martin Thomson: We should just add the text without waiting for transport feedback.

(1:44:00) Cullen Jennings: Agreed. Let's add something here.

(1:44:15) Martin Thompson: With HTTP with 6 connections right now, the responses that comes back almost hit a megabit. That's much more than we're asking for.

Consensus: 250 kbps as basis for discussion with transport people.

Action Item: Talk to the transport guys about what the limit should be

(1:45:45) Ari Keränen: Who is willing to review? 

Reviewers:

- Jonathan Lennox

- Emil Ivov

- Cullen Jennings (maybe)

- Marc Petit-Huguenin (stunbis parts)

draft-ietf-mmusic-trickle-ice

(1:49:45) Emil Ivov presenting the remaining issues. Added the signaling of ufrag to differentiate candidates between ICE restarts.

(1:50:50) Jonathan Lennox: ICE needs to make it clear which candidates pair with which other candidates. SDP offer/answer gets this for free. In Jingle, this is broken.

(1:52:30) Peter Thatcher: That's like telling people to do smart things, to eat breakfast. It's so obvious. I'm not opposed to say "do smart things".

(1:52:55) Action Item: Add "you need to have that covered" (make it clear which candidates pair with which)

(1:53:00) Emil Ivov presenting again, with the unfreezing rules.

(1:53:45) Peter Thatcher: It's more clear with a table.

(1:54:15) Jonathan Lennox: It's the signaling protocol's responsibility to define the order (or JSEP in WebRTC).

(1:53:00) Emil Ivov presenting again, with duplicate candidates. We originally said "keep the higher priority one", but that causes issues with the two sides having different priority.

(1:59:00) Jonathan Lennox: OK with this, but I want to make it clear these are peer-reflexive candidates discovered through a connectivity check, not a peer-reflexive candidate from signaling. Signaling trumps connectivity checks.

(2:00:30) Peter Thatcher: Does it matter? The priorities will still be the same on both sides.

(2:00:50) Jonathan Lennox: Yeah, probably.

(2:01:00) Tim Panton: What does replacing mean?.

(2:02:00) Action item: Clarify what "replacing candidate" means. Update the priority and resort the checklist, as if you had removed and re-added it.

(2:03:30) Emil Ivov: Any more open items with trickle ice?

(2:04:00) Christer H: What about candidate removal?

(2:04:45) Peter Thatcher: I don't think there needs to be anything in trickle-ice to handle candidate removal. A future candidate removal doc can handle that.

(2:06:00) Ari Keränen: Who will review?

Volunteers to review:

- Tim Panton

- Cullen Jennings will ask the rtcweb chairs for reviewers, who ended up being:

- Magnus Westerlund

 

draft-thatcher-ice-renomination, draft-thatcher-ice-network-cost, draft-thatcher-ice-remove-candidate

(2:08:00) Peter Thatcher: Presents 3 drafts. Wants to ask "how much of this should the WG work on"?

(2:16:15) Peter Thatcher: Are people in the WG interested in working on these things in this working group?

(2:16:40) Varun Singh: Like to have the network id. Not really sure that the network cost would be accurate, and not convinced where the value is.

Cullen Jennings: The assumption that the client can generate the cost is the problem ...

(2:19:30) Ben Schwartz: Available through an API on Android.

(2:20:12) Varun Singh: Seems brittle, for example in an international context. My 3G is free until I go to the US.

(2:20:30) Peter Thatcher: This draft just defines how to signal the information, not about how it's calculated.

(2:21:00) Bernard Aboba: Of the three things, the most essential are renomination and removal (and TURN mobility). With those, you can do quick switch overs. We implemented basically those things as well. One complication is that occasionally you still need an ICE restart. We also implemented something like network cost, but not exactly network cost.

(2:23:00) Emil Ivov: Probably not the exact same renomination and exactly the same candidate removal.

(2:24:00) Peter Thatcher: With this draft, you can renominate any time.

(2:24:45) Emil Ivov: All of this is twisting the original ICE design. They take a lot of effort and delays an ICE redesign. A redesign is due to do all the ideas we have to removing streams and components (we now have bundle), removing freezing, removing ICE roles, keeping multiple paths, ... There's potential to simplify ICE a lot. If we don't do that, these additions will result in unjustified complexity in implementations.

(2:26:00) Varun Singh: I really love renomination and candidate removal.

(2:26:36) Göran Eriksson: I think we should work on mobility. We need to consider what's in this WG. About network cost: don't assume anything about the network interface has anything to do with network cost. 

(2:27:00) Cullen Jennings: Do not start anything new until past WGLC on icebis and trickle-ice. Also, adding hacks to the current things seems like not the right way to do this. Instead, I propose we have a multi day workshop where we get a whole bunch of people together and brainstorm. Do we do incremental improvements or do we start over from scratch.

(2:29:00) Jonathan Lennox: Solving "walk out the WiFi" is important, but you seem to understand ICE much different than I do, which is a bad sign for ICEbis.

(2:29:30) Mo Zanaty: I agree for all three of these things, and the order to do them in. But I have concerns about the network costs and the use cases around it.