TCPM meeting, IETF-85, Atlanta, GA, USA, Tuesday, November 6, 09:00 - 11:30 --------------------------------------------------------------------------- Chairs: Michael Scharf Yoshifumi Nishida Pasi Sarolahti Note takers: Andrew McGregor Gorry Fairhurst WG Status --------- No comments / discussion Working Group Items ------------------- * RFC 1323bis draft-ietf-tcpm-1323bis Speaker: Richard Scheffenegger Kevin: Is it worth updating the introductory material? Matt Mathis: The document has a lot of historical curiosities. I thought of another bug too: nowhere does it say that if you are doing PAWS you must not fragment. Mirja Kuehlewind: If nobody has read the document completely, maybe it is worth shortening it? Matt: Almost half of the text is "Well, we thought about this...", maybe the whole lot could be "refer to 1323 for history". * TCP Fast Open draft-ietf-tcpm-fastopen Speaker: Yuchung Cheng Bob Briscoe: Idempotency material. Originally TCP was carefully designed to be completely reliable, and now we are asking application developers to make a subtle decision. So we should make it very clear when it will work and when it will not. This is from the server's point of view, but there will be consequences for the client. The server processing SYN+DATA twice might end up causing issues for the client. Yuchung: This can happen with two different connections already. Bob: Yes, but there's still an issue here. "Oh, it will only be a problem sometimes" isn't good enough. We don't know whether this will be a small problem occasionally. Yuchung: I don't know how to do better than this to explain what the problem is. It doesn't seem to be a high probability event. Jim Gettys: GET is idempotent, there's no guarantee anything will happen. Bob: Whether the application is idempotent isn't the issue, we're talking about the transport duplicating packets, which is subtle. I'd like to help with wording here. Gorry Fairhurst: We need a better applicability statement, that describes thoroughly when an application developer might want to use this. What are we actually doing about the cached state, the wording seemed wooly, I'd like to make it a SHOULD. Yuchung: I think SHOULD is fine, but I will discuss with the other authors. Gorry: There seems to be a case where this is much more aggressive than normal TCP, where you can get a whole IW immediately after the SYN. IW3 I don't really care about, but IW10 is hugely more aggressive. The IW might change with time. Is that actually the right amount of data to send? Yuchung: After an RTT, multiple browser connections send full IW. Fast-open only moves this 1RTT earlier Gorry: Yes, but that RTT also moves it before you've sent any packets at all over this path Yuchung: I still don't see this as being all that serious; the only case is a SYN+ACK being dropped, and fast open is more aggressive than that case. Is that serious? I don't think so, but I don't have data to prove that either. Gorry: I think it would be nice to have this discussion on the list and see if something develops Yuchung: I'm going to put this concern into the draft Matt: I've been trying to find a scenario where it matters. There's no difference in action on the part of the server, so it is hard to construct a case where it matters at all. If there's multiple connections, the bursts might be spread. Bob: The one case where the extra time is used, is having sent some packets, if there is congestion, then you can do something different having experienced it. So, on the timestamp: I think there is still a question to be answered, we have an attack and a proposal for dealing with it, we should define what should happen when a client uses timestamps. Should the attack become common, then all the machinery for dealing with it is already in place. If a client uses a TS and the server responds with a TS-based cookie, if the client echoes the TS next time, then the server can choose whether to use the TS or not unilaterally. Yuchung: Sure Michael Scharf (from Jabber): What happens with simultaneous open in both directions? Yuchung: I don't think we have thought about this. I'm assuming, since SYN+DATA is in 793, it has been thought of Jana Iyengar: I don't think you can just dismiss that... Yuchung: No, sure, I will check this case out More Accurate ECN discussion ---------------------------- * Problem statement and draft progress draft-kuehlewind-tcpm-accecn-reqs draft-kuehlewind-tcpm-accurate-ecn draft-kuehlewind-tcpm-accurate-ecn-option Speaker: Mirja Kuehlewind Bob: regarding the motivation. DCTCP has gone for a feedback mechanism that is flaky at best, and is deployed. There is a question of interoperability here, we want something that does work to be standardised. Conex and DCTCP are two implementations that do not interop. Rong Pan: Instead of marking multiple bits, can you not use one bit aggregated over a certain period, is this a design option Mirja: This could be a design option for how you use this option Andrew McGregor : Density marking isn't necessarily delayed multiple RTTs as new estimates arrive. Bob: Yes, but it is always delayed more than direct quantity feedback Bob: In response to Rong. Mirja is desigining a congestion feedback channel, whereas I think you're trying to do congestion control. We're trying to create a reliable feedback channel. Richard: We seem to need 5.6 bits per packet for optimal feedback to meet all the requirements... but we have 3. Jana: You can always send accurate ECN feedback, but can you receive it? Making acks reliable... I hesitate to go that far. We talked about SACK as giving more accurate feedback, but SACK doesn't guarantee that all the information is received... that's OK because we know what the sender will do with that. Same may be true for accurate ECN feedback. Mirja: There may be no proposal that satisfies all issues. Bob: We're not trying to make ACKs reliable, but to make the information carried in them reliable (by repeating it) Jana: I'm trying to say, we need to be careful how much reliability we specify. Mirja: we're trying to make as much information available as possible, in advance of having all the use cases Jana: Having all the information available would be nice, but it may not be necessary either Mirja: Full reliability doesn't seem feasible Bob: The current scheme only repeats, assuming you don't loose a whole RTT of ACKs Mirja: This is also a signal (Randall: It's called a RTO) Brian: You have a requirement for n bits per RTT, but if you can relax that timeframe, you can have n bits per RTT and n+i in two. Mirja: we proposed the counter mode, but that's rather complicated Matt: I think we have a circular process problem here. We really need to know what we're using the signal for before we can decide where we can tolerate inaccuracy. For example, what if we expect very high marking density? That would completely change the optimisation. Although it's useful to think about the signal independant of the action, we can't choose until we know the action. There are research avenues potentially cut off here Mirja: Are there 2 cases? Matt: We may like to look at what is best in an Internet context, should we do more marks in this case also. All solutions will be approximations Bob: I don't think there's a particular process problem, this is experimental to allow the research paths. We need to do something to let us do research. Brian: There are other things you could get with this signalling that should probably be out of scope. I'd like to see the places where bits could go ordered by potential danger. The reason I came up was that I saw the urgent pointer, and thought... ooh, danger. Richard: if you have unreliable bits Brian: The worst thing is converting a congestion signal into a loss Mirja: So, how to proceed? Jana: The slide looks roughly in order of decreasing danger. I want to go back to accuracy. With SACK, you get more info with more SACKs. We don't count these as separate events to the original loss. I am wondering how we could use more accurate information... not that we shouldn't, but I'm trying to understand what the sender might do. Mirja: There is congestion information here, and so CC can compare the level to what is expected. Right now you have a binary signal Gorry: Maybe we just need to decide on something, and proceed. Bob: If the urgent flag is off, we can use parts of the field. Beauty is, if the flag is on, it's interpreted normally. Gorry: Can't we just decide? Jana: It almost sounds like a new draft for urgent pointer reuse. With ack feedback there are two bits of work we've done. DCCP TFRC with the loss vector, and ACK congestion control had a list of possible complications Pasi: How many are interested in working on this? Mirja: How can we get to a decision? Pasi: Perhaps best next step is to focus on problem statement RTO loss recovery enhancements ------------------------------ * TCP and SCTP RTO Restart draft-hurtig-tcpm-rtorestart Speaker: Michael Welzl Mirja: Why are you restarting the timer, rather than keeping the old timer. Michael W: The old timer could be very old - it is done as if the timer was for each message. Can we compare with TLP....This is looking at thin streams. It uses RTO. TLP more focussed on probes. Michael Tuexen: Why do you focus on thin streams? (can't you do all the time?) Michael W.: Yes, we can do this all the time. Michael Tux: It should be OK for SCTP to do this. Jana: I was looking up the wording, the mechanism is supposed to work per segment, the implementation recommendation is buggy. So in some ways, this is fixing the error in the implementation recommendation. I would have liked to have seen something more aggressive. Michael W.: We'd have to then look at the cause of spurious RTOs, may increase risk of spurious RTOs. Jana: How much of a gain do we get for thin streams? Michael W: Significant, we have seen a case where it would have improved it from annoying to tolerable Michael T: This might be interesting for the RTCweb guys * TCP Loss Probe (TLP): An Algorithm for Fast Recovery of Tail Losses draft-dukkipati-tcpm-tcp-loss-probe Speaker: Yuchung Cheng Time: 10 minutes Jana: I think the problem is important, and the wg should be considering it. Not so convinced that this is the solution. There is info not being taken into account. You do lose a lot of information by choosing not to respond to the ACKs you have recieved. This solution may not be unique. Yuchung: I agree not unique. Stuart Cheshire: There's two cases, short transaction and nothing more to send. The youtube example is weak, because there's a design mistake there... if you divide the data into chunks like that, you're inviting tail drop scenarios. If there's data to be sent, send it. Richard: It's not necessarily bad app design, you might have an empty send buffer because the application is synchronous Stuart: There are many cases where the delay is artificial... a redirect you can't have more data, whereas with youtube, the data exists, it hasn't been asked for yet ??: There are perfectly valid use cases that need this. Stuart: If it's not being consumed, that's fine... this is only a problem when the receiver is stuck waiting for a retransmit. Randall Stewart: You could easily make this cover SCTP as well. Yuchung: I don't know SCTP, so not easily for me. Randall: Only difference is, SCTP will tell you if you sent a duplicate Jana: DSACK does not let you know about >1 duplicate Yuchung: FACK is not standardised, but we're trying to get that to happen Matt: If there is a big enough hole in the SACK block, send immediately. This algorithm was updated in 3517bis, it's a 1 paragraph description, so it's already there Markku Kojo: This is not quite the same as FACK - we need to check this. Keith: Not exactly 3517bis is still robust to reordering, not exactly forward acknowledgements * Discussion Time: 10 minutes Pasi: Do we take up both solutions, or do we want to unify them? Yuchung: The mechanisms are quite different. Regardless of scheme, we need the RTO. Michael T: The RTO doc describes a way to implement the timer as it was originally defined, and fixes a bug in the deployed approximation. It's a simple idea, and should be in its own document. Michael Welzl: It is a separate story Mirja: +1. If we need the bugfix, can we still use TLP as well? Yuchung: We just use the RTO as it is, so no matter how it was calculated. Mirja: Do we still need TLP? Can you compare the two? Yuchung: We're working toward that comparison. Matt: They're really orthogonal. The RTO is about calculating the timers right, whereas TLP shortcuts the timer to fix a particular corner case, without stopping the regular RTO process as the fallback. That TLP has a timer in it is a coincidence, the two are separate issues. Gorry: The timer update I like. I think both are orthogonal, and both have been used at link layer for ages, so we can move fairly fast. Jana: +1, they're separate. The RTO is simpler, and it fits within existing standards. TLP is different. There may be other ways of dealing with thin streams. One suggestion to RTO folks would be to not characterise it as a thin stream fix; it helps, but should not be the focus. Yuchung: Since TLP uses another timer, you can use a similar idea in other cases Pasi: Therefore, two separate documents. No opposition to adoption of either. Jana: TLP is interesting, but it is too early to adopt because that implies adopting the solution Michael T: I would like to see both SCTP and TCP handled the same way Pasi: The charter would imply that. Also the doc is in Linux-specific language, needs edited into RFC language Yuchung: I don't see other proposals out there Yoshifumi: I am somewhat concerned about the other corner cases mentioned, so what do we do there? Mirja: Before adoption, if you have Michael's RTO fix, is the problem still there? Yuchung: I think so, but we will do that test. Pasi: I think the TLP document needs to be revised. People seem to support the activity, but the document should be updated and discussed again next time. Pasi: Who is available to support RTO restart work by reviewing and contributing? Hum for adoption of RTO restart, no opposition at all. Michael: Experimental, or? Pasi: Yes, experimental Michael: Can this be changed later after publication? Gorry: For the timer change, what do we have to do to change status out of experimental? What is the experiment? Jana: We need to define what the experiment would be. I'm happy to discuss status later as it progresses. Pasi: PS might be too hard. Jana: Does this update 6298? Keith: A number of people think expt should define what is trying to be learned Michael W: The experiment is quite simple, how big is the risk of spurious RTO Andrew: If the experiments are already done, that might change the picture. Jana: Google experiments are nice, but we should have more Michael W: A variant of RTO restart has been apparently used by Cisco for a long time. Trying to get that confirmed. Pasi: Status to be discussed further on the list. Other Items ----------- * Updating TCP to support Rate-Limited Traffic draft-fairhurst-tcpm-newcwv Speaker: Gorry Fairhurst Yuchung: I think it's tackling an important problem, and apps have the choice of disabling slow start after idle, mostly. Because the performance loss is so large. PipeACK is also crucial. I would like to see if this can fix the bug entering fast recovery the sshthresh should be set based on flight size, not cwnd Gorry: Not sure how we do that, but yes. Whether it is part of this draft, I don't know. Yuchung: I think it should be. Jana: My concern is that there is this issue of burstiness that still exists. This is like a new connection with a higher cwnd. One bit of information this could use is RTT from the last time it sent anything, it could be used for rough pacing. Something better than line rate should be in there. Gorry: That echos something Mirja said, Mark Allman's jump start also had some form of pacing. At least we still know what window worked on this path... however, not if traffic is in place, pacing kind of helps this. Yuchung: 5 minutes could be replaced if using pacing Gorry: The 5 minutes is because things can change dramatically over long periods of time, even RTT. 5 minutes is arbitrary, but reasonable. Yuchung: After idling, you really have no idea, and history is your best guess. Trying to simplify that part. Mirja: Maybe a max burst size here is a reasonable thing. "Maybe you should do something" isn't quite enough. Jana: Maybe we'd be happier with pacing than with bursting. This about temporal sharing of TCP values, there's an old RFC about control block interdependence. From a CC POV we're restarting. Gorry: But we're probably not actually idle, but thin. Draft says "maybe pacing is a good idea". problem is how to standardize pacing. Pasi: Adoption seems a bit early Matt: this is a step forward from the current state of the art, and I think it needs the formality of WG process, and strongly advocate adopting it. Clearly better than the current choices. Needs the attention. Randall: I'd love to see maxburst put in here. It's been in some stacks for ages, it really does prevent some taildrop issues. Gorry: What do I need to bring it back? Pasi: adoption needs people as well. Are we ready to commit to the solution, or should we postpone until next time. Gorry: We need to do something, I'm ok with altering it Hum for adoption, lots of support, no objections