* WEDNESDAY, August 1, 2012, 1510-1710 (Room Regency C) 15:10 Chair Intro Blue sheets Note Takers Thanks Dominique Barthel, Andrew McGregor, and Cullen Jennings for the notes already in here. Additional notes to be integrated from Carles and Carsten Jabber Scribes Brief WG and Document status (chairs) in RFC ed queue: draft-ietf-core-link-format-14.txt Carsten goes through the deliverable list RFC 6690-to be is in the queue (link-format-14) expect to be out in few weeks [actual: published on 2012-08-07] Work on deliverables this meeting: -- coap-11, some work discussed today and friday, some work remaining On Friday, we will have Transport people with us and will try to resolve a couple of issues. -- observe-05 -- groupcomm-02 to be discussed Friday Quick run through individual drafts on Friday, but no hope of real technical work on them. -- Also see LWIG WG meeting for Jari's draft (and also consider over SMS) ** Group 1: "Finishing" 15:20 CoAP core protocol (Zach Shelby) draft-ietf-core-coap-11.txt draft-bormann-coap-misc-19.txt (re #214) Objectives: -- Confirm consensus about recently closed tickets -- Reach closure on open tickets, in particular: #214, #241 (Ticket #215 and congestion control to be discussed on Friday) Zach: review on CoAP core spec since WGLC, two revisions and one interim phone conf. Still 7 tickets to be closed. While this is going on, look at the diffs and tell us if a change concerns you. Question (Akbar): groupcom, Cullen: ??? #201: solution was introduced in -10, go and look at the new parameters. Do the values make sense, are there cases where they don't work? Remember this are default values, one can still have different values in specific networks. #214 Vendor-defined options: we wanted to make it easy, no need to go to IANA Zach comments on the pros and cons of various potential solutions. An important need was to be able to experiment then make an option become a real one. Proposed solution is to add a jump to a high number. Keep lower number for reviewed options. Does this fulfill the needs of those who wanter the vendor-specific options ? Question (Bert??): is this usable together with fence posting or replacing? Zach: see in a few slide, not decided yet. Question (Cullen): [fill in] Question (??): matching between HTTP and CoAP ... given the new options ? Zach: no static binding. See next (critical/elective option, carried in option number), proxy will still be able to propagate or discard option independant of whether it understands it or not. Question (??): why Standard Action, and not expert review Comment (?? green cap): speaks way too fast. [must have been EKR] #241 avoiding having to upgrade proxies and clients for each new option. "Safe to Forward" bit in the option number. Bad news: we'll have to renumber every option defined so far. Carsten displays intended remapping. Question: could we declare that the high numbers are never critical? Designated Experts are good to investigate unsafe critical options that people would want to define. Question: why not just a bit as a flag? Zach: risk of somebody setting the flag incongruent with actual option. With a bit in the option number, nobody can accidentally do this wrong. Re fenceposts (Carsten): time to drop the fencepost, and have a uniform way of jumping in option number? Zach: who is in love with fencpost? no hand shown in the audience. Cullen (asks again): anybody opposed to getting rid of fencepost? -> Nobody. CoAP version number: should we carry current number 1 forward for the RFC, or increment to 2? but then only two usable CoAP version numbers left? Question: is there normative text that says what to do when version does not match? yes, drop! You could also support multiple versions. Copper does support multiple version numbers. Zach suggests we stick to 1. Carsten: change the option numbers is appropriate for a radical change, but then there is a small version number space. Akbar: don't change because of so few numbers. Roberto: don't change. What's the market situation? Zach: 20 implementations at the Interop event. don't believe any implementer will miss the fact that draft has changed. Kerry Lynn: don't increment. Maybe even roll back to 0. Klaus: 0 is used by DTLS, but next byte allows to distinguish. Carsten: can reuse 1 after 3. Matthias: we didn't change version number after -03. This change is more obvious, won't work at all. Propose we stick with 1. Cullen: nobody against? Let's go forward with that. 16:28 Observe (Klaus Hartke) draft-ietf-core-observe-05.txt draft-bormann-coap-misc-19.txt (section 5) Objectives: -- Check that section 5 has the answers for #204 -- Work on #217, #227/#235 Klaus re Observe: #227 aborting previous transaction when state is updated internally. Difficult to implement. Make this behavior optional. Cullen: why not have parallel transactions, and transmit the new state alongside the provious transmission? Akbar: problem with the text proposal. ??: why not start a new transaction and stop any effort at transmitting previous state? Carsten: [fill in] #217 observe clock. (24-bit sequence) Carsten: who understands the problem? thinks it's a good solution? want to propose something else? 10/10/0 Interest in a resource: server send notifications as confirmable every so often. Clent can reply with period of 0 to say no longer interested. Server sends notification even if no state change every so often so client detects server no longer assumes that the client is interested. Cullen: what are the right values for these periods? negotiated or well known? Matthias: only one value for the server side because... Zach: [fill in] Cullen: the client needs to discover that the server has lost its mind. With the lightswitch use case, after an outage, the switch would not work for 24h! Matthias: if no notification received, the client can start doing something else, like polling. Zach: this overrides server trying to send... Cullen: why don't we have a negotiation up front, and have server and client agree on these values as opposed to trying to guess? Freshness model: Cullen: what is Max-Age in normal CoAP? in -observe ? Klaus: it means nothing in -observe. Cullen: Max Age is not a promise that value will change in Max Age, but means that clients are not garantee to be told is value changes in that time. Klaus, Zach discuss the interpretation of Max Age and its use. Carsten: needs more discussion, we'll take it offline. 16:58 Block (Carsten Bormann) draft-ietf-core-block-08.txt Objectives: -- Postpone to next (interim?) meeting (out of time, no discussion) 17:00 Security draft-garcia-core-security -04 2012-03-26 (not discussed again) draft-sarikaya-core-sbootstrapping -05 2012-07-10 (Behcet Sarikaya) Objectives: -- seek WG opinion on this solution -- is this something we want to continue to work on? -- if no, what might be its right place? (Note that ROLL will have a security segment later on Friday, too, see below.) Carsten: Originally, the security work was intended as a separate milestone; currently we have some text in the core spec, some text about how to set up DTLS. Behcet re Security bootstrapping Behcet: In the last two versions, the draft has actual solutions proposed. Using EAP-TLS. and raw public key as a certificate. ??: assume EAP running on a server? Zach: why is this document in Core? Seems you are trying to solve a network authentication problem. Behcet: this is an overflow of 6LoWPAN. Rob Moskovitz: Carsten: we need to stop bouncing this work around. [there was additional discussion in ROLL. -> SOLACE] (Note that we may want to pull up items from Friday 10:15 onward in case we manage to finish early. Or we may have to be brutal and finish stuff from here on Friday instead.) [that was a bit too optimistic] 17:10 break * FRIDAY, August 3, 2012, 0900-1100 (Room Regency F) *** NEW TIME SLOT: 0900–1100 (was 1120–1330) 09:00 * Chair Intro Blue sheets Note Takers Jabber Scribes quick report from IAB workshop on congestion control (NN) Friday Aug 3st; 9:02 am (quick ad about COMAN) Mehmet briefly describes COMAN (COnstrained Management) activity. A mailing list at this time. Informal discussion at this IETF meeting. Networks of constrained nodes, vs. constrained networks of regular nodes. ** Group 1: "Finishing" 09:10 ** Congestion control discussion (chairs) draft-ietf-core-coap-11.txt draft-ietf-core-observe-05.txt draft-bormann-core-congestion-control-01.txt (➔ -02) (We plan to invite people from the TSV communities, e.g., TCPM, TSVWG, ICCRG, or TSVDIR for this discussion. That's why it is scheduled for Friday and at the start of this meeting.) Carsten re Congestion Control: we have invited Transport layer experts in the room. Self-fair: fair between applications running CoAP; also fairness/friendliness with other protocols such as TCP. Carsten shows and comments slides borrowed from IAB CC IRTC workshop 2012-07-28 by Mark Handley: Fragmentation is a common cause for congestion collapse. Undelivered packets another one. In CoAP, barring firmware update, traffic is not bulk flows. But do not want too much state, which is needed for flow control loops. Padhye/Firoiu (TCP 1998 perf study): two linear responses: left portion is caused by dup-ACK retransmissions, right by timeout. CoAP does not want to know how to exploit good links, because window is only 1. David Black: this comparison is irrelevant, don't compare to TCP because you are not doing enough packets to cover this; CORE is not TCP-like, therefore treat it like UDP. compare CoAP to UDP, see RFC5405 for start. Salvatore Loreto: We have to decide if we want to talk about CC in the public internet, or in constrained domains. Secondly, usually a COAP sensor is not going to send a lot of traffic. It's not that a single sensor will cause congestion, but that many of them together will. Key problem is lots of nodes sending little, not one node sending lots; congestion likely caused by multiplicity of sensors, not bulk flow of one sensor. On the topic of congestion collapse: For example there might be an earth quake and many sensors react at same time. David Black: There may not be much you can do about that in the protocol, if they all get synchronised. Randomised exponential backoff recovers, but it doesn't stop it happening in the first place. Carsten: No RTT estimation in CoAP, a little too agressive for high RTT and high losses, compared to TCP. In the case nodes sense correlated events, there are risks of new congestion conditions. David Black: 5405 very strongly recommends that you do something about RTT estimation. You need to think about a RTT estimator. Look hard and see what you can do with RTT. A fixed RTT estimate is going to cause issues. Lars: Obviously, there are going to be congested scenarios (open a million connections per second). Density of deployment, for example, is an issue. CoAP applications are new stuff. The question is does it work in the operating region we want to be in. Applications will have some assumptions built in to them, and if the actual network is outside that, it won't work. Drafts need to say something about what region they can work in -- like bandwidth and number of nodes. Needs someone to be interested, because there is room to improve on TCP by rather a lot. Lars: There's no strict limits in 5405. We really need some experience, rather than picking magic numbers out of the air. 5405 was written for the big Internet. Don't pick your magic number 4 by analogy to the big Internet. If you can't do RTT, should not do more than 1 outstanding transaction. David: Packet size matters, except where it doesn't. Some resources congest on packet count, not on number of bytes, and when that happens loads of small packets is awful. Ekr: COAP doesn't really have the same issues as HTTP, in that the app writer does not have opposed interests to the user or network operator. Web pages have interests in opening lots of connections, and limits in the browser is to limit the impact of that. Cullen: The spec has a number of features, like new observations cancelling old. Ekr: I think I'd throttle somewhere. Michael Scharf: One parallel transaction should work, should it not? Carsten: Non-confirmable vs confirmable behaviour might need to be different. Lars pushed back on no room for minimal congestion state: I'm not an implementer, but if you can have four outstanding transactions outstanding, it seems odd you can't have a few state variables and a timer. Carsten: in an actual implementation, responses are stateless. Lars: But if you can have packet buffers, there's probably enough space. If you're ready to handle 4 connections, then means you have memory for 4 packets. Then you have memory for congestion control. Cullen: we even decided we have enough memory for DTLS... EKR: I don't see how there's not room for a crude RTT estimator. Carsten: DTLS is where the RTT estimator could be. Jari Arkko: Having some implementations without DTLS, it's not too bad to add RTT estimate. If there are clients that send to multiple destinations, blocking is an issue, and I'd rather pay for RTT estimates than deal with not being allowed to make parallel connections. David: That will help with these concerns. As soon as there is a response, you can do something with RTT. If you have DTLS, that should provide a decent RTT estimate. Ekr: A request pending for 3 seconds is very different than 50 per second, so maybe it is better to limit transaction attempts per second. David Black: Another thing to keep in mind, a lot of HTTP is users behind browsers, which provides randomisation. This is sensors, not humans browsing the Web. Be careful of the flashmob effect, where something happens and every node decides to talk about it... boom. Jari: Observe, servers should send not more than one observation every 3s to an endpoint. This is odd, because the real problem is overall traffic in the network, aggregated. So endpoint may not be the right answer. CoAP "endpoints"? Jari - not sure endpoint is right answer. David: think of the various useful cases, not theoretical performance. What answer is going to have the best do-no-harm properties. It might be useful to think about the separation between what is appropriate in the flashmob case and what is appropriate in routine observation situations. NSTART adjustment proposal (decaying blockage by 7 B/s) -- deemed from the audience to be as complex as RTT estimator. David Black: if you don't want an RTT estimator, then you are stuck with 1 outstanding packets. Advanced mechanisms might include a TFRC type loss rate estimator (for sensor sending frequent observe data, e.g.). (On the topic of multicast response dithering: something like this might be useful for some applications of observe, too.) Ekr: What is a quality of implementation issue? Carsten: People can always build something that doesn't work, we can only help with specifications making it harder to do that by accident. Michael: I'm wondering about how to implement this (nstart decay) without timers. Michael: You probably need a separate specification for a minimum sensor. David: This looks like what is required for an RTT estimator. Carsten: Maybe NSTART is 1 for no-RTT nodes, and if you have one you can do more. Lars: If you pick a constant, you need to have a mechanism to back off as well. Carsten: 93 seconds of binary exponential backoff Lars: You can't cycle at that point. Carsten: We don't say anything about that. Lars: I'm worried that if you simply deploy too many devices, you'll end up immediately congested. Carsten: If we want the baseline to require state, the state needs to be tied to something. There is already state that manages the message-id, and we could tie congestion control to that. Lars: TCP 3s, 6, 12 if loss detected. In CoAP ? Carsten: cycle is 93 s. Lars: you use 2.5 seconds estimate to start, but you need a way to slow this down. When restarted, in at 2.5 seconds again after a transaction had failed -- would not be ok. Lars: in the Internet, hard enough to gauge network. Here, little traffic, flaky links. Don't want to be in the state of needing to peel batteries out of all installed devices to get million sensors to stop the congestion collapse. Cullen: We have lots of congestion expertise in here, so what might we write in the spec? Why I'm chairing both areas where Lars thinks you could get a PhD in transport I don't know... We'd like to make it probable that this will work if deployed as specified. Cullen: On to what can we do. Restructuring the conversation. 1) Sensors sending non highly correlated data with confirmable. 2) Highly correlated data 3) multicast, non-confirmable. Lars: if you can overhear the channel, or other link layer info, could infer some knowledge about congestion. Lars: There's a certain amount of available resources, and if there's a lot of nodes we'll have very low limits. In the internet we can't really use link-layer information, and if there's radio information available that can be used we can probably do something useful. Ed Beroset: Unlike in wired nets, we don't want to assume packet loss is due to congestion. Also, it is probably useful to think of classes of devices, eg minimalistic and complete, so what mix of those causes things to fail. David Black: related to restructuring the conversation. Flashmob may a separable concern. Dithering can help, with app knowledge. Also there are debates about TCP friendly, so do not take TFRC as gospel. Step back, take a look at traffic mix. If you're basically all the traffic, don't worry about it, think about self-fairness -- more important than how behaves in the presence of TCP. Andrew McGregor: Read Tim Shepard's thesis, it explains how to make mesh scale (http://publications.csail.mit.edu/lcs/pubs/pdf/MIT-LCS-TR-670.pdf) Michael: The important thing is congestion avoidance. And if we have congestion, we have to react. Have to make sure there are reactions to problems, so have to detect AND respond. Lars: this is not just congestion control, it's also energy conservation. You can also view this as an energy management problem, so you don't drop packets unnecessarily. Cullen: ok, it's broken. Throw in some suggestion! Lars: could design some simple reactive mechanism. A few handful of bytes of state for a few connections. Use confirmable ping-pong for some probing. What do we do to completely avoid state? very hard. But with very little state -- if we are talking 10 of bytes of total state and times, we can come up with something reasonable. I think something worthwhile can be done in ~100 bytes or less, and using only a few timers. There's no existing transport standard that can help Richard Kelsey: if you do DTLS, much of this mechanism already available. I'm convinced RTT estimation is the right thing to do. For the rest of application, we hear that RTT estimator is really important. But every time someone wants something, it nibbles away at the available space. 'It's only a little bit' adds up; our ram and flash get used up by lots of additions of "just a little bit". Sort of by the argument here but scared of the consequences. David: -- Start with the confirmables, when you expect a reply. This gets you RTT estimate, and you get ack-clocking, where you don't send the next request until the last one comes back. This is a starting point. -- Also, dithering is good. One of the ways you get in bad trouble is synchronisation, and dithering in many places is a good way to avoid synchronous collapse behavior. -- Do something fairly simple and tightly constrain in the baseline information, and don't allow more traffic without more congestion control. Matthias Kovatsch: These nodes are on lossy links that fluctuate lots so saving state does not help much because the network has changed. If links are changing, what can you do? Carsten: right, we might be sending packets less frequently than the links are changing. ???: 1) unlike wired network, don't want to assume every loss is due to congestion. 2) if we have 2 or more classes of devices, what mix of theses causes things to fail. If all the automation systems in a building, and if you had a massive power outage, could be bad. Congestion nightmare. Lars: limited bandwidth. If few nodes can send lots, if lots of nodes very hard... -- can you overhear packets send by others? Can tell how busy the channel is? Is there link layer information? -- do your link layers deliver corrupted packets? might be able to use information that a corrupted packet was received to infer link behavior. Delivery of corrupted packets lets you distinguish between loss and congestion. Carsten: That's a really good research topic. Cullen: I'm sure the range is huge, but sometimes the radio is a serial line with no information at all about the link layer. Carsten: Of course, the RTT estimator has a shelf life. David: The network may change much faster than you send packets, but the estimator may still help. Andrew: Keeping estimates over network fluctuations is a better estimator than anything else you might do. Experience with Wifi is that most recent estimate is always better estimate than default constant when restarting. Salvatore: We can keep discussing for more time than is available. Suggest to set up a design team, work on draft.. Cullen: We have a proposed algorithm: -- If you have no RTT estimator, N=1 (lockstep). -- If you do, N=something bigger (perhaps 4) What would be a reasonnable number larger than 1? 4? Cullen: suggestion for non-confirmables? Lars: If you have mostly confirmables, you could allow non-confirmables as a small fraction of the confirmables. Cullen: Most of the use cases suggest the bulk of messages would be non-confirmable. Lars: That's very hard, then, given the lack of feedback. Andrew: A few percent of randomly timed confirmables to probe for loss rates and RTT would be sufficient (suggested by Minstrel's probing... this case needs much less info than minstrel, which probes with 10% of packets). Andrew: require that certain small fraction of non confirmable are confirmable - between 1 and 12 % of time ? Zach: That's completely compatible with the application cases. Would be fine with requiring percentage of confirmables. The server generally is concerned that some of its data is getting there. David Black: On the confirmable: -- We get an RTT. A coarse RTT estimator can help a lot. The difference of dealing with 1/2 second vs 5 second RTT is huge. -- And we get an ack clocking. We should think about expanding the dither. If we have a way to extend the dither, this can help deal with accidental synchronization. Do something fairly simple and tightly constrained, and allow people to do more if they do more. Possibly the flash mob problem can be separated out and dealt with much like multicast dithering. If you can be on network that is all CoAP traffic, self-fairness more important than TFRC. Cullen: since want to get the draft through the IESG, what about separating the non-confirmables out of the main draft? Lars: [fill in] Andrew: Wifi uses 10% of probe packets, a smaller rate should be fine here. 1%? Zach: ok with small percentage of confirmables when sending non-confirmables. Cullen: calling out for contributors. A few. -> Folks working on congestion plan: Carsten, Lars, David, Andrew, Bert, Michael Lars: Are there testbeds or simulators to validate this? Michael: Can help. Cullen thanks the transport experts who took the time to come and help us. 10:10 observe (continued) Finish the discussion of freshness, subscription lifetime models (stubbornness), and its influence on the options transported. (Time did not suffice to prepare this segment; time ran out anyway...) ** Group 2: Working-Group Documents 10:15 Groupcomm (Akbar Rahman): draft-ietf-core-groupcomm -02 2012-07-10 (15 min) (Background reading: draft-dijk-core-groupcomm-misc -01 2012-07-10 (0 min)) Objective: - Review updates to address tickets #185, #188, and summary of other changes Akbar Rahman Groupcomm Slides are self-explanatory Discussion about whether a node can tell if the packet was received on the unicast or on a multicast address of a scoket. Jorg - brought up issues you often can't tell if received packet came on multicast or not. Lars - Multicast issues. Responses have to fall under the other congestion budget. Multicast response MUST not be sent if there's no CC budget left right now (after the leisure wait?) Zach - Change request... server MUST not respond with errors or empty responses. "A server MAY choose not to respond...". Lars and Zach recommend to change this text to "MUST not". Need to deal with not getting 404 back from half the group. Common problem is being able to configure multicast listening, especially for building automation style work, where the group is not all-nodes. Zach Shelby: would be nice to have a separate REST interface to configure groups. ** Group 3: Pre-/Non-WGLC work 10:30 We won't have much time for new work, but if you want to lead a discussion of any of these, potentially talk to the authors in the same grouping, and then send objectives! 10:35 HTTP and CoAP: draft-castellani-core-http-mapping -05 2012-07-04 draft-castellani-core-advanced-http-mapping -00 2012-07-04 Objectives: -- define next steps for these documents Salvatore Loreto re Best practices for HTTP-CoAP mapping implementation: Since Paris, split the draft in two. Cullen: the working group is so far behind on WG milestones, probably can't adopt any new WG drafts at this time. This draft needs more review anyway. Zach: Encourage people to keep looking at this, and I'll review it. Peter Van der Stok and Zach commit to do a review of this draft in August. 10:45 Ran out of time. Carsten: Please give feedback on the overview of all these drafts. Barry Leiba (AD): I would like to see the drafts prioritised on the list, not in meetings, not at length. But too many presentations in the meeting is not the right balance. A working group meeting with a presentation of 30 drafts doesn't make sense. Would like to see the 30 drafts prioritized on the mailing list and then perhaps talk about top few. Interfaces, profiles, resource discovery: draft-shelby-core-interfaces -03 2012-07-11 draft-shelby-core-resource-directory -04 new 2012-07-16 draft-vanderstok-core-dna -02 2012-07-10 draft-cao-core-pd -02 new 2012-07-16 draft-vial-core-mirror-proxy -01 2012-07-13 draft-wang-core-profile-secflag-options -01 new 2012-07-16 (not discussed) draft-greevenbosch-core-profile-description -00 2012-06-12 (Slot requested, 5 min, Bert Greevenbosch) 10:50 Misc: draft-fossati-core-fp-link-format-attribute -00 2012-07-09 draft-fossati-core-monitor-option -00 2012-07-09 draft-fossati-core-multipart-ct -00 2012-04-08 draft-fossati-core-publish-option -00 2012-07-09 draft-he-core-energy-aware-pd -01 new 2012-07-16 (not discussed) draft-li-core-conditional-observe -02 2012-06-08 draft-greevenbosch-core-block-minimum-time -00 2012-07-09 draft-li-core-coap-payload-length-option -00 2012-05-26 (Slot requested, 5 min, Bert Greevenbosch) 10:55 Sleeping, identifying: draft-arkko-core-cellular -00 2012-07-09 (to be discussed at LWIG meeting.) draft-arkko-core-dev-urn -03 2012-07-09 (not on agenda this time.) draft-rahman-core-sleepy -00 ipr 2012-07-06 (Slot requested, 5 min, Akbar Rahman) 11:00 break ---------------------- Over coffee, in the hallway, we can then discuss: draft-bormann-core-links-json -01 new 2012-07-14 draft-bormann-core-roadmap -01 2012-04-06 Instead discussed in LWIG: CoAP over X: draft-becker-core-coap-sms-gprs -02 new 2012-07-15 (10 min, Klaus Hartke) Not discussed: Security: draft-hartke-core-codtls -02 new 2012-07-16 (Slot requested, Klaus Hartke)