DIME WG IETF 87; 130 min (including the break) FRIDAY, August 2, 2013, 1120-1330 CEST --------------------------------------------------------------------------- Chairs: 11:20 o Agenda and WG update 10 min Note taker: Jean Mahoney Jabber Scribes: Stefan Winter (first half), Jean Mahoney (second half) Jabber log: http://www.ietf.org/jabber/logs/dime/2013-08-02.html Presentation: http://www.ietf.org/proceedings/87/slides/slides-87-dime-3.pptx New RFCs since IETF 86: - RFC 6942 (draft-ietf-dime-erp) Jouni - Congratulations to the authors. In WG: - draft-ietf-dime-app-design-guide (Proto Write-UP; pending on Jouni ;) Jouni - Writeup will happen any day soon - draft-ietf-dime-group-signaling In IESG: - draft-ietf-dime-overload-reqs (WAD GoA::ADF) Jouni - Last Call has been announced. Benoit has submitted his AD review - draft-ietf-dime-realm-based-redirect (WADEva) Jouni - On AD's table - draft-ietf-dime-rfc4005bis (WAD GoA::ADF) Jouni - Needs an update In RFC Editor's Queue: - draft-ietf-dime-pmip6-lr (MISSREF to RFC4005bis) Agenda bashing - no changes Milestones update - Milestones were shown ---------------------------------------------------------------------------WG Documents: 11:30 o draft-ietf-dime-group-signaling (Marco) 15 min Presentation: http://www.ietf.org/proceedings/87/slides/slides-87-dime-4.pptx Marco Liebsch presented. Slide 15 - Open Items Marco - May get into trouble if group-specific AVPs are optional. Issue with terminating these sessions. slide 16 - Open Items Marco - Group masquerading puts a mask on the group identifier. slide 17 - Next Steps Marco - Could set up a phone conference in September to discuss. Ben Campbell - What are the mid-session-change use cases? It seems to be expensive. You may have hit on it on the multi-proxy stuff. Do we have a solid use case for mid-session? Simplify by making it invariant. Hannes Tschofenig - Ben is asking for the use case. Marco - it's the dynamic re-assignment of groups mid-session. [É] Ben - The decision process should be - If we know we need it, let's put it in. Eric McMurry - What are the use cases for having the client change the group list? There may be [É] Marco - [...] Lionel - When you create the session, you don't know who will create the group. The server can decide to accept or overrule - user case - asynchronous [É] Eric - What's a specific example? Lionel - GW and PCRF. There's no assumption on grouping. Always a client assigns to the group. And there's a restart. The server doesn't know. The grouping would be impacted. Otherwise, we would need to define that only one is responsible. I don't know if we can argue that. Eric - I agree. We should go through the use cases. Marco - If we don't find mechanisms, we can drop. Jouni - Supported features open item - are you trying to replicate the 3GPP one? Lionel - no. an implicit mechanism if it there, [É] A way to standardize a feature. We could use an explicit AVP to advertise. Hannes - We use the supported feature AVP in a few documents. Mobile IP stuff. Jouni - Pick a different name - it collides with an existing AVP in SDO space. Lionel - the meaning is the same. Ben, jokingly - put an X- in front of it Marco - It's not standardized in the IETF but in other SDOs. Hannes - We talking about defining a generic super AVP, but it's been bastardized. IETF RFCs have feature capabilities negotiation. Jouni - My confusion was over the name, not the function. ---------------------------------------------------------------------------Diameter overload control: 11:45 o draft-campbell-dime-overload-issues (Ben) Presentation: http://www.ietf.org/proceedings/87/slides/slides-87-dime-0.pdf slide 3 - Non-Adjacent Overload control number 34 requirement - adjacent - sharing a connection non-adjacent - assuming an agent in between. slide 4 - Non-Adjacent Use Cases Martin Dolly - we don't think load or overload would be shared between carriers. Kalyani Bogineni - we won't share load or overload information with other carriers. Jouni - We need to be prepared or we'll have to patch it. Or if they have huge network. Ben - I agree in general. Need to balance for adding complexity. Hannes - Passing topology info may not make sense. With overload - you want to pass to the client, because only they have connections to the front-end protocol. If you just drop messages, instead of returning an error, you cause more messages. The diameter client doesn't know what went wrong and retransmits. Ben - 2 possible ways. An agent can take overload abatement action like reroute. If agent drops, it needs to send an error message. You want to throttle close to the client, and reroute close to the server. Hannes - the issue isn't on load balancing rerouting. If supporting clients and servers are in differenet realms, you need to send something to the client. Ben - I agree if you are going to drop on the floor. If it is possible to solve without dropping, then you don't have to tell the client. Hannes - If we exclude the option of sending info Ben - this use case is that the 3rd party doesn't support overload control. Hannes - we won't have to care Ben - the carriers don't plan to exchange info with other carriers, but may want to share info with affiliates in other realms. Lionel, from floor - Comment from martin - interconnect - you may rely on diameter for network sharing. If you want both endpoints to follow rules. In global solution - you could rely on TOO BUSY or something specific [É]. Ben - In the other use case, within a carrier, the realms don't have to align - there can still be a non-supporting agent in the network. slide 5 - Operator Interconnect Use Case Ben - interconnect may only work at the IP layer. The edge agents talk to each other, etc. We are talking about the agents that don't support or are not trusted. We have carriers saying that they don't need this right now. By the time we have a use case, the ipx carrier must support the scenario. slide 7 - Non Supporting Agent Ben - Nodes are not necessarily clients or servers, they could be agents. slide 8 - Non-Supporting Agent Characteristics Lionel - On the first question here ("Single administrative domain?"), is that based on the assumption of peers sharing a connection? Ben - yes Lionel - It's an issue if only relying on connection info. Ben - go back to slide 7. There are two assumptions: node doesn't support, the islands can't send to a non-supporting agent. Hannes - what are you suggesting? Ben - I'm talking about tradeoffs on supporting. Are the tradeoffs worth it? I have an opinion, but not pushing it. I have more slides on difficulties. Hannes - It doesn't come up in other solutions. Ben - let's look at the issues and whether they apply to other solutions. slide 9 - Non-Adjacent Topology Issues Hannes - There's a doc that allows you to control an intermediary. Ben - in common use? Hannes - I don't know. Routing topologies are not that complicated. Ben - there's often an invisible topology. Each box could have a lot of blades with load balancers in it. Lionel - In my experience, if you need to take of the path - you make all the agents in the middle act as one agent. Whatever the load. The client will only see 1 logical agent. Ben - topology hiding. Hannes - There is no expectation to control the routing of messages within the domain. Overload in a agent is not what were trying to solve. Ben - we have been working on that. We're making deployment assumptions. If we're going to make assumptions, we need a BCP on designing networks for overload control that are then less flexible than what could be done otherwise. Eric - The clients not seeing the agents is the root of the issue. You're throttling for what they can't see. You want to minimize throttling, you don't want to cascade. If we throttle a lot of things, then we haven't improved the problem much. Ben - Say that something impacts the upper-left agent. The client can't see it. The client only knows that the realm is overloaded. Client reduces all traffic. But the good agent can route. He'll reduce far more. Hannes - I wasn't talking about that. The agent attached to it can do another path. Ben - we're trying to push back on load before we have problems. Hannes - my problem is with an overloaded server, not an overloaded load balancer. If the failover occurs with agent, I'm ok with switching over rather than reducing load. Ben - We were talking to carriers pushing these requirements. They are concerned about cascade failures and network oscillations, over-responding to overload. Hannes - The carriers that we talked to said that they were concerned about the server. If you can't load-balance, the client has to reject some messages. Ben - these are not necessarily load balancers, they could be edge agents. Hannes - we have to figure out tradeoffs. Ben - in the real world, chains of relays occur, we have them in 3GPP topologies. Eric - I agree on keeping it simple. These use cases came out of what actually happened. We've had these discussions. These complicated topologies exist. Ben - cut the scopes Jouni - we have used up the time for both your presentations, and are now into Hannes' time. Hannes - there are topologies that are complicated. Which do we want to address? It's a scoping issue. Ben - The people with the problems are the wireless carriers, and they have these topologies. Hannes - We've talked to them. The server is the problem. Routing is not the problem. Ben - i can show it. Janet Gunn (remote, relayed by Stefan Winter) - if you try to control the overload solely form the server, you get congestion collapse. The resources needed to "reject" the messages overwhelm the server. Jouni - This is a good discussion if you have something that you want to show. Ben - this discussion is the crux of the disagreement. The scope discussion is tied to this. Jouni - If Hannes will give time, you can continue. Hannes - yes Ben - [...] Ben - crossing multiple connections may need some sequencing. Hannes - The ordering of messages is not an issue unless you have multiple connections between two peers. Not in the spec. Ben - it shouldn't occur, but it could. Hannes - [É] Ben - if you can guarantee the same path, but not sure of that. It's a discussion to have. I'm done. ------------------------------------------------------------------------- o draft-roach-dime-overload-ctrl (Ben) 30 min Presentation: http://www.ietf.org/proceedings/87/slides/slides-87-dime-1.pptx -------------------------------------------------------------------------- 12:05 o draft-tschofenig-dime-overload-arch o draft-tschofenig-dime-dlba o draft-tschofenig-dime-overload-piggybacking (Hannes) 30 min 12:25 Time for discussion on all above 10 min Presentation: http://www.ietf.org/proceedings/87/slides/slides-87-dime-5.ppt Hannes presented. Slide 5 - Principles Ben - We have an overload requirements document that has been in WGLC and is in AD review. It's in LC. We've done a fair amount of work. Hannes - I don't disagree with the requirements doc. There's a second order of problems: the solution itself causes problems. Non-adjacent topologies are 2nd order. Security causes overhead. etc. Ben - I don't disagree. The draft sounded like it was going back to first principles. If we find problems, we need to say why. May need to make refinements. We have a basis. Eric - the requirements draft came from actual deployments, things haven't gotten simpler since then. The fact that we have non-adjacent solutions - that's the point - here are the problems if you allow it to happen. There are couple of ways to handle it. It's not advocacy. Kalyani - focus on the real world problems - Overload events are not rare. It's caused by the use of new technologies. It's caused by not-known issues. We don't know the causes. We need solutions. The networks will be different when VMs are deployed. It' not realistic to solve for both today. Have deployed in pairs [É]. Hannes - you've summarized my talk Martin - load balancing is different than overload control. In a solution - you need load balancing to mitigate overload. In PCC realm, you need group ids for protecting the network. Hannes - both of you agree that load balance and overload are different. Your server farms look different. Ben - There interdependence on load and overload. Overload control depends on load reporting. Hannes - we agree on something. Ben - [É] Eric - That's a different conversation. Widely different topologies. Load and overload are orthogonal and interrelated. Separation is ok - need to solve proactively and reactively. Proactive overload control is a fundamental requirement. Hannes - There's a relationship. If server A is overloading and then server B is overloaded. Mechanism won't work any more. You need to tell client. Kalyani - [É] Hannes - [É] Lionel - A global comment on existing topologies - you need to take care of what's in base protocol - it needs to be flexible. First issue - we are not able to rely on TOO BUSY, we need to enrich the mechanism. We have different implementations [...] Hannes - looking at the base features makes sense. Eric - load balancing is a simplistic proactive overload control. We need to think about proactive abatement. Hannes - definitely. On advancements of technology - loadbalancing handling from overload-control communication, you can advance them independently. Slide 7 - Load Balancing Hannes - passing loadbalacing info out of the operator doesn't make sense. Slide 8 - Information Model Jouni - We have a non-trivial problem with how to go forward. We have been talking with the ADs. Proposal - tool set for the overload control use cases. Hannes use case for load balancing. Have multiple docs on the area. One defines the AVPs. A solution doc that can use the toolset. Can make a solution that doesn't care about the case with no interconnect. Can do it if we need it. Or create load balancing with the tool set. Having the data model. We need a document that describes how to use the existing stuff - TOO BUSY. Get this done - form a design team on an expedited schedule. Before Vancouver. Eric - load balancing shouldn't be conflated with overload control. This will take much longer. This is not a fast way forward. We have already gone to LC with requirements with complicated use cases. There are not so many competing proposals that cover requirements. We don't have anything else that comes close as Adam's draft. We can update that doc. We will miss external deadlines. Lionel - We had discussion on data model. In the existing Tekelec document, there's disagreements. We may have some agreement on that. Working on what is really needed to covey this info. This is has been a ping-pong game. No agreement on assumptions. We want to have a design team create something soon. We don't have a baseline working group document. Jouni - Take adam's draft as a template. Glen Zorn - We've had a design team in everything but name. A new group came to the working group to design this. Then Hannes showed up. It's the same 3-4 people - they are the design team. I thought I heard someone say - overload is constant - it's frightening that it's brought up in the IETF. It's a network problem, not a Diameter problem. Hannes - we've had discussions off-list. They didn't converge, they don't have a specific goal. Eric has said - It would be so much faster to take my document - that's the problem. It's slower if you don't agree with him. Keith Drage - On behalf of my colleague, Jean Jacques, ALU - We have an overload control draft but haven't submitted it because we didn't want to confuse things. The roach draft has some good ideas but some aren't. Partitioning is a good idea. The grouping stuff could be deferred. We support a design team. Ben - What's the 3GPP schedule? 12 - completion by June next year. Lionel - Protocol completed in June. Ben - Kalyani didn't say that the network was always in overload, but it's more frequent than rare. Kalyani - that's right Ben - I would like to hear comments from carriers on a way forward. There are disagreements between me, Hannes, Eric, and Jean Jacques. I would like to hear from other people. Lionel - the adoption of WG doc - we need to agree on basic tech stuff in draft. The main point [É] connection, scope, link between overload in load. Not saying [É.]. We should agree on some parts. And go on with this one. We are missing a lot of things in a single doc. Martin - I've been in all the discussions. There's agreement that the overall solution needs to deal with load balancing, overload control, and group ID. There's consensus that they can be in separate drafts. If we modify the Roach draft - remove parts dealing with load and group id, which I proposed to the list - remove host scope and session scope - we're down to a smaller of scopes to be discussed. For each of the scopes - there's a misunderstanding of use cases. The elephant is: do we relay in application responses or in a connection message? One use case - in mxn message - pool of HSSes, diameter agent, and MMEs. And a DOS attack on MMES. Not in overload yet. The DRA is in the best place to throttle or shut down the MME - put it in the connection message. The connection message is only impacted between agent and client. Can't think to server to agent and server and client. Maybe look at it in the light. Lionel - don't forget - connection is message at connection level. ?? Eric - A lot of good ideas have been thrown out. I want to clarify our position. If I was just concerned about who was paying our checks, I would be scuttling the effort. If we look at next June. There won't be time if not soon. Don't have [...] Keith - in my view - we can't adopt the roach draft. We need the design team, we haven't modified the roach draft. Take the discussion out of the roach draft. Split the drafts up. Kalyani - Work backwards from the deadline - once the protocol work done - do we need to finish in Nov timeframe. Martin? Thanks, Ben - we do have to have problems to be solved, it's always urgent. Lionel - Have something to discuss or publish Hannes - I agree to have a design team as soon as possible. When you optimize something - you always think about the overhead to add messages. This signaling doesn't happen with all the messages - it doesn't happen that often. Ben - if we are not in good shape in Vancouver, we've failed. If our various customer SDOs need 6 months to work on it. Not necessarily an RFC, but stable. Is data model and semantics is enough? WIll 3GPP do their own transport? Jouni - don't now. We should have something concrete and complete by next meeting - In last call. Martin - address load balancing, and overload control, and group id in separate drafts. With respect to 3GPP - in our Nov meeting - there can be meaningful CRs into CT4 into Nov meeting. When you receive overload info, what will do procedurally? New requests, existing request? Just knowing AVPs can start work. Ben - semantics have to come before encoding. Martin - ALU draft - roach draft but scrub out all the scopes. If we take out middle scopes - session, load, host - only leaves a few scopes. I could go either way. Could take them out - focus Design team to argue put them back in. If we agree on that. Jouni - yes Hannes - dislike not piggybacking the info. Ben - I didn't get to the slideÉ Jouni - once we get back home, we'll set up the design team. Jouni - Design team - send chairs email today. or proxy for someone else. Keith - Limit the number of design team members. They should justify why they should be on it. Send to the list. Jouni - thank you for the discussion. Lionel - I think we've progressed. Jouni - We are rearranging agenda - end-to-end security has been presented before. Cathy? ---------------------------------------------------------------------------Diameter e2e security: 12:35 o draft-tschofenig-dime-e2e-sec-req o draft-korhonen-dime-e2e-security (Hannes/Jouni) 15 min Presentation: http://www.ietf.org/proceedings/87/slides/slides-87-dime-6.ppt ---------------------------------------------------------------------------New topics: 12:50 o draft-zhou-dime-4over6-provisioning 10 min (Cathy) Presentation: http://www.ietf.org/proceedings/87/slides/slides-87-dime-2.pptx Cathy Zhou presented. Lionel from the floor - the definition in AVP should be done in this group. Instead of creating separate AVPs - use group AVPs for 6to4, 4to6 AVPs. Have info on one side. I have to review the draft again. Group required info. Don't have to create AVP tha [É] Cathy - It's for the CPE to know which tech. Jouni - Lionel is saying to group - inside another group AVP. You already know what the stuff is about. Just another way of doing. A diameter way of doing things. Lionel - create a stack AVP. Don't have to have separate AVP. Put on mailing list. Jouni - happy to see that the softwire working group isn't doing AVP work. Now this is not on the charter. We'll figure out a way to sneak it in. Can do everything on mailing list. We'll get back to this and work on the doc. Jouni - about to end Eric - on end2end - I'm seeing progress there. Martin - in china - interested, will put in GSA list - lionel - seems stable, need sec review on the doc. Jouni - It will be a busy next time in vancouver. End of Meeting