Minutes for NVO3 at interim-2015-nvo3-9
Network Virtualization Overlays
||Minutes for NVO3 at interim-2015-nvo3-9
IETF NVO3 WG
Virtual Interim Meeting Agenda
2015-05-22 10:00-11:30 EDT
Chairs: Benson Schliesser, Matthew Bocci
Secretary: Sam Aldrin
Note-takers: Sue Hares, Matthew, Sam, Ignas
1. Meeting Introduction and Agenda Bashing
Chairs (5 min)
chair reviewed the note-well, and pointed to the etherpad and virtual blue
sheets, Chair indicated the topic was OAM, and coming to consensus on the OAM
mechanisms. Agenda item #6 has been presented in Dallas. Need to agree on how
to mark the bits. Get an understanding on how to proceed and reach consensus on
Erik Nordmark (15 min)
Eric Nordmark: This was covered in interim before Dallas, now refreshing. There
is an update to the document, -02 has some minor changes based on feedback. One
thing that is an operational consideration has to deal with fragmentation and
MTU. The way how some of these technologies are deployed assumes that underlay
MTU is well managed. This shows in VXLAN RFC too - it says you should not do
fragmentation. Similar is also seen in SP networks. We can also see other new
encapsulations that will continue along the lines of requiring large MTU.
However it still seems to be practical to have a mechanism to detect and report
misconfigurations and diagnostics. Easy way assumes to generate ICMP message
back and include as much of payload as possible, but not everyone implements
that. Another aspect - if you define another encapsulation that is not
constrained to a well designed datacenter, it may need fragmentation and
reassembly. In terms of OAM itself, the design team looked at OAM and realized
that we do not want to redo all of the OAM discussion, but what we could reuse
for a OAM packet format. Inband OAM for measurements, and out of band for
maintenance. Fundamental property - OAM packets should follow the same path as
data packets, which places some constrains on entropy and forwarding operations
of devices. The other thing that matters is to have some way of marking the OAM
messages in the data plane. Measurements would require adding counters,
timestamps, and other bits to be added in. Error reporting mechanism is also
required, and this is not what is largely done in NVO3. Relying on ICMP is not
always reliable, it may get filtered out for good operational reasons. Also a
question of coordination between the different groups working on OAM.
Matthew Bocci: Are you talking about RDI (functionally)
Eric: May be the underlay, which may use ICMP, or could be in the overlay.
In order to avoid sending OAM to end stations, better to drop after
decapsulation rather than overload a bit in the header.
Alia Atlas: The goal of design team was to research the options. And it is
important to have similar mechanisms and not invent all new.
Benson: You said this was reflected in the draft that was posted last night.
Erik: This is an updated draft based on feedback in Dallas. The next step is to
request this to become a WG Draft. Benson:We should talk further at the end of
the call. if this is a complete set of OAM requirements.
Chenhao Philips (15 min)
Erik: Section 5.4 on the number of available paths - that states "Must provide
a mechanism to exercise/trace all data paths that result due to the ECMP/LAG
hops in the underlay." We ran into several pieces of this problem during TRILL
discussions. The problem is the NVO3 does not know how wide the ECMP/LAG in
the underlay. If you do not have insights on what the underlay protocol is,
you cannot really find out. You have on node 1 - 4 paths, and node 2 - 4 paths.
You would assume that you have 16 paths. The packet hash may be identical.
You attempt to avoid this, but it can occur. Therefore, you might only have 4
paths (as hash calculate is the same). It would be good to understand this is
the limitation. Deepak: Is the requirement to discover all the paths? Or is it
just to travel all the paths. Erik: If you exercise all datapaths are being
used - is this your question? Sam Aldrin: Seems like a reasonable requirement.
Deepak: You should have the mechanism to trace the paths used. Erik: I did not
understand what you indicated? Deepak: It is difficult to get all the paths in
MPLS and to discover the hash path. Discover all paths is a difficult
requirement. We should be able to handle all paths to query. The mechanism
should be there. The protocol may not automatically do this query, but Erik:
You can finnese this by stating if you know the paths, then you can use the OAM
to exercise/trace the paths. Benson: This seems like good advice. Suggests
authors talk to Erik off line to revise phrasing? Any other comments.
Deepak Kumar (15 min)
Shows the OAM frame - presented some time ago.
NVO3 shim layer, followed by optional payload fragment. Uses Trill OAM -
EThertype followed by OAM message channel. Explains bits in OAM frame. OAM
message channel. Various OAM frames can be sent on this. CCM, Loopback,
PathTrace, LinkTrace, LM/DM, etc. All based on Trill and 802.1g. Will add link
trace to the draft. Shows details in slides. Claims existing deployments.
Note: Link trace will be added in the updated draft.
Erik: Question - you said punt and forward. This assumes TRILL-like topology.
In NVO3 we do not have intermediate NVEs. This assumes a TRILL topology.
Deepak: This is based on a real deployment with a large customer. It is
optional. It is not for all topologies, true. You have to have hardware that
understands underlay OAM. Erik: The place where you think it will be used is
where the red boxes are routers running NVO3. You want to augment the OAM to
do more than the IP ping or traceroute. It would help to explain the picture.
Deepak: I will explain this prior to uploading the draft. It is useful for
hardware to look deeper in the draft. I will upload the draft and look for
comments. Benson: I am still confused. There is a question on how coupled the
underlay and overlay are. The requirements to tracepaths where intermediate
nodes may have an interaction. There are two methods: a) where there no
interactions with underlay and interactions Deepak: Think about the VTEP
termination happening on the gateway and and then going to ToR. The customers
are want end-to-end OAM. The only input the customer will give is the VMs on
the endpoint. The customer asked for the full path. The rest of the OAM occurs
in the forwarding. The packet is inject as comes from the VM. It gets injected
as if it comes from the VXLAN header. All the routers in the path must
understand the "o" bit will provide an OAM response. Benson: I understand what
you just said. For the WG, does this violate the overlay and underlay
separation? Erik: I'm not sure we are understanding this picture yet. We have
three things: a) decap, b) route, c) encap. When these red boxes are routers,
then the traceroute from VM to VM - these are regular routers. I can see having
the red routers participate in the OAM (if they are IP routers). If they are
not OAM aware in the underlay, this looks different. Benson: If the red boxes
are underlay, orange boxes are NVO3 encapsulation, and red are underlay. If red
boxes are underlay, the boxes participating in the OAM is different. Deepak: I
think of the red-boxes are in the overlay layer, then the responses are norm.
The customer wants the underlay boxes to optionally respond as OAM. I am
assuming that the MIP level is configured through-out the network - in both
Chandra: Are you describing the scenarios described in the EVPN drafts where
there are L3 Gateways and L2 NVEs devices that default route to the L3
Gateways? Deepak: Yes. Chandra: Even though there might be devices in the path
that are not NVEs, they are still functioning in the overlay and seem to be
relaying overlay packet passing through back to an overlay endpoint - right?
Deepak: The messages can be sent back to the controller. The request is to
give the exact path that goes overlay and underlay. Chandra I understand that
not all devices might be capable I am just trying to understand your model.
One possibility is that this model maps to the DCI function within EVPN. I need
you to explain the other cases that are not in the overlay, perhaps in your
draft. Deepak: I will create a new section based on 802.1AG using the
link-trace. A single packet gets sent in and at every point gets punted up to
the OAM function. Lucy Yong: I agree with Benson's point - that the overlay and
underlay are separate. I echo Erik point that if the red boxes are underlay
boxes, the boxes should not be available to OAM. If the red box are special
gateway (E.g. tunnel stitching), then these can be served by OAM - as this is
an overlay. If you feedback the path that links to the underlay path. If the
datacenter want to trace through gateway, this is possible. Deepak: I will
describe the situation in the draft from the customer requirement. Lucy: If you
are using this for tunnel stitches, this is reasonable. If it is another
draft, I do not see where this is valuable. Deepak: I will try to document both
cases i.e. the DC-GW scenario and the underlay scenario Erik: It would be
useful to show the 3 layers (overlay, underlay, stitching layer). It may be
that stitching occurs in the routed overlay, L2 devices, or specific stitching
devices. Having a picture will make this easy to talk about. Benson: We need
to move on, and we have
Diego Garcia del Rio (15 min)
Skipped as presenter and slides are missing.
Erik Nordmark (15 min)
Presenting this again to see if there are any additional requirements. Trace
route shows the path, even though the user can’t control the path. Overlay
networks might hide the underlay path, which is a useful policy, but should
separate policy from mechanism. This is easy to rectify, and details are
specified for VXLAN. Draws comparison with the Internet trace route case -
useful for a trouble ticket. Policy could be used to filter (or not) ICMPs.
Next steps: is there a requirement for mechanism to see underlay errors? Path
MTU mismatch between overlay and underlay. Is it useful to require a Uniform
TTL bit in NVO3 header?
Lucy: Is this a NVO3 requirement or is this a OAM requirement?
Erik: It depends what he scope of OAM is. If we see breakage in the underlay,
what is the impact in the overlay. For a practical perspective if the underlay
provides indicates of breakage. Lucy: This depends that you have a correlation
between overlay and underlay to do this OAM properly. Erik: If you have a
gateway stitching devices that gets OAM errors, should the gateway stitching
device send back the OAM indications to the gateway. Lucy: This is something to
research and decide on. Should we use OAM or something else? We need to look
at the pro/cons. Deepak Kumar: This is something customer is interested.
Benson: On the OAM reaching into the underlay, the NVA architecture with the
well-managed Data Center network there may have a controller that can usefully
reach into the network - this OAM has functionality. In the less controlled
network, managing the underlay nodes may be difficult if the underlay nodes are
invisible. Erik: (missed portion of the discussion). If am running some OAM
pathtrace that looks at the devices visible in the overlay, and then I get some
underlay information on error - this is one place to report the underlay
information. The second case is when you generically see underlay errors
reported in the response. Benson: This is a good response to my topic. My
question earlier was how much overlay/underlay do we use. Erik: I think we
should integrate what the underlay Lucy: We should not require underlay to
support things, but should leverage what the underlay supports. Chandra: There
is no additional support in the underlay but in fact this uses what the
underlay already supports. It is just bundling and connecting that information.
Lucy: The architecture makes the underlay invisible. I can see that the
underlay information may be valuable. Chandra: Yes, in theory the underlay
should be invisible but when there is an issue, most organization would like to
get some information about what happened when the customer had the problem.
What we observed is that being able to provide information that is already
generated when the endpoint issued the traceroute makes the entire process very
efficient. Lucy: I agree that this is a problem the operations must address. It
is not clear the right architecture to address this problem. Typically in this
overlay , it is separate from the L3VPN. I do not think we should tie the
underlay and overlay together except in the stitching point. We should
separate the tunnel trace (stitching) and the end-to-end path in the overlay.
The operator that trouble shoot the underlay path, it is a different thing.
Benson: Lucy are capturing the point we need to discuss. Perhaps you could
continue this discussion on the mail list. Lucy: I would be glad to do this.
Benson: Deepak's text regarding his customer use case would be useful. This
discussion is important context that prepares us to talk about the bits for the
solution. We should discuss the context and potential bits on the mail list.
All (10 min)
Benson: any final comments before we close the meeting today?
Benson: thank you.