Thanks Yuehua Wei (ZTE) for taking the minutes.

Video recording: https://www.youtube.com/watch?v=dOclSQ9wJu0
 
------------------------------------------------------------

- Chairs remarks - Jeff & Jeffrey, 5 minutes

- Updates from Jordan Head, 20 minutes
  - Base spec: https://datatracker.ietf.org/doc/draft-ietf-rift-rift/
Jim[AD]：  hey Jordan, thanks for that and thanks  for taking care of all of that for me  so um very much appreciated the the only  comment I had was the the TTL thing you're going to put some text in the applicability document on that right? is that something that you're doing? because I kind of want to try and get that moved at the same time 
so if we can kill two birds with one stone that would be great.

Jordan: yeah I've driving the discussion a bit, gotten some discussion but I think we're at the point where I'll just write some text and propose it. you  know ask for forgiveness rather than  permission so to speak just to make sure  we're covered. and then there were a couple of other points in the applicability side but nothing that relates to something that's  normative like this. 

Jim：so yeah so that one's pretty much done as well right? 

Jordan: minor stuff , okay perfect, thanks
  
  - KV registry: https://datatracker.ietf.org/doc/draft-ietf-rift-kv-registry/
Jeffrey[Chair]: I have a question, so I had thought this document is about the registry itself, it seems  that now we have added this mechanism for the key targets and then it's handling all those things, so should the draft be renamed and then title changed?

Jordan: that's probably not an awful idea , I think Tony's gonna mention but we've had other mechanisms  described there as well besides key  targets like the southbound tie breaking  and so forth  but go ahead...

Toni: well the only thing that's Changing the rift document is that: first we thought that we basically have the key which was  always outside and then the content was just a blob and we wanted to throw the target into this blob but we decided that we basically split it into  the key and the Target and then the blob which is the really the value,right? so that schema change, so we have to register this code point for the content for the key value TIE, but in  terms of what this thing does, the rift spec doesn't say anything. that's all.  Still farmed out to the key value spec.

Jeffrey[Chair]: right, so the KV registry spec is really not only about registry but also about the  behavior

Toni：right so it also specified the behavior of this  field so since we defined the tiebreaking  of the key value store on the rift spec, You could argue that we should  put the target text also into the Rifts spec.

Jeffrey[Chair]: no, that's not my point. this KV registry document defines also the target behavior, so I think that the document  should be renamed.

Toni: fair enough

Jordan: yes I'll take that for next time, Jeffrey that's fine. since it's just a title change it's  not a big deal. 

Sandy Zhang:  in some  scenarios may I understand the key Target is the routing Target in  MPBGP? if it can be used like this?

Toni:  yes no maybe. so people are  probably confused because I'm not sure  everybody knows how a bloom filter works. so the idea is fairly simple, you take something fairly big and then you  generate multiple hash functions of  something and you get let's say three hash functions, three bits, and you flip those three bits on, and that way you can put 100,000 targets into 64 bit okay? of  course you will get false positives but you don't get false negatives. so you may address more people than you intend, but you have a very small filter .
 
all right, whereas the route Target in BGP is like you have by policy you have a perfect match, here you don't have a perfect match.   you have something with statistically works actually incredibly well, all right, but the delivers false positives and you  have to deal with that. it's a well-known thing, research papers blah blah blah often used technique. but the equivalents of the route Target breaks down here because it's not the perfect match, right? 
  
- Update on the interop testing in Hackathon, Tony P., 10 minutes

-rift meets dragonfly  https://datatracker.ietf.org/doc/draft-przygienda-rift-dragonfly/, Tony P., 25 minutes

Jeffrey[Chair]: Tony yeah do you want to take the  questions now or later  

Rod Van Meter K University：um it's a clarification about this  diagram

Toni: sure, yeah I know it's tons of information  we get worse 

[Rod Van Meter K University]: hi this is  Rod Van Meter K University so um  this diagram you said eight links?

[Toni]: eight edges. there's no node in the middle, it's an octogon. so it's a regular thing that the dragonfly original was just like full  mesh and these little wings which are all full meshes. and if you have four of them and you align it correctly,it looks like a dragonfly. think about this, those are the whatever routers, those are the two  planes how they connected and yes I should have probably made big blocks but I was too lazy, and there would be two more of these on the top on the left or well think behind. it doesn't matter because those are clos planes, so those are clos fabrics on the dragonfly+ as far as I could figure out.  because there's really no clean paper  that will explain to you in research terms what the hell it is. so that was as much as I could reconstruct from all kind of ideas flying around.

Linda: I'm a little confused by this picture too, so you're saying the red nodes

Toni:  there are no red nodes it's just a red  plane.
Linda: red plane 
Toni: yeah you can talk so. those would be the red  nodes, 
Linda: but you have a box connecting  the green and red, does that means it's  just one node?

Toni: be an important concept  right so you can see it in different  ways. you can see there are two completely  disconnected planes, you can see there's half a full mesh or you can see there's two planes that you can somehow connect together, you keep those links.
Linda: so I see the red plane is only one hop away from each node. why do  you say there's two hops?

Toni: what do we have in the middle? this cross?  nothing nothing it's just random if you start to draw things like that, the things intersect. so yeah my bad. only those things are nodes it's an octagon 

Linda: okay so the the middle one is not really the connection. there's nothing there. 

Toni: sorry, there is six red links in fact . there is no node in the middle there's nothing there. so you've got shortest path and  non shortest path. 
Linda:  so you have one hop and you have some two hop paths.

Toni: 2 hops and in one hop, yes. sorry that's implicit. okay cool.

Jeff[Chair]: so I'll bring a couple of  points and we can start discussion.
1) why is this important? people been trying  dragonfly like topologies in data center to save on interfaces practically the complexity doesn't justify the  deployment where it becomes really interesting is when you use inter links to interconnect data center as Toni said, and there's very important point.  today it's pretty much impossible to get data center over 50 mega there's just no power to cool it and power it. so a lot  of people started building data centers in Us in pairs of 50 60 meab data  centers. this naturally lays down to this  kind of topology. so I've got twice 50 mega Data Center and then within data center you run your whatever you like right most probably MPBGP. so that's  number one. this is why this is so important
2) number two: this provides you Loop free routing. it doesn't explain how  to get traffic on the links. but  practically the cheapest way is to go on the shortest link. it also gives you low latency. you want to be able to use longer links. but you need to understand that, again looking at the target, this is really machine learning cluster. in Collective operations you cannot afford to have part of collective operations having different latency, because it's all about job completion time. so you need to make sure that  whatever your GPU is running, it is following same  path. how do you get traffic in case congestion on another link? again another problem to solve, not here. but practically you need to know when to switch from shortest path to non-shortest path and it's not in routing  protocol at least as of now. Adaptive routing applicability here again if you try to do more granular load balance and just per flow, you end up in a case where some of your packets go on the shortest link some don't.  performance goes to 3%. so it's really important to understand from applicability perspective how to deploy it, how to signal potential congestion, or  bandwidth available, or failure on the inter fabric links. and all of this will need to be worked out at least some of it. so  this is where I think we  should start discussion.

Toni: yeah but you know the nice thing if you start to look also this, you know direct path and this one alternative hop. this is an incredibly resilient structure. I mean  you have to kill tons of connectivity before this thing literally starts to become unreachable. when I was looking at the stuff and you know if you  build like three planes clos and then this thing in between I mean you have to nuke it before the stuff  starts to actually not have any path toget anywhere. because I kind of hated  dragonfly. I thought it was too dense and  nobody could figure out the routing. now I start to like them. of course because I think I figured  them out. yeah all right so I think that's it.  

Jeff[chair]: one more comment. there's a draft that crossing routing working group that focus actually on BGP in dragonfly plus. if you want to get kind of terms you're more familiar with VRFs, BGP policies, it explains how this can be done with BGP policies where rather understanding whether link comes from  the fabric or from the interlink. you just use different VRFs and you can't really IS-IS path to figure out where you are. so it will help you to better understand applicability of regular routing protocol to this. 

Dima： thanks Tony it's really impressive what you did with Rift. I just want to comment about computation scalability problem because essentially if we are trying to use silicon to the maximum, then number of groups probably will be half the radixs of top of fabric switches  plus one. because we have half interfaces going south half interfaces going north and they're going to other groups and plus one is our local groups.  so it could be 3365 for current generation of  silicon and something like that.  but I think there's no need to do full computations for every group because the reason to do full computation if you're going to go through intermediate group and going to reflect why are leaves in that group.

Toni: no. the only reason to do the full computations I mentioned if you really want the negative disaggregation positive  disaggregation tackle the cases  where you have to start in the fabric on  the direct plane because you can only  get on this plane to the other fabric to  the leaf. so it's in the cases  where other fabric breaks and it forces  all the way to your fabric to  disaggregate. you see my point. right  I mean those are like I don't even know how many links I have to break and  how.  only reason to run the computations

Dima: yeah my point is that probably it's possible to do less computations than like full commutation for every member fabric.

Toni: yeah that's what I wrote.  right I said like leave out the inter fabric link when you do positive and  negative desegregation it's good enough,  most likely.

Dima: yeah because negative desegregation is needed if you cannot go through particular top of the fabric switch.

Toni: yeah totally right.

Dima：and I wanted to second what Jeff said that there is power scalability problem for how much  you can get for one particular data  center but this topology looks like  a good feed for data center campuses.  or any aggregation of data centers which are not too far from each other,  and you want to have more or less uniform connectivity and a lot of bandwidth  between them because it scales better than trying to add yet another level to the clos. 

Toni：right, so if anybody feels like a little bit mental exercises, especially the profs here, now imagine run this thing on a optical ring, counter rotating,  what will happen if the ring get cut in  one place. What will this  topology look like and what will happen.  because this next layer of problem in  network, right? because you run this whole thing on lambdas over a ring.

Dima： yeah so that's it for  me. thanks.

Jeff[chair]: next  question 

Jingyou: I'm from Fiberhome. I'm not have  any comments just a minor suggestions.  because those figures seems handsome but is little difficult for me to understand  so I suggest maybe we can add some formulas to give some examples or use cases. 

Toni: I used to be in Academia. I don't do formulas  anymore I could write it beautifully in three formulas you know and I could talk to you about Banyan trees and Banyan tree formulas. nobody would grog anything anything whatsoever you know. I reserved it for  the journal paper

Linda: from futurewei. so I'm just curious like you have multiple planes and you have each plane has their own topology. can you use IS-IS different areas to solve the problem? the plane can be the area two and...... 

Toni: look, you could run in  the core IS-IS. right? I mean we wouldn't have extended Rift, you could only shortest path one hop. so you don't get bisectional bandwidth with IS-IS unless you hack IS-IS  to the point there's not IS-IS anymore. so...

Linda: but you can use some kind of policy on the side so that you can...

Toni: IS-IS doesn't have policies
Jeff[chair]: that's why we use  BGP 

Linda: how about use BGP, there is a draft ......we have a draft on that like  basically have some kind of metrics  to influence the path selection so  instead of choosing the shortest path, we add some other weight so that with  that other weight added in maybe the longer path will be chosen.

Toni: so my comment will be you know once your policy grow complicated long enough you  may start to carry packets by hand that may be more efficient.

Linda: yeah of course. but here we're  talking about um multiple path and  shortest paths may not be the best path  and how do we balance 
 
Toni: so dimma has a draft where he has shown basically with a lot of VPN how you can solve that stuff because you know that the horizon  idea is actually dima's idea not mine. because I was standing in front of like  sucking my teeth, right? how the hell do you know shortest path properly here? it  was dima's idea that we can actually  build a horizon. because he built the  horizon using VPN in BGP because that's how you use them.  they basically reflect the horizon. that's the BGP  mechanism

Linda: okay so if we have some ideas  on how do we do this?

Jeff[chair]: oh we know exactly  how to do it with BGP. that was presented in last working group Rift.

Toni: yeah we talked about the BGP stuff. modulo a little details like where are the couple of hundred lines of BGP policy and how you stitch that stuff properly. so it doesn't break. plus of course the BGP  will stitch with the VPNs and you have to start to think okay where are your tunnels? what will happen if this thing where because no the tunnels start to develop their own logic, right? how to go from one place to another and you have to control them  that you go the path that you want. yeah  but it's all doable. like I say.  ultimately you can get you know enough people to carry packets by hand and you know beating them enough you will get what you want. 
Jeff[chair]: there's another level of  complications when you start doing overlay which is mandatory if you do  multitenancy, right? so if you do it on the switch, think about VxLAN and VPN which is common way to do today, you're going to build structure that is underlay VPN, another VPN that is tenant alright? it becomes really complex from management.
Linda: it may not be VPN per se but anyway I'll  just throw some ideas here.

Toni: yeah yeah yeah no  it's  solvable I mean this is trying  to solve it in a very ztp way with a very cheap forwarding plane,  that was always Rift, right？ 

Sandy Zhang: zte. I'd like to make sure if I understand right. how do the TOF nodes know if the flow is intra fabric or inter fabric?

Toni:  that's a very justified question that's where Rift solves the problem and that's where you BGP will have a hard time, right ? we know the direction of the fabric. so we know who is south and who is north. and now  we can differentiate whether it's inter  Fabric or whether it's a horizontal link. so the inter link the adjacency will  clearly tell you which Horizon is on. 

Sandy Zhang: yes  I think the FIB the forwarding table in  the TOF will show that if the  route is  from inter Fabric or intra fabric. so when the TOF receive the flow they will know how to forward this.

Toni:  correctly. which FIB to  throw to. precisely. and and thanks to ZTE  because we spend a lot of time in hackathon. to start to ask questions because I  only draw a very simple like you know a  three thingy and a four thingy and they go like yeah five and I wasn't sure. so I had to draw the figure actually you know to figure out this presentation. because I oversimplified with three, everything works.  it's kind of trial  but this is exactly how it worked. the incoming interface will tell you which FIB to go to. and I was  slightly skeptical with demand and I looked and yes all even the cheapest  silicon can do these days. that because it's actually very common problem if you run any kind of VRF. you have to  know this is a VRF link so it's a completely different FIB. otherwise it  won't work.

Sandy Zhang: so I think maybe some flag  may be added in the forwarding table for  distinguish it maybe?

Toni:  how you solve it over specification.  this thing tells you . look this is the computation that you used to build this FIB...

Jeff[chair]:  so in  BGP is configurational logic. you have  two different virtual routers to treat  Fabric and Inter fabric routes. here based on the fabric ID you see it's you or it's not you. so it's built into protocol you don't need additional  management task to identify particular  interface.

Toni:  okay so you know please look over the  stuff. maybe you find the  whole thing is just made up. I don't know.   I'm pretty confident you  know. that this stuff holds up. but who knows. it's never seen that done before. I never saw any kind of dynamic routing for dragonflies actually. where anybody explained how the hell it's supposed to work. all this fancy stuff like  dragonfly, hyper cubes, or thoroidal meshes we used in supercomputers where links never fail.  so it's like simple. Dynamic routing is overvalued and  this is you know first time I see something cooked up except Dima's stuff which is basically you know stitching BGP magic. so it's not really  routing it's like arm handling packets  the right way by a lot of policy magic. which is fine. lot of  people seems to consider a pretty good job security these days. 

Jeff[chair]:  okay thanks Toni, great presentation and again we've been trying to solve non-shortest path routing Academia for probably since  routing exists that's very good example  how it can be done with right protocol  simply and  elegant.  so and uh we are exactly on time so just 30 minutes from now we are going to have  a AIDC side meeting which is going to talk about in more details workloads this kind of topologies dedicated. it's a  really machine learning  application.   we can figure out how to  record it hopefully. somewhere in  the cloud. yes thanks everyone and we'll see you in Australia.