Skip to main content

Minutes interim-2020-lsr-02: Wed 10:00
minutes-interim-2020-lsr-02-202004291000-01

Meeting Minutes Link State Routing (lsr) WG
Title Minutes interim-2020-lsr-02: Wed 10:00
State Active
Other versions plain text
Last updated 2020-05-07

minutes-interim-2020-lsr-02-202004291000-01
LSR Interim Agenda 
Chairs: Acee Lindem
        Chris Hopps
Secretary: Yingzhen Qu
 
WG Status Web Page: http://tools.ietf.org/wg/lsr/

Session #2
Date: Wednesday, 29 April 2020 
Start Time: 10:00 America/New_York

Bruno Decraene - IS-IS Flooding Speed Parameters Advertisement
https://datatracker.ietf.org/doc/draft-decraene-lsr-isis-flooding-speed/


Acee:     Can you disclose the implementations used in your tests. 
Bruno:    I don’t want to disclose the implementations. If we have two senders, 
          that’s something we can do but there are synchronization issues.
Chris H:  We can talk offline.
Acee:     That’s interesting. If you have somebody who has a naive assumption 
          that faster is better, they’re actually going to get three times 
          slower flooding than the optimal. 
Chris H:  Does TCP support dynamic receiver window resizing?
Bruno:    I don’t know. But from protocol standpoint, you can change but you’re 
          not supposed to reduce.
Chris H:  There is lots of information, flow control, congestion control and 
          modifications of the protocol. I have different ideas of flow control 
          and congestion control, not necessarily correct here. Really 
          congestion control is about loss, it can happen in the network and in 
          the router. I want to bring up that we have flow control of saying 
          stop sending, it’s more active. The point is we should explore the 
          idea, it might be a simple solution. The dynamic receive window 
          size is 2nd order, it’s almost modifying the congestion control 
          algorithm. It’s worth investigating, but seems complex.
Bruno:    It could be simple. It’s already available on some implementation. 
Chris H:  You’re talking about burst rate. It’s more like a buffer size. It 
          won’t help speed up.
Bruno:    You might send back to back.
Chris H:  That’s a bit different.
Tony Li:  We’ve been flexible in the draft. Not all knobs are useful, and this 
          allows flexible implementations. People can play with the knobs and 
          see what happens.

Les Ginsberg - IS-IS Flooding Scale Considerations
https://datatracker.ietf.org/doc/draft-ginsberg-lsr-isis-flooding-scale/


Chris H:  I’d be surprised to see that ISIS traffic is not separated from data 
          traffic on the line card, other than all control plane traffic.
Les:      So be shocked. I talked to some implementations, and in some cases, 
          this is the case. There are implementations with the separation, but 
          there are some without.
Chris H:  These might be barriers. It doesn’t mean we need to engineer to it. 
          It’s not hard to do, considering IS-IS is coming in as ISO, it’s 
          separated early generally. Let’s not throw out a solution yet. 
Tony P:   from practical implementation, the transmitter-based is very useful
          and works well, but the receiver needs high frequency which we don’t
	  have. You have to go through TCP, etc. If you start running faster
	  timers, you can’t get decent feedback. You can put it in, and it
	  may fly in the future.  
Chris H:  What do you say to my idea of having an ISIS queue in the line card?
          If it’s not draining, I send a pause. 
Tony P:   Now you need line card to generate ISIS packets. It’s pretty 
          challenging.  
Chris H:  Just add an ISIS queue, if it’s 80% full, send a pause.
Tony P:   But your line card starts to generate a packet and your
          implementation knows nothing about it. Or the line card needs to
	  report back. Those are not easy to build.
Les:      Your proposal is only taking one point from the chain. Even if we
          agree to do what you suggested, it doesn’t cover the case where
	  the PDU might be dropped.
Tony P:   For ISIS, everything is in LSP. It’s more difficult, not like OSPF.
Chris H:  I just don’t want to throw this out so quickly. It might be an
          add-on solution. I’m not convinced it’s too hard to put it in the
	  line card.
Tony P:   This is decent idea. Something to think about.
Chris H:  If we can flood without loss, why should we care whether one
          interface floods faster than the other?
Les:      For any event, left side of network can converge faster than the 
          right side.
Chris H:  We want to converge as fast as possible. Why should I wait for the 
          slowest router in the network?
Les:      If you look at your network, if there are one or two routers that 
          are slower then it’s a problem of your network. The analogy is
	  like some SPF is running at 5s interval while others at 15s.
Chris:    I want my network to converge as fast as possible. 
John Scudder:  Typically the answer to why you don’t want one part of your 
          network to converge faster is because you create forwarding loops.
	  I agree in general faster convergence is better, but it’s not
	  always.
Chris H:  We talked about good and bad information before. If your network is 
          broken, you want reconverge as fast as possible because it’s black
          holing. If you’re bringing up alternate routes, and you’re creating 
          loops then that good information becomes bad propagating at
	  different rate.  
John Scudder: Black holing traffic consumes one transmission on the link,
          micro-looping consumes many transmission on that link. 
Chris H:  That’s a good point.  
Bruno:    Agree with micro loops, you have LSDB inconsistency. 
Les:      We want to decide whether we want to do transmit-side or
          receive-side flow control. Let’s not get lost in other issues.
Bruno:    I think you’re mixing flow control and congestion control. If you 
          want to do flow control, it’s end to end. 
Tony Li:  Question to Les, have you done any experiments?
Les:      The data Bruno presented reinforces the point. We published an 
          algorithm, but you haven’t.
Tony Li:  We’re not trying to publish an algorithm, we’re trying to set up some 
          experimentations. How about we stop trying to decide without evidence. 
          Let’s do some coding and put the evidence on the table.
Les:      If we were try to implement the receiver-side algorithm what we have 
          to do. I need to do changes from the platform side.
Tony Li:  I’m asking you to implement your proposal.
Les:      That’s fine. We can do that.
Tony P:   The algorithm is a simplified version of what we run in RIFT, the 
          receiver-side also provides some sort of back pressure, something 
          along the line.
Tony Li:  Can we see some numbers?
Tony P:   What numbers? It’s platform dependent.
Tony Li:  I’d like to see two implementations, flooding 10K LSPs.
Tony P:   So give me the other one.
Tony Li:  I agree. Let’s get some numbers.
Peter:    It looks to me both drafts are trying to do flow control, and it 
          needs to be done from the sending side. The only difference is where
	  to get the parameters, buffer size, number of LSPs that can be
	  flooded, etc. For the rest it looks to me the same thing.
Tony Li:  I agree. 
Peter:    If we’re going to use static value, I can configure it on the sender 
          side as well. If we’re to tie the value to platform dependent receiver
          and send it in Hello packets, it’s problematic. 
Chris H:  As WG member, why don’t we talk about more why a solution works 
          better? Instead of why the other one not working?
Les:      It’s a mis-statement. I’m not saying it’s bad, but it's difficult
          to implement.
Tony Li:  If we can get to 99% with transmit-side, there is no reason not.
Les:      I agree.
Peter:    The results will be the same with static value, whether it’s from the 
          sender or receiver.
Chris H:  Why don’t we get some numbers from the transmitter-side solution?
Peter:    What Bruno presented already proved that. We can debate what’s the 
          best algorithm from the sender side to get the best result. 
Chris H:  We can do that. 
Tony Li:  If we have to configure static numbers everywhere, the users won’t be 
          happy.
Peter:    We don’t want that, we want it to behave the same.
Tony Li:  You have interoperability issue, different implementations, different 
          router capabilities.
Peter:    For less capable router, I will slow down immediately. 
Tony Li:  You don’t know how to slow down, you don’t have feedback.
Peter:    I have unacked LSPs.
Tony Li:  That means some implementations have to speed up, that means we’re 
          changing the protocol, and we have to agree on that.
Les:      We’re not changing the protocol here. 
Tony Li:  You are, you’re changing behavior.
Les:      We’re changing behavior, but not the protocol. If the algorithm is 
          sound, and the receiver is capable of processing the packets, there is 
          no interoperability issue.
Chris H:  Whether we’ve changing the behavior, or the protocol on the wire, 
          it’s not the point. We’re trying to solve a problem. 
Les:      The concern we raised about dynamic algorithm, not static. If we
          agree that it doesn’t require significant changes to platform and 
          protocol, that’s a plus.
Chris H:  that’s a selling point.
Ketant Talaulikar: In Bruno’s presentation, all those numbers from
          receiver-based 
          implementation? 
Acee:     No.
Ketant:   If so, how can it say one algorithm is better than the other? 
Tony Li:  That serves as baseline to show default parameters is not good. 
Ketant:   So we need proofs from both proposals.
Tony Li:  Agree.
Chris H:  I don’t think these two drafts are in conflict. I can do both. No 
          reason to pick one against the other.
Ketant:   I just got the impression that we have numbers from one approach. To 
          be fair, we need number from both. From simplicity point of view, the 
          2nd proposals seems easy to roll out and gets deployed in mixed 
          environment. While the other proposal needs change on the wire.
Chris H:  if it works.
Peter:    if the 2nd one doesn’t work, the first doesn’t work either.
Chris H:  Let’s get some numbers. Not changing something on the wire is 
          preferable. 
Les:      the point we’re making is in order to make the receiver based 
          algorithm work, it requires changes on the platform. 
Tony Li:  If that hard work is necessary then we should do it. No question 
          about that. 
Acee:     speaking as chair, we will need further implementation and provide 
          data. Possible another meeting on this. 

Comments from Chat: 
From Henk to Everyone:    8:51  AM
        The problem is that a "sender-only" algorithm isn't really send-only.
	The sender makes guesses, based on what it sees. Part of what is sees are
	the PSNPs. And the behaviour of the receiver when and how to send the PSNPs
	is not 100% the same in all implementations. If you want to a "sender-only"
	algorithm, you need to specify PSNPs in more detail. How long to wait, what
	intervals, how many acks to pack in one PSNP, etc.
	
from Les Ginsberg (Cisco) to Everyone:    8:53  AM
        Henk - I do not agree. Tx based works on what was actually sent and what
	was actually acknowledged. We do not care whether an LSP was dropped at
	ingress or in some queue on it's way to IS-IS.
	
from henk to Everyone:    8:54  AM
         Not only what was acknowledged, also the speed at which the acks come in.
	 And how acks are grouped, and when they are sent, are implementation
	 specific. I'll send an email to the list later today or tomorrow.
	 
from Bruno Decraene (Guest) to Everyone:    8:54  AM
         +1 thanks Henk

Peter:    Read the comments, I agree. It’s not completely from the transmitter 
          because we’re based on the acks. This is just terminology.
Chris H:  That’s a great point. The proposal only covers p2p, and it need to 
          cover LAN. If it proves to be working, then we can look at the ACK 
          mechanism. 

Xuesong Geng: If we want to do the tests, how do we do the tests? 
Tony Li:  Bruno showed a straightforward test, taking 5000 LSPs and send from 
          point to point. I believe he’s using TCP dump and looking at it from 
          the wire and observing retransmissions and  whether the transmitter
	  has sync'ed the entire LSDB. 
Xuesong:  So we can see the retransmission times and judge which one is better?
Chris H:  You can also look at the sequence number.
Tony Li:  You can also watch PSNPs coming from the other router.
Xuesong:  I just want to figure out what parameters to capture to determine
          which is better?  What numbers to look at?
Chris H:  you can send an email on the list and ask.
Les:      I’m not against testing, but it’s not going to be so easy and compare 
          the results. There are many variables. 
Chris H:  You’re right, we have to believe the results. 
Xuesong:  Agree with Les, the answers might not be straightforward. In TCP 
          congestion control, we can only say one is more agressive than the 
          other.
Chris H:  It’s a deterministic result, I send one LSP from one side, and when
          I receive it on the other side. 

Acee:     Let’s keep this discussion going. 
Chris H:  I don’t see a conflict here. We don’t have to pick one.
 

Discussion from the chat window:
From Tony P to Everyone:    8:56  AM
        The only thing that matters is really, do you get ACKs in a timely
	manner ... If you RTX it means _something_ went wrong, PSNP didn't get
	produced, got lost of the wire etc. all the same to the transmitter,
	it means
From Tony P to Everyone:    8:56  AM
        Basicaly, back-off
From Tony PWorking Group to Everyone:    8:57  AM
        That's why the RX trying to do something is hairy. Now what? we'll
	start to send empty PSNPs to indicate that we can process more now?
	Every 200msecs like TCP?
From Peter Psenak to Everyone:    8:58  AM
        Who is "BIER Working Group"?
from John Scudder to Everyone:    9:00  AM
        It would be nice if somebody had a graduate student group who wanted
	to drop both options into their simulator, instead of two groups of
	partisans running a beauty contest.
From Ketan Talaulikar to Everyone:    9:00  AM
        +1 to John 
From Tony Li to Everyone:    9:00  AM
        I don’t trust simulators.
From John Scudder to Everyone:    9:00  AM
        Somebody could get a nice publication out of it, I’m totally serious.
from Tony Li to Everyone:    9:01  AM
        But implementations, yes.
From Bruno Decraene (Guest) to Everyone:    9:01  AM
       +1 to John
From John Scudder to Everyone:    9:01  AM
      Tony Li, fair enough. It does provide an apples-to-apples comparison,
      though.
From Fohn Scudder to Everyone:    9:01  AM
      Also, what’s the difference between a simulator and someone’s
      virtualized control plane software?
From Bruno Decraene (Guest) to Everyone:    9:01  AM
      Does anyone know such a team?
From Les Ginsberg (Cisco) to Everyone:    9:01  AM
      IS-IS on the RX side is not guaranteed to know what has been
      dropped - so you cannot depend on IS-IS Rx side to know when to
      send "sned me more/faster".
From Tony Li to Everyone:    9:02  AM
      John, no diff.
From John Scudder to Everyone:    9:02  AM
      But yet we field those things.
From Tony P to Everyone:    9:02  AM
      "If" you assume that you can get the RX size on the reciever side
      and assume you can send very fast to the transmitter that information,
      the RX will perform better. But you knnow what assumptions made of
      you and me ;-)
From John Scudder to Everyone:    9:03  AM
      Bruno, Cisco used to fund research groups, I don’t know if they still
      do, but it might be that people there have access.