Minutes interim-2019-lsr-01: Thu 07:00
Link State Routing
||Minutes interim-2019-lsr-01: Thu 07:00
LSR WG Interim Meeting
May 30th, 2019
Chair: Please be aware of IPR and Notewell.
Starting from Tony Li's presentation.
Tony: Arista has filed an IPR claim on dynamic flooding.
Tony: FT bit from Huaimo's draft is added and it's straightforward. There's
a minor question I have here for the chairs, about how to deal with
IANA. I'm not seeing a registry for the link attributes bit.
Les: You're talking about the ISIS link attribute bits? There is a
registry and it's referenced in the draft.
Tony: Okay, my apology.
Acee: Did you say you only wait for 10 ms? So, actually when you receive a
new flooding topology, you only flood it on the new and old for a
certain amount of time.
Tony: Yes. Let me be clear that it's 10 milliseconds between adding
Huaimo: If there are hundreds of links, are you going to do temporary
flooding on those links?
Tony: If we have hundreds of links that somehow have fallen off the
flooding topology and where we have disconnected nodes. Yes, we're
going to end up adding all hundred links. That's the truly bizarre
occurrence because that should not happen unless those nodes
are truly disconnected.
Acee: Slide 5 please. I read the draft yesterday, this level of detail is
not in the draft.
Huaimo: This algorithm is not in the draft?
Tony: Yes, not in the draft.
Acee: Hopefully it will converge unless there are some other problems in
the IGP domain. We don't even need to standardize it, do we?
Tony: We don't because as far as I can tell, there's no negotiation going on.
Acee: It’s a local repair. It's good to have it for information but this
could be modified based on experimentation.
Tony: We’ve worked on it. Any other questions?
* FT bit discussion
Acee: Let's do the discussion now.
Tony: There is another reason why we don’t want to do it, consider the case
where you've got parallel links between A and B. They may be flooding
to each other on opposite links. If you use this mechanism, you're
going to warn about an error, and there is no error.
Huaimo: In this case we can send a warning, then people or tools can analysis
Tony: This generates lots of noise. Parallel links are extremely common
these days, especially in data center topologies.
Les: I did respond to your email on this on the list with some reasons
why this was not a good idea. When you get a chance, you know, please
reply to that. Second, Tony, I agree with you. The way the bit is
defined in the draft and I would hope the way that this would be used,
even if we agreed to do what Huaimo suggesting is that it would be edge
oriented. In other words, you have to advertise the bit on all the
parallel links. But how you evaluate the bit depends upon whether the
edges are in the flooding topology or not. I think that's the only
way it could work reasonably.
Huaimo: In cases there are no parallel links, should we do something?
Tony: No need. If somebody wants to build a tool to look for this they can.
You're advertising the information.
Peter: It’s up to implementation, but no need to standardize anything.
Huaimo: So even if there might be problems, we're not going to take action to
have a temporary fix?
Tony: If there are real issues in the flooding topology, then partition
repair would have acted to actually repair the flooding topology.
This adds another level of worry about things that we already have
a mechanism for.
*FN bit discussion
Les: This issue has nothing to do with dynamic flooding. If the WG decides
to take it on, it should be in a separate draft. Having said that, I'm
not encouraging you to, for example, issue another draft to propose
this because I don't think this is a good idea. As you've mentioned,
this problem been discussed on the list and has already been
addressed by implementation, without any protocol extension. I think
there's some very significant issues associated with your solution.
You're changing the state machine. You're trying to set up a
negotiation based on partial information. I think there's a lot of
problems with this solution, but I really would like to divorce it
from the dynamic flooding discussion.
Huaimo: I think this is in the scope of flood reduction. When there are
thousands of links, we only need to flood over one or two links.
Peter: There are other issues for this problem, not just flooding. I agree
with Les, I can confirm there are implementations that solve this
problem. And there are other problems associated with bring-up of
the large number of adjacencies. It is completely unrelated to
Huaimo: We’re talking about bringing 1000s of adjacencies up.
Peter: If you solve this problem, then you wouldn't have that problem.
Tony: If you solve it as a generic problem, then you bring up links, a
few at a time. And as you do that, partition repair within the
dynamic flooding domain would add some of those links to the
flooding topology temporarily, creating flooding. Then you don’t
have a problem.
Huaimo: This problem is from you.
Tony: That was a better way to fix it.
Robin: I’m confused. Tony mentioned that it was included in the draft?
Tony: We haven’t included FN bit, just FT bit.
Acee: The scenario is for thousands of links?
Robin: I agree that this can be solved by implementation, not protocol
* Transition between Flooding Reduction and Normal Flooding
Les: The draft has always had the ability to quickly and easily, transition
between enabling and disabling dynamic flooding. I think what we
didn't have in the draft was a very clear explanation of how this is
done. We added some language in the latest version v2. I'd encourage
you to review that. But the draft already has the necessary mechanisms
so I don't think any of this is necessary.
Huaimo: In the draft, you mentioned centralized mode. After the flooding
topology is flushed, every node transitions to normal flooding.
Les: Apologies for interrupting you but that's actually a point on your
slides, which is incorrect. The flooding topology is carried in LSPs
or LSAs, and if those LSPs or LSAs get updated, then everybody's link
state database gets updated in a relatively short period of time.
We're talking a matter of seconds; the flooding topology is then gone
or updated. There is no separate aging of the flooding topology
independent of the link states database. So point five is just not
Huaimo: So the flooding topology is advertised by the leaders, so the leader
needs to flush the LSAs when switching back to normal flooding.
Les: That’s what the leader will do. The leader will withdraw the
advertising whether that's purging and LSA, or in the case of ISIS,
it's simply removing the appropriate TLVs from the LSP in which
they were advertised, and they're gone.
Huaimo: So the leader will flush the flooding topology, and this is not wrong.
Les: The leader will update the flooding typology as necessary. Again
this is described in the updated language in the draft. I'm just,
again to repeat, that none of this is necessary. The existing TLV or
sub-TLVs that we have are sufficient. And we try to make the
procedures a bit clear in the latest version of the draft. If the
language needs to be improved and certainly we're open to improving
the language, but none of this is needed.
Huaimo: For the centralized mode, the draft is ok. For distributed mode,
because we don’t have a way to tell each node to transfer to normal
Les: Actually we do. it’s in version 2.
Huaimo: How did you do that? Because you need to transform to centralized
mode. And then you withdraw the flood topology.
Les: If I'm a leader, and I want to operate in distributed mode, but I want
to disable the optimized dynamic flooding, at this point
I simply advertise the algorithm 0 and I don't advertise any
Huaimo: You only have two states. Central or distributed? You can use 0 for
Les: Let’s take it offline. Again, I encourage you to read the updated
draft. It does explain how this is done.
Huaimo: We need to fix it. Robert also mentioned it in the list.
Les: I will try to clarify on the list, but we do have it.
Robin: I think both parties agree this needs to be fixed, and Les agreed to
clarify on the list. From my experience, we may have to do protocol
extension. Considering time, my suggestion it’s better to clarify
it in the list.
Acee: Les, please put it up again in the list. There are different ways to
do it. any algorithm number devoted to normal flooding?
Tony: We didn’t think it’s needed because disable dynamic flooding is
Acee: We may want to disable it. So we need separate algorithm for central
and distributed, then 0 for disable. Either way works.
Tony: We have it covered.
Acee: Let's move on to the next one.
* Area Leader Sub-TLV
Tony: I'm sorry you're having trouble understanding it. But the point of
the area leader sub-TLVs are very clear. This was to carry the
priority, and also to carry the algorithm for distributed mode. We
should point out that the dynamic flooding sub-TLV was, as intended,
so that nodes can indicate that they are capable of operating with
dynamic flooding. And also we carry around the potential algorithms
for distributed mode. All nodes in the flooding topology, assuming
the nodes support ing, are advertising the dynamic flooding sub-TLV.
Huaimo: This is also inline with broadcast network.
Tony: That is exactly what we're modeling this after. Yes, every area leader
candidate needs to advertise a priority. Some nodes, if they are short
on RAM or short on CPU, may choose not to be the area leader. They
would not advertise an area leader priority. It's important that they
be able to do that. The right way to do that is to not advertise the
area leader sub-TLV.
Huaimo: You mean you have a way to advertise priority without area leader
Tony: The priority is in the area leader sub-TLV. That is where it belongs.
That is how a node indicates that it is willing to become area leader.
The priority belongs in that sub-TLV. If the node chooses the area
leader it has to advertise priority.
Huaimo: That means every node will send leader sub TLV.
Tony: Selecting a distributed mode algorithm, and having them all not be
elected except one is not a problem. That is the whole point.
Huaimo: Either way can work, right?
Robin: Maybe for simplicity, maybe we should introduce the enhancement now.
Les: This has been discussed on the list, you may want to have multiple
area leader advertisements, So that if the current area leader fails,
you don't have to go through a reconvergence cycle in order to elect
a new area leader and get the flooding topology from the new area
leader in centralized mode. So the idea is that we only want one area
leader sub-TLV advertised leaves us very vulnerable.
Huaimo: The leader will be elected even though we have multiple area sub-TLV
in the system. When the leader dies, a new leader will be elected.
Les: That presents a significant convergence problem. And again, if you
look at the latest version of the draft we've, tried to clarify how
that can be avoided. But it requires that there are multiple area
leader sub-TLVs that are always advertised. So apart from the fact
that architecturally we have concerns about what you're proposing,
operationally, it leaves us very vulnerable to a single point of
failure, which we definitely don't want.
Huaimo: Like DR operation in ISIS, we don't have any problem. Leader is
elected dynamically. Let’s take it offline.
* Enhancements on FT Encoding
Tony: In this example, have you considered the fact that the link
between RN11 and RN31 is also part of the flooding topology?
Huaimo: Good question. In this case, RN11 is local node, the link between
RN11 and RN31 is included in the topology.
Tony: I'm still now clear. It seems to me that you added an index for
Huaimo: No index for link, we only have nodes. The link is represented by
local node and remote node.
Tony: If I understand how to encode things here, let's suppose that the
links are RN2 to RN31, and RN11 and RN31 are both part of the
flooding topology. This is the way that I see you encoding this. You
have RN1 as an adjacent node, it's going to mark it as external, and
it's going to list RN31 as part of its adjacent nodes. Similarly
RN11 is going to have RN31 as one of its adjacent notes.
Huaimo: Everything here is implied by the node, local node and the remote node.
Tony: This implies to me that if you are bi-connected, that the index for
the node has to appear twice in the block encoding.
Tony: If you don't do that, then how are you indicating which links to use?
Huaimo: Yes, there are some duplications in some cases.
Acee: It seems this encoding will be more as nodes will be listed as both
local nodes and remote nodes. What’s the advantage?
Huaimo: Slides 6-4. more efficient.
Donald: There's a problem with the nodes having to be listed multiple times
because the links are all implicitly bi-directional.
Tony: It's a space efficiency issue.
Donald: I think this is more compact.
Tony: I disagree.
Acee: You still use the indices?
Acee: How can this be more compact as listing multiple times? Didn't think
much about ISIS, but in OSPF you could break it up into multiple LSAS
and would only need to flood LSAs that change. The other thing,
let's get this into perspective compared to an IS-IS LSP or a
router LSA, where every node in the domain floods this, so it makes
a bigger difference. As far as the flooding topology, it's only the
area leader that's flooding it, so there's only one instance of it.
So it's not a matter of compactness unless there's order of magnitude
difference. I don't see that it's the most important, correctness is
the most important consideration. I don't see that for something that
for which there's only one instance. Let me ask this, do backup area
leaders compute and flooded the FT so it's ready to use right away?
Les: That’s what we recommended in the latest version. Because in the event
that the area leader fails, this allows you to transition to the new
area leader much more quickly.
Acee: That’s what we considered for the Network-LSA in OSPF. Ended up with
only the DR flooding the Network-LSA.
Huaimo: This is more efficient because of blocks.
Acee: I don’t see why. This is actually more. The total size is more.
Les: Acee, I’d like to reinforce your point. I think the primary concern
here is correctness. And because there's only going to be a small
number of copies of the flooding typology. What we've recommended in
the draft is see the area leader to advertise it and the second best
candidate advertises it. Even if the final conclusion is that this
encoding saves some number of bytes, the total value add to this
when you look at the full size of link state database is very
modest. So to me, correctness is the dominant concern here.
Huaimo: The correctness is equal, also the complexity.
Aijun: Based on the block information, we can easily recover the flooding
topology, but not with path info.
Tony: Paths are links in the topology.
Aijun: Block encoding is more structured.
Robin: if we don’t use this enhancement, is there any critical issue?
Huaimo: No critical issue. This is for improvement. it’s to reduce flooding.
Tony: It's true that we're trying to be reasonably space-efficient. But as
we've said many times, we are trying not to make things so complicated
that things become fragile. If we were really trying to ultimately
make everything efficient, we could actually use compression
algorithms and run them on top of our LSPs before we flood them. Set
aside the patent issues, there's a question, has everybody got
the acompression algorithm compressing correctly? We try not to do
that. Again, correctness is more important than efficiency.
Huaimo: Regarding correctness, the methods are equal.
Robin: To simplify the discussions, we may not want to have too many options.
Second, if there is no critical issue, this can be for future discussion.
* Backup Paths for FT Partitions
Acee: Is this local repair?
Huaimo: The iteration is local, but computing the path is global. Because
there is a split, the database may be out of sync among some nodes,
we add some links to make database resync, then we converge one step
further. But for rate-limiting, those flooding topologies are
calculated by the leaders, and it may take a long time.
Tony: That’s incorrect. Both mechanisms needs full topology information
Huaimo: For the backup path, we don't depend on the flooding topology computed
by the leader. As soon as we calculate backup paths, we enable them.
For rate-limiting, each node needs to check whether this is a link to
the remote site through flooding topology computed by the leader.
Tony: I disagree. The correct thing to do here, regardless of which
mechanism you use, is to determine which temporary links to flood on
and to notice that as soon as you have repaired the partition, you're
we are going to get new LSP information. As soon as that happens,
assuming centralized mode, so we could be just on the same page.
Then the area leader is going to have to re-compute the flooding
topology in both situations.
Huaimo: No. The area leader will compute the flooding topology.
Tony: Rate-limit checks change, then a node decides that it has to
reevaluate. At that point, it is going to see it and flooding topology,
and proceed differently. We could conceivably add more links while it's
waiting for the topology. But that's largely irrelevant because as
soon as it gets the flooding topology, it's going to disable it.
Huaimo: We need to check based on flooding topology.
Tony: Rate-limiting acting on topology change, not flooding topology.
Huaimo: So you will need to check whether a link is part of the flooding
Tony: After you have done a successful repair, the flooding topology is
going to change. We're discussing the arrival of the flooding
topology information, and some other events.
Huaimo: So depending on the flooding topology change, you iterate further,
Tony: If necessary.
Huaimo: Yes, that's the difference. The backup path is not depending on the
flooding topology change.
Tony: It still has to look at the flooding topology to determine if there is
a partition, it’s completely dependent.
Robin: Is there a case that there is no backup path in some topology?
Huaimo: As soon as the topology is connected, we will have a unique backup
path. If the topology is physically split, no way we have a backup
Robin: From my experience, partition is a real problem. And it’s better to
use backup path to fix the problem.
Acee: How does a node know there is partition before area leader computes?
You don’t know where it is going to be partitioned. How to do you
calculate repair paths? Are you saying you compute every node you're
Huaimo: This is on demand. As soon as there is failure, we assume there is
partition and compute backup path.
Acee: Any reason this enhancement couldn't go on a 2nd draft?
Tony: We’re trying to have one draft.
Acee: But this is something extra.
Tony: We need one consistent algorithm for the domain to act on for
Acee: Independent of this, centralized or distributed, you will have a new
flooding topology whether or not you try to do this backup. So the
question is does this do anything faster? How does a guy in the middle
of the flooding topology know that there is a partition? I'm saying
let's just say you're doing distributed because it's easier to see the
analogy. If you're doing this on on demand backup path, you might as
well just compute a new flooding topology. Because everybody's going
to converge to a new topology sooner rather than trying to do a repair
with the existing one.
Les: Acee, I think that's the that's the catch 22 here. If the flooding
topology is partitioned, you don’t know what you don’t know. You can
only detect locally, like your neighbor is not on the flooding
Acee: Then why is this better then temporary flooding?
Huaimo: With backup paths, it converges faster, also minimum number of
links, and algorithm is simple.
Acee: Are you tunneling?
Huaimo: No tunnel. Because every node using the same algorithm will come up
with the same backup path.
Sarah: Every one has the same algorithm but not the same DB, so the backup
paths might be different.
Huaimo: We will come up with the same unique backup path no matter of the
database and the partition. Ot's guaranteed.
Acee: I don’t think it’s simple. Node on the backup path will need to
know. It’s N squared computation.
Huaimo: No. Every node compute backup path from A to B. ...
Acee: Let’s take it offline. I will think more about it.
Sarah: You seem to enable more links for temporary flooding.
Huaimo: No. Minimum number of links are used.
Sarah: In some cases, it may enable less links but not all cases.
Tony: You don’t know that yet. Because you don't have critical information
about everything north of the partition. You only have database
before the partition. Your calculation is not correct.
Huaimo: Iteration is different. This one is more local.
Acee: This is where we don’t converge. I don’t see the advantage
of this because you don’t really know the whole topology. If you have
multiple failures, you don't know where the failures are. Let’s take
it offline. The rate-limiting is already in the draft. So here is a
FT: there is no good reason for it.
FN: needs to discussion more.
Flooding Mode Transition: may need more clarification.
Area leader: Didn’t really understand. What was it solving?
Encoding: Not sure about the blocks vs path - need more discussion.
Backup paths: This also needs to be discussed.
I’ll send an email. Thanks for people attending this. Special thanks
to Tony and Huaimo for leading this discussion.