Note: This ballot was opened for revision 08 and is now closed.
Summary: Needs a YES.
Comment (2010-06-16) 
I have a huge raft of issues and concerns about this document. In the
end I raised a COMMENT not a DISCUSS because I don't believe that the
publication of this document as an Informational RFC without making
the changes will be harmful. However, attending to these comments would,
I believe significantly improve the value of your work.
The responsible AD may decide that the comments need to be addressed.
---
idnits (http://tools.ietf.org/tools/idnits/) throws up a number of
issues. Although many of these are minor or editorial, it would have
made the document easier to read had you fixed them.
I think that some of the boilerplate issues are more significant and
that the I-D cannot be accepted unless a new revision with the
correct boilerplate is submitted.
The document writeup says:
(1.g) Has the Document Shepherd personally verified that the
document satisfies all ID nits? (See
http://www.ietf.org/ID-Checklist.html and
http://tools.ietf.org/tools/idnits/). Boilerplate checks are
not enough; this check needs to be thorough. Has the document
met all formal review criteria it needs to, such as the MIB
Doctor, media type and URI type reviews?
There are a few errors in the nits:
http://tools.ietf.org/idnits?url=http://tools.ietf.org/id/draft-ietf-bmwg-protection-term-08.txt
These can be corrected after AD-review, but there will be some work
necessary to adopt the new boilerplate, update references, fix
page lengths, etc.
Clearly this did not happen, and I really think we need to push back on
document shepherds to understand that the only acceptable answer to
the question is "Yes, idnits passes cleanly".
---
There is a significant different between the document title
("Benchmarking Terminology for Protection Performance") and the content
of the document as explained in the Abstract and Introduction.
Viz.
This document provides common terminology and metrics for
benchmarking the performance of sub-IP layer protection
mechanisms.
1. The title should mention metrics
2. The title should indicate the scope is sub-IP
---
The Introduction starts off with...
Technologies
that function at sub-IP layers can be enabled to provide further
protection of IP traffic by providing the failure recovery at the
sub-IP layers so that the outage is not observed at the IP-layer.
This seems a reasonable statement, but isn't it in contradiction with
the whole premise of this document as stated in the Abstract...
The performance
benchmarks are measured at the IP-Layer, avoiding dependence on
specific sub-IP protection mechanisms.
In other words, if the outage is not observed at the IP-layer, how
will you measure the benchmarks at the IP-Layer?
---
I would really have liked this work to be aligned with RFC 4427.
That document sets out terminology for protection and restoration
in GMPLS networks (i.e. sub-IP) and attempts to achieve alignment
with ITU-T Recommendations G.808.1 and G.841.
It is important to use identical rather than similar terms for
protection and restoration techniques across the IETF when we
are referring to sub-IP layers.
Actually, I find the terminology rather woolly. For example, in
Section 1
The Working Path is the Primary Path prior to the Failover Event and
the Backup Path after the Failover Event.
Yet in Section 3.1.3
The Primary Path is the Path that traffic traverses
prior to a Failover Event.
And it is difficult to read a document where the terms are used well
in advance of their definitions.
It might help if Section 3 was split with sections 3.1 through 3.4
presented as terminology, and sections 3.5 onwards as Test
Considerations.
The definition of "Restoration" in section 3.3.5 is unlike anything
I have seen before and is very much at odds with the way the term is
used in sub-IP networks.
---
Figure 4
There is a spurious arrow head on the left-hand end of the arrow
marked "IP-Layer Forwarding"
---
Section 3.1.1
a. R1 is the ingress node and forwards IP packets, which input
into DUT/SUT, to R2 as sub-IP frames over link L12.
b. Ri is a node which forwards data frames to R[i+1] over Link
Li[i+1] for all i, 1<i<n, based on information in the sub-IP
layer.
I don't think you should assume anything about the encapsulation method
or transport mechanisms in the sub-IP technology. What is a sub-IP frame
in packet over WDM?
In many technologies, node Ri does not forward sub-IP frames. It
forwards a signal.
---
Section 3.1.1
"A bidirectional path", which transmits traffic in both
directions along the same nodes, consists of two unidirectional
paths. Therefore, the two unidirectional paths belonging to
"one bidirectional path" will be treated independently when
benchmarking for "a bidirectional path".
Doesn't that mean that there will be different observed performance in
the IP layer according to whether bidirecitonal or unidirecitonal
protection is enabled in the sub-IP layer? But you are saying that you
will not distinguish the two cases and so you will report benchmarks
incorrectly.
---
Section 3.1.3
Definition:
The preferred path for forwarding traffic between two or
more nodes.
This is (possibly) the only mention of p2mp sub-IP paths. Is this
really in scope for your draft?
---
Section 3.1.5
The Backup Path
MUST be created prior to the Failover Event.
This seems to rule out the use of restoration (in the stadard
transport network sense of the word) in the sub-IP network. Why
do you rule this out?
In fact, section 3.1.7 seems to contradict this statement.
---
Section 3.1.5
Is the list intended to be exhaustive? This seems to have forgotten
1:1 protection.
---
Section 3.1.5
The backup path
generally originates at the point of failure, and terminates at
a node along a primary path.
I don't find the term "point of failure" in the rest of the document,
but it seems to me this is wrong. It the backup path starts at the
point of failure (for example, a failed node), what use will it be?
---
Section 3.1.8
3.1.8. Disjoint Paths
Definition:
A pair of paths that do not share a common link.
Discussion:
Two paths are disjoint if they do not share a common node other
than the ingress and egress.
While these two paragraphs are compatible it is not clear that the
Discussion is simply observing that node-disjoint paths are necessarily
link-disjoint.
Usually, one talks about link-disjoint paths and node-disjoint paths.
---
Section 3.1.9
You use the term "penultimate egress node" which may be a bit confusing.
After all, there is only one egress node.
---
Section 3.1.10
Definition:
SRLG is a set of links which share a physical resource.
This is a debatable definition! What is a physical resource? Does it
include a power supply? What about separate ducts in the same bridge?
Or transit of the same politically unstable country?
The last line of the Discussion seems to be the really key bit.
Discussion:
SRLG is considered the set of links to be avoided when
the primary and secondary paths are considered disjoint.
The SRLG will fail as a group if the shared resource fails.
---
Section 3.2
I am not really comfortable with the split of "link protection" and
"local link protection", but it I see how it may be interesting for
benchmarking.
I don't understand why there isn't a concept of "local node protection"
where the failure of node C on the path ABCDE is protected by a direct
link BD.
---
Section 3.2.3
Path Protection provides Node Protection and Link Protection
for every node and link along the Primary Path. A Backup
Path providing Path Protection MUST have the same ingress
node as the Primary Path.
I don't see why you rule out a backup path that is not fully link and
node diverse
---
Section 3.2.7
The State Control Interface MAY be used for Redundant Node
Protection. The State Control Interface MUST be out-of-band.
It is possible to have Redundant Node Protection in which
there is no state control or state control is provided
in-band.
This is hard to parse. It appears to say that stat control must be
provided out of band. And yet it also says it is possible for it to
be in band.
---
Section 3.5
The following Benchmarks MAY be assessed on a per-flow basis
using at least 16 flows spread over the routing table (more
flows is better).
Isn't this SHOULD?
Comment (2010-06-16) 
I am not sure that RFC2119 language is appropriate here
Figures 1 through 5 show models that MAY be used when benchmarking
Sub-IP Protection mechanisms, which MUST use a Protection Switching
System that consists of a minimum of two Protection-Switching Nodes,
an Ingress Node known as the Headend Node and an Egress Node known
as the Merge Node.
Ideally the RFC2119 definition should appear before first use in the document
The Protection Switching System MUST include
either a Primary Path and Backup Path, as shown in Figures 1 through
4, or a Primary Node and Standby Node, as shown in Figure 5.
I am not a mathematician, can an equation have a pluarity of sub-equations, or
is a different mathematical construct needed?
TBLM as shown in Equation 2:
(Equation 2)
(Equation 2a)
TBLM Failover Time = Time(Failover) - Time(Failover Event)
(Equation 2b)
TBLM Reversion Time = Time(Reversion) - Time(Restoration)
Same with Eq3
Comment (2010-06-17) 
It looks to me that 'Benchmarking Terminology for Performance of sub-IP layer
Protection Mechanisms' would be a more apropriate name for this document.
Comment (2010-06-14) 
> Benchmarking Terminology
> for Protection Performance
As stated in the abstract, Section 3.5 actually defines some tests.
The title should reflect that this document is not only defining
terminology.
Section 3.1., paragraph 4:
> b. Ri is a node which forwards data frames to R[i+1] over Link
> Li[i+1] for all i, 1<i<n, based on information in the sub-IP
> layer.
Since R1/L12 are defined in (a) and (c) defines Rn, you probably want
to change 1<i<n to 1<i<n-1, so that Rn isn't defined twice. (And then
you need to define L(n-1)n in (c)).
(I also wonder why you define these sets of links and nodes when they
aren't used anymore in the remainder of the document.)
Comment (2010-06-15) 
I suspect this is clear enough for its intended audience, but a more detailed
description of the calculations in 3.6.1 and 3.6.3 would help novice readers.
I couldn't quite sort out the equations. For example in 3.6.3 (TBM), are the
following interpretations correct?
(1) Time(Failover) = Timestamp on first unimpaired packet received at egress
node after the backup path became the working
(2) Time(Failover Event) = Timestamp on the last unimpaired packet received at
egress node on the primary path before failure
I was unable to construct similar statements for the TBLM in 3.6.1 at all.
Nits:
in section 3.1.1, there is a confusing mixture of notation, using parentheses
and brackets interchangeably. The definition of path uses parentheses, as in
"L(n-1)n", but the description of node in subitem b. uses brackets, as in
"Li[i+1]".
In 3.6.3 under discussion, the comma following "observation of unimpaired
packets" should be a period.