Skip to main content

Basic Telephony SIP End-to-End Performance Metrics
draft-ietf-pmol-sip-perf-metrics-07

Revision differences

Document history

Date Rev. By Action
2012-08-22
07 (System) post-migration administrative database adjustment to the No Objection position for Cullen Jennings
2012-08-22
07 (System) post-migration administrative database adjustment to the Abstain position for Lisa Dusseault
2012-08-22
07 (System) post-migration administrative database adjustment to the No Objection position for Robert Sparks
2012-08-22
07 (System) post-migration administrative database adjustment to the No Objection position for Lars Eggert
2010-09-21
07 Cindy Morgan State changed to RFC Ed Queue from Approved-announcement sent by Cindy Morgan
2010-09-21
07 (System) IANA Action state changed to No IC from In Progress
2010-09-21
07 (System) IANA Action state changed to In Progress
2010-09-21
07 Amy Vezza IESG state changed to Approved-announcement sent
2010-09-21
07 Amy Vezza IESG has approved the document
2010-09-21
07 Amy Vezza Closed "Approve" ballot
2010-09-21
07 Amy Vezza State Changes to Approved-announcement to be sent from IESG Evaluation::AD Followup by Amy Vezza
2010-09-21
07 Amy Vezza [Note]: 'Vijay Gurbani is the PROTO-shepherd' added by Amy Vezza
2010-09-20
07 (System) New version available: draft-ietf-pmol-sip-perf-metrics-07.txt
2010-09-17
07 Gonzalo Camarillo [Ballot Position Update] New position, No Objection, has been recorded by Gonzalo Camarillo
2010-09-16
07 Robert Sparks
[Ballot comment]
I suggest removing the 2119 keywords from the security consideration section and using normal prose.

There's no discussion of what to do if …
[Ballot comment]
I suggest removing the 2119 keywords from the security consideration section and using normal prose.

There's no discussion of what to do if BYEs get a failure response (particularly a recoverable failure) in
section 4.4, and there's no discussion of what to do if there is no BYE in section 4.5.
2010-09-16
07 Robert Sparks [Ballot Position Update] Position for Robert Sparks has been changed to No Objection from Discuss by Robert Sparks
2010-08-17
07 Lars Eggert [Ballot Position Update] Position for Lars Eggert has been changed to No Objection from Discuss by Lars Eggert
2010-07-30
06 (System) New version available: draft-ietf-pmol-sip-perf-metrics-06.txt
2010-06-02
07 Robert Sparks
[Ballot discuss]
1 cleared

2 cleared

3 cleared

4 cleared

5 Was: "These metrics are intentionally designed to not measure (or be
  perturbed by) …
[Ballot discuss]
1 cleared

2 cleared

3 cleared

4 cleared

5 Was: "These metrics are intentionally designed to not measure (or be
  perturbed by) the hop-hop retransmission mechanisms. This should be
  made explicit. There should also be some discussion of the effect of
  the end-to-end retransmission of 200OK/ACK on the metrics based on
  those messages."

  The additional text in section 4 helps (but you should delete
  "in a dialog" from "the first associated SIP message in a dialog"
  since at least one of the metrics defined operates on messages that
  are never in dialogs.

  There still is no discussion on how to handle measurements when
  there is retransmission of 200s to INVITES and their associated ACKs.

6 cleared

7 cleared

8 cleared

9 cleared

10 Was: "The 3rd to last paragraph of section 4 should be expanded. I think
  it's unlikely that implementers, especially those with other language
  backgrounds,  will understand the subtlety of the quotes around
  "final".  Enumerating the cases where you want the measurement to
  span from the request of one transaction to the final response of
  some other transaction will help. (I'm guessing you were primarily
  considering redirection, but I suspect you also wanted to capture the
  additional delay due to Requires-based negotiation or 488
  not-acceptable-here style re-attempts?). You may also want to
  consider the effect of the negotiation phase of extensions like
  session-timer on these metrics."

  The text in -05 is better, but it is still not clear that you are
  trying to measure across transactions. The trick will be in modifying
  the language to express the notion of measuring between messages that
  are parts of separate transactions and avoiding conflating the
  "message that counts as the end of what we're measuring"
  with "final response" since that phrase means something else in SIP.

  For example, you have a metric that wants to measure this interval:

  -  ---INVITE-->
  ^  <--302--- This 302 is a final response.
  |  ---ACK-->
  v  ---INVITE-->
  -  <--200--- Ths 200 is another final response.
              (It's the final response to the second transaction).
    ---ACK-->

11 cleared

12 cleared

13 Was "The SRD metric definition in 4.3.1 ignores the effect of forking.
        (remainder snipped)"

  You added good text to section 5.4 pointing out how to handle measurements
  when forking occurs. Please add a forward pointer to that text from this
  section.

14 The Failed Session Setup SRD claims to be useful in detecting
  problems in downstream signaling functions. Please provide some text
  or a reference supporting that claim. As written, this metric could
  be dominated by how long the called user lets his phone ring. Is that
  what was intended? You might consider separate treatment for 408s and
  for explicit decline response codes.

15 cleared

16 Was "In section 4.4, what does it mean to measure the delay in the
  disconnect of a failed session completion? Without a successful
  session completion, there can be no BYE. This section also begs the
  very hard to answer question about what to do when BYEs receive
  failure responses. It would be better to note that edge-case exists
  and what, if anything, the metric is going to say about it if it
  happens."

  The new text is much better, but it is still not clear what an
  implementor should do for BYEs that receive failure responses
  (especially failure responses that can be recovered from).
  For instance, consider a BYE/603 or a BYE/503 Retry-After/BYE/200.

17 Was "Section 4.5 is a particularly strong example of these metrics
  focusing on the simple telephony application. It may even be falling
  into the same traps that lead to trying to build fraud-resistant
  billing based on the time difference between an INVITE and a BYE.
  Some additional discussion noting that the metric doesn't capture
  early media and recommendation on when to give up on seeing a BYE
  would be useful. (Sometimes BYEs don't happen even when there is no
  malicious intent.)"

  The new text is better, but the missing BYE case is still not addressed.
  You indicated 4.5.2 should handle that, but it doesn't - it seems to be
  dealing with BYEs that don't get a response, not flows that don't have
  a BYE. BTW - I was confused for awhile when re-reviewing this section
  as to what the scenario it was trying to cover really was. It would help
  others avoid that confusion to show the ACK for the 200 OK to the INVITE.
  Also, is it the intent to only allow the endpoint that sends a BYE to
  report CHT? If so, the document does not say that. If not, should the
  document say something about getting inconsistent values from the two
  UAs involved? (They could be off by almost all of Timer F if it's the
  200s to the BYEs being lost in the flows in 4.5.2).

18 cleared

19 Was "The ratio metrics don't define (or convey) the interval that totals
  are taken over. Are these supposed to be "# requests received since
  this instance was manufactured' or "since last reboot" or "since last
  reset of statistics" or something else? What is the implementation
  supposed to report when the denominator of a ratio is 0?"

  The additional text helps, but doesn't answer my last question, especially
  for metrics like NER/SCR (was SER/SEER).

  I think you're trying to say "Apply this equation to all the messages in
  this bucket" and "different service providers may have different bucket
  sizes". What's not answered is whether the bucket is a sliding window
  (All the messages in the last hour), or a fixed window (All the messages
  since the top of the hour), or if it doesn't matter. Is "Since the beginning
  of time" a valid bucket? What are the consequences for the operator if they
  change bucket sizes? Is this metric calculated and reported by the endpoint?
  If so, what are the consequences if an operator has a network full
  of endpoints that use different size buckets?

  In any case I can have a bucket with a million things in it and still end up
  with a zero in this denominator. (The ratio will be 0/0 if this happens for NER
  or SCR). So, what's the thing that's generating the measurement supposed
  to do when this happens? Not report anything? Report a 100? Whatever it is,
  the document should say what to do.

20 Was "Please add some discussion motivating why all 300s, 401, 402, and 407
  are treated specially (vrs several other candidate 4xx and 6xx
  responses) in sections like section 4.8. Were other codes considered?
  If so, why were they rejected?"

  The new text added is problematic. The condition it states (they indicate an
  acceptable UA effect without the interaction of an individual user of the UA)
  applies to several existing response codes that are not in the set, such as 420.

21 cleared

22 cleared

23 cleared

24 cleared

25 Was "I'm a little surprised there is no discussion on privacy,
  particularly on profiling the usage patterns of individuals or
  organizations, in the security considerations section."

  The section is here to point out to users of the specification
  that there are concerns they need to be sure to have considered.
  Right now, it's left unspecified about how these measurements are
  getting from the measurement point to the operator console.
  It would be prudent to remind potential SSPs to worry about exposing
  information about customer A to customer B while moving these measurements
  around, or when deciding to expose them on a website, especially if they
  are correlated in the example dimensions you call out in section 5.
  If you think it's remotely possible that some endpoint vendor will
  start making a subset of these metrics user-visible (many phones have
  a rich "statistics" screen already) reminding them to think about privacy
  is probably a good idea.

26 cleared

======= NEW ISSUES with -05 =======

27 I think relabeling SRD as PDD is a mistake. At the very least it is
  a change of such magnitude that it should be re-reviewed by the working
  group. SRD as defined by -04 is similar, but not an exact reflection of
  PDD. Saying it is "like" Post-Dial-Delay as defined in the PSTN is risky
  enough. Using the same name makes it even more likely that someone will
  come to the conclusion that it is measuring exactly the same thing. It
  does not. For example, it fails to capture any delay from when
  the user "finishes dialing" to when the INVITE is generated due to
  DNS processing. It would be possible to engineer highly constrained
  networks (that didn't use DNS or allow redirection or forking for example)
  where the metric might behave very much like PDD in the PSTN, but that
  will not be true in the general case.
2010-06-02
07 Robert Sparks
[Ballot discuss]
This is a copy of the discuss on -04 which was previously a pointer to
.

This update is to capture the discuss …
[Ballot discuss]
This is a copy of the discuss on -04 which was previously a pointer to
.

This update is to capture the discuss in the tracker by value instead
of by reference.

----

1 The document should more carefully describe its scope (and consider
  changing its title). This document focuses on the use of SIP for
  simple telephony and relies on measurements in earlier telephony
  networks for guidance.  But telephony is only one use of SIP. These
  aren't the same metrics that would be most useful for observing a
  network that was involved primarily in setting up MSRP sessions for
  file transfer, for instance. A eventual set of generic SIP
  performance metrics will need to focus on the primitives rather than
  artifacts from any particular application.

2 That said, I'm skeptical of the utility of many of these metrics even
  for monitoring systems that are focusing only on delivering basic
  telephony. Has the group surveyed operators to see what they're
  measuring, what they're finding useful, and what they're just
  throwing away? Some additional text motivating why this particular
  set of metrics were chosen should be provided to help

operators/implementers choose which ones they are going to try to use.

3 "Each session is identified by a unique Call-ID" is incorrect. You
  need at least Call-ID, to-tag, and from-tag here. And to be pedantic,
  you're describing the SIP dialog, not one of the sessions it manages.
  The session is what is described by the Session Description Protocol.
  The metrics in this draft are derived from signaling events, not
  session events, and is making assumptions about how those correlate
  for a simple voice call that may not be true for more advanced uses.

4 The document is inconsistent about whether the metrics will describe
  any part of an early-dialog/early session. The introduction indicates
  it won't and focuses on the delivery of a 200 OK, but there are
  metrics that measure the arrival time of 180s. This should be
  reconciled. Do take note that early sessions are pervasive in real
  deployments at this point in time.

5 These metrics are intentionally designed to not measure (or be
  perturbed by) the hop-hop retransmission mechanisms. This should be
  made explicit. There should also be some discussion of the effect of
  the end-to-end retransmission of 200OK/ACK on the metrics based on
  those messages.

6 The document should consider the effects of the presence or absence
  of the reliable-provisional extension on its metrics (some of the
  metrics will be perturbed by a lost 18x that isn't sent reliably).

7 Using T1 and T4 as the timing interval measurement tokens is
  unfortunate. SIP uses those symbols already to mean something
  completely different. Is there a reason not to change these and avoid
  the confusion that the collision will cause?

8 The document uses the terms UAC and UAS incorrectly. It is trying to
  use them to mean the initiator and recipient of a simple phone call.
  But the terms are roles scoped to a particular transaction, not to a
  dialog. When an endpoint sends a BYE request, it is by definition
  acting as a UAC.

9 The document uses the word "dialog" in a way that's not the same as
  the formal term with the same name defined in RFC3261 and that will
  lead to confusion. (A sequence of register requests and responses,
  for example, are never part of any dialog. The INVITE/302/ACK
  messages shown in the call setup flows are not part of any dialog.)
  Please choose another word or phrase for this draft. I suggest
  "message exchange".

10 The 3rd to last paragraph of section 4 should be expanded. I think
  it's unlikely that implementers, especially those with other language
  backgrounds,  will understand the subtlety of the quotes around
  "final".  Enumerating the cases where you want the measurement to
  span from the request of one transaction to the final response of
  some other transaction will help. (I'm guessing you were primarily
  considering redirection, but I suspect you also wanted to capture the
  additional delay due to Requires-based negotiation or 488
  not-acceptable-here style re-attempts?). You may also want to
  consider the effect of the negotiation phase of extensions like
  session-timer on these metrics.

11 The document assumes that a registration will be DIGEST challenged.
  That's a common deployment model, but it is not required. If other
  authentication mechanics are used (such as SIP Identity), the RRD
  metric, for example, becomes muddied.

12 In section 4.2, "Subsequent REGISTER retries are identified by the
  same Call-ID" should say "identified by the same transaction
  identifier (same topmost Via header field branch parameter value".
  Completely different REGISTER transactions from a given registrant
  are likely to have the same Call-ID.

13 The SRD metric definition in 4.3.1 ignores the effect of forking.
  Unlike 200 OKs, where receiving multiple 200s in response to a single
  INVITE only happens if a race is won, it is the _normal_ state of
  affairs for a UAC to receive provisional responses from multiple
  branches when a request forks. Deployed systems are increasingly
  sending 18x responses reliably with an answer, establishing early
  sessions, so when forking is present it is _highly_ likely that there
  will be multiple 18x's from different branches arriving at the UA.
  This section should provide guidance on what to report when this
  happens.

14 The Failed Session Setup SRD claims to be useful in detecting
  problems in downstream signaling functions. Please provide some text
  or a reference supporting that claim. As written, this metric could
  be dominated by how long the called user lets his phone ring. Is that
  what was intended? You might consider separate treatment for 408s and
  for explicit decline response codes.

15 What was the motivation for making MESSAGE special in section 4.3.3.
  Why didn't the group instead extend the concept to measuring _any_
  non-INVITE transaction (with the possible exception of CANCEL)?

16 In section 4.4, what does it mean to measure the delay in the
  disconnect of a failed session completion? Without a successful
  session completion, there can be no BYE. This section also begs the
  very hard to answer question about what to do when BYEs receive
  failure responses. It would be better to note that edge-case exists
  and what, if anything, the metric is going to say about it if it
  happens.

17 Section 4.5 is a particularly strong example of these metrics
  focusing on the simple telephony application. It may even be falling
  into the same traps that lead to trying to build fraud-resistant
  billing based on the time difference between an INVITE and a BYE.
  Some additional discussion noting that the metric doesn't capture
  early media and recommendation on when to give up on seeing a BYE
  would be useful. (Sometimes BYEs don't happen even when there is no
  malicious intent.)

18 Trying to use Max-Forwards to determine how many hops a request took
  is going to produce incorrect results in any but the most simple of
  network deployments (I would have expected this to be based on
  counting Vias with a note pointing to the discussion on the problems
  B2BUAs introduce). Proxies  can reduce Max-Forwards by more than one.
  There are many implementations in the wild that cap Max-Forwards. If
  this metric remains as defined, you should also point out that
  neither endpoint can calculate it. Some third entity will have to
  collect information from each end to make this calculation.

19 The ratio metrics don't define (or convey) the interval that totals
  are taken over. Are these supposed to be "# requests received since
  this instance was manufactured' or "since last reboot" or "since last
  reset of statistics" or something else? What is the implementation
  supposed to report when the denominator of a ratio is 0?

20 Please add some discussion motivating why all 300s, 401, 402, and 407
  are treated specially (vrs several other candidate 4xx and 6xx
  responses) in sections like section 4.8. Were other codes considered?
  If so, why were they rejected?

21 Section 4.9 seems to be implying that you can't receive a 500 class
  response to a reINVITE which is not true. If you want this metric to
  only reflect the results of initial INVITEs, more definition will be
  needed.

22 ISA in section 4.10 claims that 408s indicate an overloaded state in
  a downstream element. Overload may induce 408s, but 408s do _not_
  indicate overload. Its possible to receive them just because someone
  is not answering a phone.

23 In section 5, why where these correlation dimensions chosen. Was the
  Request-URI considered? If so, why was it rejected?

24 The treatment of forking in section 6.3 is insufficient. As noted
  earlier, provisional messages establishing early sessions is becoming
  common, and there will be multiple early sessions for a given INVITE
  when there is forking. The recommendation to latch onto the "first"
  200 (or 18x) and ignore the others only marginally works for playing
  media for simple telephony applications - we're seeing phones that
  mix or present multiple lines, and applications that go beyond basic
  phone calls (like file transfer) that make use of all the responses.
  Trying to dodge the complexity as the current section does will lead
  to metrics that don't reflect what the network is doing.

25 I'm a little surprised there is no discussion on privacy,
  particularly on profiling the usage patterns of individuals or
  organizations, in the security considerations section.

26 Nits:
    26.1 What does it mean in section 4.3.1 for the "user" to send the
      first bit of a message? Suggest deleting "or user" from the
      sentence.
    26.2 Section 4.11 has a stale internal pointer to a non-existant
      section 3.5 I suspect it's trying to point back into 4 somewhere.
2010-05-06
07 (System) Sub state has been changed to AD Follow up from New Id Needed
2010-05-06
05 (System) New version available: draft-ietf-pmol-sip-perf-metrics-05.txt
2010-01-22
07 Cullen Jennings [Ballot Position Update] Position for Cullen Jennings has been changed to No Objection from Discuss by Cullen Jennings
2009-10-09
07 Lisa Dusseault [Ballot Position Update] Position for Lisa Dusseault has been changed to Abstain from Discuss by Lisa Dusseault
2009-10-09
07 Lisa Dusseault
[Ballot discuss]
My fundamental objection to this approach is that metrics for protocols that offer user features are hard to objectively define.  Even worse, a …
[Ballot discuss]
My fundamental objection to this approach is that metrics for protocols that offer user features are hard to objectively define.  Even worse, a protocol like SIP that has a rich set of features and architectures, has many different application use cases for which different metrics are appropriate.

When we move something along the Standards Track, the message the IETF gives is "This is the way we think it should be done".  That doesn't mean there can be only one protocol solving a given problem, but it does give a stamp of approval.  In the case of metrics for these types of protocols, particularly when we're driving the metrics rather than documenting something that already exists and is interoperable, that is inappropriate.  There is insufficient cause for this document, in particular, to go on the Standards Track.

I also have trouble with the specific language in the draft.  In the Abstract the draft says:
  "The purpose of
  this document is to combine a standard set of common metrics,
  allowing interoperable performance measurements, easing the
  comparison of industry implementations."

Later on the draft also says

    "These metrics will likely be utilized in
  production SIP environments for providing input regarding Key
  Performance Indicators (KPI) and Service Level Agreement (SLA)
  indications; however, they may also be used for testing end-to-end
  SIP-based service environments.

First, note that this draft by itself does not allow interoperability unless there's a standard performance monitoring protocol to transport the standardized metrics.  By itself, this draft allows comparison. 

Second, comparing industry implementations on this basis is not necessarily the best way to compare.  We don't even know if it's a good way to compare.  Which metrics are most important to user experience?  Aren't there some thresholds on some measures which are "good enough" and improving beyond those thresholds offers no noticeable improvement? 
What is the likely harm of implementations optimizing for these metrics instead of for a more holistic user experience?

Informational status would make much more sense to me -- I would read that as "here is a set of metrics defined a certain way, and the definitions are for your information if you choose to use the same metrics."

We look for justification for all documents to be on the Standards Track -- metrics documents or protocol specifications.  This document does not, in my opinion, meet the bar. 

I am moving my vote to ABSTAIN.
2009-10-09
07 Lisa Dusseault [Ballot Position Update] Position for Lisa Dusseault has been changed to Discuss from Abstain by Lisa Dusseault
2009-10-09
07 Lisa Dusseault
[Ballot discuss]
My fundamental objection to this approach is that metrics for protocols that offer user features are hard to objectively define.  Even worse, a …
[Ballot discuss]
My fundamental objection to this approach is that metrics for protocols that offer user features are hard to objectively define.  Even worse, a protocol like SIP that has a rich set of features and architectures, has many different application use cases for which different metrics are appropriate.

When we move something along the Standards Track, the message the IETF gives is "This is the way we think it should be done".  That doesn't mean there can be only one protocol solving a given problem, but it does give a stamp of approval.  In the case of metrics for these types of protocols, particularly when we're driving the metrics rather than documenting something that already exists and is interoperable, that is inappropriate.  There is insufficient cause for this document, in particular, to go on the Standards Track.

I also have trouble with the specific language in the draft.  In the Abstract the draft says:
  "The purpose of
  this document is to combine a standard set of common metrics,
  allowing interoperable performance measurements, easing the
  comparison of industry implementations."

Later on the draft also says

    "These metrics will likely be utilized in
  production SIP environments for providing input regarding Key
  Performance Indicators (KPI) and Service Level Agreement (SLA)
  indications; however, they may also be used for testing end-to-end
  SIP-based service environments.

First, note that this draft by itself does not allow interoperability unless there's a standard performance monitoring protocol to transport the standardized metrics.  By itself, this draft allows comparison. 

Second, comparing industry implementations on this basis is not necessarily the best way to compare.  We don't even know if it's a good way to compare.  Which metrics are most important to user experience?  Aren't there some thresholds on some measures which are "good enough" and improving beyond those thresholds offers no noticeable improvement? 
What is the likely harm of implementations optimizing for these metrics instead of for a more holistic user experience?

Informational status would make much more sense to me -- I would read that as "here is a set of metrics defined a certain way, and the definitions are for your information if you choose to use the same metrics."

We look for justification for all documents to be on the Standards Track -- metrics documents or protocol specifications.  This document does not, in my opinion, meet the bar. 

I am moving my vote to ABSTAIN.
2009-10-09
07 Lisa Dusseault
[Ballot discuss]
My fundamental objection to this approach is that metrics for protocols that offer user features are hard to objectively define.  Even worse, a …
[Ballot discuss]
My fundamental objection to this approach is that metrics for protocols that offer user features are hard to objectively define.  Even worse, a protocol like SIP that has a rich set of features and architectures, has many different application use cases for which different metrics are appropriate.

When we move something along the Standards Track, the message the IETF gives is "This is the way we think it should be done".  That doesn't mean there can be only one protocol solving a given problem, but it does give a stamp of approval.  In the case of metrics for these types of protocols, particularly when we're driving the metrics rather than documenting something that already exists and is interoperable, that is inappropriate.  There is insufficient cause for this document, in particular, to go on the Standards Track.

I also have trouble with the specific language in the draft.  In the Abstract the draft says:
  "The purpose of
  this document is to combine a standard set of common metrics,
  allowing interoperable performance measurements, easing the
  comparison of industry implementations."

Later on the draft also says

    "These metrics will likely be utilized in
  production SIP environments for providing input regarding Key
  Performance Indicators (KPI) and Service Level Agreement (SLA)
  indications; however, they may also be used for testing end-to-end
  SIP-based service environments.

First, note that this draft by itself does not allow interoperability unless there's a standard performance monitoring protocol to transport the standardized metrics.  By itself, this draft allows comparison. 

Second, comparing industry implementations on this basis is not necessarily the best way to compare.  We don't even know if it's a good way to compare.  Which metrics are most important to user experience?  Aren't there some thresholds on some measures which are "good enough" and improving beyond those thresholds offers no noticeable improvement? 
What is the likely harm of implementations optimizing for these metrics instead of for a more holistic user experience?

Informational status would make much more sense to me -- I would read that as "here is a set of metrics defined a certain way, and the definitions are for your information if you choose to use the same metrics."

We look for justification for all documents to be on the Standards Track -- metrics documents or protocol specifications.  This document does not, in my opinion, meet the bar. 

I am moving my vote to ABSTAIN.
2009-10-09
07 Lisa Dusseault [Ballot Position Update] Position for Lisa Dusseault has been changed to Abstain from Discuss by Lisa Dusseault
2009-10-08
07 Samuel Weiler Request for Last Call review by SECDIR Completed. Reviewer: Phillip Hallam-Baker.
2009-10-08
07 Cindy Morgan State Changes to IESG Evaluation::Revised ID Needed from IESG Evaluation by Cindy Morgan
2009-10-08
07 Jari Arkko [Ballot Position Update] New position, No Objection, has been recorded by Jari Arkko
2009-10-07
07 Ross Callon [Ballot Position Update] New position, No Objection, has been recorded by Ross Callon
2009-10-07
07 Cullen Jennings
[Ballot comment]
Meta Comment: We are still in the experimentation mode of how to bring together the operational and metrics experience of the OPS area …
[Ballot comment]
Meta Comment: We are still in the experimentation mode of how to bring together the operational and metrics experience of the OPS area with the SIP expertise of the RAI area. I think the ADs have some work to do to see if what improvements can be made
2009-10-07
07 Cullen Jennings
[Ballot discuss]
On some of these, it seems like there were multiple metrics that ended up with same name but were actually different - for …
[Ballot discuss]
On some of these, it seems like there were multiple metrics that ended up with same name but were actually different - for example, SRD in the failure case and SRD in the success case.

Several metrics seem like they would be messed up by HERFP and forking. By messed up I mean not useful for anything. In general I think on all the metrics I would have liked to have a better idea of how they can be used not just collected.

As far as I understand 4.4.2, it measures timer F which is compiled into the code so seems pretty useless to measure.

In the second example in 4.4.2, when UA2 sends a BYE, UA2 is the UAC not the UAS.

SDT seems undefined when answer sends the BYE. Need to consider things like Bye ALSO, reinvites, REFERS and more.

HpR. I don't think this approach has worked out well on operational networks. I see more people doing VIA counting at the UAS. One of the many problems with the approach is, well you have to be at both ends, and if the UAC starts with say 30 (fairly common) and then some B2BUA in the middle resets to 70 (insane but common and what a strict read of 3261 might say you SHOULD do) you are going to probably end up with a negative hop count.

SER. This looks very wrong. On a interface where all invites are challenged (very common for anything offering long distance) this is going to be under 50%.

SEER - the way this is defined you end up with a divide by zero on interfaces that are redirecting.

SDF - I really have no idea how to implement this in an way that gets consistent numbers. It not clear what I look for in the Reason to decide I increment the SDF counter or not.

SCR - I did not understand why this one needed a proxy. I suspect I don't understand what is being counted.

The SSR does not seem right. If you had lots of 503s, it seems like the the SSR could go negative.

Section 6.3. totally agree forking SHOULD be considered. In fact I think forking MUST be considered but we the spec needs to help the implementors know what to do. Take for example a case where an invite forks to 3 phones that all start ringing then one of the is answered. From a dialog point of view, 1/3 of the dialogs worked and the others failed to have a call. From a user point of view the call was a success. Treating this like a transit ISDN network is not going to get the metrics that are useful for a SLA.

Overall, I think that if we backed up to the 10,000 foot level and asked what problem are we going to solve with metrics and what metrics do we need, it would be much clearer how to evaluate if these metrics worked or not.
2009-10-07
07 Cullen Jennings [Ballot Position Update] New position, Discuss, has been recorded by Cullen Jennings
2009-10-07
07 Amy Vezza State Changes to IESG Evaluation from IESG Evaluation - Defer by Amy Vezza
2009-10-07
07 Tim Polk
[Ballot comment]
I support Lisa's and Robert's discusses...

I believe that publication as Informational would be a reasonable path forward, especially
if supported by additional …
[Ballot comment]
I support Lisa's and Robert's discusses...

I believe that publication as Informational would be a reasonable path forward, especially
if supported by additional text describing the scope and origins of the metrics.
2009-10-07
07 Tim Polk [Ballot Position Update] New position, No Objection, has been recorded by Tim Polk
2009-10-07
07 Lars Eggert
[Ballot discuss]
Section 12.1., paragraph 3:
>    [GR-512]  Telcordia, "LSSGR: Reliability, Section 12", GR-512-
>              CORE Issue 2, …
[Ballot discuss]
Section 12.1., paragraph 3:
>    [GR-512]  Telcordia, "LSSGR: Reliability, Section 12", GR-512-
>              CORE Issue 2, January 1998.

  DISCUSS: Is this a standard by another SDO? (In any event, I believe
  it can be made in Informative reference, because it simply points to
  the source from where a metric/test was borrowed.)
2009-10-07
07 Lars Eggert [Ballot Position Update] New position, Discuss, has been recorded by Lars Eggert
2009-10-06
07 Ron Bonica [Ballot Position Update] New position, No Objection, has been recorded by Ron Bonica
2009-10-06
07 Alexey Melnikov [Ballot Position Update] New position, No Objection, has been recorded by Alexey Melnikov
2009-10-06
07 Robert Sparks [Ballot discuss]
I have several concerns with this document that are captured in the message went to the PMOL list at
2009-10-06
07 Russ Housley [Ballot Position Update] New position, No Objection, has been recorded by Russ Housley
2009-10-06
07 Robert Sparks [Ballot Position Update] New position, Discuss, has been recorded by Robert Sparks
2009-10-06
07 Ralph Droms [Ballot Position Update] New position, No Objection, has been recorded by Ralph Droms
2009-10-05
07 Lisa Dusseault
[Ballot discuss]
I would like to talk about the competitive or anti-competitive nature of metrics as well as the meaning of putting these on the …
[Ballot discuss]
I would like to talk about the competitive or anti-competitive nature of metrics as well as the meaning of putting these on the Standards Track.

In the Abstract the draft says:
  "The purpose of
  this document is to combine a standard set of common metrics,
  allowing interoperable performance measurements, easing the
  comparison of industry implementations."

Later on the draft also says

    "These metrics will likely be utilized in
  production SIP environments for providing input regarding Key
  Performance Indicators (KPI) and Service Level Agreement (SLA)
  indications; however, they may also be used for testing end-to-end
  SIP-based service environments.

First, note that this draft by itself does not allow interoperability unless there's a standard performance monitoring protocol to transport the standardized metrics.  By itself, this draft allows comparison. 

Second, comparing industry implementations on this basis is not necessarily the best way to compare.  We don't even know if it's a good way to compare.  Which metrics are most important to user experience?  Aren't there some thresholds on some measures which are "good enough" and improving beyond those thresholds offers no noticeable improvement?

I don't see how we have enough confidence to call this a Proposed Standard at this point; Informational would make much more sense to me -- I would read that as "here is a set of metrics defined a certain way, and the definitions are for your information if you choose to use the same metrics."

What is the likely harm of implementations optimizing for these metrics instead of for a more holistic user experience?
2009-10-05
07 Lisa Dusseault [Ballot Position Update] New position, Discuss, has been recorded by Lisa Dusseault
2009-09-28
07 Pasi Eronen [Ballot Position Update] New position, No Objection, has been recorded by Pasi Eronen
2009-09-21
07 Robert Sparks Telechat date was changed to 2009-09-24 from 2009-10-08 by Robert Sparks
2009-09-21
07 Robert Sparks Telechat date was changed to 2009-10-08 from 2009-09-24 by Robert Sparks
2009-09-21
07 Robert Sparks State Changes to IESG Evaluation - Defer from IESG Evaluation by Robert Sparks
2009-09-21
07 Robert Sparks [Note]: 'Vijay Gurbani is the PROTO-shepherd' added by Robert Sparks
2009-09-15
07 Dan Romascanu [Ballot Position Update] New position, Yes, has been recorded for Dan Romascanu
2009-09-15
07 Dan Romascanu Ballot has been issued by Dan Romascanu
2009-09-15
07 Dan Romascanu Created "Approve" ballot
2009-09-15
07 Dan Romascanu State Changes to IESG Evaluation from Waiting for AD Go-Ahead by Dan Romascanu
2009-09-15
07 Dan Romascanu Placed on agenda for telechat - 2009-09-24 by Dan Romascanu
2009-09-09
04 (System) New version available: draft-ietf-pmol-sip-perf-metrics-04.txt
2009-09-01
07 Dan Romascanu waiting for the editori to address the comments in the Gen-ART review by Suresh Krishnan [suresh.krishnan@ericsson.com]
2009-08-18
07 (System) State has been changed to Waiting for AD Go-Ahead from In Last Call by system
2009-08-14
07 Amanda Baber IANA comments:

As described in the IANA Considerations section, we understand this
document to have NO IANA Actions.
2009-08-06
07 Samuel Weiler Request for Last Call review by SECDIR is assigned to Phillip Hallam-Baker
2009-08-06
07 Samuel Weiler Request for Last Call review by SECDIR is assigned to Phillip Hallam-Baker
2009-08-04
07 Amy Vezza Last call sent
2009-08-04
07 Amy Vezza State Changes to In Last Call from Last Call Requested by Amy Vezza
2009-08-04
07 Dan Romascanu State Changes to Last Call Requested from AD Evaluation by Dan Romascanu
2009-08-04
07 Dan Romascanu Last Call was requested by Dan Romascanu
2009-08-04
07 (System) Ballot writeup text was added
2009-08-04
07 (System) Last call text was added
2009-08-04
07 (System) Ballot approval text was added
2009-07-22
07 Dan Romascanu
Document write-up by Vijay Gurbani:

This is a publication request for SIP End-to-End Performance Metrics
http://tools.ietf.org/html/draft-ietf-pmol-sip-perf-metrics-03
as a STANDARDS TRACK RFC.

    (1.a) Who …
Document write-up by Vijay Gurbani:

This is a publication request for SIP End-to-End Performance Metrics
http://tools.ietf.org/html/draft-ietf-pmol-sip-perf-metrics-03
as a STANDARDS TRACK RFC.

    (1.a) Who is the Document Shepherd for this document? Has the
          Document Shepherd personally reviewed this version of the
          document and, in particular, does he or she believe this
          version is ready for forwarding to the IESG for publication?
Vijay K. Gurbani  is the Document Shepherd.

    (1.b) Has the document had adequate review both from key WG members
          and from key non-WG members? Does the Document Shepherd have
          any concerns about the depth or breadth of the reviews that
          have been performed?
This document has been reviewed by many participants of the SIP,
SIPPING, PMOL and BMWG working groups over the last 3 years.

    (1.c) Does the Document Shepherd have concerns that the document
          needs more review from a particular or broader perspective,
          e.g., security, operational complexity, someone familiar with
          AAA, internationalization or XML?
There does not exist any basis for concerns regarding more
review.  The draft does not introduce a new protocol or any new headers
that may be amenable to a security review, nor does it introduce
any machinations that may cause operational complexity.

    (1.d) Does the Document Shepherd have any specific concerns or
          issues with this document that the Responsible Area Director
          and/or the IESG should be aware of? For example, perhaps he
          or she is uncomfortable with certain parts of the document, or
          has concerns whether there really is a need for it. In any
          event, if the WG has discussed those issues and has indicated
          that it still wishes to advance the document, detail those
          concerns here. Has an IPR disclosure related to this document
          been filed? If so, please include a reference to the
          disclosure and summarize the WG discussion and conclusion on
          this issue.
There are no specific issues or concerns with this document.
There are no IPR filings, and no one has mentioned IPR related
to this draft during the life of the draft.

    (1.e) How solid is the WG consensus behind this document? Does it
          represent the strong concurrence of a few individuals, with
          others being silent, or does the WG as a whole understand and
          agree with it?
After one WGLC with many comments and several reviews that followed,
the recent WGLC ended quietly.  The PMOL WGLC request for this draft
was cross-posted to the SIPPING WG as well.

    (1.f) Has anyone threatened an appeal or otherwise indicated extreme
          discontent? If so, please summarise the areas of conflict in
          separate email messages to the Responsible Area Director. (It
          should be in a separate email because this questionnaire is
          entered into the ID Tracker.)
No.

    (1.g) Has the Document Shepherd personally verified that the
          document satisfies all ID nits? (See
          http://www.ietf.org/ID-Checklist.html and
          http://tools.ietf.org/tools/idnits/). Boilerplate checks are
          not enough; this check needs to be thorough. Has the document
          met all formal review criteria it needs to, such as the MIB
          Doctor, media type and URI type reviews?
The nits check indicates one false alarm.
http://tools.ietf.org/idnits?url=http://tools.ietf.org/id/draft-ietf-pmol-sip-perf-metrics-03.txt

    (1.h) Has the document split its references into normative and
          informative? Are there normative references to documents that
          are not ready for advancement or are otherwise in an unclear
          state? If such normative references exist, what is the
          strategy for their completion? Are there normative references
          that are downward references, as described in [RFC3967]? If
          so, list these downward references to support the Area
          Director in the Last Call procedure for them [RFC3967].
The references are split, and there are no down-references.

    (1.i) Has the Document Shepherd verified that the document IANA
          consideration section exists and is consistent with the body
          of the document? If the document specifies protocol
          extensions, are reservations requested in appropriate IANA
          registries? Are the IANA registries clearly identified? If
          the document creates a new registry, does it define the
          proposed initial contents of the registry and an allocation
          procedure for future registrations? Does it suggest a
          reasonable name for the new registry? See [RFC5226]. If the
          document describes an Expert Review process has Shepherd
          conferred with the Responsible Area Director so that the IESG
          can appoint the needed Expert during the IESG Evaluation?
There are no IANA considerations needed, and this is indicated.

    (1.j) Has the Document Shepherd verified that sections of the
          document that are written in a formal language, such as XML
          code, BNF rules, MIB definitions, etc., validate correctly in
          an automated checker?
Not applicable.

    (1.k) The IESG approval announcement includes a Document
          Announcement Write-Up. Please provide such a Document
          Announcement Write-Up? Recent examples can be found in the
          "Action" announcements for approved documents. The approval
          announcement contains the following sections:

          Technical Summary
    SIP has become a widely-used standard among many service providers,
    vendors, and end users.  Although there are many different standards
    for measuring the performance of signaling protocols, none of them
    specifically address SIP.

    The scope of this document is limited to the definitions of a
    standard set of metrics for measuring and reporting SIP performance
    from an end-to-end perspective.  The metrics introduce a common
    foundation for understanding and quantifying performance expectations
    between service providers, vendors, and the users of services based
    on SIP.  The intended audience for this document can be found among
    network operators, who often collect information on the
    responsiveness of the network to customer requests for services.

          Working Group Summary
    Working Group Consensus was smoothly achieved.

          Document Quality
              Are there existing implementations of the protocol? Have a
              significant number of vendors indicated their plan to
              implement the specification? Are there any reviewers that
              merit special mention as having done a thorough review,
              e.g., one that resulted in important changes or a
              conclusion that the document had no substantive issues? If
              there was a MIB Doctor, Media Type or other expert review,
              what was its course (briefly)? In the case of a Media Type
              review, on what date was the request posted?
There are several implementations of earlier versions of the I-D,
based on contacts with the authors.  For example, Sipana is a
distributed SIP analyzer to monitor the SIP signaling behavior, uses
the many of the SIP metrics: http://code.google.com/p/sipana/

The Contributors and Acknowledgements sections list
many of the reviewers who deserve mention:
Carol Davids, Marian Delkinov, Adam Uzelac, Jean-Francois Mule,
Rich Terpstra, John Hearty and Dean Bayless.
2009-07-22
07 Dan Romascanu [Note]: 'Vijay Gurbani is the PROTO-shepherd' added by Dan Romascanu
2009-07-22
07 Dan Romascanu Draft Added by Dan Romascanu in state AD Evaluation
2009-03-06
03 (System) New version available: draft-ietf-pmol-sip-perf-metrics-03.txt
2008-11-01
02 (System) New version available: draft-ietf-pmol-sip-perf-metrics-02.txt
2008-06-26
01 (System) New version available: draft-ietf-pmol-sip-perf-metrics-01.txt
2008-02-29
00 (System) New version available: draft-ietf-pmol-sip-perf-metrics-00.txt