Basic Telephony SIP End-to-End Performance Metrics
draft-ietf-pmol-sip-perf-metrics-07
Revision differences
Document history
Date | Rev. | By | Action |
---|---|---|---|
2012-08-22
|
07 | (System) | post-migration administrative database adjustment to the No Objection position for Cullen Jennings |
2012-08-22
|
07 | (System) | post-migration administrative database adjustment to the Abstain position for Lisa Dusseault |
2012-08-22
|
07 | (System) | post-migration administrative database adjustment to the No Objection position for Robert Sparks |
2012-08-22
|
07 | (System) | post-migration administrative database adjustment to the No Objection position for Lars Eggert |
2010-09-21
|
07 | Cindy Morgan | State changed to RFC Ed Queue from Approved-announcement sent by Cindy Morgan |
2010-09-21
|
07 | (System) | IANA Action state changed to No IC from In Progress |
2010-09-21
|
07 | (System) | IANA Action state changed to In Progress |
2010-09-21
|
07 | Amy Vezza | IESG state changed to Approved-announcement sent |
2010-09-21
|
07 | Amy Vezza | IESG has approved the document |
2010-09-21
|
07 | Amy Vezza | Closed "Approve" ballot |
2010-09-21
|
07 | Amy Vezza | State Changes to Approved-announcement to be sent from IESG Evaluation::AD Followup by Amy Vezza |
2010-09-21
|
07 | Amy Vezza | [Note]: 'Vijay Gurbani is the PROTO-shepherd' added by Amy Vezza |
2010-09-20
|
07 | (System) | New version available: draft-ietf-pmol-sip-perf-metrics-07.txt |
2010-09-17
|
07 | Gonzalo Camarillo | [Ballot Position Update] New position, No Objection, has been recorded by Gonzalo Camarillo |
2010-09-16
|
07 | Robert Sparks | [Ballot comment] I suggest removing the 2119 keywords from the security consideration section and using normal prose. There's no discussion of what to do if … [Ballot comment] I suggest removing the 2119 keywords from the security consideration section and using normal prose. There's no discussion of what to do if BYEs get a failure response (particularly a recoverable failure) in section 4.4, and there's no discussion of what to do if there is no BYE in section 4.5. |
2010-09-16
|
07 | Robert Sparks | [Ballot Position Update] Position for Robert Sparks has been changed to No Objection from Discuss by Robert Sparks |
2010-08-17
|
07 | Lars Eggert | [Ballot Position Update] Position for Lars Eggert has been changed to No Objection from Discuss by Lars Eggert |
2010-07-30
|
06 | (System) | New version available: draft-ietf-pmol-sip-perf-metrics-06.txt |
2010-06-02
|
07 | Robert Sparks | [Ballot discuss] 1 cleared 2 cleared 3 cleared 4 cleared 5 Was: "These metrics are intentionally designed to not measure (or be perturbed by) … [Ballot discuss] 1 cleared 2 cleared 3 cleared 4 cleared 5 Was: "These metrics are intentionally designed to not measure (or be perturbed by) the hop-hop retransmission mechanisms. This should be made explicit. There should also be some discussion of the effect of the end-to-end retransmission of 200OK/ACK on the metrics based on those messages." The additional text in section 4 helps (but you should delete "in a dialog" from "the first associated SIP message in a dialog" since at least one of the metrics defined operates on messages that are never in dialogs. There still is no discussion on how to handle measurements when there is retransmission of 200s to INVITES and their associated ACKs. 6 cleared 7 cleared 8 cleared 9 cleared 10 Was: "The 3rd to last paragraph of section 4 should be expanded. I think it's unlikely that implementers, especially those with other language backgrounds, will understand the subtlety of the quotes around "final". Enumerating the cases where you want the measurement to span from the request of one transaction to the final response of some other transaction will help. (I'm guessing you were primarily considering redirection, but I suspect you also wanted to capture the additional delay due to Requires-based negotiation or 488 not-acceptable-here style re-attempts?). You may also want to consider the effect of the negotiation phase of extensions like session-timer on these metrics." The text in -05 is better, but it is still not clear that you are trying to measure across transactions. The trick will be in modifying the language to express the notion of measuring between messages that are parts of separate transactions and avoiding conflating the "message that counts as the end of what we're measuring" with "final response" since that phrase means something else in SIP. For example, you have a metric that wants to measure this interval: - ---INVITE--> ^ <--302--- This 302 is a final response. | ---ACK--> v ---INVITE--> - <--200--- Ths 200 is another final response. (It's the final response to the second transaction). ---ACK--> 11 cleared 12 cleared 13 Was "The SRD metric definition in 4.3.1 ignores the effect of forking. (remainder snipped)" You added good text to section 5.4 pointing out how to handle measurements when forking occurs. Please add a forward pointer to that text from this section. 14 The Failed Session Setup SRD claims to be useful in detecting problems in downstream signaling functions. Please provide some text or a reference supporting that claim. As written, this metric could be dominated by how long the called user lets his phone ring. Is that what was intended? You might consider separate treatment for 408s and for explicit decline response codes. 15 cleared 16 Was "In section 4.4, what does it mean to measure the delay in the disconnect of a failed session completion? Without a successful session completion, there can be no BYE. This section also begs the very hard to answer question about what to do when BYEs receive failure responses. It would be better to note that edge-case exists and what, if anything, the metric is going to say about it if it happens." The new text is much better, but it is still not clear what an implementor should do for BYEs that receive failure responses (especially failure responses that can be recovered from). For instance, consider a BYE/603 or a BYE/503 Retry-After/BYE/200. 17 Was "Section 4.5 is a particularly strong example of these metrics focusing on the simple telephony application. It may even be falling into the same traps that lead to trying to build fraud-resistant billing based on the time difference between an INVITE and a BYE. Some additional discussion noting that the metric doesn't capture early media and recommendation on when to give up on seeing a BYE would be useful. (Sometimes BYEs don't happen even when there is no malicious intent.)" The new text is better, but the missing BYE case is still not addressed. You indicated 4.5.2 should handle that, but it doesn't - it seems to be dealing with BYEs that don't get a response, not flows that don't have a BYE. BTW - I was confused for awhile when re-reviewing this section as to what the scenario it was trying to cover really was. It would help others avoid that confusion to show the ACK for the 200 OK to the INVITE. Also, is it the intent to only allow the endpoint that sends a BYE to report CHT? If so, the document does not say that. If not, should the document say something about getting inconsistent values from the two UAs involved? (They could be off by almost all of Timer F if it's the 200s to the BYEs being lost in the flows in 4.5.2). 18 cleared 19 Was "The ratio metrics don't define (or convey) the interval that totals are taken over. Are these supposed to be "# requests received since this instance was manufactured' or "since last reboot" or "since last reset of statistics" or something else? What is the implementation supposed to report when the denominator of a ratio is 0?" The additional text helps, but doesn't answer my last question, especially for metrics like NER/SCR (was SER/SEER). I think you're trying to say "Apply this equation to all the messages in this bucket" and "different service providers may have different bucket sizes". What's not answered is whether the bucket is a sliding window (All the messages in the last hour), or a fixed window (All the messages since the top of the hour), or if it doesn't matter. Is "Since the beginning of time" a valid bucket? What are the consequences for the operator if they change bucket sizes? Is this metric calculated and reported by the endpoint? If so, what are the consequences if an operator has a network full of endpoints that use different size buckets? In any case I can have a bucket with a million things in it and still end up with a zero in this denominator. (The ratio will be 0/0 if this happens for NER or SCR). So, what's the thing that's generating the measurement supposed to do when this happens? Not report anything? Report a 100? Whatever it is, the document should say what to do. 20 Was "Please add some discussion motivating why all 300s, 401, 402, and 407 are treated specially (vrs several other candidate 4xx and 6xx responses) in sections like section 4.8. Were other codes considered? If so, why were they rejected?" The new text added is problematic. The condition it states (they indicate an acceptable UA effect without the interaction of an individual user of the UA) applies to several existing response codes that are not in the set, such as 420. 21 cleared 22 cleared 23 cleared 24 cleared 25 Was "I'm a little surprised there is no discussion on privacy, particularly on profiling the usage patterns of individuals or organizations, in the security considerations section." The section is here to point out to users of the specification that there are concerns they need to be sure to have considered. Right now, it's left unspecified about how these measurements are getting from the measurement point to the operator console. It would be prudent to remind potential SSPs to worry about exposing information about customer A to customer B while moving these measurements around, or when deciding to expose them on a website, especially if they are correlated in the example dimensions you call out in section 5. If you think it's remotely possible that some endpoint vendor will start making a subset of these metrics user-visible (many phones have a rich "statistics" screen already) reminding them to think about privacy is probably a good idea. 26 cleared ======= NEW ISSUES with -05 ======= 27 I think relabeling SRD as PDD is a mistake. At the very least it is a change of such magnitude that it should be re-reviewed by the working group. SRD as defined by -04 is similar, but not an exact reflection of PDD. Saying it is "like" Post-Dial-Delay as defined in the PSTN is risky enough. Using the same name makes it even more likely that someone will come to the conclusion that it is measuring exactly the same thing. It does not. For example, it fails to capture any delay from when the user "finishes dialing" to when the INVITE is generated due to DNS processing. It would be possible to engineer highly constrained networks (that didn't use DNS or allow redirection or forking for example) where the metric might behave very much like PDD in the PSTN, but that will not be true in the general case. |
2010-06-02
|
07 | Robert Sparks | [Ballot discuss] This is a copy of the discuss on -04 which was previously a pointer to . This update is to capture the discuss … [Ballot discuss] This is a copy of the discuss on -04 which was previously a pointer to . This update is to capture the discuss in the tracker by value instead of by reference. ---- 1 The document should more carefully describe its scope (and consider changing its title). This document focuses on the use of SIP for simple telephony and relies on measurements in earlier telephony networks for guidance. But telephony is only one use of SIP. These aren't the same metrics that would be most useful for observing a network that was involved primarily in setting up MSRP sessions for file transfer, for instance. A eventual set of generic SIP performance metrics will need to focus on the primitives rather than artifacts from any particular application. 2 That said, I'm skeptical of the utility of many of these metrics even for monitoring systems that are focusing only on delivering basic telephony. Has the group surveyed operators to see what they're measuring, what they're finding useful, and what they're just throwing away? Some additional text motivating why this particular set of metrics were chosen should be provided to help operators/implementers choose which ones they are going to try to use. 3 "Each session is identified by a unique Call-ID" is incorrect. You need at least Call-ID, to-tag, and from-tag here. And to be pedantic, you're describing the SIP dialog, not one of the sessions it manages. The session is what is described by the Session Description Protocol. The metrics in this draft are derived from signaling events, not session events, and is making assumptions about how those correlate for a simple voice call that may not be true for more advanced uses. 4 The document is inconsistent about whether the metrics will describe any part of an early-dialog/early session. The introduction indicates it won't and focuses on the delivery of a 200 OK, but there are metrics that measure the arrival time of 180s. This should be reconciled. Do take note that early sessions are pervasive in real deployments at this point in time. 5 These metrics are intentionally designed to not measure (or be perturbed by) the hop-hop retransmission mechanisms. This should be made explicit. There should also be some discussion of the effect of the end-to-end retransmission of 200OK/ACK on the metrics based on those messages. 6 The document should consider the effects of the presence or absence of the reliable-provisional extension on its metrics (some of the metrics will be perturbed by a lost 18x that isn't sent reliably). 7 Using T1 and T4 as the timing interval measurement tokens is unfortunate. SIP uses those symbols already to mean something completely different. Is there a reason not to change these and avoid the confusion that the collision will cause? 8 The document uses the terms UAC and UAS incorrectly. It is trying to use them to mean the initiator and recipient of a simple phone call. But the terms are roles scoped to a particular transaction, not to a dialog. When an endpoint sends a BYE request, it is by definition acting as a UAC. 9 The document uses the word "dialog" in a way that's not the same as the formal term with the same name defined in RFC3261 and that will lead to confusion. (A sequence of register requests and responses, for example, are never part of any dialog. The INVITE/302/ACK messages shown in the call setup flows are not part of any dialog.) Please choose another word or phrase for this draft. I suggest "message exchange". 10 The 3rd to last paragraph of section 4 should be expanded. I think it's unlikely that implementers, especially those with other language backgrounds, will understand the subtlety of the quotes around "final". Enumerating the cases where you want the measurement to span from the request of one transaction to the final response of some other transaction will help. (I'm guessing you were primarily considering redirection, but I suspect you also wanted to capture the additional delay due to Requires-based negotiation or 488 not-acceptable-here style re-attempts?). You may also want to consider the effect of the negotiation phase of extensions like session-timer on these metrics. 11 The document assumes that a registration will be DIGEST challenged. That's a common deployment model, but it is not required. If other authentication mechanics are used (such as SIP Identity), the RRD metric, for example, becomes muddied. 12 In section 4.2, "Subsequent REGISTER retries are identified by the same Call-ID" should say "identified by the same transaction identifier (same topmost Via header field branch parameter value". Completely different REGISTER transactions from a given registrant are likely to have the same Call-ID. 13 The SRD metric definition in 4.3.1 ignores the effect of forking. Unlike 200 OKs, where receiving multiple 200s in response to a single INVITE only happens if a race is won, it is the _normal_ state of affairs for a UAC to receive provisional responses from multiple branches when a request forks. Deployed systems are increasingly sending 18x responses reliably with an answer, establishing early sessions, so when forking is present it is _highly_ likely that there will be multiple 18x's from different branches arriving at the UA. This section should provide guidance on what to report when this happens. 14 The Failed Session Setup SRD claims to be useful in detecting problems in downstream signaling functions. Please provide some text or a reference supporting that claim. As written, this metric could be dominated by how long the called user lets his phone ring. Is that what was intended? You might consider separate treatment for 408s and for explicit decline response codes. 15 What was the motivation for making MESSAGE special in section 4.3.3. Why didn't the group instead extend the concept to measuring _any_ non-INVITE transaction (with the possible exception of CANCEL)? 16 In section 4.4, what does it mean to measure the delay in the disconnect of a failed session completion? Without a successful session completion, there can be no BYE. This section also begs the very hard to answer question about what to do when BYEs receive failure responses. It would be better to note that edge-case exists and what, if anything, the metric is going to say about it if it happens. 17 Section 4.5 is a particularly strong example of these metrics focusing on the simple telephony application. It may even be falling into the same traps that lead to trying to build fraud-resistant billing based on the time difference between an INVITE and a BYE. Some additional discussion noting that the metric doesn't capture early media and recommendation on when to give up on seeing a BYE would be useful. (Sometimes BYEs don't happen even when there is no malicious intent.) 18 Trying to use Max-Forwards to determine how many hops a request took is going to produce incorrect results in any but the most simple of network deployments (I would have expected this to be based on counting Vias with a note pointing to the discussion on the problems B2BUAs introduce). Proxies can reduce Max-Forwards by more than one. There are many implementations in the wild that cap Max-Forwards. If this metric remains as defined, you should also point out that neither endpoint can calculate it. Some third entity will have to collect information from each end to make this calculation. 19 The ratio metrics don't define (or convey) the interval that totals are taken over. Are these supposed to be "# requests received since this instance was manufactured' or "since last reboot" or "since last reset of statistics" or something else? What is the implementation supposed to report when the denominator of a ratio is 0? 20 Please add some discussion motivating why all 300s, 401, 402, and 407 are treated specially (vrs several other candidate 4xx and 6xx responses) in sections like section 4.8. Were other codes considered? If so, why were they rejected? 21 Section 4.9 seems to be implying that you can't receive a 500 class response to a reINVITE which is not true. If you want this metric to only reflect the results of initial INVITEs, more definition will be needed. 22 ISA in section 4.10 claims that 408s indicate an overloaded state in a downstream element. Overload may induce 408s, but 408s do _not_ indicate overload. Its possible to receive them just because someone is not answering a phone. 23 In section 5, why where these correlation dimensions chosen. Was the Request-URI considered? If so, why was it rejected? 24 The treatment of forking in section 6.3 is insufficient. As noted earlier, provisional messages establishing early sessions is becoming common, and there will be multiple early sessions for a given INVITE when there is forking. The recommendation to latch onto the "first" 200 (or 18x) and ignore the others only marginally works for playing media for simple telephony applications - we're seeing phones that mix or present multiple lines, and applications that go beyond basic phone calls (like file transfer) that make use of all the responses. Trying to dodge the complexity as the current section does will lead to metrics that don't reflect what the network is doing. 25 I'm a little surprised there is no discussion on privacy, particularly on profiling the usage patterns of individuals or organizations, in the security considerations section. 26 Nits: 26.1 What does it mean in section 4.3.1 for the "user" to send the first bit of a message? Suggest deleting "or user" from the sentence. 26.2 Section 4.11 has a stale internal pointer to a non-existant section 3.5 I suspect it's trying to point back into 4 somewhere. |
2010-05-06
|
07 | (System) | Sub state has been changed to AD Follow up from New Id Needed |
2010-05-06
|
05 | (System) | New version available: draft-ietf-pmol-sip-perf-metrics-05.txt |
2010-01-22
|
07 | Cullen Jennings | [Ballot Position Update] Position for Cullen Jennings has been changed to No Objection from Discuss by Cullen Jennings |
2009-10-09
|
07 | Lisa Dusseault | [Ballot Position Update] Position for Lisa Dusseault has been changed to Abstain from Discuss by Lisa Dusseault |
2009-10-09
|
07 | Lisa Dusseault | [Ballot discuss] My fundamental objection to this approach is that metrics for protocols that offer user features are hard to objectively define. Even worse, a … [Ballot discuss] My fundamental objection to this approach is that metrics for protocols that offer user features are hard to objectively define. Even worse, a protocol like SIP that has a rich set of features and architectures, has many different application use cases for which different metrics are appropriate. When we move something along the Standards Track, the message the IETF gives is "This is the way we think it should be done". That doesn't mean there can be only one protocol solving a given problem, but it does give a stamp of approval. In the case of metrics for these types of protocols, particularly when we're driving the metrics rather than documenting something that already exists and is interoperable, that is inappropriate. There is insufficient cause for this document, in particular, to go on the Standards Track. I also have trouble with the specific language in the draft. In the Abstract the draft says: "The purpose of this document is to combine a standard set of common metrics, allowing interoperable performance measurements, easing the comparison of industry implementations." Later on the draft also says "These metrics will likely be utilized in production SIP environments for providing input regarding Key Performance Indicators (KPI) and Service Level Agreement (SLA) indications; however, they may also be used for testing end-to-end SIP-based service environments. First, note that this draft by itself does not allow interoperability unless there's a standard performance monitoring protocol to transport the standardized metrics. By itself, this draft allows comparison. Second, comparing industry implementations on this basis is not necessarily the best way to compare. We don't even know if it's a good way to compare. Which metrics are most important to user experience? Aren't there some thresholds on some measures which are "good enough" and improving beyond those thresholds offers no noticeable improvement? What is the likely harm of implementations optimizing for these metrics instead of for a more holistic user experience? Informational status would make much more sense to me -- I would read that as "here is a set of metrics defined a certain way, and the definitions are for your information if you choose to use the same metrics." We look for justification for all documents to be on the Standards Track -- metrics documents or protocol specifications. This document does not, in my opinion, meet the bar. I am moving my vote to ABSTAIN. |
2009-10-09
|
07 | Lisa Dusseault | [Ballot Position Update] Position for Lisa Dusseault has been changed to Discuss from Abstain by Lisa Dusseault |
2009-10-09
|
07 | Lisa Dusseault | [Ballot discuss] My fundamental objection to this approach is that metrics for protocols that offer user features are hard to objectively define. Even worse, a … [Ballot discuss] My fundamental objection to this approach is that metrics for protocols that offer user features are hard to objectively define. Even worse, a protocol like SIP that has a rich set of features and architectures, has many different application use cases for which different metrics are appropriate. When we move something along the Standards Track, the message the IETF gives is "This is the way we think it should be done". That doesn't mean there can be only one protocol solving a given problem, but it does give a stamp of approval. In the case of metrics for these types of protocols, particularly when we're driving the metrics rather than documenting something that already exists and is interoperable, that is inappropriate. There is insufficient cause for this document, in particular, to go on the Standards Track. I also have trouble with the specific language in the draft. In the Abstract the draft says: "The purpose of this document is to combine a standard set of common metrics, allowing interoperable performance measurements, easing the comparison of industry implementations." Later on the draft also says "These metrics will likely be utilized in production SIP environments for providing input regarding Key Performance Indicators (KPI) and Service Level Agreement (SLA) indications; however, they may also be used for testing end-to-end SIP-based service environments. First, note that this draft by itself does not allow interoperability unless there's a standard performance monitoring protocol to transport the standardized metrics. By itself, this draft allows comparison. Second, comparing industry implementations on this basis is not necessarily the best way to compare. We don't even know if it's a good way to compare. Which metrics are most important to user experience? Aren't there some thresholds on some measures which are "good enough" and improving beyond those thresholds offers no noticeable improvement? What is the likely harm of implementations optimizing for these metrics instead of for a more holistic user experience? Informational status would make much more sense to me -- I would read that as "here is a set of metrics defined a certain way, and the definitions are for your information if you choose to use the same metrics." We look for justification for all documents to be on the Standards Track -- metrics documents or protocol specifications. This document does not, in my opinion, meet the bar. I am moving my vote to ABSTAIN. |
2009-10-09
|
07 | Lisa Dusseault | [Ballot discuss] My fundamental objection to this approach is that metrics for protocols that offer user features are hard to objectively define. Even worse, a … [Ballot discuss] My fundamental objection to this approach is that metrics for protocols that offer user features are hard to objectively define. Even worse, a protocol like SIP that has a rich set of features and architectures, has many different application use cases for which different metrics are appropriate. When we move something along the Standards Track, the message the IETF gives is "This is the way we think it should be done". That doesn't mean there can be only one protocol solving a given problem, but it does give a stamp of approval. In the case of metrics for these types of protocols, particularly when we're driving the metrics rather than documenting something that already exists and is interoperable, that is inappropriate. There is insufficient cause for this document, in particular, to go on the Standards Track. I also have trouble with the specific language in the draft. In the Abstract the draft says: "The purpose of this document is to combine a standard set of common metrics, allowing interoperable performance measurements, easing the comparison of industry implementations." Later on the draft also says "These metrics will likely be utilized in production SIP environments for providing input regarding Key Performance Indicators (KPI) and Service Level Agreement (SLA) indications; however, they may also be used for testing end-to-end SIP-based service environments. First, note that this draft by itself does not allow interoperability unless there's a standard performance monitoring protocol to transport the standardized metrics. By itself, this draft allows comparison. Second, comparing industry implementations on this basis is not necessarily the best way to compare. We don't even know if it's a good way to compare. Which metrics are most important to user experience? Aren't there some thresholds on some measures which are "good enough" and improving beyond those thresholds offers no noticeable improvement? What is the likely harm of implementations optimizing for these metrics instead of for a more holistic user experience? Informational status would make much more sense to me -- I would read that as "here is a set of metrics defined a certain way, and the definitions are for your information if you choose to use the same metrics." We look for justification for all documents to be on the Standards Track -- metrics documents or protocol specifications. This document does not, in my opinion, meet the bar. I am moving my vote to ABSTAIN. |
2009-10-09
|
07 | Lisa Dusseault | [Ballot Position Update] Position for Lisa Dusseault has been changed to Abstain from Discuss by Lisa Dusseault |
2009-10-08
|
07 | Samuel Weiler | Request for Last Call review by SECDIR Completed. Reviewer: Phillip Hallam-Baker. |
2009-10-08
|
07 | Cindy Morgan | State Changes to IESG Evaluation::Revised ID Needed from IESG Evaluation by Cindy Morgan |
2009-10-08
|
07 | Jari Arkko | [Ballot Position Update] New position, No Objection, has been recorded by Jari Arkko |
2009-10-07
|
07 | Ross Callon | [Ballot Position Update] New position, No Objection, has been recorded by Ross Callon |
2009-10-07
|
07 | Cullen Jennings | [Ballot comment] Meta Comment: We are still in the experimentation mode of how to bring together the operational and metrics experience of the OPS area … [Ballot comment] Meta Comment: We are still in the experimentation mode of how to bring together the operational and metrics experience of the OPS area with the SIP expertise of the RAI area. I think the ADs have some work to do to see if what improvements can be made |
2009-10-07
|
07 | Cullen Jennings | [Ballot discuss] On some of these, it seems like there were multiple metrics that ended up with same name but were actually different - for … [Ballot discuss] On some of these, it seems like there were multiple metrics that ended up with same name but were actually different - for example, SRD in the failure case and SRD in the success case. Several metrics seem like they would be messed up by HERFP and forking. By messed up I mean not useful for anything. In general I think on all the metrics I would have liked to have a better idea of how they can be used not just collected. As far as I understand 4.4.2, it measures timer F which is compiled into the code so seems pretty useless to measure. In the second example in 4.4.2, when UA2 sends a BYE, UA2 is the UAC not the UAS. SDT seems undefined when answer sends the BYE. Need to consider things like Bye ALSO, reinvites, REFERS and more. HpR. I don't think this approach has worked out well on operational networks. I see more people doing VIA counting at the UAS. One of the many problems with the approach is, well you have to be at both ends, and if the UAC starts with say 30 (fairly common) and then some B2BUA in the middle resets to 70 (insane but common and what a strict read of 3261 might say you SHOULD do) you are going to probably end up with a negative hop count. SER. This looks very wrong. On a interface where all invites are challenged (very common for anything offering long distance) this is going to be under 50%. SEER - the way this is defined you end up with a divide by zero on interfaces that are redirecting. SDF - I really have no idea how to implement this in an way that gets consistent numbers. It not clear what I look for in the Reason to decide I increment the SDF counter or not. SCR - I did not understand why this one needed a proxy. I suspect I don't understand what is being counted. The SSR does not seem right. If you had lots of 503s, it seems like the the SSR could go negative. Section 6.3. totally agree forking SHOULD be considered. In fact I think forking MUST be considered but we the spec needs to help the implementors know what to do. Take for example a case where an invite forks to 3 phones that all start ringing then one of the is answered. From a dialog point of view, 1/3 of the dialogs worked and the others failed to have a call. From a user point of view the call was a success. Treating this like a transit ISDN network is not going to get the metrics that are useful for a SLA. Overall, I think that if we backed up to the 10,000 foot level and asked what problem are we going to solve with metrics and what metrics do we need, it would be much clearer how to evaluate if these metrics worked or not. |
2009-10-07
|
07 | Cullen Jennings | [Ballot Position Update] New position, Discuss, has been recorded by Cullen Jennings |
2009-10-07
|
07 | Amy Vezza | State Changes to IESG Evaluation from IESG Evaluation - Defer by Amy Vezza |
2009-10-07
|
07 | Tim Polk | [Ballot comment] I support Lisa's and Robert's discusses... I believe that publication as Informational would be a reasonable path forward, especially if supported by additional … [Ballot comment] I support Lisa's and Robert's discusses... I believe that publication as Informational would be a reasonable path forward, especially if supported by additional text describing the scope and origins of the metrics. |
2009-10-07
|
07 | Tim Polk | [Ballot Position Update] New position, No Objection, has been recorded by Tim Polk |
2009-10-07
|
07 | Lars Eggert | [Ballot discuss] Section 12.1., paragraph 3: > [GR-512] Telcordia, "LSSGR: Reliability, Section 12", GR-512- > CORE Issue 2, … [Ballot discuss] Section 12.1., paragraph 3: > [GR-512] Telcordia, "LSSGR: Reliability, Section 12", GR-512- > CORE Issue 2, January 1998. DISCUSS: Is this a standard by another SDO? (In any event, I believe it can be made in Informative reference, because it simply points to the source from where a metric/test was borrowed.) |
2009-10-07
|
07 | Lars Eggert | [Ballot Position Update] New position, Discuss, has been recorded by Lars Eggert |
2009-10-06
|
07 | Ron Bonica | [Ballot Position Update] New position, No Objection, has been recorded by Ron Bonica |
2009-10-06
|
07 | Alexey Melnikov | [Ballot Position Update] New position, No Objection, has been recorded by Alexey Melnikov |
2009-10-06
|
07 | Robert Sparks | [Ballot discuss] I have several concerns with this document that are captured in the message went to the PMOL list at |
2009-10-06
|
07 | Russ Housley | [Ballot Position Update] New position, No Objection, has been recorded by Russ Housley |
2009-10-06
|
07 | Robert Sparks | [Ballot Position Update] New position, Discuss, has been recorded by Robert Sparks |
2009-10-06
|
07 | Ralph Droms | [Ballot Position Update] New position, No Objection, has been recorded by Ralph Droms |
2009-10-05
|
07 | Lisa Dusseault | [Ballot discuss] I would like to talk about the competitive or anti-competitive nature of metrics as well as the meaning of putting these on the … [Ballot discuss] I would like to talk about the competitive or anti-competitive nature of metrics as well as the meaning of putting these on the Standards Track. In the Abstract the draft says: "The purpose of this document is to combine a standard set of common metrics, allowing interoperable performance measurements, easing the comparison of industry implementations." Later on the draft also says "These metrics will likely be utilized in production SIP environments for providing input regarding Key Performance Indicators (KPI) and Service Level Agreement (SLA) indications; however, they may also be used for testing end-to-end SIP-based service environments. First, note that this draft by itself does not allow interoperability unless there's a standard performance monitoring protocol to transport the standardized metrics. By itself, this draft allows comparison. Second, comparing industry implementations on this basis is not necessarily the best way to compare. We don't even know if it's a good way to compare. Which metrics are most important to user experience? Aren't there some thresholds on some measures which are "good enough" and improving beyond those thresholds offers no noticeable improvement? I don't see how we have enough confidence to call this a Proposed Standard at this point; Informational would make much more sense to me -- I would read that as "here is a set of metrics defined a certain way, and the definitions are for your information if you choose to use the same metrics." What is the likely harm of implementations optimizing for these metrics instead of for a more holistic user experience? |
2009-10-05
|
07 | Lisa Dusseault | [Ballot Position Update] New position, Discuss, has been recorded by Lisa Dusseault |
2009-09-28
|
07 | Pasi Eronen | [Ballot Position Update] New position, No Objection, has been recorded by Pasi Eronen |
2009-09-21
|
07 | Robert Sparks | Telechat date was changed to 2009-09-24 from 2009-10-08 by Robert Sparks |
2009-09-21
|
07 | Robert Sparks | Telechat date was changed to 2009-10-08 from 2009-09-24 by Robert Sparks |
2009-09-21
|
07 | Robert Sparks | State Changes to IESG Evaluation - Defer from IESG Evaluation by Robert Sparks |
2009-09-21
|
07 | Robert Sparks | [Note]: 'Vijay Gurbani is the PROTO-shepherd' added by Robert Sparks |
2009-09-15
|
07 | Dan Romascanu | [Ballot Position Update] New position, Yes, has been recorded for Dan Romascanu |
2009-09-15
|
07 | Dan Romascanu | Ballot has been issued by Dan Romascanu |
2009-09-15
|
07 | Dan Romascanu | Created "Approve" ballot |
2009-09-15
|
07 | Dan Romascanu | State Changes to IESG Evaluation from Waiting for AD Go-Ahead by Dan Romascanu |
2009-09-15
|
07 | Dan Romascanu | Placed on agenda for telechat - 2009-09-24 by Dan Romascanu |
2009-09-09
|
04 | (System) | New version available: draft-ietf-pmol-sip-perf-metrics-04.txt |
2009-09-01
|
07 | Dan Romascanu | waiting for the editori to address the comments in the Gen-ART review by Suresh Krishnan [suresh.krishnan@ericsson.com] |
2009-08-18
|
07 | (System) | State has been changed to Waiting for AD Go-Ahead from In Last Call by system |
2009-08-14
|
07 | Amanda Baber | IANA comments: As described in the IANA Considerations section, we understand this document to have NO IANA Actions. |
2009-08-06
|
07 | Samuel Weiler | Request for Last Call review by SECDIR is assigned to Phillip Hallam-Baker |
2009-08-06
|
07 | Samuel Weiler | Request for Last Call review by SECDIR is assigned to Phillip Hallam-Baker |
2009-08-04
|
07 | Amy Vezza | Last call sent |
2009-08-04
|
07 | Amy Vezza | State Changes to In Last Call from Last Call Requested by Amy Vezza |
2009-08-04
|
07 | Dan Romascanu | State Changes to Last Call Requested from AD Evaluation by Dan Romascanu |
2009-08-04
|
07 | Dan Romascanu | Last Call was requested by Dan Romascanu |
2009-08-04
|
07 | (System) | Ballot writeup text was added |
2009-08-04
|
07 | (System) | Last call text was added |
2009-08-04
|
07 | (System) | Ballot approval text was added |
2009-07-22
|
07 | Dan Romascanu | Document write-up by Vijay Gurbani: This is a publication request for SIP End-to-End Performance Metrics http://tools.ietf.org/html/draft-ietf-pmol-sip-perf-metrics-03 as a STANDARDS TRACK RFC. (1.a) Who … Document write-up by Vijay Gurbani: This is a publication request for SIP End-to-End Performance Metrics http://tools.ietf.org/html/draft-ietf-pmol-sip-perf-metrics-03 as a STANDARDS TRACK RFC. (1.a) Who is the Document Shepherd for this document? Has the Document Shepherd personally reviewed this version of the document and, in particular, does he or she believe this version is ready for forwarding to the IESG for publication? Vijay K. Gurbani is the Document Shepherd. (1.b) Has the document had adequate review both from key WG members and from key non-WG members? Does the Document Shepherd have any concerns about the depth or breadth of the reviews that have been performed? This document has been reviewed by many participants of the SIP, SIPPING, PMOL and BMWG working groups over the last 3 years. (1.c) Does the Document Shepherd have concerns that the document needs more review from a particular or broader perspective, e.g., security, operational complexity, someone familiar with AAA, internationalization or XML? There does not exist any basis for concerns regarding more review. The draft does not introduce a new protocol or any new headers that may be amenable to a security review, nor does it introduce any machinations that may cause operational complexity. (1.d) Does the Document Shepherd have any specific concerns or issues with this document that the Responsible Area Director and/or the IESG should be aware of? For example, perhaps he or she is uncomfortable with certain parts of the document, or has concerns whether there really is a need for it. In any event, if the WG has discussed those issues and has indicated that it still wishes to advance the document, detail those concerns here. Has an IPR disclosure related to this document been filed? If so, please include a reference to the disclosure and summarize the WG discussion and conclusion on this issue. There are no specific issues or concerns with this document. There are no IPR filings, and no one has mentioned IPR related to this draft during the life of the draft. (1.e) How solid is the WG consensus behind this document? Does it represent the strong concurrence of a few individuals, with others being silent, or does the WG as a whole understand and agree with it? After one WGLC with many comments and several reviews that followed, the recent WGLC ended quietly. The PMOL WGLC request for this draft was cross-posted to the SIPPING WG as well. (1.f) Has anyone threatened an appeal or otherwise indicated extreme discontent? If so, please summarise the areas of conflict in separate email messages to the Responsible Area Director. (It should be in a separate email because this questionnaire is entered into the ID Tracker.) No. (1.g) Has the Document Shepherd personally verified that the document satisfies all ID nits? (See http://www.ietf.org/ID-Checklist.html and http://tools.ietf.org/tools/idnits/). Boilerplate checks are not enough; this check needs to be thorough. Has the document met all formal review criteria it needs to, such as the MIB Doctor, media type and URI type reviews? The nits check indicates one false alarm. http://tools.ietf.org/idnits?url=http://tools.ietf.org/id/draft-ietf-pmol-sip-perf-metrics-03.txt (1.h) Has the document split its references into normative and informative? Are there normative references to documents that are not ready for advancement or are otherwise in an unclear state? If such normative references exist, what is the strategy for their completion? Are there normative references that are downward references, as described in [RFC3967]? If so, list these downward references to support the Area Director in the Last Call procedure for them [RFC3967]. The references are split, and there are no down-references. (1.i) Has the Document Shepherd verified that the document IANA consideration section exists and is consistent with the body of the document? If the document specifies protocol extensions, are reservations requested in appropriate IANA registries? Are the IANA registries clearly identified? If the document creates a new registry, does it define the proposed initial contents of the registry and an allocation procedure for future registrations? Does it suggest a reasonable name for the new registry? See [RFC5226]. If the document describes an Expert Review process has Shepherd conferred with the Responsible Area Director so that the IESG can appoint the needed Expert during the IESG Evaluation? There are no IANA considerations needed, and this is indicated. (1.j) Has the Document Shepherd verified that sections of the document that are written in a formal language, such as XML code, BNF rules, MIB definitions, etc., validate correctly in an automated checker? Not applicable. (1.k) The IESG approval announcement includes a Document Announcement Write-Up. Please provide such a Document Announcement Write-Up? Recent examples can be found in the "Action" announcements for approved documents. The approval announcement contains the following sections: Technical Summary SIP has become a widely-used standard among many service providers, vendors, and end users. Although there are many different standards for measuring the performance of signaling protocols, none of them specifically address SIP. The scope of this document is limited to the definitions of a standard set of metrics for measuring and reporting SIP performance from an end-to-end perspective. The metrics introduce a common foundation for understanding and quantifying performance expectations between service providers, vendors, and the users of services based on SIP. The intended audience for this document can be found among network operators, who often collect information on the responsiveness of the network to customer requests for services. Working Group Summary Working Group Consensus was smoothly achieved. Document Quality Are there existing implementations of the protocol? Have a significant number of vendors indicated their plan to implement the specification? Are there any reviewers that merit special mention as having done a thorough review, e.g., one that resulted in important changes or a conclusion that the document had no substantive issues? If there was a MIB Doctor, Media Type or other expert review, what was its course (briefly)? In the case of a Media Type review, on what date was the request posted? There are several implementations of earlier versions of the I-D, based on contacts with the authors. For example, Sipana is a distributed SIP analyzer to monitor the SIP signaling behavior, uses the many of the SIP metrics: http://code.google.com/p/sipana/ The Contributors and Acknowledgements sections list many of the reviewers who deserve mention: Carol Davids, Marian Delkinov, Adam Uzelac, Jean-Francois Mule, Rich Terpstra, John Hearty and Dean Bayless. |
2009-07-22
|
07 | Dan Romascanu | [Note]: 'Vijay Gurbani is the PROTO-shepherd' added by Dan Romascanu |
2009-07-22
|
07 | Dan Romascanu | Draft Added by Dan Romascanu in state AD Evaluation |
2009-03-06
|
03 | (System) | New version available: draft-ietf-pmol-sip-perf-metrics-03.txt |
2008-11-01
|
02 | (System) | New version available: draft-ietf-pmol-sip-perf-metrics-02.txt |
2008-06-26
|
01 | (System) | New version available: draft-ietf-pmol-sip-perf-metrics-01.txt |
2008-02-29
|
00 | (System) | New version available: draft-ietf-pmol-sip-perf-metrics-00.txt |