Problems Identified Associated with the Session Initiation Protocol's (SIP) Non-INVITE Transaction
draft-sparks-sip-nit-problems-02
The information below is for an old version of the document that is already published as an RFC.
| Document | Type |
This is an older version of an Internet-Draft that was ultimately published as RFC 4321.
|
|
|---|---|---|---|
| Author | Robert Sparks | ||
| Last updated | 2015-10-14 (Latest revision 2005-01-04) | ||
| RFC stream | Internet Engineering Task Force (IETF) | ||
| Intended RFC status | Informational | ||
| Formats | |||
| Additional resources | Mailing list discussion | ||
| Stream | WG state | (None) | |
| Document shepherd | (None) | ||
| IESG | IESG state | Became RFC 4321 (Informational) | |
| Action Holders |
(None)
|
||
| Consensus boilerplate | Unknown | ||
| Telechat date | (None) | ||
| Responsible AD | Allison J. Mankin | ||
| Send notices to | rohan@ekabal.com, rsparks@nostrum.com |
draft-sparks-sip-nit-problems-02
Network Working Group R. Sparks
Internet-Draft Xten
Expires: July 2, 2005 Jan 2005
Problems identified associated with the Session Initiation Protocol's
non-INVITE Transaction
draft-sparks-sip-nit-problems-02
Status of this Memo
This document is an Internet-Draft and is subject to all provisions
of section 3 of RFC 3667. By submitting this Internet-Draft, each
author represents that any applicable patent or other IPR claims of
which he or she is aware have been or will be disclosed, and any of
which he or she become aware will be disclosed, in accordance with
RFC 3668.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on July 2, 2005.
Copyright Notice
Copyright (C) The Internet Society (2005).
Abstract
This draft describes several problems that have been identified with
the Session Initiation Protocol's non-INVITE transaction.
Sparks Expires July 2, 2005 [Page 1]
Internet-Draft SIP non-INVITE Problems Jan 2005
Table of Contents
1. Problems under the current specifications . . . . . . . . . . 3
1.1 NITs must complete immediately or risk losing a race . . . 3
1.2 Provisional responses can delay recovery from lost
final responses . . . . . . . . . . . . . . . . . . . . . 4
1.3 Delayed responses will temporarily blacklist an element . 5
1.4 408 for non-INVITE is not useful . . . . . . . . . . . . . 7
1.5 Non-INVITE timeouts doom forking proxies . . . . . . . . . 8
1.6 Mismatched timer values make winning the race harder . . . 8
2. Security Considerations . . . . . . . . . . . . . . . . . . . 9
3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
4. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9
5. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Author's Address . . . . . . . . . . . . . . . . . . . . . . . 9
Intellectual Property and Copyright Statements . . . . . . . . 10
Sparks Expires July 2, 2005 [Page 2]
Internet-Draft SIP non-INVITE Problems Jan 2005
1. Problems under the current specifications
There are a number of unpleasant edge conditions created by the SIP
non-INVITE transaction (NIT) model's fixed duration. The negative
aspects of some of these are exacerbated by the effect provisional
responses have on the non-INVITE transaction state machines as
currently defined.
1.1 NITs must complete immediately or risk losing a race
The non-INVITE transaction defined in RFC 3261 [1] is designed to
have a fixed and finite duration (dependent on T1). A consequence of
this design is that participants must strive to complete the
transaction as quickly as possible. Consider the race condition
shown in Figure 1.
UAC UAS
| request |
--- |---. |
^ | `---. |
| | `-->| ---
| | | ^
| | | |
64*T1 | | |
| | | |
| | | 64*T1
| | | |
| | | |
v | | |
timeout <=== --- | 200 OK | |
| .---| v
| .---' | ---
|<--' |
Figure 1: NI Race Condition
The User Agent Server (UAS) in this figure believes it has responded
to the request in time, and that the request succeeded. The User
Agent Client (UAC), on the other hand, believes the request has
timed-out, hence failed. No longer having a matching client
transaction, the UAC core will ignore what it believes to be a
spurious response. As far as the UAC is concerned, it received no
response at all to its request. The ultimate result is the UAS and
UAC have conflicting views of the outcome of the transaction.
Sparks Expires July 2, 2005 [Page 3]
Internet-Draft SIP non-INVITE Problems Jan 2005
Therefore, a UAS cannot wait until the last possible moment to send a
final response within a NIT. It must, instead, send its response so
that it will arrive at the UAC before that UAC times out.
Unfortunately, the UAS has no way to accurately measure the
propagation time of the request or predict the propagation time of
the response. The uncertainty it faces is compounded by each proxy
that participates in the transaction. Thus, the UAS's only choice is
to send its final response as soon as it possibly can and hope for
the best.
This result constrains the set of problems that can be solved with a
single NIT. Any delay introduced during processing of a request
increases the probability of losing the race. If the timing
characteristics of that processing are not predictable and
controllable, a single NIT is an inappropriate model for handling the
request. One viable alternative is to accept the request with a 202
and send the ultimate results in a new request in the reciprocal
direction.
In specialized networks, a UAS might have some reliable knowledge of
inter-hop latency and could use that knowledge to determine if it has
time to delay its final response in order to perform some processing
such as a database lookup while mitigating its risk of losing the
race in Figure 1. Establishing this knowledge across arbitrary
networks (perhaps using resource reservation techniques and
deterministic transports) is not currently feasible.
1.2 Provisional responses can delay recovery from lost final responses
The non-INVITE client transaction state machine provides reliability
for NITs over unreliable transports (UDP) through retransmission of
the request message. Timer E is set to T1 when a request is
initially transmitted. As long as the machine remains in the Trying
state, each time Timer E fires, it will be reset to twice its
previous value (capping at T2) and the request is retransmitted.
If the non-INVITE client transaction state machine sees a provisional
response, it transitions to the Proceeding state, where
retransmission continues, but the algorithm for resetting Timer E is
simply to use T2 instead of doubling at each firing. (Note that
Timer E is not altered during the transition to Proceeding).
Making the transition to the Proceeding state before Timer E is reset
to T2 can cause recovery from a lost final response to take extra
time. Figure 2 shows recovery from a lost final response with and
without a provisional message during this window. Recovery occurs
within 2*T1 in the case without the provisional. With the
provisional, recovery is delayed until T2, which by default is 8*T1.
Sparks Expires July 2, 2005 [Page 4]
Internet-Draft SIP non-INVITE Problems Jan 2005
In practical terms, a provisional response to a NIT in currently
deployed networks can delay transaction completion by up to 3.5
seconds.
UAC UAS UAC UAS
| | | |
--- |----. | --- |----. |
^ | `-->| ^ | `--->|
E = T1 | | E = T1 | .-----|(provisional)
v | | v |<--' |
--- |----. | --- |----. |
^ | `-->| ^ | `--->|
| | X<----|(lost final) | | X<-----|(lost final)
| | | | | |
E = 2*T1 | | | | |
| | | | | |
| | | | | |
v | | | | |
--- |----. | | | |
| `-->| | | |
| .-----|(final) | | |
|<-' | | | |
| | | | |
\/\ /\/ /\/ /\/ /\/
E = T2
\/\ /\/ /\/ /\/ /\/
| | | | |
| | v | |
| | --- |----. |
| | | `--->|
| | | .-----|(final)
| | |<--' |
| | | |
Figure 2: Provisionals can harm recovery
No additional delay is introduced if the first provisional response
is received after Timer E has reached its maximum reset interval of
T2.
1.3 Delayed responses will temporarily blacklist an element
A SIP element's use of DNS SRV Resource Records [3] is specified in
RFC 3263 [2]. That specification discusses how SIP assures high
availability by having upstream elements detect failure of downstream
elements. It proceeds to define several types of failure detection
Sparks Expires July 2, 2005 [Page 5]
Internet-Draft SIP non-INVITE Problems Jan 2005
and instructions for failover. Two of the behaviors it describes are
important to this document:
o Within a transaction, transport failure is detected either through
an explicit report from the transport layer or through timeout.
Note specifically that timeout will indicates transport failure
regardless of the transport in use. When transport failure is
detected, the request is retried at the next element from the
sorted results of the SRV query.
o Between transactions, locations reporting temporary failure
(through 503/Retry-After for example) are not used until their
requested black-out period expires.
The specification notes the benefit of caching locations that are
successfully contacted, but does not discuss how such a cache is
maintained. It is unclear whether an element should stop using
(temporarily blacklist) a location returned in the SRV query that
results in a transport error. If it does, when should such a
location be removed from the blacklist?
Without such a blacklist (or equivalent mechanism), the intended
availability mechanism fails miserably. Consider traffic between two
domains. Proxy pA in domain A needs to forward a sequence of
non-INVITE requests to domain B. Through DNS SRV, pA discovers pB1
and pB2, and the ordering rules of [2] and [3] indicate it should use
pB1 first. The first request to pB1 times out. Since pA is a proxy
and a NIT has a fixed duration, pA has no opportunity to retry the
request at pB2. If pA does not remember pB1's failure, the second
request (and all subsequent non-INVITE requests until pB1 recovers)
are doomed to the same failure. Caching would allow the subsequent
requests to be tried at pB2.
Since miserable failure is not acceptable in deployed networks, we
should anticipate that elements will, in fact, cache timeout failures
between transactions. Then the race in Figure 1 becomes important.
If an element fails to respond "soon enough", it has effectively not
responded at all, and will be blacklisted at its peer for some period
of time.
(Note that even with caching, the first request timeout results in a
timeout failure all the way back to the original submitter. The
failover mechanisms in [2] work well to increase the resiliency of a
given INVITE transaction, but do nothing for a given non-INVITE
transaction.)
Sparks Expires July 2, 2005 [Page 6]
Internet-Draft SIP non-INVITE Problems Jan 2005
1.4 408 for non-INVITE is not useful
Consider the race condition in Figure 1 when the final response is
408 instead of 200. Under the current specification, the race is
guaranteed to be lost. Most existing endpoints will emit a 408 for a
non-INVITE request 64*T1 after receiving the request if they haven't
emitted an earlier final response. Such a 408 is guaranteed to
arrive at the next upstream element too late to be useful. In fact,
in the presence of proxies, these messages are even harmful. When
the 408 arrives, each proxy will have already terminated its
associated client transaction due to timeout. So, each proxy must
forward the 408 upstream statelessly. This, in turn, is guaranteed
to arrive too late. As Figure 3 shows, this can ultimately result
in bombarding the original requester with spurious 408s. (Note that
the proxy's client transaction state machine never enters the
Completed state, so Timer K does not enter into play).
UAC P1 P2 P3 UAS
| | | | |
--- ===---. | | | |
^ | `-->===---. | | |
| | | `-->===---. | |
| | | | `-->===---. |
64*T1 | | | | `-->===
| | | | | |
| | | | | |
v | | | | |
(timeout) --- === | | | |
| .-408=== | | |
|<--' | .-408=== | |
| .-408-|<--' | .-408=== |
|<--' | .-408-|<--' | .-408===
| .-408-|<--' | .-408-|<--' |
|<--' | .-408-|<--' | |
| .-408-|<--' | | |
|<--' | | | |
| | | | |
Figure 3: late 408s to non-INVITEs
This response bombardment is not limited to the 408 response, though
it only exists when participating client transaction state machines
are timing out. Figure 4 generalizes Figure 1 to include multiple
hops. Note that even though the UAS responds "in time" to P3, the
Sparks Expires July 2, 2005 [Page 7]
Internet-Draft SIP non-INVITE Problems Jan 2005
response is too late for P2, P1 and the UAC.
UAC P1 P2 P3 UAS
| | | | |
--- ===---. | | | |
^ | `-->===---. | | |
| | | `-->===---. | |
| | | | `-->===---. |
64*T1 | | | | `-->===
| | | | | |
| | | | | |
v | | | | |
(timeout) --- === | | | |
| .-408=== | | .-200-|
|<--' | .-408=== .-200-|<--' |
| .-408-|<--'.-200-|<--' === |
|<--'.-200-|<--' | | ===
|<--' | | | |
| | | | |
Figure 4: Additional timeout related error
1.5 Non-INVITE timeouts doom forking proxies
A single branch with a delayed or missing final response will
dominate the processing at proxy that receives no 2xx responses to a
forked non-INVITE request. Since this proxy is required to allow all
of its client transactions to terminate before choosing a "best
response". This forces the proxy's server transaction to lose the
race in Figure 1. Any response it ultimately forwards (a 401 for
example) will arrive at the upstream elements too late to be used.
Thus, if no element among the branches would return a 2xx response,
failure of a single element (or its transport) dooms the proxy to
failure.
1.6 Mismatched timer values make winning the race harder
There are many failure scenarios due to misconfiguration or
misbehavior that the SIP specification does not discuss. One is
placing two elements with different configured values for T1 and T2
on the same network. Review of Figure 1 illustrates that the race
failure is only made more likely in this misconfigured state (it may
appear that shortening T1 at the element behaving as a UAS improves
this particular situation, but remember that these elements may trade
Sparks Expires July 2, 2005 [Page 8]
Internet-Draft SIP non-INVITE Problems Jan 2005
roles on the next request). Since the protocol provides no mechanism
for discovering/negotiating a peer's timer values, exceptional care
must be taken when deploying systems with non-defaults to ensure they
will _never_ directly communicate with elements with default values.
2. Security Considerations
This document describes problems with the SIP non-INVITE transaction,
including mentioning potential security vulnerabilities. It does not
make any changes to the SIP protocol.
3. IANA Considerations
This document requires no action by IANA.
4. Acknowledgments
This document captures many conversations about non-INVITE issues.
Significant contributers include Ben Campbell, Gonzalo Camarillo,
Steve Donovan, Rohan Mahy, Dan Petrie, Adam Roach, Jonathan
Rosenberg, and Dean Willis.
5 References
[1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.,
Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP:
Session Initiation Protocol", RFC 3261, June 2002.
[2] Rosenberg, J. and H. Schulzrinne, "Session Initiation Protocol
(SIP): Locating SIP Servers", RFC 3263, June 2002.
[3] Gulbrandsen, A., Vixie, P. and L. Esibov, "A DNS RR for
specifying the location of services (DNS SRV)", RFC 2782,
February 2000.
Author's Address
Robert J. Sparks
Xten
5100 Tennyson Parkway
Suite 1000
Plano, TX 75024
EMail: rsparks@xten.com
Sparks Expires July 2, 2005 [Page 9]
Internet-Draft SIP non-INVITE Problems Jan 2005
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2005). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Sparks Expires July 2, 2005 [Page 10]