Internet Engineering Task Force SIPPING WG
Internet Draft J.Rosenberg
dynamicsoft
draft-rosenberg-sipping-session-policy-00.txt
May 2, 2002
Expires: November 2002
Supporting Intermediary Session Policies in SIP
STATUS OF THIS MEMO
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress".
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
To view the list Internet-Draft Shadow Directories, see
http://www.ietf.org/shadow.html.
Abstract
The Session Initiation Protocol (SIP) was designed to support
establishment and maintenance of end-to-end sessions. Proxy servers
provide call routing, authentication and authorization, mobility, and
other signaling services that are independent of the session.
Effectively, proxies provide signaling policy enforcement. However,
numerous scenarios have arisen which require the involvement of
proxies in some aspect of the session policy. SIP has no support for
such capabilities, as the community has generally considering
involvement of proxies in session details "evil". Practical
implementations have therefore resorted to non-standard manipulation
of SDP messages in order to enforce session policy. These
implementations are fragile and frought with problems. In this
document, we discuss a middle-ground approach which permits proxies
limited involvement in session policy, but retains the robustness
that derives from the current prohibition on SDP manipulation.
J.Rosenberg [Page 1]
Internet Draft Session Policy May 2, 2002
Table of Contents
1 Introduction ........................................ 3
2 Problems with Existing Situation .................... 4
3 Requirements ........................................ 5
4 Solution Framework .................................. 6
5 Supporting Media Intermediaries ..................... 10
5.1 Media-Stream Header ................................. 10
5.2 Media-Middlebox Header .............................. 11
5.3 Reverse-MM-Policy ................................... 12
5.4 UAC Behavior ........................................ 12
5.4.1 Generating the Request .............................. 12
5.4.2 Processing the Response ............................. 13
5.5 UAS Behavior ........................................ 14
5.5.1 Receiving the INVITE or UPDATE ...................... 14
5.5.2 Receiving the ACK ................................... 14
5.6 Proxy Behavior ...................................... 15
5.6.1 Receiving a Request ................................. 15
5.6.2 Receiving a Response ................................ 15
6 Example Call Flows .................................. 15
6.1 Example I: IP-in-IP NAT ............................. 15
6.2 Example II: Traditional MIDCOM ...................... 20
6.3 Example III: SIP Message Sessions ................... 24
7 Author's Addresses .................................. 27
8 Normative References ................................ 27
9 Informative References .............................. 28
J.Rosenberg [Page 2]
Internet Draft Session Policy May 2, 2002
1 Introduction
The Session Initiation Protocol (SIP) [1] was designed to support
establishment and maintenance of end-to-end sessions. Proxy servers
provide call routing, authentication and authorization, mobility, and
other signaling services that are independent of the session.
Effectively, proxies provide signaling policy enforcement. However,
numerous scenarios have arisen which require the involvement of
proxies in some aspect of the session policy. One scenario is in the
traversal of a firewall or NAT. The midcom group has defined a
framework for control of firewalls and NATs (generically,
middleboxes) [4]. In this model, a midcom agent, typically a proxy
server, interacts with the middlebox to open and close media
pinholes, obtain NAT bindings, and so on. In this role as a midcom
agent, the proxy will need to examine and possibly modify the session
description in the body of the SIP message. This modification is to
achieve a specific policy objective: to force the media to route
through an intermediary.
In another application, SIP is used in a wireless network. The
network provider has limited resources for media traffic. During
periods of high activity, the provider would like to restrict codec
usage on the network to lower rate codecs. In existing approaches
used in 3gpp, this is accomplished by having the proxies edit the SDP
in the body, removing the higher rate codecs.
In yet a third application, SIP is used in a network that has
gateways which support a single codec type (say, G.729). When
communicating with a partner network that uses gateways with a
different codec (say, G.723), the network modifies the SDP to route
the session through a converter that changes the G.729 to G.723.
All three applications require the proxies to examine, and/or
manipulate the content of the session description in the body of SIP.
However, such manipulation is forbidden by SIP proxies. It does not
work when end-to-end encryption is applied. It introduces additional
failure modes and fate sharing. It creates potential performance
bottlenecks. There are other problems.
Our solution is to introduce into SIP a framework that allows proxy
servers to request media-level policy operations from user agents. In
section 2, we discuss the problems associated with the manipulation
of bodies by proxies, which have resulted in the prohibition from
doing so in bis. In Section 3 we introduce requirements for a
solution. In Section 4 we present our proposed framework. In Section
5 we present a SIP extension based on this framework, which allows
for the insertion of intermediaries on the media path.
J.Rosenberg [Page 3]
Internet Draft Session Policy May 2, 2002
2 Problems with Existing Situation
The bis specification explicitly disallows proxy servers from
manipulating the content of bodies. This is at odds with the common
industry practice of extensive manipulation of bodies by proxies.
Although a common practice, it is at odds with the SIP specification
for many reasons:
End-to-End Encryption: SIP uses S/MIME to support end-to-end
security security features. Authentication, message
integrity, and encryption are provided. The encryption
capabilities are important for end-to-end privacy services,
for example. The end-to-end message integrity and
authentication are important for preventing numerous
attacks, including theft of calls, eavesdropping attacks,
and so on. If end-to-end authentication is used, any
manipulation of the body will cause the message integrity
check to fail. If end-to-end encryption is used, the proxy
won't even be able to look at the SDP to modify it. In this
case, media may not function, and the call will fail.
Require Processing: A UA may require that an extension be
applied to the SDP body. This is accomplished by including
a Require header in the SIP message. Proxies do not look at
such headers. If the proxy processes the SDP without
understanding the extension, it may improperly modify the
SDP, resulting in a call failure.
Consent: Ultimately, end users need to be in control of the
media they send. If a user makes a call through a SIP
network, they have the expectation that their media is
delivered to the recipient. By having proxies modify the
SDP in some way, they act in ways outside of expected
behavior of the system.
Future Proofing: One of the benefits of the SIP architecture is
that only the endpoints need to understand sessions,
session descriptions, bodies, and so on. This facilitates
the use of proxy networks to provide communications
services for future session types, such as games and
messaging. However, if proxies require an understanding of
session types and session descriptions, the SIP network
becomes locked in to providing features for a particular
set of session types. If a new session description
protocol, such as SDPng [5], were introduced, calls would
not function even though the endpoints support SDPng.
Furthermore, it would be hard to determine why it did not
function, since the failure would occur transparently in
J.Rosenberg [Page 4]
Internet Draft Session Policy May 2, 2002
some proxy in the middle of the network.
Robustness: Having a proxy manipulate the body introduces a host
of new failure modes into the network. Firstly, the proxy
itself will need to have state in some form in order to
properly manipulate the SDP. This means that, should the
proxy fail, the call may not be able to continue. Secondly,
proxies typically won't enforce the media policy. Rather,
they leave that to some media middlebox somewhere on the
media path. This media middlebox may fail as well. Since
the user does not know of its existence, they may not be
able to detect this failure or retry the media path around
it.
Scalability: One of the reasons SIP scales so well is that
proxies don't have to be aware of the details of the
sessions being established through them. If a proxy needs
to examine and/or manipulate session descriptions, this
could require many additional processing steps. The proxy
may need to traverse a multi-part body to find the SDP, in
the case of SIP-T [6]. The proxy will need to parse,
modify, and possibly re-serialize the session description.
All of this requires additional processing that worsens the
performance of the proxies.
We note that many of these problems are similar to those pointed out
by the IAB regarding Open Pluggable Exchange Services (OPES) [7].
Indeed, the problems are similar. Both have to do with the
involvement of intermediaries in manipulation of end-to-end content.
Here, the content is not in the body itself, but is a session
described by the body.
We believe a better solution is needed.
3 Requirements
In this section, we provide a set of requirements for solving this
problem.
1. The solution should allow proxies to request specific media
policies. At the least, these policies include insertion of
intermediaries for firewall and NAT traversal, and
modification of the codec set.
2. The solution should work even with end-to-end encryption
and end-to-end authentication enabled.
3. The solution should not force a proxy to violate the SIP
J.Rosenberg [Page 5]
Internet Draft Session Policy May 2, 2002
specification.
4. The solution should not require substantial processing
burden on the proxies.
5. The solution should support an explicit consent model, so
that end users are aware of, and explicitly authorize, the
media policies requested by proxies.
6. The solution should not require proxies to understand a
specific type of session description (i.e., SDP or SDPng).
7. The solution should allow end systems to detect, and route
around, failures of media enforcement points.
8. The solution must not require that the SIP elements be in
the same administrative domain as the media processing
elements.
9. The solution should support the addition of new media
policy functions in the future.
4 Solution Framework
Our solution is based on extending the existing Record-Route/Route
metholodology to media processing. Effectively, record routing is an
expression of a proxies desire for signaling policy - namely, the
inclusion of a signaling intermediary. Each proxy makes an
independent policy request. These are added to the message, and
passed to the end system. The end system is explicitly aware of the
set of intermediate proxies on the call path. The proxy elements need
not store this route as state. It is stored in the end systems, and
pushed back into the network in Route headers.
These is exactly the same thing we want to happen, but for session
attributes.
The basic model for the framework is shown in Figure 1. In this
model, the caller (UA 1) sends an INVITE request. This request
contains a set of Media Interface Objects (MIO). Each MIO is a
description of a media aspect of the session being set up by the
caller. For example, there might be an MIO for each the IP addresses
and ports for each media stream, and an MIO for the set of codecs in
each stream. The caller only inserts MIO's for those aspect of the
session it wishes to permit the network to modify. For example, if
the caller only wants the network to modify the codecs in the
streams, it would only insert MIOs representing the codecs.
J.Rosenberg [Page 6]
Internet Draft Session Policy May 2, 2002
+------+ INVITE + MIO1 +--------+ INVITE + MIO1 + MFO1 +------+
| |---------------->| |---------------------->| |
| | |proxy | | |
| |200 + MIO2 + MFO2| | 200 + MIO2 | UA |
| UA |<----------------| |<----------------------| |
| | +--------+ | |
| 1 | | 2 |
| | | |
| | | |
| | | |
| | RTP +--------+ | |
| |---------------->| media |---------------------->| |
| | |enforce | | |
| | |point | RTP | |
| |<----------------| |<----------------------| |
+------+ +--------+ +------+
Figure 1: Session policy framework
Since the MIOs are meant for manipulation by proxies, and since they
are provided to enable a SIP feature (proxy insertion of session
policy), the MIOs are carried as SIP headers in the INVITE request.
The caller would also insert a SIP Supported header, indicating its
ability to understand session policies.
As the request traverses proxies, the proxies insert Media Filter
Objects (MFO). The MFOs represent "diffs" that the proxy wants to
apply to each MIO. These request session policy for media streams in
the direction of the callee to caller. For example, if an MIO
contains an IP address and port for receiving an audio stream, a
proxy can insert an MIO which changes that address and port to that
of a media intermediary. The proxy does not modify the MIO - that is
J.Rosenberg [Page 7]
Internet Draft Session Policy May 2, 2002
fundamental. Indeed, the MIO could, and should, be protected by end-
to-end security measures. By specifying diffs to the MIO rather than
directly modifying it, we enable an explicit consent and knowledge
model. The UA can know exactly which policies where requested against
the session.
If a proxy inserts an MFO, it can also insert a Require header into
the request. This would make sure the request fails if the UAS does
not understand session policies. Not all session policies will
require a Require header. Policies could be optional, in which case
the Require header would not be needed. If the request should fail,
the proxy would retry the request using mechanisms that would be
backwards compatible with older endpoints (such as modification of
the SDP).
Like the MIO, the MFO will be represented in a SIP header. Each proxy
can insert its own MFO. In that case, it "pushes" its MFO on top of
the set of existing MFOs, much like Record-Route headers are pushed
into a request. Each MFO also contains the identity of the domain
which requested the policy. The MFO could also contain a signature,
generated by the domain which inserted the MFO. This would allow the
UA to verify the identities of the domains which have requested
session policy, and to verify the integrity of those policies.
Perhaps most interestingly, the MFO can specify loose routing
mechanisms that should be used to deliver the media to media
intermediary. Just like the Route headers allow the UA to specify the
set of hops for signaling, tunneling protocols, such as IP-in-IP, or
IP loose source routing, would allow those approaches to be applied
to media delivery. This would have the important benefit of releaving
the network from maintenance of any state.
It is also very important that the MFO not be an actual diff, in the
unix sense. This is because it is important that the UA understand
the semantics of the requested policy, not just the syntatical change
that is needed to affect that policy.
When the request reaches the UAS, the UAS examines the MIOs and MFOs
in the request. It will know exactly what the UAC indicated, and know
exactly which policies have been requested by intermediate domains.
If those policies are unacceptable, it can generate an error response
with an indication of which policies were not acceptable. Proxies
receiving this error response could attempt to retry with a different
policy, or just pass the error response upstream. The error response
would arrive at the UAC, with a full list of the set of requested
policies. This would allow the UAC to know what happened to their
request, and why it failed.
J.Rosenberg [Page 8]
Internet Draft Session Policy May 2, 2002
If, however, the policies are acceptable to the UAS, and it accepts
the call, it generates a 200 OK. That 200 OK contains two things.
First, it contains its own set of MIOs for its side of the session.
It also contains the set of MFOs from the request, copied into the
response. These are purely informational, for the benefit of the UAC.
They are end-to-end, and not meant for modification by proxies. In
fact, they could (and should) be protected by end-to-end integrity
mechanisms. This would ensure that proxies cannot request policies
without having the UAC become aware of those policies.
As the response travels back to the UAC, proxies can insert MFOs that
request modification of the session in the caller to callee
direction. Just like the MFOs in the forward direction, these are
pushed into the request, and are formatted and interpreted
identically to those in the request.
When the UAC receives this response, it can either reject or accept
the policies. If it accepts, the ACK contains a copy of the MFOs from
the response. If it rejects, the UAC ACKs, but it also sends a BYE.
The BYE contains a reason code indicating that the call was
terminated because of unacceptable MFOs. The BYE could also contain
the list of MFOs from the 200 OK response.
Both endpoints then apply the media policies to the media streams
they generate. This may involve, for example, sending media to an
intermediary indicated in an MFO. Since the endpoints know about the
full set of intermediaries, they have many options in the event of a
failure (detected through an ICMP error, for example). The UA can try
to send the media to the next intermediary on the path. Or, if the
MFO specifies the intermediaries as a FQDN instead of an IP address,
the UA can attempt to use DNS to find an alternative, and begin
routing media through that.
The same mechanism could be repeated in a re-INVITE, allowing for
mid-session modification of policies.
This framework meets the requirements outlined in Section 3:
1. The solution allows proxies to request specific media
policies. This is accomplished through the insertion of
MFOs into the requests and the responses.
2. Since the solution does not require modification of the
bodies or the headers of the request, it works with end-
to-end encryption and authentication.
3. Since the solution does not require proxies to do anything
but insert a header (no inspection or processing of the
J.Rosenberg [Page 9]
Internet Draft Session Policy May 2, 2002
body), it requires much less processing than existing
solutions.
4. The solution is well within the scope of the SIP
specification. There is no modification of bodies, or even
modification of headers inserted by the UA.
5. An explicit consent model is supported. The UAS can reject
the policies requested for the media it generates, and it
can learn about the policies requested for the media
generated by the UAC. The UAC can reject the policies
requested for the media it generates, and it can learn
about the policies requested for the media generated by the
UAS.
6. The solution does not depend on interpretation of the
session description in the body.
7. Since the endpoints have complete knowledge of the media
policies requested by the network, they can route around
any failures by using an alternate (detected by DNS), or by
sending the media to the next media intermediary on the
path.
8. The solution does not require the SIP elements to be in the
same domain as the media processing elements.
9. The framework supports a wide variety of media policies.
5 Supporting Media Intermediaries
In this section, we describe an initial protocol that instantiates
the framework of Section 4 for insertion of media intermediaries.
Media intermediaries are used for firewall and NAT traversal,
enforcement of bandwidth usage, and so on. This protocol is not
complete. It is meant to convey the basic idea on the usage of the
framework to instantiate a particular protocol.
5.1 Media-Stream Header
In this usage, the MIO is the IP address, port, and transport where
the media stream is to be sent. This information is present in a new
SIP header, the Media-Stream header. The header also contains an ID,
which is a unique identifier for the stream.
Media-Stream = stream-info *(COMMA stream-info)
J.Rosenberg [Page 10]
Internet Draft Session Policy May 2, 2002
stream-info = discrete-type *(SEMI stream-params)
stream-params = address-param / port-param /
transport-param / id-param
address-param = "host" EQUAL (hostname / IPv4address
/ IPv6reference)
port-param = "port" EQUAL port
id-param = "id" EQUAL token
The Media-Stream header is inserted by the UAC in an outgoing INVITE,
and by the UAS in a 200 OK.
An example Media-Stream header:
Media-Stream: audio;id=7736ai;host=192.2.0.3;port=8876,
video;id=hha9s8sd0;host=192.2.0.3;port=8878
This specifies two media streams and audio and a video stream. Both
streams are sent to 192.2.0.3, but the audio is sent to port 8876 and
the video to port 8878. These parameters would match the values in
the SDP in the body.
5.2 Media-Middlebox Header
In this usage of the framework, the MFO is the address, port, and
transport of a media intermediary to be used for a particular stream.
It is conveyed in a new SIP header, the Media-Middlebox header. This
header contains, for a particular media stream (identified by the ID
from the Media-Stream header), the address and port of the middlebox,
the domain that has requested insertion of the middlebox, and a loose
source routing protocol to reach that middlebox.
Media-Middlebox = intermediary *(COMMA intermediary)
intermediary = stream-id *(SEMI intermediary-params)
stream-id = token
intermediary-params = address-param / port-param /
transport-param / lroute-param /
domain-param
lroute-param = "route" EQUAL route-protocols
route-protocols = "ip-in-ip" / "ip-loose" /
"media-specific"
domain-param = "domain" EQUAL host
J.Rosenberg [Page 11]
Internet Draft Session Policy May 2, 2002
The loose routing parameter requires some further discussion. The
purpose of the Media-Middlebox header is for a proxy to tell the UA
to send the media for a particular stream through an IP address and
port of the intermediary. Instead of merely sending the media there,
the UA can instead specify a source route, which touches that
intermediary, but also any other intermediaries and then the final
recipient. Thus, if there are N hops, including the final recipient,
there needs to be a way for the media stream to specify N
destinations. This can be done in several ways:
ip-in-ip: IP-in-IP tunneling [8] can be used to specify N hops
of media travesal. The ultimate destination is specified in
the destination IP of the innermost packet. Each subsequent
hop results in another encapsulation, with the destination
of that hop in the destination IP address of the packet.
ip-loose: IP provides a loose routing mechanism that allows the
sender of an IP datagram to specify a set of IP addresses
that are to be visited on the way before reaching the final
destination.
media-specific: Media protocols can provide their own loose
routing mechanism. If that is the case, the loose routing
mechanism of that protocol is used. As an example, the IM
Transport Protocol (IMTP) [9] uses SIP MESSAGE requests for
sending IM. SIP provides its own loose routing mechanisms
with the Route header. These can be used to direct the
MESSAGE through the set of intermediaries.
In the absence of a loose-routing mechanism, the media is instead
just sent to the first media intermediary listed in the header.
5.3 Reverse-MM-Policy
The Reverse-MM-Policy header conveys the middleboxes used in the path
of media towards the recipient. This header is informational only. It
is reflected in the 200 OK and ACK requests. Its syntax is identical
to the Media-Middlebox header.
5.4 UAC Behavior
5.4.1 Generating the Request
A UAC that supports this extension MUST insert a Supported header
into an INVITE or UPDATE request with the option tag "middlebox".
This indicates support for this extension, and willingless to let the
network specify media intermediaries.
J.Rosenberg [Page 12]
Internet Draft Session Policy May 2, 2002
For each media stream being set up or modified by the request, there
SHOULD be a Media-Stream header. The media type, address, port, and
transport for the header SHOULD be copied from the media type,
connection address, and port, and transport from the session
description in the request. The UAC MUST include an id attribute for
each media stream. This attribute MUST have a value that is unique
within the session description. As a result, the session identifier
(from the o line in SDP) along with the stream id attribute, specify
a globally unique identifier for a media stream.
5.4.2 Processing the Response
If the response is a 200 OK, it may contain a Require header with the
value of "middlebox". In this case, the UAC is requested to use a
media intermediary. There will be a Media-Stream header for each
media stream in use for the session. The UAC SHOULD verify that these
match the media streams from the session description. If they do not,
the response may have been tampered with, and the UA SHOULD terminate
the session with BYE (after ACKing, of course). If they do match, the
UA checks for a Media-Middlebox header. It MUST traverse the list of
Media-Middlebox header field values in reverse order. For each header
field value, it looks for a matching id amongst the values of the
Media-Stream header field. If there is a match, the identity of the
intermediary is "pushed" into a stack associated with that media
stream. When this process completes, the UAC will have a set of
intermediaries to visit for each media stream.
If this set of intermediaries is not acceptable, the UAC SHOULD ACK
and then BYE the call. The BYE MAY contain a Reason header [10]
indicating that the call was terminated because of unacceptable
intermediaries.
TBD: Specify the code, phrases, and a way to convey the
specific objection.
The 200 OK response will also contain the set of intermediaries that
will be used on the media path from the callee to the UAC. This will
be present in the Reverse-MM-Policy header in the 200 OK. If this is
not acceptable, the UAC SHOULD ACK and then BYE the call. The BYE MAY
contain a Reason header [11] indicating that the call was terminated
because of unacceptable intermediaries.
If the set of intermediaries is acceptable, when the UAC sends media
on a stream, it sends it to the top intermediary in the stack. The
media is sent using the transport protocol and loose routing
mechanism (if any) specified.
J.Rosenberg [Page 13]
Internet Draft Session Policy May 2, 2002
The ACK generated by the UAC SHOULD contain a Reverse-MM-Policy
header field. This header field contains the same value as the
Media-Middlebox header field from the 200 OK.
5.5 UAS Behavior
5.5.1 Receiving the INVITE or UPDATE
When the UAS receives an INVITE request, it may have a Require header
indicating that the UAS must understand the media intermediary
extension in order to process the request. In that case, the request
will contain a Media-Stream header and a Media-Middlebox header.
For each value in the Media-Stream header field, the UAS matches the
stream with its counterpart in the session description in the body.
Assuming it will otherwise generate an answer to the offer in the
INVITE, the UAS discards any Media-Stream header field values
corresponding to media streams disabled (by setting the port to zero)
in the SDP in the answer. The resulting set of Media-Stream header
field values are called the working set.
The UAS then begins processing the values of the Media-Middlebox
header in reverse order. For each value, the UAS finds the matching
stream in the working set (the match is based on the id attribute in
the Media-Middlebox value). The Media-Middlebox value is then pushed
into a stack associated with the matching value from the working set.
When the process is complete, there is a stack of intermediaries
specified for each media stream accepted by the UAS.
If the set of middleboxes is not acceptable to the UAS, it MAY reject
the response with a TBD response code. This response can contain
Warning headers indicating the specific reasons for rejection.
If the set of middleboxes is acceptable, the UAS generates an answer
(in the 2xx, or a reliable provisional response [12]). This response
contains a Reverse-MM-Policy header that mirrors the value of the
Media-Middlebox header from the request. The response also contains a
Media-Stream header, containing a value for each stream used in the
answer. The response MUST contain a Require header with the value
"middlebox" in order to indicate that media policies were applied to
the request.
When the UAS sends media, it sends it to the top middlebox in the
stack, using the address, port, transport, and optionally loose route
specified by that policy.
5.5.2 Receiving the ACK
J.Rosenberg [Page 14]
Internet Draft Session Policy May 2, 2002
The ACK request will contain a Reverse-MM-Policy header that informs
the UAS of the media policies used to route requests from the caller
to itself. If this set is not acceptable, the UAS MAY generate a BYE
to send the session.
5.6 Proxy Behavior
5.6.1 Receiving a Request
When a proxy receives an INVITE or UPDATE request with a Supported
header with the value middlebox, it knows it can attempt to use media
policies on this request. To do so, it inserts a value into the
Media-Middlebox header (adding the header field if not present) at
the top for each stream it wishes to apply media processing for. The
streams are identified with the Media-Stream header in the request.
The proxy MAY insert multiple media policies for the same stream. The
proxy MAY insert a Require header into the request, with the value
"middlebox", if it insists that the UAS understand the extension in
order to continue with the session. If the result is a 420 response,
the UAC SHOULD retry the request without the media policy.
5.6.2 Receiving a Response
When a proxy receives a response to an INVITE or UPDATE request that
contained a Supported header with the value middlebox, and the
response contains a Require header with the value middlebox, the
proxy MAY insert values into the Media-Middlebox header (adding the
header field if not present) at the top, for each stream it wishes to
apply processing for. The streams are identified with the Media-
Stream header in the response. The proxy MAY insert multiple media
policies for the same stream.
6 Example Call Flows
The framework and the protocol are best explained through some
examples. We provide three example flows here.
6.1 Example I: IP-in-IP NAT
This configuration is shown in Figure 2. The caller, UA1, is on the
public Internet. It wishes to call a user, UA2, sip:user2@foo.com.
The foo.com domain is running on a net-10 network. The network has a
single multi-homed proxy server, and it has a multi-homed router for
media processing. The router has a public interface of 1.2.3.4.
The flow for the call is shown in Figure 3. In message 1, the caller
J.Rosenberg [Page 15]
Internet Draft Session Policy May 2, 2002
................................
. .
. .
+--------+ .
| | .
| Proxy | net10 .
| | Network .
| | .
+--------+ .
. .
. .
+--------+ .
| Multi | .
|Homed | .
| Router | .
| | .
+--------+ .
+--------+ . +--------+ .
| | . | | .
| UA | . | UA | .
| 1 | . | 2 | .
| | . | | .
+--------+ .foo.com +--------+ .
................................
Figure 2: IP-in-IP NAT Configuration
sends an INVITE. This INVITE looks like, in part:
INVITE sip:user2@foo.com SIP/2.0
Supported: middlebox
Media-Stream: audio;address=9.8.7.6;port=1288;id=fxx9;transport=udp
Content-Type: application/sdp
Content-Length: ...
v=0
o=alice 2890844526 2890844526 IN IP4 host.anywhere.com
s=
c=IN IP4 9.8.7.6
J.Rosenberg [Page 16]
Internet Draft Session Policy May 2, 2002
UA 1 Proxy router UA 2
| | | |
|(1) INVITE | | |
|MS:audio 9.8.7.6:1288 | |
|------------------>| | |
| | | |
| | | |
|(2) 100 | | |
|<------------------| | |
| | | |
| |(3) INVITE | |
| |MS:audio 9.8.7.6:1288 |
| |-------------------------------------->|
| | | |
| |(4) 200 OK | |
| |MS:audio 10.0.1.1:7788 |
| |<--------------------------------------|
|(5) 200 OK | | |
|MS:audio 10.0.1.1:7788 | |
|MM:audio 1.2.3.4;ipinip | |
|<------------------| | |
| | | |
| | | |
|(6) ACK | | |
|------------------>| | |
| | | |
| | | |
| |(7) ACK | |
| |-------------------------------------->|
| | | |
|(8) IPinIP | | |
|inner 10.0.1.1:7788| | |
|-------------------------------------->| |
| | | |
| | | |
| | |(9) RTP |
| | |------------------>|
| | | |
| | | |
| | |(10) RTP |
| | |<------------------|
| | | |
| | | |
|(11) RTP | | |
|<--------------------------------------| |
| | | |
| | | |
| | | |
| | | |
Figure 3: IP-in-IP Flow
J.Rosenberg [Page 17]
Internet Draft Session Policy May 2, 2002
t=0 0
m=audio 1288 RTP/AVP 0
This is passed to the foo.com proxy. The proxy does not require the
specific usage of an intermediary for media from the callee (who is
within foo.com) to the caller. Therefore, it merely proxies the
request after a registration lookup. This request (3) arrives at the
UAS. The UAS decides to accept the session. It generates a 200 OK
with its own Media-Stream headers (4), which looks like, in part:
SIP/2.0 200 OK
Supported: middlebox
Media-Stream: audio;address=10.0.1.1;port=7788;id=jhh7;transport=udp
Content-Type: application/sdp
Content-Length: ...
v=0
o=bob 2890887s 2890686626 IN IP4 10.0.1.1
s=
c=IN IP4 10.0.1.1
t=0 0
m=audio 7788 RTP/AVP 0
This is received by the proxy. The proxy knows it needs to have media
destined for this UA pass through the multi-homed router. To do that,
it requests the caller to use IP-in-IP encapsulation. So, it adds a
Media-Middlebox header to the response (5):
SIP/2.0 200 OK
Require: middlebox
Media-Middlebox: jhh7;address=1.2.3.4;route="ip-in-ip"
Media-Stream: audio;address=10.0.1.1;port=7788;id=jhh7;transport=udp
Content-Type: application/sdp
Content-Length: ...
v=0
o=bob 2890887s 2890686626 IN IP4 10.0.1.1
s=
c=IN IP4 10.0.1.1
t=0 0
m=audio 7788 RTP/AVP 0
J.Rosenberg [Page 18]
Internet Draft Session Policy May 2, 2002
This arrives at the UAC. The UAC generates an ACK (6), which contains
a Reverse-MM-Policy header which mirrors the Media-Middlebox header
from the 200 OK:
ACK sip:ua2@10.0.1.1 SIP/2.0
Route: sip:1.2.3.3
Reverse-MM-Policy: jhh7;address=1.2.3.4;route="ip-in-ip"
The UAC then sends media. To do so, it generates an IP datagram with
the destination IP address 1.2.3.4. The protocol is IP-in-IP. The
inner datagram is a UDP packet, with destination 10.0.1.1, port 7788.
This packet is sent to 1.2.3.4 (8), which arrives at the router. The
router decapsulates the packet, and forwards the innermost packet.
This packet is destined from 10.0.1.1, which is reachable from its
internal interface. It sends it there (9), and the media arrives at
UA 2. In the reverse direction, the callee sends packets to 9.8.7.6.
These pass through the router, which NATs the source address, and
forwards them on to the caller.
The most interesting aspect of this flow is that there was no MIDCOM
protocol needed at all! There is no state stored in either the proxy,
or in the router. This is because the "state", in this case, the
binding between a public address and private one, has been pushed to
the end systems, and sent back into the network through the IP-in-IP
encapsulation. This mechanism can be considered a cross between RSIP
[13] (which also uses tunneling) and midcom (which has proxies
modifying messages).
The drawbacks of the use of IP-in-IP tunneling here are clear. First,
there is an additional 12 byte overhead per packet for the additional
IP header. The second drawback is the slow-path processing which is
likely to be seen at the router for decapsulation and forwarding.
This may limit the volume of traffic that can be supported on any
router. Interestingly, this problem is easily resolved through load
balancing. Instead of including an IP address in the Media-Middlebox
header, the proxy can include a domain name which contains multiple
SRV records, one for each router being used. The clients can perform
a randomized selection amongst the records, distributing the load
across routers with very little additional overhead. Failover is
provided in the same way. If the IP-in-IP packet generates an ICMP
error, the caller knows that the intermediary failed. It can then use
a different DNS record for an alternate. This results in highly
robust and scalable operation.
Another drawback of this approach, however, is that it doesn't
J.Rosenberg [Page 19]
Internet Draft Session Policy May 2, 2002
provide any media policy enforcement, per se. That is, it is useful
strictly for NAT. No firewall or policy enforcement is provided.
Indeed, an attacker can send packets into the private network,
without call setup. They merely send an IP-in-IP packet, with the
outermost address equal to the router interface, and the innermost
destination address that of the host which is to be communicated
with. To provide firewall mechanisms while retaining the stateless
mechanisms of this approach, it is neccesary to use different
encapsulation protocols. Such protocols would provide encapsulation,
and also allow for the presentation of authorization tokens, handed
out by the proxy to the UAs, that permit specific packet processing
in the router. This would effectively be a generalization of the call
authorization tokens described in [14].
It is no coincidence that the routing of media operates in a similar
fashion to the SIP routing of the ACK. The ACK has a destination of
sip:ua2@10.0.1.1, carried in the request-URI, but an intermediate hop
(carried in the Route header) of sip:1.2.3.3. The proxy can remain
stateless because the ultimate destination is encapsulated within the
ACK message it receives from the caller. The same is true for the
router, which can also remain stateless - no storage of bindings.
6.2 Example II: Traditional MIDCOM
This example is similar to that of the first example, but no IP-in-IP
encapsulation is done. Rather, the proxy obtains bindings through
MIDCOM. The configuration is shown in Figure 4. The caller, UA1, is
on the public Internet. It wishes to call a user, UA2, situated
behind a NAT in the foo.com domain.
The call flow is shown in Figure 5. In message 1, the caller sends an
INVITE. This INVITE looks like, in part:
INVITE sip:user2@foo.com SIP/2.0
Supported: middlebox
Media-Stream: audio;address=9.8.7.6;port=1288;id=fxx9;transport=udp
Content-Type: application/sdp
Content-Length: ...
v=0
o=alice 2890844526 2890844526 IN IP4 host.anywhere.com
s=
c=IN IP4 9.8.7.6
t=0 0
m=audio 1288 RTP/AVP 0
J.Rosenberg [Page 20]
Internet Draft Session Policy May 2, 2002
................................
. .
. .
+--------+ .
| | .
| Proxy | net10 .
| | Network .
| | .
+--------+ .
. .
. .
+--------+ .
| | .
| NAT | .
| | .
| | .
+--------+ .
+--------+ . +--------+ .
| | . | | .
| UA | . | UA | .
| 1 | . | 2 | .
| | . | | .
+--------+ .foo.com +--------+ .
................................
Figure 4: Traditional Midcom Configuration
This is passed to the foo.com proxy. The proxy does not require the
specific usage of an intermediary for media from the callee (who is
within foo.com) to the caller. Therefore, it merely proxies the
request after a registration lookup. This request (3) arrives at the
UAS. The UAS decides to accept the session. It generates a 200 OK
with its own Media-Stream headers (4), which looks like, in part:
SIP/2.0 200 OK
Supported: middlebox
Media-Stream: audio;address=10.0.1.1;port=7788;id=jhh7;transport=udp
Content-Type: application/sdp
J.Rosenberg [Page 21]
Internet Draft Session Policy May 2, 2002
UA 1 Proxy NAT UA 2
| | | |
|(1) INVITE | | |
|MS:audio 9.8.7.6:1288 | | |
|--------------------->| | |
| | | |
| | | |
|(2) 100 | | |
|<---------------------| | |
| | | |
| |(3) INVITE | |
| |MS:audio 9.8.7.6:1288 | |
| |-------------------------------------------->|
| | | |
| |(4) 200 OK | |
| |MS:audio 10.0.1.1:7788| |
| |<--------------------------------------------|
| | | |
| |(5) Allocate | |
| |10.0.1.1:7788 | |
| |--------------------->| |
| | | |
| |(6) Binding= | |
| |1.2.3.4:8876 | |
| |<---------------------| |
|(7) 200 OK | | |
|MS:audio 10.0.1.1:7788| | |
|MM:audio 1.2.3.4:8876 | | |
|<---------------------| | |
| | | |
| | | |
|(8) ACK | | |
|--------------------->| | |
| | | |
| | | |
| |(9) ACK | |
| |-------------------------------------------->|
|(10) RTP | | |
|destIP= | | |
|1.2.3.4:8876 | | |
|-------------------------------------------->| |
| | |(11) RTP |
| | |destIP= |
| | |10.0.1.1:7788 |
| | |--------------------->|
| | | |
| | | |
| | | |
| | | |
Figure 5: Traditional Midcom Flow
J.Rosenberg [Page 22]
Internet Draft Session Policy May 2, 2002
Content-Length: ...
v=0
o=bob 2890887s 2890686626 IN IP4 10.0.1.1
s=
c=IN IP4 10.0.1.1
t=0 0
m=audio 7788 RTP/AVP 0
This is received by the proxy. The proxy knows it needs to have media
destined for this UA pass through the NAT. To do that, it uses a
midcom-type of protocol, and requests a NAT binding for 10.0.1.1:7788
(5). The NAT returns a binding (6), which is 1.2.3.4:8876. The proxy
inserts a Media-Middlebox header into the 200 OK (7), containing this
address as a media intermediary.
SIP/2.0 200 OK
Require: middlebox
Media-Middlebox: jhh7;address=1.2.3.4;port=8876
Media-Stream: audio;address=10.0.1.1;port=7788;id=jhh7;transport=udp
Content-Type: application/sdp
Content-Length: ...
v=0
o=bob 2890887s 2890686626 IN IP4 10.0.1.1
s=
c=IN IP4 10.0.1.1
t=0 0
m=audio 7788 RTP/AVP 0
This arrives at the UAC. The UAC generates an ACK (8), which contains
a Reverse-MM-Policy header which mirrors the Media-Middlebox header
from the 200 OK:
ACK sip:ua2@10.0.1.1 SIP/2.0
Route: sip:1.2.3.3
Reverse-MM-Policy: jhh7;address=1.2.3.4;port=8876
The UAC then sends media. Since there is no loose routing mechanism
specified, the UAC assumes that the network can properly route the
J.Rosenberg [Page 23]
Internet Draft Session Policy May 2, 2002
media from the first intermediary to the final recipient. So, it
sends its RTP packets to 1.2.3.4:8876 (10). These packets arrive at
the NAT. The NAT translates the address to 10.0.1.1:7788, and sends
the media to the called party.
There is an interesting benefit in this case. One of the problems
with the flow of Figure 5 is that it might not work if the caller and
callee are in the same domain. In that case, the media would go from
the caller, to the NAT, and theoretically turn back around and go to
the called party. This is referred to as the intra-realm case [15].
Many NATs will not properly turn the packet around. In the flow here,
though, both the caller and callee will know the private IP address
of their peers (present in both the SDP and the Media-Stream header).
In the event the media fails when routed through the intermediary,
both parties can try to send the media directly, since they have
enough information to do so.
6.3 Example III: SIP Message Sessions
This example is similar to the first example. However, the INVITE is
used to set up an IM session [2]. The messages within the IM session
are sent using the SIP MESSAGE request [3]. That draft discusses a
similar approach for handling intermediaries to the one described
here, but uses media-specific parameters within the SDP. Here, the
MESSAGE requests in the session are routed through a SIP proxy using
"media-specific" source routing specified by the Media-Middlebox
header. In this case, the media is a SIP request, and therefore, it
uses SIP's loose routing capabilities.
The call flow is shown in Figure 6. In message 1, the caller sends an
INVITE. This INVITE looks like, in part:
INVITE sip:user2@foo.com SIP/2.0
Supported: middlebox
Media-Stream: message;address=9.8.7.6;id=fxx9;transport=tcp
Content-Type: application/sdp
Content-Length: ...
v=0
o=alice 2890844526 2890844526 IN IP4 host.anywhere.com
s=
c=IN IP4 9.8.7.6
t=0 0
m=message 5060 SIP
a=user:alice
J.Rosenberg [Page 24]
Internet Draft Session Policy May 2, 2002
UA 1 Proxy NAT UA 2
|(1) INVITE | | |
|MS:message 9.8.7.6 | | |
|user=alice | | |
|------------------>| | |
| | | |
| | | |
|(2) 100 | | |
|<------------------| | |
| |(3) INVITE | |
| |MS:message 9.8.7.6 | |
| |user=alice | |
| |-------------------------------------->|
| |(4) 200 OK | |
| |MS:message 10.0.1.1| |
| |user=bob | |
|(5) 200 OK |<--------------------------------------|
|MS:message 10.0.1.1| | |
|user=bob | | |
|MM:1.2.3.4 | | |
|<------------------| | |
| | | |
| | | |
|(6) ACK | | |
|------------------>| | |
| | | |
| | | |
| |(7) ACK | |
| |-------------------------------------->|
| | | |
|(8) MESSAGE bob@10.0.1.1 | |
|Route=1.2.3.4 | | |
|------------------>| | |
| | | |
| | | |
| |(9) MESSAGE bob@10.0.1.1 |
| |-------------------------------------->|
| | | |
| | | |
| | | |
| | | |
Figure 6: Call Flow for MESSAGE Sessions
This is passed to the foo.com proxy. The proxy does not require the
specific usage of an intermediary for messages from the callee (who
is within foo.com) to the caller. Therefore, it merely proxies the
J.Rosenberg [Page 25]
Internet Draft Session Policy May 2, 2002
with its own Media-Stream headers (4), which looks like, in part:
SIP/2.0 200 OK
Supported: middlebox
Media-Stream: message;address=10.0.1.1;id=jhh7;transport=tcp
Content-Type: application/sdp
Content-Length: ...
v=0
o=bob 2890887s 2890686626 IN IP4 10.0.1.1
s=
c=IN IP4 10.0.1.1
t=0 0
m=message 5060 SIP
a=user:bob
This is received by the proxy. The proxy knows that the MESSAGE
requests cannot go directly to Bob, they need to pass through an
intermediary. In this case, its the proxy itself. So, the proxy
inserts a Media-Middlebox header, indicating itself as the
intermediary, using a media-specific loose routing mechanism (5):
SIP/2.0 200 OK
Require: middlebox
Media-Middlebox: jhh7;address=1.2.3.4;route="media-specific";transport=tcp
Media-Stream: message;address=10.0.1.1;id=jhh7;transport=tcp
Content-Type: application/sdp
Content-Length: ...
v=0
o=bob 2890887s 2890686626 IN IP4 10.0.1.1
s=
c=IN IP4 10.0.1.1
t=0 0
m=message 5060 SIP
a=user:bob
This arrives at the UAC. The UAC generates an ACK (6), which contains
a Reverse-MM-Policy header which mirrors the Media-Middlebox header
from the 200 OK:
J.Rosenberg [Page 26]
Internet Draft Session Policy May 2, 2002
ACK sip:ua2@10.0.1.1 SIP/2.0
Route: sip:1.2.3.4
Reverse-MM-Policy: jhh7;address=1.2.3.4;route="media-specific";transport=tcp
The UAC then sends an IM. To do so, it constructs a SIP MESSAGE
request. The request URI is constructed from the SDP in the 200 OK
(which matches the Media-Stream header in the 200 OK), and is equal
to sip:bob@10.0.1.1. It then constructs a loose route using the SIP
Route headers. There is a single intermediary, a proxy at
sip:1.2.3.4. The MESSAGE sent by the caller looks like (8):
MESSAGE sip:bob@10.0.1.1;transport=tcp SIP/2.0
Route: sip:1.2.3.4;transport=tcp;lr
This is received by the proxy, which pops the Route header, and
forwards it to the recipient, Bob (9).
Of course, the intermediary for the MESSAGE request need not be the
same as the proxy handling the SIP. It is only in the case of this
example.
7 Author's Addresses
Jonathan Rosenberg
dynamicsoft
72 Eagle Rock Avenue
First Floor
East Hanover, NJ 07936
email: jdrosen@dynamicsoft.com
8 Normative References
[1] J. Rosenberg, H. Schulzrinne, et al. , "SIP: Session initiation
protocol," Internet Draft, Internet Engineering Task Force, Feb.
2002. Work in progress.
[2] B. Campbell and J. Rosenberg, "SIP instant message sessions,"
Internet Draft, Internet Engineering Task Force, July 2001. Work in
progress.
J.Rosenberg [Page 27]
Internet Draft Session Policy May 2, 2002
[3] J. Rosenberg, "Using MESSAGE for IM sessions," Internet Draft,
Internet Engineering Task Force, May 2002. Work in progress.
9 Informative References
[4] P. Srisuresh, J. Kuthan, J. Rosenberg, A. Molitor, and A. Rayhan,
"Middlebox communication architecture and framework," Internet Draft,
Internet Engineering Task Force, Mar. 2002. Work in progress.
[5] D. Kutscher, J. Ott, and C. Bormann, "Session description and
capability negotiation," Internet Draft, Internet Engineering Task
Force, Mar. 2002. Work in progress.
[6] A. Vemuri and J. Peterson, "SIP for telephones (SIP-t): Context
and architectures," Internet Draft, Internet Engineering Task Force,
Mar. 2002. Work in progress.
[7] S. Floyd and L. Daigle, "IAB architectural and policy
considerations for open pluggable edge services," RFC 3238, Internet
Engineering Task Force, Jan. 2002.
[8] C. Perkins, "IP encapsulation within IP," RFC 2003, Internet
Engineering Task Force, Oct. 1996.
[9] J. Rosenberg et al. , "A proposal for IM transport," Internet
Draft, Internet Engineering Task Force, Nov. 2001. Work in progress.
[10] H. Schulzrinne, D. Oran, and G. Camarillo, "The reason header
field for the session initiation protocol," Internet Draft, Internet
Engineering Task Force, Apr. 2002. Work in progress.
[11] H. Schulzrinne, D. Oran, and G. Camarillo, "The reason header
field for the session initiation protocol," Internet Draft, Internet
Engineering Task Force, Mar. 2002. Work in progress.
[12] J. Rosenberg and H. Schulzrinne, "Reliability of provisional
responses in SIP," Internet Draft, Internet Engineering Task Force,
Feb. 2002. Work in progress.
[13] M. Borella, J. Lo, D. Grabelsky, and G. Montenegro, "Realm
specific IP: framework," RFC 3102, Internet Engineering Task Force,
Oct. 2001.
[14] W. Marshall et al. , "SIP extensions for media authorization,"
Internet Draft, Internet Engineering Task Force, Mar. 2002. Work in
progress.
[15] C. Aoun and S. Sen, "Identifying intra-realm calls and avoiding
J.Rosenberg [Page 28]
Internet Draft Session Policy May 2, 2002
media tromboning," Internet Draft, Internet Engineering Task Force,
Feb. 2002. Work in progress.
Full Copyright Statement
Copyright (c) The Internet Society (2002). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
The IETF has been notified of intellectual property rights claimed in
regard to some or all of the specification contained in this
document. For more information consult the online list of claimed
rights.
J.Rosenberg [Page 29]