[Search] [txt|pdfized|bibtex] [Tracker] [Email] [Diff1] [Diff2] [Nits]
Versions: 00 01 02                                                      
Internet Engineering Task Force                                   SIP WG
Internet Draft                                 J.Rosenberg,H.Schulzrinne
draft-rosenberg-sip-entfw-01.txt                 dynamicsoft,Columbia U.
March 2, 2001
Expires: September, 2001


  SIP Traversal through Residential and Enterprise NATs and Firewalls

STATUS OF THIS MEMO

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as work in progress.

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.


Abstract

   In this draft, we discuss how SIP can traverse enterprise and
   residential firewalls and NATs. This environment is challenging
   because we assume here that the end user has little or no control
   over the firewall or NAT, and that the firewall or NAT is completely
   ignorant of SIP. Despite this, our solutions for the NAT case are
   very workable and suffer few disadvantages.


1 Introduction

   The problem of getting applications through firewalls and NATs has
   received a lot of attention [1]. Getting SIP through firewalls and
   NATs is particularly troublesome. In a previous draft [2] we
   discussed some of the general issues regarding traversal of
   firewalls, and discussed some solutions for it. Our solutions were



J.Rosenberg,H.Schulzrinne                                     [Page 1]


Internet Draft                   entfw                     March 2, 2001


   based on having a proxy server control the firewall/NAT with a
   control protocol of some sort [3]. This protocol can open and close
   pinholes in the firewall, and/or obtain NAT address bindings to use
   in rewriting the SDP in a SIP message.

   The use of a control protocol in the midcom architecture is ideal for
   carriers, but it does not work when the SIP service provider is not
   the same as the ISP and transport provider of the end user. This is
   frequently the case for users behind enterprise firewalls and NATs
   who are trying to access SIP services outside of their networks. The
   same happens for residential NATs and firewalls. These devices are
   often used by consumers who have cable modem and DSL connections, and
   wish to connect multiple computers using the single address provided
   by the cable company or DSL company. [1] Residential firewalls and
   NATs are often referred to as cable/DSL routers, and are manufactured
   by companies like Linksys, Netopia, and Netgear.

   Ultimately, it is our belief and hope that NATs will disappear with
   the deployment of IPv6. However, that is not likely to happen for
   some time.

   Given the existence of NATs, one way to handle SIP is to embed a SIP
   ALG within enterprise NATs and firewalls. However, this has not
   happened. The top commercial firewall and NAT products continue to be
   SIP-unaware. Even if SIP ALG support were added tomorrow, there is
   still a huge installed based of firewalls and NATs that do not
   understand SIP. As a result, there is going to be a long period of
   time during which users will be behind firewalls or NATs that are
   ignorant of SIP, probably at least two to three years. The SIP
   community cannot wait for ubiquituous deployment of SIP aware
   firewalls and NATs. Interim solutions are needed NOW to enable SIP
   services to be delivered to users behind these devices.

   In this draft, we propose solutions for getting SIP through
   enterprise and residential NATs and firewalls that does not require
   changes to these devices or to their configurations. NATs and
   firewalls are a reality, and SIP deployment is being hampered by the
   lack of support for SIP ALGs in these boxes. A solution MUST be
   found, and we provide one here.

2 Architecture

_________________________

  [1] The author of this draft  is  amongst  those  who
have  such  a  residential  NAT,  and thus feels highly
motivated to solve this particular problem




J.Rosenberg,H.Schulzrinne                                     [Page 2]


Internet Draft                   entfw                     March 2, 2001


   We assume that the network architecture we are dealing with looks
   like Figure 1. The caller is a UA in enterprise or residence A, and
   the called party is a UA in enterprise or residence B. The caller
   uses proxy X as its local outbound proxy, which forwards the call to
   the proxy of the called party, Y, also outside of the firewall or
   NAT. The call is then forwarded to the called party within enterprise
   or residence B.


   The firewall and/or NAT (FW/NAT) boxes are off-the-shelf boxes with
   no support for SIP ALG. We consider NAT and firewall separately. For
   NATs, we consider specifically a class of devices referred to as
   residential NATs.

   Residential NATs are typically placed in the home, and allow multiple
   devices to make use of a single IP address provided by a cable or DSL
   provider. The devices generally disallow incoming traffic, but allow
   outbound TCP and UDP connections. Based on the terminology defined in
   RFC 2663 [4], residential NATs are Network Address Port Translators
   (NAPT). Once a connection is established outwards, data on the same
   connection is allowed inwards from the remote peer. This is true for
   UDP as well. Specifically, if a user sends UDP packets from local IP
   address and port pair A,B to remote IP address and port pair C,D,
   they are natted to have a source address of X,Y. Packets sent from
   C,D to X,Y have their destination address natted to A,B, and are
   delivered back to the host behind the NAT. The ability to NAT UDP
   packets in this way is critical to our solutions. We have verified
   this feature on the leading residential NAT products.

   Many small offices and home offices (SOHO) also use these devices to
   allow their business to connect to the Internet over cable or DSL.
   Because the device is configured identically in this case, we lump it
   with the residential NAT.

   Enterprise firewalls are used in larger enterprises. They are
   typically configured with much tighter security. We assume the worst
   case scenario, which is that these boxes will allow users inside
   their enterprises to browse the web, and specifically, to browse
   secure web sites. UDP, both inbound and outbound, is disallowed. TCP
   inbound is disallowed. Outbound TCP from any host within the
   enterprise is allowed out only to port 80 and 443. Our assumption is
   that these devices are not running NAT.

   Handling enterprise devices that are both firewalls and NAPT involves
   combing the solutions for both cases. Wherever appropriate, we
   discuss any issues specific to combining the two.

   In general, getting SIP services to function behind these devices



J.Rosenberg,H.Schulzrinne                                     [Page 3]


Internet Draft                   entfw                     March 2, 2001













                      +-------+       +-------+
                      | SIP   |       | SIP   |
                      | Proxy |       | Proxy |
                      |  X    |       |   Y   |
                      |       |       |       |
                      +-------+       +-------+







           +-------+                           +-------+
   ........|FW/NAT |............       ........|FW/NAT |............
   .       |       |           .       .       |       |           .
   .       +-------+           .       .       +-------+           .
   .                           .       .                           .
   .                           .       .                           .
   .                           .       .                           .
   .                           .       .                           .
   .                           .       .                           .
   .                           .       .                           .
   .       +-------+           .       .       +-------+           .
   .       | SIP UA|           .       .       | SIP UA|           .
   .       |  Joe  |           .       .       |  Bob  |           .
   .       +-------+           .       .       +-------+           .
   .............................       .............................

        Enterprise or                           Enterprise or
          Residence A                            Residence B


   Figure 1: Network Architecture






J.Rosenberg,H.Schulzrinne                                     [Page 4]


Internet Draft                   entfw                     March 2, 2001


   requires resolution of several problems:

        Originating Requests: Getting SIP requests from the caller, Joe,
             to proxy X, and responses from proxy X back to the Joe.

        Receiving Requests: Getting SIP requests from proxy Y to the
             called party, Bob, and responses from Bob back to proxy Y.

        Handling RTP: Getting media to go from Joe to Bob and Bob to
             Joe.

   We discuss solutions for each in turn.

3 Originating requests

   The first problem is originating requests from the caller through a
   firewall/NAT, out to a proxy, and getting the responses from this
   proxy back to the caller.

3.1 NAT

   The residential NAT will allow both outgoing UDP and TCP traffic to
   port 5060. This means that there are no problems in generating an
   outbound INVITE. However, there are issues with the response.

   SIP specifies that for UDP, the response is sent to the port number
   in the Via header and the IP address the request came from. However,
   due to NAT, the port number in the Via header will be wrong. This
   means that the response will not be sent to the proper location.
   However, with TCP, responses are sent over the connection the INVITE
   arrived on. This means that a response sent over the TCP connection
   will be received properly by a caller behind a NAT.

   The simplest solution, therefore, is for the caller to use a TCP
   connection to send the INVITE, and receive the response. We recommend
   that this connection be kept open permanently, to avoid the need to
   establish it for new calls. A persistent connection is also needed
   for incoming calls in any case (see Section 4). For devices which do
   not support TCP, UDP may be used. However, the proxy needs to be able
   to send the UDP response to the address *and* port the request
   arrived on. This is not standardized behavior, but could potentially
   be configured for requests from users that are known to be behind
   residential NATs.

   In order for this connection to be used for re-INVITEs or BYEs, the
   proxy needs to record route.

3.2 Firewall



J.Rosenberg,H.Schulzrinne                                     [Page 5]


Internet Draft                   entfw                     March 2, 2001


   We assume the firewall (FW) blocks all outgoing UDP, but will allow
   some outgoing TCP. In the worst case, it will only allow outgoing
   HTTP traffic on 80, and HTTPS on 443. HTTPS is nothing more than HTTP
   over TLS/SSL [5]. What's interesting about https is that the
   connection starts out with TLS, negotiates a secure channel, and then
   runs HTTP over this channel. All HTTP messages are encrypted. The FW
   never sees any HTTP messages in the clear, only TLS/SSL messages. The
   important implication is that there is no way for a FW to have
   application layer intelligence that depends on the existence of HTTP
   on port 443. In fact, any protocol can be run over TLS on port 443,
   and it will look the same to the FW. Since we assume that the FW lets
   HTTPS through, it should allow SIP over TLS through, running on port
   443.

   Thus, our proposal is to have the caller, Joe, initiate a TLS
   connection on port 443 to the proxy server X. Once the TLS connection
   is secured, the client can send SIP messages over this connection.
   Handling of SIP over TLS/SSL is identical to TCP. Responses from the
   proxy are sent over this connection as well [6]. We recommend that
   the client maintain the TLS connection to be open (more on this in
   Section 4). This avoids the need to re-initiate the TLS connection
   for every outgoing call.

   Fooling the FW into believing the traffic is HTTPS by running it over
   port 443 is not nice. We would strongly recommend that clients first
   try the IANA registered port for SIP over TLS, port 5061. If no
   response is received over this connection, the client should then try
   443.

   Note that outgoing requests may work with just vanilla TCP. However,
   we have observed that some firewalls examine TCP connections to look
   for specific protocols. Thus, SIP over TCP on 5060 may not work. SIP
   over TCP on port 80 may also not work, as some firewalls check for
   HTTP messages. This is why we prefer TLS; we believe that it is most
   likely to work.

   In order for this connection to be used for re-INVITEs or BYEs, the
   proxy needs to record route.

4 Receiving requests

   Unfortunately, receiving requests is not as simple as sending them.
   We consider first the NAT case, and then the firewall case.

4.1 NAT

   The problem has to do with registrations. In Figure 1, the callee,
   Bob, will receive requests at their UA because they had previously



J.Rosenberg,H.Schulzrinne                                     [Page 6]


Internet Draft                   entfw                     March 2, 2001


   sent a REGISTER request to their registrar, which is co-located with
   proxy Y. This registration contains a Contact header which lists the
   address where the incoming requests should be sent to. However, in
   the case of NAT, this address will be wrong. It will contain a domain
   name or IP address that is within the private space of enterprise B.
   Thus, the REGISTER might look like:


   REGISTER sip:Y.com SIP/2.0
   From: sip:bob@Y.com
   To: sip:bob@Y.com
   Contact: sip:bob@10.0.1.100



   This address is not reachable by the proxy.

   To solve this problem, we need two things. First, we need a
   persistent connection to be established from Bob to Y. Secondly, we
   need a way for incoming requests destined for B to be routed over
   this connection.

   To address this first problem, we recommend that clients that send
   REGISTER requests do so over a TCP or TLS connection, as described in
   Section 3. Furthermore, they keep this connection open permanently.
   REGISTER refreshes are sent over this connection. We further
   recommend that the proxy/registrar hold this connection in a table,
   where the table is indexed by the remote side of the transport
   connection. When the proxy wishes to send a packet to some server at
   IP address M, port N, transport O, it looks up the tuple (M,N,O) in
   the table to see if a connection already exists, and then uses it.

   Now, a connection is available for contacting the user. However, this
   connection must be associated with sip:bob@Y.com. Unfortunately, it
   is not. Calls for sip:bob@Y.com are translated to sip:bob@10.0.1.100,
   which does not correspond to the remote side connection used to send
   the register, as seen by the proxy. Thats because of NAT, which will
   make the remote side appear to be a publically routable address.

   To handle this problem, the proxy could, in principal, record the IP
   address and port from the remote side of the connection used to send
   a REGISTER. Then, it can create a Contact entry of the form
   sip:bob@[ip-addr]:[port], where [ip-addr] and [port] are the IP
   address and port of the remote side of the connection. However, this
   is assuming that the registration is for the purposes of connecting
   the address in the To field with the machine the connection is coming
   from. That may not be the intent of the registration. The
   registration may be used to set up a call forwarding service, for



J.Rosenberg,H.Schulzrinne                                     [Page 7]


Internet Draft                   entfw                     March 2, 2001


   example.

   As a result, it is our proposal that clients be allowed to explicitly
   ask a proxy to create a Contact entry corresponding to the machine a
   REGISTER is sent from. We propose that a specific contact hostname
   value be reserved to have the meaning "I don't know what my address
   is, please use the IP address, port and transport from the connection
   over which this REGISTER was delivered". We propose that this host
   name be "jibufobutbmpu". This name is "I hate NATS a lot" with each
   letter incremented by one. This name is unlikely to be used in real
   systems (as opposed to something like "default", which could be real
   host name).

   Consider once more the architecture of Figure 1. The callee has an IP
   address of 10.0.1.100. It initiates a TCP connection to port 5060 on
   the proxy. This connection goes through the NAT, and the source
   address is rewritten to 77.2.3.88, and the port to 2937. The
   registration looks like:


   REGISTER sip:Y.com SIP/2.0
   From: sip:bob@Y.com
   To: sip:bob@Y.com
   Contact: sip:bob@jibufobutbmpu



   The proxy Y then stores the incoming TCP connection into a table:


   (77.2.3.88,2397,TCP) -> [reference to TCP connection]



   It also updates the contact list for sip:bob@Y.com to include the URL
   sip:bob@77.2.3.88:2937;transport=tcp.

   Now, when an INVITE arrives for sip:b@Y.com, it is looked up in the
   registration database. The contact is extracted, and the proxy tries
   to send the request to that address. To do so, it checks its
   connection table to an open connection to the IP address, port and
   transport where the request is destined. In this case, such a
   connection is available, and the request is forwarded over it. The
   response from the callee is also routed over the same connection.

   In order for this connection to be used for re-INVITEs or BYEs, the
   proxy needs to record route.




J.Rosenberg,H.Schulzrinne                                     [Page 8]


Internet Draft                   entfw                     March 2, 2001


4.2 Firewalls

   The situation is somewhat simpler for the case of firewalls. We still
   need to have a persistent connection established from Bob out to the
   proxy, possibly using TLS over port 443. A registration is then sent
   over this address, which will look like:


   REGISTER sip:Y.com SIP/2.0
   From: sip:bob@Y.com
   To: sip:bob@Y.com
   Contact: sip:bob@44.2.4.1;transport=tcp



   For this to work, incoming calls for sip:bob@Y.com must be routed
   over the connection established by Bob to proxy Y. We assume the
   proxy maintains persistent connections in a table, indexed by remote
   address, port, and transport (as described above for NAT). In order
   for this connection to be used when contacting Bob, Bob's contact
   address must be the same as the connection address. This means that
   the remote connection address, as seen by Y, has to be 44.2.4.1:5060.
   However, there are several cases where it might not be.

   In what cases would it not be? First off, the client might be multi-
   homed. Multi-homed hosts are increasingly common as VPNs become more
   pervasive. VPNs show up as virtual interfaces, making hosts
   multihomed. The client may not be able to correctly guess which
   interface the REGISTER will be sent on. If the client guesses
   incorrectly, the IP address in the Contact header may be on a
   different interface than the one used to send the registration. The
   second case when the connection address and contact address don't
   match is when the client incorrectly discovers its own IP address,
   even when singly homed. We have observed this to frequently be the
   case. In fact, we have seen some systems report back 127.0.0.1 (the
   loopback address), in fact, as their IP address.

   Thus, even without NAT, the Contact address may not match the source
   address of the TLS or TCP connection used to register. In fact, this
   problem has nothing to do with NATs or firewalls. We have observed it
   happening in many real world scenarios.

   As a result, it is our recommendation that, as a general rule,
   clients use the "Contact cookie" and a persistent connection in order
   to ensure that they are reachable. This solution works for firewalls,
   NATs, multi-homed hosts, singly homed hosts, and a variety of other
   cases.




J.Rosenberg,H.Schulzrinne                                     [Page 9]


Internet Draft                   entfw                     March 2, 2001


   Storing incoming connections in a table for later reuse is useful
   even between proxies. If TCP or TLS is used between proxies X and Y,
   that connection can be stored by both X and Y, and thus reused for
   messaging in either direction. It is for this reason that we separate
   the connection table management from the registration processing.
   Such table management is needed if one of the proxies was on the
   inside of the firewall, for example. In that case, responses and
   requests in the reverse direction would need to be forwarded over the
   connection initiated by the proxy.

5 Handling RTP

   Dealing with SIP was the easy part. Getting the media through a NAT
   or firewall is more complex. RTP is on dynamic ports, peer-to-peer,
   and UDP, all of which are problematic for NATs, firewalls, or both.

   Our solution is to use connection oriented media, either UDP, TCP, or
   TLS, with the entities behind NATs or firewalls initiating the
   connection. This is discussed in more detail below.

5.1 NATs

   The trick to getting RTP through a NAT is to make sure it exhibits
   two characteristics. First, any users behind a NAT have to send the
   first packet to establish a NAT binding. Secondly, media sent back to
   that user must be to the source port where the media came from. In
   other words, if Joe calls Bob, and only Joe is behind a NAT, Joe must
   send the first UDP packet to Bob. Let's say Joe sends from IP address
   and port pair A,B to Bob at public address and port C,D. The NAT will
   translate port pair A,B to X,Y. Bob receives the media. To talk to
   Joe, it is essential that Joe send his media with source port C,D to
   destination port X,Y. This will be received by the NAT, and have the
   destination translated to A,B, where it is sent to Joe.

   Unfortunately, RTP does not work this way. When used with SIP, a
   conversation between Joe and Bob will result in two RTP sessions, one
   from Joe to the address Bob provided in his SDP, and one from Bob to
   the address provided by Joe in his SDP. This will not work with NAT.

5.1.1 Bi-Directional RTP

   Our solution is simple: we define bi-directional RTP. Bi-directional
   RTP runs over UDP. Like TCP, one side initiates a connection to the
   other side. As a result, one side is active (initiates the
   connection), and the other side is passive (waits for the
   connection). Like TCP, data in the reverse direction is sent to the
   port where the connection came from. Unlike TCP, a bi-directional RTP
   connection is created when the first packet arrives; there is no



J.Rosenberg,H.Schulzrinne                                    [Page 10]


Internet Draft                   entfw                     March 2, 2001


   explicit handshake or setup. There are no retransmissons or changes
   to the RTP protocol operation. The only difference is that
   bidirectional RTP involves sending media on the same socket used to
   receive it.


   An example flow using bidirectional media is shown in Figure 2. Joe
   calls Bob. Assume for this flow that Joe is behind a NAT, and Bob is
   not. For simplicities sake, we don't show proxies, and don't show
   much of the SIP detail. Joe indicates, in his SDP in the INVITE, that
   he is capable of bi-directional RTP, and wishes to be the active side
   of the connection (more on this later). Bob receives the INVITE, and
   responds with a 200 OK. His SDP indicates that he can be the passive
   side, and he provides the IP address and port to connect to. When Joe
   receives the 200 OK, an ACK is sent. Then, Joe sends a RTP packet to
   the IP address and port provided by Bob. The RTP packet passes
   through the NAT, and has its source address rewritten. When Bob
   receives this packet, the connection is established. Bob now has the
   IP address and port to send media back to. This address/port is the
   one from the source address of the RTP packet Bob just received
   (which has been natted). Bob sends media to this address. Those
   packets have their destination address natted, translated back to the
   address Joe used to send the first packet.

   In traditional unidirectional RTP, Joe would have included an IP
   address and port in the INVITE, and Bob would have sent media to this
   address, rather than the one in the RTP packet received from Joe.
   This does not work through NAT, since this address is wrong, and
   since no NAT binding has been established. Bidirectional RTP does not
   suffer this problem; note how Joe does not actually need to provide
   an IP address in the SDP in his INVITE.


   The call flow when Bob is behind the NAT is very similar, and is
   shown in Figure 3. Instead of Joe being the active side of the
   connection, Bob is the active side. It is important to note that the
   role of active or passive for the RTP connection is not tied to who
   makes the call.

   As a result, when only one the participants is behind a NAT, a direct
   UDP connection can be used between them. When both are behind NATs,
   an RTP translator is needed. This is described in Section 5.1.3.

5.1.2 Signaling Support

   SDP extensions are needed to allow the signaling discussed above to
   take place. Specifically, extensions are needed to indicate that a
   media stream is bidirectional RTP, and to allow each side to indicate



J.Rosenberg,H.Schulzrinne                                    [Page 11]


Internet Draft                   entfw                     March 2, 2001


   that they are active, passive, or can play either role.

   As it turns out, this is exactly the kind of signaling provided in
   the SDP extensions for TCP media [7]. That draft only handles TCP and
   TLS, but the semantics for TCP are identical to bidirectional UDP.
   Therefore, we propose that a new keyword, BAVP, be used to signal
   that the RTP is bidirectional. The direction attribute and the
   exchange procedures defined in [7] works as described for BAVP.

   Revisiting the flow in Figure 2, the SDP in the INVITE would actually
   appear as:


   c=IN IP4 10.0.1.1/127
   m=audio 9 RTP/BAVP 0
   a=direction:active



   and in the 200 OK as:


   c=IN IP4 4.5.11.3/127
   m=audio 4444 RTP/BAVP 0
   a=direction:passive



5.1.3 Both parties behind NAT

   The approach described above works if (1) only one of the two parties
   are behind a NAT, and (2) the party behind a NAT knows they are
   behind a NAT. To handle these problems, we introduce functionality
   into the proxies. The proxies can detect, by inspecting components of
   the messages, which parties are behind NATs. They can rewrite SDP in
   order to ensure that those parties behind NATs are active.
   Furthermore, when both are behind a NAT, the proxies can bring an RTP
   translator into the call. RTP translators can be thought of as RTP
   routers; they receive RTP packets on a particular incoming port, and
   send them out on a different port/address. When both parties are
   behind a NAT, the proxies will rewrite the SDP so that both sides
   initiate outward connections to the RTP translator. The RTP
   translator then hands packets back and forth between the connections.


   We show these boxes incorporated into the architecture in Figure 4.
   Only one translator is needed per call. Our architecture will only
   result in usage of the box when both parties are behind NATs, which



J.Rosenberg,H.Schulzrinne                                    [Page 12]


Internet Draft                   entfw                     March 2, 2001






     |                |                              |
     |                |                              |
     |---------------------------------------------> |
     |                | INV sip:bob@Y.com            |
     |                | active                       |
     |                |                              |
     |                |                              |
     |                |                              |
     |<--------------------------------------------- |
     |                | 200 OK                       |
     |                | passive                      |
     |                | 4.5.11.3:4444                |
     |                |                              |
     |                |                              |
     |---------------------------------------------> |
     |                | ACK                          |
     |                |                              |
     |                |                              |
     |                | RTP from Joe to Bob          |
     |----------------->---------------------------> |
     |S:10.0.1.1:12   |S:7.1.1.1:227                 |
     |D:4.5.11.3:4444 |D:4.5.11.3:4444               |
     |                |                              |
     |                | RTP from Bob to Joe          |
     |<--------------<-------------------------------|
     |S:4.5.11.3:4444 |              S:4.5.11.3:4444 |
     |D:10.0.1.1:12   |              D:7.1.1.1:227   |
     |                |                              |
     |                |                              |
     |                |                              |
     |                |                              |
     |                |                              |
     |                |                              |
     |                |                              |
     |                |                              |
     |                |                              |
     |                |                              |
     |                |                              |
     |                |                              |
     |                |                              |

    Joe               NAT                           Bob


   Figure 2: Bi-directional RTP Flow


J.Rosenberg,H.Schulzrinne                                    [Page 13]


Internet Draft                   entfw                     March 2, 2001






     |                          |                    |
     |                          |                    |
     |---------------------------------------------> |
     |                          |  INV sip:bob@Y.com |
     |                          |  either            |
     |                          |  7.1.1.1:88        |
     |                          |                    |
     |                          |                    |
     |<--------------------------------------------- |
     |                          |             200 OK |
     |                          |             active |
     |                          |                    |
     |                          |                    |
     |                          |                    |
     |---------------------------------------------> |
     |                          |               ACK  |
     |                          |                    |
     |                          |                    |
     |                          |RTP from Bob to Joe |
     |<----------------<---------------------------< |
     |S:4.5.11.3:654            |    S:10.0.1.1:44   |
     |D:7.1.1.1:88              |    D:7.1.1.1:88    |
     |                          |                    |
     |                          |RTP from Bob to Joe |
     |>-------------->------------------------------>|
     |                          |                    |
     |S:7.1.1.1:88              |      S:7.1.1.1:88  |
     |D:4.5.11.3:654            |      D:10.0.1.1:44 |
     |                          |                    |
     |                          |                    |
     |                          |                    |
     |                          |                    |
     |                          |                    |
     |                          |                    |
     |                          |                    |
     |                          |                    |
     |                          |                    |
     |                          |                    |
     |                          |                    |
     |                          |                    |

   Joe                         NAT                  Bob





J.Rosenberg,H.Schulzrinne                                    [Page 14]


Internet Draft                   entfw                     March 2, 2001


   Figure 3: Bi-directional RTP Flow, NAT role reversed


   is the only case when one is needed. Our solution will result in the
   invocation of RTP forwarding services by the domain of the called
   party.

   The basic idea behind the solution is this. User agents must be able
   to initiate or terminate bidirectional RTP connections. The calling
   side always indicates support for both. When a proxy for a user in
   some domain receives a call (either to or from that user), that proxy
   accepts the responsibility for setting the direction attribute in the
   SDP in such a way that the client will be able to successfully handle
   media.

   Consider first proxy X, representing Joe. When Joe makes an outgoing
   call, Joe's UA will set the direction attribute in the SDP to "both"
   and include the IP address and port Joe is prepared to receive media
   on. This INVITE is sent to proxy X. Proxy X determines if Joe is
   behind a NAT. This can be done either through configuration (when the
   user signs up, they indicate whether they are behind a NAT or not),
   or through packet inspection. If the source address of the INVITE
   does not match the address and port in the Via header (especially if
   the ports don't match), Joe is behind a NAT.

   If Joe is behind a NAT, proxy X knows that Joe can not accept
   incoming connections. Thus, Joe cannot actually be either active or
   passive; he must be active. Proxy X therefore rewrites the SDP to
   indicate a direction of active. If, for some reason, Joe's UA had set
   the SDP to indicate either active or passive, this can be taken as an
   indicator that Joe knows he is (active) or is not (passive) behind a
   NAT, in which case no action is needed by the proxy.

   When the call arrives at proxy Y, proxy Y first determines the call
   routing. If it discovers that the call is to be routed to the called
   party's machine (which it knows based on whether the user registered
   with the Contact cookie), and it determines that the called party is
   behind a NAT (based on the source address of the REGISTER compared to
   the address in the top Via header of the REGISTER), the proxy may
   need to modify the SDP. If the SDP in the incoming INVITE indicates a
   direction of both, it is changed to passive (this way, the called
   party initiates the connection). If the direction is passive, nothing
   is done. If the SDP in the incoming INVITE indicates a direction of
   active, there is a problem. Both parties are only capable of
   initiating active connections. To handle this, proxy Y needs to
   involve an RTP translator. It allocates a pair of address/port pairs,
   A and B, from the translator. It rewrites the SDP in the INVITE to
   indicate a direction of passive, and sets the IP adress and port pair



J.Rosenberg,H.Schulzrinne                                    [Page 15]


Internet Draft                   entfw                     March 2, 2001













                      +-------+       +-------+
                      | SIP   |       | SIP   |
                      | Proxy |       | Proxy |
                      |  X    |       |   Y   |
                      |       |       |       |
                      +-------+       +-------+

                         ----
                        /RTP \
                       | Forw.|
                        \    /
                         ----

           +-------+                           +-------+
   ........|FW/NAT |............       ........|FW/NAT |............
   .       |       |           .       .       |       |           .
   .       +-------+           .       .       +-------+           .
   .                           .       .                           .
   .                           .       .                           .
   .                           .       .                           .
   .                           .       .                           .
   .                           .       .                           .
   .                           .       .                           .
   .       +-------+           .       .       +-------+           .
   .       |  Joe  |           .       .       |  Bob  |           .
   .       | SIP UA|           .       .       | SIP UA|           .
   .       +-------+           .       .       +-------+           .
   .............................       .............................

        Enterprise A                          Enterprise B


   Figure 4: RTP Translators







J.Rosenberg,H.Schulzrinne                                    [Page 16]


Internet Draft                   entfw                     March 2, 2001


   to A. This will ensure that the called party initiates an RTP
   connection out to the translator. Similarly, in the SDP in the
   response, the direction (which will be active) is rewritten to
   passive, and the IP address is set to B. This will ensure that the
   calling party initiates an RTP connection out to the translator. The
   proxy then tells the translator that packets received on A should be
   relayed to the connection on B, and vice a versa.

   The actions at the proxies for incoming and outgoing calls are
   summarized in Table 1.


    Call Direction  SDP direction  rewrite to  note
    Incoming        both           passive
    Incoming        active         passive     introduce RTP translator
    Incoming        passive        -
    Outgoing        both           active
    Outgoing        active         -
    Outgoing        passive        -


   Table 1: Rules for SDP Rewriting


   Based on these rules, we can analyze the four cases.

   In case one, neither party is behind a NAT. The caller indicates a
   direction of "both" in the SDP. The local outbound proxy does not
   change that, since it detects that the caller is not behind a NAT.
   The call is forwarded to the proxy for the called party. It doesn't
   modify the SDP either, and forwards the call to the called party. In
   its response, the called party indicates that it can support a
   direction of "both". When the response is delivered to the calling
   party, both sides initiate bidirectional RTP connections to each
   other. One of them is chosen, and is used for media.

   In the second case, the caller is behind a NAT, but the called party
   is not. The caller indicates a direction of "both" in the SDP. The
   local outbound proxy detects that the caller is behind a NAT. It
   therefore modifies the SDP to indicate a direction of "active". The
   call is forwarded to the proxy for the called party. It determines
   that the called party is not behind a NAT. So, it leaves the SDP
   alone. The called party sees that the caller requested the active
   side of the connection. So, in the 200 OK response, the called party
   indicates passive. This 200 OK is forwarded back to the caller. The
   caller initiates a bidirectional RTP connection the called party,
   which succeeds. The media is sent over that connection.




J.Rosenberg,H.Schulzrinne                                    [Page 17]


Internet Draft                   entfw                     March 2, 2001


   In the third case, the caller is not behind a NAT, but the called
   party is. The caller indicates a direction of "both" in the SDP. The
   local outbound proxy does not change that, since it detects that the
   caller is not behind a NAT. The call is forwarded to the proxy for
   the called party. This proxy determines that the called party is
   behind a NAT. It rewrites the direction tag in the SDP in the INVITE
   from "both" to "passive". This is received at the called party. It
   has no choice but to respond with a direction of "active" in its 200
   OK. This is forwarded to the calling party. The called party then
   initiates a bidirectional RTP connection to the caller, which
   succeeds. The media is sent over that connection.

   In the fourth, and worst case, scenario, both are behind NATs. The
   caller indicates a direction of "both" in the SDP. The local outbound
   proxy detects that the caller is behind a NAT. It therefore modifies
   the SDP to indicate a direction of "active". The call is forwarded to
   the proxy for the called party. THis proxy also detects that the
   called party is behind a NAT. However, the SDP indicates a direction
   of "active", which is bad. The proxy then brings in an RTP
   translator, and rewrites the direction to be passive. It also sets
   the c line and m line to contain address/port pair A of the
   translator. This INVITE received at the called party. It has no
   choice but to respond with a direction of "active" in its 200 OK. The
   200 OK is received at the proxy, where it rewrites the direction tag
   from "active" to "passive". It also sets the c line and m line to
   contain address/port pair B of the translator. This INVITE is
   received at the calling party. Both sides then initiate outbound
   connections. The caller sends RTP to address/port B, and the callee
   sends RTP to address/port A. The translator exchanges media between
   these two connections.

   Either the proxy or the RTP translator can manage the lifecycle of
   the connection binding. If the proxy does it, the proxy must record-
   route When the call is over (known through the BYE), the proxy
   destroys the connections and connection bindings from the translator.
   If the RTP translator manages the lifecycles, the proxy need not ever
   record route or maintain call state. When the call is over, the
   caller and callee both disconnect their RTP connections to the
   translator (this is done with an RTCP BYE). When both connections
   disconnect, the translator can destroy the bindings.

   In cases where there is no RTP translator available, and both parties
   are behind a NAT, media cannot flow. In some cases, this will be
   detectable by the called party or their proxy (if the incoming SDP
   has bidirectional media with a direction of active, and the called
   party is behind a NAT, and no translator is available). In this case,
   the called party or proxy responds with a 488 Not Acceptable Here,
   and includes a Warning header indicating a code 308 - NAT Traversal



J.Rosenberg,H.Schulzrinne                                    [Page 18]


Internet Draft                   entfw                     March 2, 2001


   Failure.

5.2 Firewalls

   Because firewalls restrict connections to outbound only, the same
   problem that plagues NATs also plagues firewalls. The same solution
   as described above can also solve it, with a few minor tweaks. The
   solution in Section 5.1 is defined for UDP. UDP will not work through
   firewalls. Therefore, RTP over TCP or TLS is used instead. In the
   worst case, the RTP would need to be carried over a TLS connection on
   port 443. Besides this difference, the solution for firewall is the
   same as described for NAT. Note that since SIP may be over TLS to
   port 443 as well, the proxy and the RTP translator should not be on
   the same IP address.

6 Caveats

   There are many caveats with our proposed solutions, especially for
   firewall.

6.1 NAT Solutions

        o RTP translators are horrible. The author spent much time
          arguing against such devices, on the grounds that the
          underlying IP network already providing routing capabilities,
          and that these do not need to be replicated at the voice
          transport layer. They will increase overall voice latency,
          introduce another point of failure, and incur additional costs
          to providers. However, they are unavoidable given that the
          fundamental semantic of the IP address, that it is a globally
          reachable point for communications, has been violated by NATs.
          Perhaps this is argument can be rephrased as, "unreliable and
          delayed communication beats no communication."

        o If the RTP translator is not co-resident with the proxy, some
          kind of control protocol is needed to allocate addresses and
          to establish bindings. No such protocol exists right now. The
          midcom protocol [3] or MGCP [8] might be used for this
          purpose. We expect these translators to be bundled with
          proxies, and thus make use of proprietary protocols initially.

        o It is possible that both caller and called party are behind a
          NAT, but are behind *the same* NAT. In this case, no RTP
          translator is needed. In theory, this case can be hard to
          detect, but in practice, can frequently be determined
          administratively. As an example, a SIP provider might be
          providing centrex types of services to users in a network
          behind a NAT. The proxy providing these services will know



J.Rosenberg,H.Schulzrinne                                    [Page 19]


Internet Draft                   entfw                     March 2, 2001


          which users belong to the same enterprise, and it can modify
          its behavior accordingly. Even if the proxy is wrong, the
          worst case is that an RTP translator is involved, increasing
          voice latency.

        o If the calling party is behind a NAT, an RTP connection cannot
          be established until the 200 OK is returned to the caller.
          This means that the post-pickup delay increases by an RTT,
          which introduces additional clipping. This can be solved
          through early media. The SDP is returned in a 183, allowing
          the media connection to be established before the 200 OK.

        o The use of persistent TCP or TLS connections for SIP between
          the user agents and their proxies makes clustering more
          complex. With traditional UDP, a call for some user could
          arrive at any proxy that has access to the location service
          which can route the call to Bob. Not so any longer. With
          persistent connections, the users are partitioned across the
          proxies in a cluster.

6.2 Firewall Solutions

        o Riding on top of port 443 for SIP over TLS goes against the
          principles of the guidelines established by the IESG [9].

        o TLS or TCP will result in very bad voice delays as soon as the
          packet loss is nonzero. Interestingly, with zero packet loss,
          the delays for voice over TCP will be equal to those of voice
          over UDP. Clients will need adaptive voice buffer algorithms
          that can tolerate wide swings in latencies.

        o Current SIP client implementations do not require a TCP stack.
          The firewall solution will require TCP and/or TLS.

        o For firewalls, our approach requires a TLS server process (to
          receive RTP) embedded within a SIP enabled communications
          client. This will require a public/private key and its
          associated certificate, available to the client, issued from a
          Certification Authority (CA) that is known to the other party.
          Similarly, use of a TLS client will require that the client be
          configured with the keys of a set of well known CAs.

   Support for TCP and/or TLS in the softphones can be mitigated by
   deploying UDP to TCP/TLS translation proxies inside of the firewall.

7 Security Considerations

   RTP translators are effectively man-in-the middle systems. As a



J.Rosenberg,H.Schulzrinne                                    [Page 20]


Internet Draft                   entfw                     March 2, 2001


   result, a rogue proxy and RTP translator can listen in on the media
   of all users initiating calls through it. To prevent this, clients
   initiating TLS connections to a server should verify that the server
   name in the SDP is a subdomain of the name presented in the
   certificate. Furthermore, the client should only connect to servers
   whose domains are subdomains of their service provider, or the
   provider of the other party in the call.

8 Conclusion

   In this draft, we have proposed some modifications to SIP operation
   which allow it to successfully pass through NATs and firewalls. We
   believe our NAT solution is very workable. It has minimal impact on
   clients, allows voice to run over UDP, and uses direct UDP transport
   in all but the worst case. Our solutions for firewalls are less
   palatable. The ideal solution is for firewall administrators to allow
   SIP (over TCP on 5060 or TLS on 5061) out through the firewall, and
   to eventually deploy ALGs, preferably using the midcom architecture.

   We believe that solving the firewall and NAT problems are critical
   for deployment of SIP.

9 Acknowledgements

   We would like to thank Jeffrey Citron and John Butz from Vonage for
   their efforts at verifying UDP NAT capabilities in existing
   commercial products.

10 Author's Addresses


   Jonathan Rosenberg
   dynamicsoft
   72 Eagle Rock Avenue
   First Floor
   East Hanover, NJ 07936
   email: jdrosen@dynamicsoft.com

   Henning Schulzrinne
   Columbia University
   M/S 0401
   1214 Amsterdam Ave.
   New York, NY 10027-7003
   email: schulzrinne@cs.columbia.edu







J.Rosenberg,H.Schulzrinne                                    [Page 21]


Internet Draft                   entfw                     March 2, 2001


11 Bibliography

   [1] M. Holdrege and P. Srisuresh, "Protocol complications with the IP
   network address translator (NAT)," Internet Draft, Internet
   Engineering Task Force, Oct. 2000.  Work in progress.

   [2] J. Rosenberg, D. Drew, and H. Schulzrinne, "Getting SIP through
   firewalls and NATs," Internet Draft, Internet Engineering Task Force,
   Feb. 2000.  Work in progress.

   [3] P. Srisuresh, J. Kuthan, and J. Rosenberg, "Middlebox
   communication architecture and framework," Internet Draft, Internet
   Engineering Task Force, Feb. 2001.  Work in progress.

   [4] P. Srisuresh and M. Holdrege, "IP network address translator
   (NAT) terminology and considerations," Request for Comments 2663,
   Internet Engineering Task Force, Aug. 1999.

   [5] E. Rescorla, "HTTP over TLS," Request for Comments 2818, Internet
   Engineering Task Force, May 2000.

   [6] M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, "SIP:
   session initiation protocol," Request for Comments 2543, Internet
   Engineering Task Force, Mar. 1999.

   [7] D. Yon, "TCP-Based media transport in SDP," Internet Draft,
   Internet Engineering Task Force, Nov. 2000.  Work in progress.

   [8] M. Arango, A. Dugan, I. Elliott, C. Huitema, and S. Pickett,
   "Media gateway control protocol (MGCP) version 1.0," Request for
   Comments 2705, Internet Engineering Task Force, Oct. 1999.

   [9] K. Moore, "On the use of HTTP as a substrate for other
   protocols," Internet Draft, Internet Engineering Task Force, Oct.
   2000.  Work in progress.
















J.Rosenberg,H.Schulzrinne                                    [Page 22]