Network Working Group O. Maennel
Internet-Draft T-Labs/TU-Berlin
Intended status: Standards Track R. Bush
Expires: April 30, 2009 Internet Initiative Japan
L. Cittadini
Universita' Roma Tre
S. Bellovin
Columbia University
October 27, 2008
The A+P Approach to the Broadband Provider IPv4 Address Shortage
draft-ymbk-aplusp-00
Status of this Memo
By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
This document may not be modified, and derivative works of it may not
be created.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 30, 2009.
Abstract
We are facing the exhaustion of the IANA IPv4 free IP address pool.
Unfortunately, IPv6 is not yet deployed widely enough to fully
replace IPv4, and it is unrealistic to expect that this is going to
change before we run out of IPv4 addresses. Letting hosts seamlessly
communicate in an IPv4-world without assigning a unique globally
Maennel, et al. Expires April 30, 2009 [Page 1]
Internet-Draft A+P Addressing Extension October 2008
routable IPv4 address to each of them is a challenging problem, for
which many solutions have been proposed. Some prominent ones involve
carrier-grade-NATs (CGN), which have been shown to provide an
inadequate experience to IPv4 users and enshrine a walled garden in
the core of the provider. Instead, we propose using specialized NATs
at the consumer premises equipment (CPE) edge which treat some of the
port number bits as part of an extended IPv4 address.
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Why Carrier-Grade-NATs are Harmful . . . . . . . . . . . . 3
1.2. Security of CGNs . . . . . . . . . . . . . . . . . . . . . 5
2. Proposed Solution . . . . . . . . . . . . . . . . . . . . . . 5
2.1. Changes Required to the Network . . . . . . . . . . . . . 6
2.1.1. Changes Required to CPE . . . . . . . . . . . . . . . 6
2.1.2. Changes to Customer-Provided NAT . . . . . . . . . . . 7
2.1.3. Changes to Provider-Edge Routers . . . . . . . . . . . 7
2.1.4. Changes to Provider Border Routers . . . . . . . . . . 7
2.1.5. Changes to Network Core Routers . . . . . . . . . . . 8
3. Implementation . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1. A+P dual-stack . . . . . . . . . . . . . . . . . . . . . . 8
3.2. Design of the A+P NAT Device . . . . . . . . . . . . . . . 14
3.3. IPv6 and mixed V4-V6 traffic . . . . . . . . . . . . . . . 16
3.4. Handling ICMP . . . . . . . . . . . . . . . . . . . . . . 16
3.5. Handling IP fragments . . . . . . . . . . . . . . . . . . 16
3.6. The incremental path to A+P . . . . . . . . . . . . . . . 17
4. Benefits and limitations of A+P . . . . . . . . . . . . . . . 18
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19
6. Security Considerations . . . . . . . . . . . . . . . . . . . 19
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19
8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
8.1. Normative References . . . . . . . . . . . . . . . . . . . 20
8.2. Informative References . . . . . . . . . . . . . . . . . . 20
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 21
Intellectual Property and Copyright Statements . . . . . . . . . . 22
Maennel, et al. Expires April 30, 2009 [Page 2]
Internet-Draft A+P Addressing Extension October 2008
1. Introduction
Many large Internet Service Providers (ISPs) face the problem, that
their networks' customer edges are so large that, even giving the
'front' of each customer premises equipment (CPE) only one single
IPv4 address, they need two to five /8s of IPv4 space. The looming
exhaustion of the free IANA IPv4 pool makes it highly unlikely that
they would be allocated that much public IPv4 address space.
Therefore ISPs have to devise something more ingenious. Deploying
NATs is a direct consequence of the design of a new protocol (IPv6)
which is incompatible on the wire, there is not the slightest
compatibility mode. Although undesirable, NATs are inevitable.
An approach which some broadband providers are testing is being
called Carrier Grade NAT (CGN). It is essentially a number of IPv4
NATs in the core of their networks and various tunneling and
translation techniques. If the CPE has dual stack, traffic where
source and destination is IPv6 would not have to be NATted, but IPv4
would be heavily NATted. We can contrast this to, for example,
NAT-PT [RFC2766] [RFC4966] on the CPE, which would probably scale to
the needs of even a large non-consumer backbone. But, as we noted
above, very large broadband consumer providers would need far too
much IPv4 space for the NAT-PT front ends for their large consumer
networks.
Our main concern is that the imminent IPv4 address exhaustion is
tempting operators to deploy technology which is damaging to the
Internet as a whole.
1.1. Why Carrier-Grade-NATs are Harmful
We have taken up a desperate search for alternatives. The reasons
are simple:
"Carrier grade" is a euphemism for centralized. More semantics move
to the core of the network. This is bad in and of itself. Net-heads
call it "telco-think" because it is the telco model of smarts in the
core as opposed to the Internet model of a simple, just forward
packets, core and smart edges. It also places the provider in the
position of a walled garden, where the user is trapped behind
unchangeable application and policies, the opposite of the "end to
end" model of the Internet.
With the smarts at the edges, e.g. NAT-PT, one can easily field new
protocols between consenting end-points by just tweaking the NATs at
the corresponding CPE, even adding application layer gateways (ALGs)
if they are needed. However, CGNs do not build an Internet walled
garden at the edges, they build it by restricting the core.
Maennel, et al. Expires April 30, 2009 [Page 3]
Internet-Draft A+P Addressing Extension October 2008
With NAT in the core, if a customer wants a new application protocol
which requires cooperation from the NAT, he gets to beg help from the
broadband providers' engineers and lawyers, and all other users of
carrier grade NATs. This is the ultimate horror the NAT-haters fear,
and, in this case, they are not all that wrong.
One broadband provider has recently received a lot of bad press for
just this, though we know that the engineers are very far from those
responsible. This shows that all new application protocols have to
go through the carrier loving lawyers to be allowed to be handled by
the NATs in their core. Today's NATs are typically mitigated by
ALG's of which the customer has some degree of control, e.g. port
forwarding or UPnP. However, this is not expected to work anymore
with CGN's. CGN proposals admit that it is not expected that
applications that require specific port assignment or port mapping
from the NAT box will keep working
[I-D.durand-softwire-dual-stack-lite]. We believe this is not an
option and that the end-user must have the ability to control their
own ALGs. So, if someone wants to deploy a new application, they can
talk to the broadband providers' lawyers or run new disruptive
technology over HTTP, we pick our poison. And if the NAT is not
where the customer can directly control it, i.e. it is anywhere back
in the provider's network, then the provider controls what the user
can control, i.e. it is not really under user control. We do not
wish to deal with the case where the provider has to decide whether
to allow Skype v42 when they themselves provide a competing VoIP
product.
And remember that, as IPv6 deploys, and we want to have one Internet,
i.e. IPv4 nodes talking freely with IPv6 nodes, then translation
must be done somewhere. The challenge is whether someone can figure
out a scheme where it is done for these large networks? We believe
it should be at the customer edge, not in the core.
Another issue with CGNs is scalability. ISPs face a tension in
between the placement of CGNs within their network to aggregate as
much as possible and that too much aggregation creates a massive
state problem. To reduce the state, the placement ends up somewhere
closer to the edge, where the benefits are somewhat limited.
It is not clear how a CGN should maintain per-session state in a
scalable manner. This is particularly relevant given that each
customer is very likely to open many TCP connections in parallel.
State for improperly terminated sessions could remain stale for some
time. The CGN hence trades scalability for the amount of state that
needs to be kept, and this makes optimally placing a CGN a hard
engineering problem.
Maennel, et al. Expires April 30, 2009 [Page 4]
Internet-Draft A+P Addressing Extension October 2008
With CGNs, tracing hackers, spammers and other criminals will be
impossible, unless all the connection based mapping information is
recorded and stored. This would cause not only concern for law
enforcement services, but also for privacy advocates. Which brings
us to the other security related problems with CGNs in the next
section.
1.2. Security of CGNs
NATs frequently need to initiate translation for secondary port
numbers. This may be a decision based on packet inspection (i.e.,
looking for PORT commands in FTP [RFC0959] sessions), or it may rely
on explicit signaling from the end host via protocols such as UPnP.
Either way, CGNs pose a security threat and/or an administrative
nightmare.
The issue is proper authentication of such requests. Most UPnP
devices do not implement appropriate security features. Even if they
did, there would be no way to administer the security mechanism.
Every end-user device would have to have a secret corresponding to
some authentication field in the CGN. End users will not set these
up properly; providers do not want to maintain such a database.
Decisions made based on packet inspection are just as problematic. A
request from one customer could easily request opening a port for an
other customer's addresses, similar to the Java-based attack
described by Martin et al in [Martin-Java].
2. Proposed Solution
The specific problem we are facing is that available IPv4 address
space is insufficient to number the IPv4-speaking customers, while
IPv6 is not widely enough deployed to migrate to an IPv6-only world.
Therefore, we propose to extend the IPv4 address space by assigning
to each customer a single IPv4 address which is extended by
"stealing" bits from the port number in the TCP/UDP header, leaving
the applications a reduced range of ports. In the face of IPv4
address exhaustion, the need for addresses is stronger than the need
to be able to address thousands of applications on a single host
[SP-NAT], and broadband consumers are not anticipated to deploy a
massive number of applications over IPv4 (if they did, CGN would be
even more damaging than this "bit-stealing" proposal). Assuming we
could limit the applications' port addressing to 8 (or 12) bits, we
can increase the effective size of an IPv4 address by 8 (or 4)
additional bits. In this scenario, 512 (or 16) customers could be
multiplexed on the same IPv4 address, while allowing them a fixed
range of 512 (or 4096) ports. We call this "extended addressing" or
Maennel, et al. Expires April 30, 2009 [Page 5]
Internet-Draft A+P Addressing Extension October 2008
"A+P" (Address Plus Port) addressing.
2.1. Changes Required to the Network
The devices involved in this approach are as follows:
1. Customer Premises Equipment (CPE), i.e. cable/DSL modem
2. Customer-Provided-NAT (CN), (optional)
3. Provider Edge Router (PE), AKA customer aggregation router
4. Provider Border Router (BR), provider's edge to other providers
5. Network Core Routers (Core), provider routers not PE or BR
2.1.1. Changes Required to CPE
As the customer's hosts should be unaware of the restricted range of
ports and the extended A+P addressing scheme, translation would be
done at the border between the customer and the provider. In the
most common case, this is the provider provisioned cable or DSL modem
on the customer's premises into which the customer plugs their single
computer or a LAN. This CPE would be aware of the A+P extended
addressing. This could be done, for example, via a vendor or other
extension to DHCP. The CPE would also provide the A+P NAT function
between the customer's LAN and the provider.
This would require modification of current CPE. However, current CGN
approaches require modifications to the CPE as well, for example
[I-D.durand-softwire-dual-stack-lite] says, "It is expected that the
home gateway is either software upgradable, replaceable or provided
by the service provider as part of a new contract."
The customer premises equipment would be configured, hopefully
automatically, with
o IPv4 and/or IPv6 addressing for the customer's LAN
o The IPv4 A+P extended address for the WAN side to connect to the
provider,
o An IPv6 address for the WAN side to connect to the provider, and
o The range of port number to use on the WAN side.
Maennel, et al. Expires April 30, 2009 [Page 6]
Internet-Draft A+P Addressing Extension October 2008
2.1.2. Changes to Customer-Provided NAT
Alternatively, as occasionally happens today, the customer could
provide its own A+P NAT and the CPE would then be configured as a
simple cable/DSL modem. This customer A+P NAT would be configured
with the IPv4 address and port-range allocated to the customer (e.g.,
via extended DHCP).
The customer NAT is entirely optional. The customer does not have to
operate such a device. If they do not, then the provider installed
CPE handles the mappings. A mixture of CPE and CN device is also
possible, where the customer gets full control over the CPE via an
administrative login. In this draft, we write CPE/CN to denote the
device the customer has control on (i.e., either a CPE with
administrative control, or an "A+P-aware CN").
2.1.3. Changes to Provider-Edge Routers
Ultimately, we expect that all CPE/CN's take the functionality of the
A+P gateway. Then the provider's customer aggreagation router (aka
PE) might only perform some security related functions, i.e., assure
that a CPE/CN does not send packets from other ports than the
allocated port-range, as the replies in-turn, would go back to some
other hosts. This is a comparable threat as IP source address
spoofing.
During a transition phase, however, customers with legacy CPE could
have the A+P gateway-functionality provided by the PE. If we assume
only layer 2 devices which connect directly to an interface of the
PE, there should be no problems for the customer to be unaware of the
restricted port range. Unfortunately, this comes very close to the
walled garden effect that a CGN would cause.
However, one important difference applies: customers who wish to
"escape" from the walled garden can run their own upgraded CN. This
way customers become aware of which ports will be A+P NATted and
which will not, so they have control over their own applications with
no need to interact with the ISP (e.g., there's no need for UPnP
equivalents).
2.1.4. Changes to Provider Border Routers
Routers at the provider's edge which face other providers need to be
aware of the extended A+P IPv4 addresses. They must have the ability
to forward packets to the PE based on IPv4 address and port.
We suggest that the provider network use IPv6 as the tunneling
mechanism. The CPE/CN or PE routers would encapsulate the A+P pseudo
Maennel, et al. Expires April 30, 2009 [Page 7]
Internet-Draft A+P Addressing Extension October 2008
address within an IPv6 address using a well-known IPv6 prefix. Then
the core would route on the IPv6 address. The border routers would
recognize the well-known IPv6 prefix, decapsulate the inner IPv4
packet, and normally route on the IPv4 address. Thus the provider's
network could be IPv6 only, or any other layer 3/2.5 protocol.
2.1.5. Changes to Network Core Routers
If transport through the provider is chosen appropriately, e.g.
IPv4-in-IPv6-encapsulation, the network's core routers need not
understand A+P extended IPv4 addressing at all. Routing through the
core without some form of tunneling would require the deployment of
IPv4-A+P all the way to the PE routers. As the original problem was
insufficient IPv4 space, we assume that IPv6 or other non-IPv4
tunneling will be used.
However, while we recommend IPv6, we acknowledge that A+P is the
natural extention of IPv4, and should work seeminglessly. In an
IPv4-only (or dual-stacked) network, we propose to host only
unsplitted/full IPv4 addresses on the PE. In this case no
modifications have to be done to allow routing of /32-or-longer
prefixes and forwarding will work with legacy equipment. Only the PE
would have to be upgraded to A+P-awareness.
3. Implementation
3.1. A+P dual-stack
There's wide consensus that the only long term solution to the IPv4
address shortage is speeding the deployment of IPv6. Hence, we argue
that the main design requirement for any short term solution is to
ease, or at least not hamper, ISP-wide IPv6 deployment. A+P
addressing enables ISPs to run an IPv6-only core with dual-stack
devices at the edge. In fact, the A+P CPE/CN and the BR are the only
devices that need to support dual-stack. A+P addressing requires
those devices to be assigned IPv6 addresses belonging to an ISP-wide
well-known prefix (WKP), which only needs to be routable within the
ISP. The CPE/CN learns both WKP and its A+P address and port range
(e.g., via DHCP), and configures its WAN interface accordingly.
Figure 1 shows an example of how WKP and A+P are combined to obtain
an IPv6 address at the CPE/CN.
Maennel, et al. Expires April 30, 2009 [Page 8]
Internet-Draft A+P Addressing Extension October 2008
Configuration (e.g., from DHCP):
--------------------------------
WKP = 4999::/64 (64 bits)
A = 12.0.0.1 (32 bits)
P = ports 4096 to 8191
Port bits usage:
--------------------------------
P = Pa + Pp (16 bit port field in TCP header)
Pa = address extension (4 bits)
Pp = restricted port number (12 bits)
from 0001000000000000 (4096) to 0001111111111111 (8191)
\__/\__________/
/ \
/ \
+------------+ +---------------+
| part of A+P| | spare bits for|
| address | | port number |
| (4 bits) | | (12 bits) |
+------------+ +---------------+
IPv6 prefix:
--------------------------------
4999:0:0:0 : 0c00:0001 : 1000 :: /100
\________/ \___________/ \__/
WKP A+P address (64+32+4 bits)
Building an IPv6 prefix from Well Known Prefix and A+P address
Figure 1
This prefix is announced by the PE in the internal routing of the
provider, either IGP or iBGP depending on the provider's routing
philosophy. Those prefixes are expected to be highly aggregatable,
so that A+P prefixes do not result in large routing tables. It is
expected that those prefixes can be announced with very little impact
on the routing table size in the ISP core network.
Packet delivery works as follows. We first describe how a packet is
being transmitted from an A+P-end-user device behind a CPE/CN towards
the legacy Internet, and then the opposite direction. In the
following examples, we assume that the end-user host is not A+P-
aware. Hence, port numbers are A+P NATted at the CPE/CN. The CPE/CN
receives an IPv4 packet from the customer to a destination address
V4D, ensures that the source port falls into the configured port
range, and then encapsulates the packet in an IPv6 packet where the
Maennel, et al. Expires April 30, 2009 [Page 9]
Internet-Draft A+P Addressing Extension October 2008
source address is WKP+A+P, and the destination address is WKP+V4D.
The packet is then routed using standard routing in the ISP core, up
to the provider's BR. Note that there is no preconfigured tunnel
between the CPE/CN and the BR, and the packet is routed based on the
destination address, rather than a predetermined endpoint. When the
BR receives the packet, it de-capsulates the IPv4 packet where the
source is A and the destination is V4D. Figure 2 exemplifies routing
of outgoing packets. Observe that the source port does not initially
fall in the configured range (datagram 1), so it is translated at the
CPE/CN (datagram 2).
Maennel, et al. Expires April 30, 2009 [Page 10]
Internet-Draft A+P Addressing Extension October 2008
+-----------+
| Host |
+-----+-----+
| |12.0.0.1 (ports 4096 to 8191)
IPv4 datagram 1 | |
| |
v |
+---------|---------+
|CPE/CN | |
+--------|||--------+
| |||4999:0:0:0:0c00:0001:1000::/100
IPv6 datagram 2| |||
| |||<-IPv4-in-IPv6
| |||
-----|-|||-------
/ | ||| \
| ISP network |
\ | ||| /
-----|-|||-------
| |||
v |||
+--------|||--------+
|BR ||| |
+---------|---------+
| |
IPv4 datagram 3 | |
-----|--|--------
/ | | \
| Internet |
\ | | /
-----|--|--------
| |
v |128.0.0.1
+-----+-----+
| IPv4 Host |
+-----------+
Figure 2: Routing of Outgoing Packets
Maennel, et al. Expires April 30, 2009 [Page 11]
Internet-Draft A+P Addressing Extension October 2008
+-----------------+--------------+-----------------------------+
| Datagram | Header field | Contents |
+-----------------+--------------+-----------------------------+
| IPv4 datagram 1 | IPv4 Dst | 128.0.0.1 |
| | IPv4 Src | 12.0.0.1 |
| | TCP Dst | 80 |
| | TCP Src | 32000 |
| --------------- | ------------ | --------------------------- |
| IPv6 Datagram 2 | IPv6 Dst | 4999:0:0:0:128.0.0.1:: |
| | IPv6 Src | 4999:0:0:0:0c00:0001:1001:: |
| | IPv4 Dst | 128.0.0.1 |
| | IPv4 Src | 12.0.0.1 |
| | TCP Dst | 80 |
| | TCP Src | 4097 |
| --------------- | ------------ | --------------------------- |
| IPv4 datagram 3 | IPv4 Dst | 128.0.0.1 |
| | IPv4 Src | 12.0.0.1 |
| | TCP Dst | 80 |
| | TCP Src | 4097 |
+-----------------+--------------+-----------------------------+
Datagram header contents
An incoming packet undergoes the reverse process. When a BR receives
an IPv4 packet on an external interface, it extracts the address and
port and then uses that information to build a WKP+A+P IPv6
destination address. The packet is then routed in the ISP core to
the user's CPE/CN, which is then able to decapsulate the IPv4 packet
where the destination is simply A. Note that the packet processing at
the BR is completely stateless, since there's no need to know how
many bits of the port are "stolen" by the address. The longest
prefix rule will just deliver the packet to the corresponding CPE.
All the state is kept the CPE/CN, i.e. at the edge. Figure 3 shows
how an incoming packet is routed. Observe that the port translation
at the CPE/CN (datagram 3) only happens if the CPE/CN has a
preexistent mapping. Otherwise, the port number is left untouched.
Overall, this approach brings two major advantages over CGNs: (i)
there are no scalability issues, and (ii) it allows a customer to be
contacted on the restricted port range with no extra signaling.
Maennel, et al. Expires April 30, 2009 [Page 12]
Internet-Draft A+P Addressing Extension October 2008
+-----------+
| Host |
+-----+-----+
^ |12.0.0.1 (ports 4096 to 8191)
IPv4 datagram 3 | |
| |
| |
+---------|---------+
|CPE/CN | |
+--------|||--------+
^ |||4999:0:0:0:0c00:0001:1000::/100
IPv6 datagram 2| |||
| |||<-IPv4-in-IPv6
| |||
-----|-|||-------
/ | ||| \
| ISP network |
\ | ||| /
-----|-|||-------
| |||
| |||
+--------|||--------+
|BR ||| |
+---------|---------+
^ |
IPv4 datagram 1 | |
-----|--|--------
/ | | \
| Internet |
\ | | /
-----|--|--------
| |
| |128.0.0.1
+-----+-----+
| IPv4 Host |
+-----------+
Figure 3: Routing of Incoming Packets
Maennel, et al. Expires April 30, 2009 [Page 13]
Internet-Draft A+P Addressing Extension October 2008
+-----------------+--------------+-----------------------------+
| Datagram | Header field | Contents |
+-----------------+--------------+-----------------------------+
| IPv4 datagram 1 | IPv4 Dst | 12.0.0.1 |
| | IPv4 Src | 128.0.0.1 |
| | TCP Dst | 4097 |
| | TCP Src | 80 |
| --------------- | ------------ | --------------------------- |
| IPv6 Datagram 2 | IPv6 Dst | 4999:0:0:0:0c00:0001:1001:: |
| | IPv6 Src | 4999:0:0:0:128.0.0.1:: |
| | IPv4 Dst | 12.0.0.1 |
| | IP Src | 128.0.0.1 |
| | TCP Dst | 4097 |
| | TCP Src | 80 |
| --------------- | ------------ | --------------------------- |
| IPv4 datagram 3 | IPv4 Dst | 12.0.0.1 |
| | IPv4 Src | 128.0.0.1 |
| | TCP Dst | 32000 |
| | TCP Src | 80 |
+-----------------+--------------+-----------------------------+
Datagram header contents
3.2. Design of the A+P NAT Device
There are a number of delicate design choices for the A+P NAT device.
We present our preferred solution here.
Legacy hosts would send IPv4 packets from any port(s). We are not
expecting to change end-hosts; therefore we require some kind of NAT.
However, one of our basic assumptions is that the customer wants to
be able to run their own servers and NATs. This leads to several
constraints:
1) We want to enforce the analog of BCP 38 [BCP38]. This means
that no packets outside of the assigned address and port
number range should leave the PE for the network.
2) We want minimal configuration. There should be no need for
the customer to tell the ISP that they have purchased an A+P-
grade home NAT.
3) We must support unmodified computers and NATs.
4) We want the A+P gateway (i.e., CPE) to be as accommodating as
possible to strange protocols it knows nothing about. It may
do its own packet snooping and/or ALGs for things it knows
about (i.e., FTP, SIP, Skype), but should leave it to the CN
Maennel, et al. Expires April 30, 2009 [Page 14]
Internet-Draft A+P Addressing Extension October 2008
to handle obscure/unknown protocols (e.g., gaming).
5) Conversely, if the customer's CN has done some translation,
those packets should not be re-translated.
These principles lead us to the following design:
1) The PE should discard any outbound packets that does not
originate from the proper A+P address. (Constraint 1)
2) An A+P gateway, (i.e., CPE, CN, or both) should include some
option in the DHCP request message, to inform the PE router
of its abilities. (Constraint 2)
3) If no A+P signaling was done (i.e., neither CPE nor CN
support A+P), the PE router should perform NATting, including
whatever ALG functions it can, or an unrestricted IPv4
address has to be provided. (Constraints 3 and 4)
4) The PE router should not modify any A+P packets from the
proper address and port range. (Constraints 4 and 5)
Note that a customer with no CN or with a non-A+P CN may emit packets
within the proper port range by accident, thus accidentally violating
part of point 4 above. We solve that by DHCP-based signaling from
the A+P gateway: the A+P option in the DHCP request tells the PE that
a customer-provided CN will do all NATting according to this design.
In that case, the primary function of the PE router is to enforce
restrictions on port numbers in outbound packets.
We leave unspecified for now the question of how large a port number
range is allocated to each customer. We anticipate that the
allocation available to a customer will be determined by ISP-specific
policy, perhaps as a function of the fee charged to the customer. If
variable allocations are to be supported, i.e., the ability for a
customer to request more port numbers (and hence more possible
simultaneous connections) at one time and fewer at another, the
natural way to signal this is in the DHCP A+P request option.
However, there is a tradeoff between the advantages of efficiently
managing the extended address space via dynamic and/or variable
allocation, and the cost it brings in terms of additional complexity.
A simple DHCP release/request cycle could be used, but if the proper
adjacent block of port numbers was not available, this would entail
tearing down existing connection or reNATting them. The
disadvantages of the former are obvious; adopting the latter approach
would bring back all of the disadvantages this scheme is intended to
avoid. One possible answer is to allocate ranges of IPs with a
Maennel, et al. Expires April 30, 2009 [Page 15]
Internet-Draft A+P Addressing Extension October 2008
static assigned port-range. For example the ISP could offer "classes
of service", e.g., the first block of IPs offer 4096 ports, the
second class offers 512 ports, the third class offers 16 ports. If
the customer wants more ports, the address needs to be moved into a
different class. Obviously, this does not go without a service
interruption for this particular customer (i.e., the customer has to
get a new IP). However, this solves the problem of dynamic
allocation for the ISP. We leave details of this issue for future
work.
3.3. IPv6 and mixed V4-V6 traffic
Note that if IPv4/IPv6 dual stack is provided on the customer's LAN,
IPv6 to IPv6 destinations would be be transported untranslated from
the customer's host to the provider's border with other providers.
If the customer has an IPv6-only LAN, then the device providing A+P
translation should also provide NAT-PT service so that the customer
could communicate with the IPv4 Internet.
3.4. Handling ICMP
ICMP is problematic for all NATs, because it lacks port numbers. A+P
routing exacerbates the problem.
Most ICMP messages fall into one of two categories: error reports, or
ECHO/ECHO reply (commonly known as "ping"). For error reports, the
offending packet header is embedded within the ICMP packet; NATs can
then rewrite that portion and route the packet to the actual
destination host. This functionality will remain the same with A+P;
however, the provider's BR will need to examine the embedded header
to learn with A+P NAT is handling it, while that box will do the
necessary rewriting.
ECHO and ECHO reply are more problematic. For ECHO, the border
router must rewrite the "Identifier" and perhaps "Sequence Number"
fields in the ICMP request, so that returning ECHO REPLY packets may
be routed correctly. We suggest to rewrite the information in the
sequence number to allow the BR returning ECHO replies to come back
to the appropriate host.
3.5. Handling IP fragments
Much like ICMP packets, IP fragmented packets are renowned to be hard
to handle in any address translation mechanism [RFC3022]. In fact,
only the first IP fragment contains the TCP (UDP) header. This issue
is commonly dealt with by keeping additional state at the NAT device
which allows fragments to be mapped to the correct TCP (UDP) session.
Maennel, et al. Expires April 30, 2009 [Page 16]
Internet-Draft A+P Addressing Extension October 2008
In the A+P NAT solution, fragments coming from the internal domain
can be avoided if the core network runs IPv6 only and the PE ensures
that no layer-3 fragmentation is performed by the customer equipment.
Fragments coming from the external domain are harder to handle.
Commercial NATs extract the port number out of the first fragment and
keep that information to map subsequent fragments. Moreover, when
the first fragment is not the first one to be received at the NAT,
the fragment needs to be stored until the port number is known
[CCIE-Pro]. Note that a deployment scenario which intends to handle
fragments must ensure that all of the fragments arrive at the same
fragment handling host.
We propose to route fragments to special boxes by exploiting the
prefix combination in a similar way to Figure 1. The BR is able to
detect that a packet is fragmented when it receives it, so in that
case it uses a different well-known prefix which is intended for
fragments only (we call it WKPF). Hence, the BR builds an IPv6
packet where the destination address is WKPF+A and then uses normal
routing. Fragments are then routed to a special box which we call
"fragment handler" (FH). The FH is in charge of keeping track of the
port numbers used by each fragment. Namely, upon receiving the first
fragment, the FH stores a mapping <src_ip, ip_id> --> <dst_port> (8
bytes in total), which it uses to build the correct WKP+A+P address
for all the fragments of the same IP packet (identified by the pair
<src_ip, ip_id>). After storing such a mapping, all subsequent
fragments can be forwarded to the correct A+P destination address.
This way, fragment storage is only required for out-of-order
fragments, until the fragment carrying the port number is received.
Since out-of-order packets are pretty rare, the FH is not expected to
buffer an high number of fragments. Observe that a CGN also needs to
remember the dst_ip information, since it cannot trust the dst_ip in
the packet itself. In this case, each entry in the mapping takes 12
bytes instead of 8.
Finally, handling fragments via a specific prefix gives the network
operator the flexibility to deploy multiple FHs. There are two limit
cases: on one hand, a single FHs that handles all the fragments in
the network (the FH then announces WKPF); on the other hand, a FH for
each destination IP (the FH then announces WKPF+A). Again, the
longest prefix matching rule gives the ISP the autonomy to choose any
intermediate point in between.
3.6. The incremental path to A+P
In this section we will discuss one possibility for large networks to
incrementally deploy A+P. As discussed above, the A+P scheme requires
changes to the CPE, the BR, and (optionally) the PE. Changes to the
routing system include the addition of the WKP and WKPF. The upgrade
Maennel, et al. Expires April 30, 2009 [Page 17]
Internet-Draft A+P Addressing Extension October 2008
of the BR, as well as routing the WKP/WKPF have to be done before the
first customers transition to A+P. In addition, it is possible to
provide the A+P NAT function at the PE routers while gradually
upgrading the CPEs. (We stress here once again, that as soon as the
PE is upgraded and A+P is activated the customer must be able to
operate its own CN, if he/she so desires.) One important
consideration has not been made so far: the BR mentioned in this
document is essentially the BR of the A+P part of the network, and
does not necessarily have to be the border router of the ISP. In
this sense it might be possible to upgrade a smaller, but contiguous
part of a larger network, as long as it supports dual-stack.
However, care needs to be taken that all routers (BR) that might form
the boundary of the "upgraded cloud", are upgraded to A+P. In this
case, those routers translate "A+P packets" into "legacy IPv4
packets" and vice versa.
A+P clouds can be independently deployed within the ISP network: the
only constraint that needs to be satisfied is that the A+P address
space does not overlap with the IPv4 address space which still serves
legacy CPEs. As the A+P deployment speeds up, small clouds can be
easily merged into bigger ones, leading the way to the ultimate goal
of a single, ISP-wide A+P cloud. For instance, a deployment plan
could be to install A+P clouds at some neighboring PoPs, then merge
them at the state level, and so on.
4. Benefits and limitations of A+P
A+P addressing leverages internal routing in the ISP to route packets
on extended addresses in a stateless manner. This allows customers
to be assigned globally routable addresses and to accept incoming
connections on their A+P port range. Observe that the statefulness
of NATs hampers this desirable feature, and forces users to use out-
of-band signaling (e.g., UPnP). From the perspective of the ISP, on
the other hand, A+P statelessness usually means lower deployment
costs and less scalability issues with respect to stateful approaches
like NAT. Moreover, A+P allows ISP to fine-tune their network via
standard internal routing management, without adding an extra layer
of complexity (e.g., point-to-point tunnels).
We now discuss the limitations of the A+P approach. Recall that a
transport session is identified by a 5-tuple
<src_ip, src_port, dst_ip, dst_port, protocol>
Hence, any mechanism that shares the same IP address among multiple
hosts intrinsically poses limitations on the number of active
transport sessions that a single host can maintain. Observe that
Maennel, et al. Expires April 30, 2009 [Page 18]
Internet-Draft A+P Addressing Extension October 2008
connections with different hosts (or even different applications on
the same host) are only minimally impacted, because they can be
differentiated by means of the dst_ip (dst_port) field. Therefore,
the only case in which address sharing causes troubles is multiple
outbound transport sessions with the same remote host and the same
port. In fact, in this case only the src_port field can be used to
differentiate, however that field can not be fully exploited, since
it is also used to multiplex multiple users on the same IP address.
While multiple sessions with the same remote application are not a
widely spread practice, some very popular websites (e.g., GoogleMaps
and iTunes) have been reported to massively use multiple TCP/IP
connections to maximize parallelism. The current estimate of the
number of parallel sessions used by those websites is circa 70
[I-D.durand-softwire-dual-stack-lite]. In this respect, A+P with 8
port bits would allow every host to maintain up to 256 parallel
connections with the same remote process, while still providing 256
times more addresses for end hosts.
Another limitation that A+P shares with any other IP address sharing
mechanism is the availability of well known ports. In fact, services
run by customers that share the same IP address will be distinguished
by the port number. As a consequence, it will be impossible for two
customers who share the same IP address to run services on the same
port (e.g., port 80). Unfortunately, working around this limitation
implies application-specific hacks (e.g., HTTP and HTTPS virtual
hosting), whose discussion is out of the scope of this document.
Observe that some popular applications (e.g., BitTorrent) require the
availability of well known ports. However, those applications can
easily adapt to work with different ports, and users of such tools
update them frequently (e.g., to exploit new features).
5. IANA Considerations
This document makes no request of IANA.
Note to RFC Editor: this section may be removed on publication as an
RFC.
6. Security Considerations
7. Acknowledgements
The authors wish to thank David Ward for review, endless constructive
criticism, and interminable questions, and Cullen Jennings for
discussion and review of fragmentation. We also like to thank the
Maennel, et al. Expires April 30, 2009 [Page 19]
Internet-Draft A+P Addressing Extension October 2008
following persons for their valuable feedback on earlier versions of
this work: Bernhard Ager, Alain Durand, Dino Farinacci, Hamed
Haddadi, Russ Housley, Wolfgang Muehlbauer and Ruediger Volk.
8. References
8.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
8.2. Informative References
[BCP38] Ferguson, P. and D. Senie, "Network Ingress Filtering:
Defeating Denial of Service Attacks which employ IP Source
Address Spoofing", BCP 38, May 2000.
[CCIE-Pro]
Doyle, J., "Routing TCP/IP Volume I (CCIE Professional
Development)", 1998.
[I-D.durand-softwire-dual-stack-lite]
Durand, A., Droms, R., Haberman, B., and J. Woodyatt,
"Dual-stack lite broadband deployments post IPv4
exhaustion", draft-durand-softwire-dual-stack-lite-00
(work in progress), September 2008.
[Martin-Java]
Martin, D., Rajagopalan, S., and A. Rubin, "Blocking Java
Applets at the Firewall", Proceedings of the Internet
Society Symposium on Network and Distributed System
Security, pp. 16-26, 1997.
[RFC0959] Postel, J. and J. Reynolds, "File Transfer Protocol",
STD 9, RFC 959, October 1985.
[RFC2766] Tsirtsis, G. and P. Srisuresh, "Network Address
Translation - Protocol Translation (NAT-PT)", RFC 2766,
February 2000.
[RFC3022] Srisuresh, P. and K. Egevang, "Traditional IP Network
Address Translator (Traditional NAT)", RFC 3022,
January 2001.
[RFC4966] Aoun, C. and E. Davies, "Reasons to Move the Network
Address Translator - Protocol Translator (NAT-PT) to
Historic Status", RFC 4966, July 2007.
Maennel, et al. Expires April 30, 2009 [Page 20]
Internet-Draft A+P Addressing Extension October 2008
[SP-NAT] Alcock, S., Nelson, R., and D. Miles, "Characterizing the
Network Connection Behavior of Residential Broadband
Subscribers", draft, under-submission , 2009.
Authors' Addresses
Olaf Maennel
T-Labs/TU-Berlin
Ernst-Reuter-Platz 7
Berlin 10587
Germany
Phone: +491607199931
Email: olaf@maennel.net
Randy Bush
Internet Initiative Japan
5147 Crystal Springs
Bainbridge Island, Washington 98110
US
Phone: +1 206 780 0431 x1
Email: randy@psg.com
Luca Cittadini
Universita' Roma Tre
via della Vasca Navale, 79
Rome, 00146
Italy
Phone: +39 06 5733 3215
Email: luca.cittadini@gmail.com
Steven M. Bellovin
Columbia University
1214 Amsterdam Avenue
MC 0401
New York, NY 10027
US
Phone: +1 212 939 7149
Email: bellovin@acm.org
Maennel, et al. Expires April 30, 2009 [Page 21]
Internet-Draft A+P Addressing Extension October 2008
Full Copyright Statement
Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Maennel, et al. Expires April 30, 2009 [Page 22]