draft-katabi-global-anycast-00

Network Working Group                           Dina Katabi
INTERNET DRAFT                                  John Wroclawski
                                                MIT LCS
Expires: 1,2000                                 June, 1999

        A Framework for Global IP-Anycast (GIA)
        <draft-katabi-global-anycast-00.txt>

Status of this Memo

This document is an Internet-Draft and is in full conformance
with allprovisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet
Engineering Task Force (IETF), its areas, and its working
groups.  Note that other groups may also distribute working
documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of
six months and may be updated, replaced, or obsoleted by
other documents at any time.  It is inappropriate to use
Internet-Drafts as reference material or to cite them other
than as "work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt

The list of Internet-Draft Shadow Directories can be accessed
athttp://www.ietf.org/shadow.html.

Abstract

This document describes GIA, an architecture for a scalable
Global IP-Anycast service. In contrast to previous
approaches, which route IP-anycast through the unicast
routing system, GIA provides IP-anycast with its own
routing protocol.  To scale, GIA pushes the overhead of
anycast routing to the edge of the network and off-load the
middle routers of the burden of storing anycast routes.

GIA's main contribution is its interdomain routing
protocol, which is based on promoting the quality of an
anycast route according to its level of usage. The protocol
generates two types of routes; default low-cost anycast
routes, which consume no bandwidth or storage space; and
high-quality shortest-path anycast routes that are
customized according to the beneficiary domain's interests.

1.0   Introduction

IP-anycast is a network service whereby receivers that
share the same characteristics are assigned the same
anycast address. A sender interested in contacting a
receiver with those characteristics sends its packet to the
anycast address and the routers conspire to deliver the
packet to the nearest receiver to the sender, where nearest
is defined according to the routing system measure of
distance.

Anycast' s early applications are service oriented. In
particular, RFC 1546 [19] proposes anycast as a means for
service location and host auto-configuration. As an example
of using anycast for service location, a set of replicated
ftp servers is assigned the same anycast address. By
contacting the anycast address a user in France who wants
to retrieve a file is directed to the European server while
a user in the States is directed to the American server. As
an example for using anycast for host auto-configuration,
one can imagine assigning the same anycast address to all
Domain Name Servers. In that case, a host that is moved to
a new network needs not be configured with a new DNS
address since it can use the global anycast address to
access the local DNS server anywhere.

Recently, new anycast applications that do not directly
provide a service to an end user have been developed. For
example, [5] uses a global anycast service as an
infrastructure to develop an inter-domain multicast routing
scheme, while [16] uses anycast as a method to develop more
efficient intradomain multicast trees.

Currently, there is no scalable design for providing a
global ip anycast service. The major obstacle that faces
research in this area is anycast's defiance of hierarchical
aggregation. Actually, an anycast address, similarly to a
multicast address, represents a group of nodes that share a
particular characteristic. As such, there is no reason to
expect anycast group-topology to be hierarchical or to
comply with the underlying unicast topology. Unfortunately,
this means that to provide a global anycast service we need
to advertise each anycast group to the entire Internet.
Thus, routing tables will have a new component - anycast
host routes - which grows proportionally to the total number
of anycast groups in the Internet. Given that the routers
in the backbones already suffer from the large size of
their routing tables, this approach does not scale.

The problem can be partially alleviated by use of scoped
anycast addresses for groups whose members are confined to
a particular network region. However, many anycast
applications such as the one in [5] require a global
anycast service for which no scoped anycast address works.
Moreover, sometimes even when the anycast group is
currently confined to a particular region there is an
incentive to use a global anycast address. For example, a
company that provides an online service might use an
anycast address to have its customers access the nearest
online office. Although the company online offices might
currently cover only the east cost, the company would like
to use a global address, since a scoped anycast address
prevents future expansion to the west coast or to Europe.

2.0   Background and Related Work

Anycast was introduced to the Internet literature by RFC
1546 [19]. This document motivated the need for an anycast
service as a means for service discovery and host auto-
configuration. The document also pointed out the major
difficulties associated with IP-anycast, which are the
stateless nature of anycast and its defiance of
hierarchical aggregation. The authors suggested approaches
for building TCP connections on top of anycast addresses,
and discussed possible addressing schemes. They considered
it "wiser to use a separate class" to assign anycast
addresses than to carve them from the existing unicast
address space.

When the Internet Engineering Task Force considered finding
a replacement for IPv4, all major candidates adopted
anycast as an addressing mode (Pip [8], SIPP [14], IPv6
[13]). In particular, IPv6 allocates anycast addresses from
the unicast address space making them indistinguishable
from their unicast counterparts. Each anycast group is
confined to a particular topological region with which it
shares the prefix. Within the region identified by the
shared prefix, each member of the anycast group must be
advertised as a separate entry in the routing system
(commonly referred to as a "host route"); outside the
region, the anycast address may be aggregated into the
routing advertisement for the shared prefix. Note that
global anycast groups have a prefix of null and must be
advertised as separate routing entries throughout the
entire Internet.

3.0   Requirements

Our objectives are to implement a scalable IP-anycast
service that complies with the semantics defined in RFC
1546. Thus, we want a service that satisfies the following
characteristics:

-  Global:  In a global anycast service, group members can
potentially exist anywhere in the Internet and be
accessible to senders in their neighborhood. Although
confining each anycast group to a configured region
satisfies some range of applications it leaves a space that
can't be filled unless there is a global anycast service.
For example, providing a host with the ability to configure
itself wherever it is plugged into the Internet is infeasible
unless we assign the same anycast address to all of the
dynamic-host-configuration servers.

-  Scalable:  A service is scalable when the overhead in
terms of storage, traffic and processing is manageable in a
large heterogeneous inter-network. Thus, scalability is not
achieved by readily using fewer resources, it also depends
on having a good topological alignment between resource
consumption and resource availability.

-  Efficient: The anycast service is efficient when there
is a high probability that an anycast packet is received by
the nearest group member to the sender.

4.0   Design Rationale

The traditional belief that IP-Anycast should be routed
similarly to its unicast counterpart has hampered the
service, and a routing scheme that recognizes the
characteristics of anycast and benefits from them to scale
the service is needed.

Actually, given that most anycast groups represent
services, it seems inefficient that a domain has to spend
equal amount of network resources on learning storing and
maintaining different anycast routes. For example, why does
an MIT router spend equal resources on the route to the
replicated CNN web server and the route to the replicated
Algerian news agency? The first route is used every day by
some MIT user, while the second one is rarely used if ever.
In fact, anycast's primary usage (to date) is service
discovery (host auto-configuration is a form of service
discovery).  Thus, one can extrapolate information about
other networked services' access pattern to anycast. One
well-studied example is the web's access pattern, which
reveals a reasonable amount of locality of interests that
justifies the use of proxies. Therefore, it seems that
although anycast is not amenable to hierarchical
aggregation, the service's potential applications make
it amenable to caching. In particular, the authors think
that at any given time there is a predictable set of anycast
groups that hosts in a domain access with high probability
and that this set is much smaller than the total number of
anycast groups in the Internet. This means that at a
particular domain anycast routes are not equally valuable,
and that a good anycast routing protocol devotes more
resources to build good routes to repeatedly accessed
anycast groups.

GIA's design allows a domain to discover, store and
maintain efficient routes to anycast groups repeatedly
accessed by users in this domain, while supporting a cheap
fallback mechanism to send packets to unpopular groups. In
fact, the fallback mechanism does not consume any additional
network resources because it is based on mapping the anycast
topology to the underlying unicast topology by which all
routers in the Internet know how to forward packets.

5.0   Design Details

This section describes the details of the architecture.
Note that GIA is concerned only with routing and addressing
issues and does not address the data-link layer or the
transport layer issues.

5. 1   Address Architecture

Our architecture assigns anycast its own address space but
it allocates anycast addresses to domains according to the
unicast hierarchy. More precisely, an anycast address is a
concatenation of an anycast indicator, the unicast prefix
of the home domain, and the group ID (see Figure 1). The
anycast indicator is a fixed length prefix that
differentiates anycast addresses from their unicast and
multicast counterparts, such that anycast packets can be
recognized andforwarded by the anycast forwarding protocol.
One possible anycast indicator is the bit-pattern '11110'.
On the other hand, the home domain's prefix identifies the
Internet domain that owns the unicast space from which this
anycast address is derived. It is the same as the unicast
prefix carried by the routers inside the home domain. In
contrast to the anycast indicator, the home domain field
has a variable length that depends on the size of the
domain's unicast address space. GIA requires each anycast
address to be associated with an Internet home domain, and
it requires that the home domain contain at least one
machine configured with the anycast address. Note that the
anycast address is still global and can be assigned to
machines anywhere in the Internet (as opposed to IPv6
architecture where all group members reside inside the
region or the domain with which they share the prefix).
Finally, the group ID identifies a particular anycast address
among the anycast addresses associated with the same home
domain. Note that when shifting off the anycast indicator the
anycast address becomes similar to a unicast address from the
home domain (see Figure 2).


+---------------------------------------------------------+
|Anycast Indicator| Home Domain's Unicast Prefix| Group ID|
+---------------------------------------------------------+

      Figure 1: The syntax of an IP-anycast address


+---------------------------------------------------------+
|Home Domain's Unicast Prefix| Group_ID| a Number of zeros|
+---------------------------------------------------------+

        Figure 2: Shifting the anycast indicator off,
        an anycast address becomes a unicast address.
        Shifting is not performed on the packet but on
        the lookup variable

The above architecture means that every Internet domain is
allocated an anycast address space that is proportional to
its unicast address space (In fact, for the case of IPv4,
if the anycast indicator is 11110 then domains whose unicast
prefix is smaller than x.x.x.x/27 are not allocated any
anycast address space. However, we don't know of any
Internet domain whose prefix is smaller than x.x.x.x/24.)

The domain might use the anycast addresses to provide
global services or it might lease them to end users providing
an on-line service.  For example, assume X is a company that
provides an online service and that wants its customers to
use an anycast address to locate the nearest on-line office
to them. It is likely that X has a main office connected to
the Internet somewhere and consequently can use one of the
anycast addresses associated with its network for its on-
line service, in which case the home domain for the anycast
group is X's domain. If X does not have its own domain, X
can lease an anycast address from its service provider. In
this case the group's home domain is the provider domain.
Anyway, if company X grows in the future and opens a new
online office, then the new office can use the same anycast
address and be accessible to customers in its neighborhood.

On the other hand, for well-known anycast addresses used
for host auto-configuration (such as the group of all DNS)
the home domain should be either in one of the backbones or
a virtual domain advertised by the backbones.

5.2   Anycast Address Assignment

The process according to which a domain D assigns an
anycast address out of its allocated address space to an
end user is domain dependent and can be the same as the one
used for assigning unicast addresses. For example, the
anycast address might be assigned manually by the
administrator, or the domain might have special address
assignment servers that lease anycast and unicast addresses
with certain lifetime to end-users. The design requires the
user to setup at least one machine with the anycast address
in the home domain. However, the user can assign the
address to other machines anywhere in the Internet.

In addition to the aforementioned address assignment
procedure, some anycast addresses will be 'well-known
addresses', assigned by the Internet Assigned Numbers
Authority (IANA) to particular auto-configuration groups
such as all Dynamic Host Configuration Servers or all DNS.

5.3   Anycast Address Advertisement

Anycast reachability information is generated and
propagated only by routers. A host that wants to join an
anycast group has to ask its next hop router to advertise
the address on its behalf, which can be achieved by adding
a new message type to either IGMP [26] or the Neighbor
Discovery protocol [25]. A router that receives such a
request processes it through some security checking procedure
(to be designed as a future work), and if compliant it marks
the address to be advertised according to the anycast routing
protocol adopted by the domain. The router uses a keepalive
mechanism to ascertain the availability of the anycast member.
It never advertises an address after the member becomes
inaccessible.

5.4   Anycast Routing

>From an edge domain perspective, an anycast group is
classified according to the following classifications:

-  Internal anycast group: An internal anycast group to
domain D is a group for which D has internally at least one
member. Note that, all groups are internal to their home
domain. However, groups might be internal to domains other
than their home domains.
-  External Anycast group: An external anycast group to
domain D is a group for which D has no members.
-  Popular anycast group: A popular anycast group in Domain
D is an external group that clients in domain D repeatedly
access.

                        Anycast Group
                        /           \
        Internal Group             External Group
                                    /           \
                              Popular         Unpopular

        Figure 3: Anycast group classification from an
        edge domain perspective

Since GIA generates routes whose quality differs according
to their anticipated usage, it distinguishes between
routing internal anycast groups, routing external-popular
anycast groups, and routing external-unpopular anycast
groups.

5.4.1 Routing Internal Anycast Groups

Internally, each domain routes its internal groups using
its own unicast routing protocol, which means that each
member is advertised as a separate entry in the routing
system, and each router knows the nearest anycast member.
This holds the implied assumption that the number of
internal anycast groups in a domain stays manageable. Since
this number can be administratively controlled, we expect
each domain to control this number to stay within the
limits of locally available bandwidth and storage space.

Intradomain routing protocols based on the Distance-Vector
algorithm such as RIP work without any modification [19].
For protocols based on the Link-State algorithm to work
correctly, routers should abstain from routing through an
anycast address. For example, assuming A is an anycast
group, router R1 in Figure 4-a should not mistake the
topology as that in Figure 4-b and should not try to route
packets sent to R5 through A. To solve the problem, a large
cost is assigned to virtual links connecting anycast nodes
to their local networks, such that they are not used in
building routes unless the anycast node is the destination.

                 +----+        +---+
                 | R2 |--------| A |
                 +----+        +---+
                   |
                 +----+
                 | R1 |
                 +----+
                   |
                 +----+     +----+     +----+
                 | R3 |-----| R4 |-----| R5 |
                 +----+     +----+     +----+
                                         |
                                       +---+
                                       | A |
                                       +---+
                        Figure 4-a



                 +----+        +---+
                 | R2 |--------| A |-----+
                 +----+        +---+     |
                   |                     |
                 +----+                  |
                 | R1 |                  |
                 +----+                  |
                   |                     |
                 +----+     +----+     +----+
                 | R3 |-----| R4 |-----| R5 |
                 +----+     +----+     +----+

                        Figure 4-b

        Figure 4: Applying the link state algorithm
        directly on the topology in 4-a may introduce
        false topologies


Internal groups are not advertised to other domains in the
Internet. Sections 5.4.2 and 5.4.3 describe how users in
other domains access those groups (For those users the
groups are external.)

5.4.2 Routing External Unpopular Anycast Groups

In GIA, unpopular anycast groups need not be routed.  Thus,
if the number of unpopular anycast groups is considerably
larger than the number of popular groups, the system,
without degrading the service, can make a large saving by
using cheap default sub-optimal routes to forward packets
destined to unpopular anycast groups.

The basic characteristic of a default route is that it
doesn't consume any bandwidth to generate, and doesn't need
any storage space in the routing tables. To understand how
such a route exists recall that an anycast address is a
concatenation of the anycast indicator, the unicast prefix
of the home domain and the group ID. Also, recall that the
architecture requires the user who leased the anycast
address to set up at least one machine with that address in
the home domain. Thus, in the worst case any router that
can distinguish anycast addresses can forward any anycast
packet to its home domain. To do so the router shifts the
anycast indicator off and forwards the packet according to
its unicast routing table. Note that the router leaves the
destination address in the packet intact; it only shifts
the variable according to which it is looking up the
address in its routing table.

Thus, a packet destined to an unpopular group is forwarded
toward its home domain. However, depending on the
popularity distribution of its corresponding group the
packet follows one of three possible routes. First, if the
packet crosses a domain that has a member of the anycast
group then the packet is delivered to that member. Second,
if the packet crosses a domain that has this group as a
popular group and consequently knows a shorter route to a
group's member then the packet continues its journey along
the enhanced route. Finally, if neither of the
aforementioned cases is encountered, then the packet
eventually hits the home domain and is delivered to the
nearest member there.

5.4.3 Routing External Popular Anycast Groups

At the core of GIA's architecture is generating routes to
popular anycast groups. This task is performed by border
routers of a domain and it is implemented as an integrated
part of BGP (the Border Gateway Routing protocol). It is
decomposed into 3 protocol-building blocks; monitoring
popular anycast groups; learning an anycast route; and
finally maintaining a learned route. We address each of
these three components separately.

-  Monitoring popular anycast groups

To monitor popularity, each border router keeps track of
the number of times an anycast packet is forwarded along a
default route. The border router keeps this information in
a list of pending addresses. Periodically, the border
router checks its list and decides on the most popular
addresses to search for. This number should not exceed the
maximum number of addresses that can be included in one
search message. (This number is around 1000 and it is
limited by the maximum size of a BGP message). Addresses
included in the search are deleted from the list. Other
addresses in the list have their popularity multiplied by
an aging factor and are kept for consideration at the time
of the next search. When the popularity of an address in
the list falls below certain threshold the address is
discarded.  The time between two searches, which we call
the search interval (SI), should be jittered to prevent
synchronized search messages.

-  Learning an Anycast Route

In contrast to unicast interdomain routing, which is based
on advertising unicast prefixes to all Internet domains,
GIA adopts an on-demand reactive inter-domain routing
approach. The incentive for choosing a reactive approach is
the need for a design in which routers in the backbones do
not store any external anycast routes. This objective makes
it hard to design a proactive routing protocol. Consider
unicast routing as an example. In unicast the routing
information propagates from children domains to parent
domains (up the hierarchy), then from parent domains to
other children domains (down the hierarchy). Thus, for
a downstream router to forward a packet along a certain
route the upstream routers in the parent domain have to
store the route. One can imagine a scheme in which upstream
routers propagate reachability information without storing
it, and downstream routers tunnel the packets to the router
that generated the routing advertisement. However, such a
design means that any routing advertisement is broadcast to
all domains in the Internet because the upstream border
routers can not tell whether they have already advertised
the same or a better route for this address. A second less
important factor for choosing a reactive approach is that
the fact that an anycast group is replicated in multiple
domains in the Internet increases the probability of
finding the nearest group member by exploring a small
neighborhood around the interested domain.

The route learning process makes use of the TCP connections
a BGP router has with its peers [21]. It involves adding two
new message types to BGP; the search message, and the reply
message.

A search for a set of popular anycast groups is triggered
by the exit BR (border router) towards the groups' home
domains, which receives the anycast packets in the absence
of a learned route. This BR, which we call the originator
BR (OBR), monitors the popularity of the groups according
to the scheme described in the previous section. At the
beginning of a search interval (SI) the OBR generates a
search message for all of the popular groups for which
there is no learned route and broadcasts it to all of its
peers. (Here 'peers' refers only to border routers whether
they are internal or external peers. Some domains ran iBGP
to disseminate external routes to internal routers. In this
case, those peers should be excluded from the search.)
Once the search is generated the OBR sets a timer and waits
for replies.

The search itself is a scoped domain-by-domain broadcast
that explores the neighborhood of a domain looking for
members of the popular anycast groups. The search message
whose format is shown in Figure 4 contains the IP-address
of the OBR, a sequence number, a path vector which, at
first, is set to the OBR's AS number (AS_Path field), and
the set of anycast groups the OBR is searching for (Network
Layer Reachability Information field). In addition the
message contains a TTL field, which is initialized to the
maximum number of domain-hops the message is allowed to
travel. Based on the study of the inter-domain topology
provided in [10], the authors think that the TTL should be
set to 3 or 4 domain-hops. This number was chosen because
most edge domains are less than 3 or 4 domain hops from the
backbone.  The AS_PATH field is used to collect a vector of
Autonomous Systems that separate the OBR from a domain that
contains a member of a popular anycast group. It is also
useful to prevent the search message from looping. On the
other hand the TTL scopes the search to a certain
neighborhood around the searching domain.

+---------------------//----------+
|   BGP Header (2 octets)         |
+---------------------------------+
|  Sequence Number (2-octets)     |
+---------------------------------+
|   TTL (1 octet)                 |
+---------------------------------+
|   Total Path Attribute Length   |
+---------------------------------+         +------------+
|                                 |  ---->  | AS PATH    |
|   Path Attributes               |         +------------+
|                                 |         |OriginatorID|
+---------------------------------+         +------------+
|                                 |
|   Network Layer Reachability    |         +------------+
|   Information (variable)        |  ---->  | Address 1  |
|                                 |         +------------+
|                                 |         | Address 2  |
+--------------------//-----------+         +------------+
                                            |            |
                                           //            //
                                            |            |
                                            +------------+

        Figure 5: The Format of the Search Message

+---------------------//----------+
|   BGP Header                    |
+---------------------------------+
|  Sequence Number                |
+---------------------------------+         +------------+
|   Total Path Attribute Length   |    - -> | AS Path    |
+---------------------------------+   /     +------------+
|                                 |  /      |OriginatorID|
|   Path Attributes               | /       +------------+
|                                 |         | ReplierID  |
+---------------------------------+         +------------+
|                                 |
|   Network Layer Reachability    |         +------------+
|   Information (variable)        |  ---->  | Address 1  |
|                                 |         +------------+
|                                 |         | Address 2  |
+--------------------//-----------+         +------------+
                                            |            |
                                           //            //
                                            |            |
                                            +------------+

        Figure 6: The Format of the Reply Message

A BR that receives a search message from an internal peer
propagates the message to all of its external peers with no
further processing. A BR that receives a search message
from an external peer looks up the anycast addresses in the
search message in its routing table.  For all groups that
are internal to the replying BR's domain, the BR sends a
reply message, which relays the path vector in the original
search message after appending the receiving BR's AS
number. In addition, the reply includes the original search
sequence number and the receiving BR's IP-address. The
reply is sent directly to the OBR. Groups for which no
internal member is found, are looked up in the set of
learned anycast routes. For each anycast group for which
the receiving BR has a learned route, it adds its AS to the
vector-path in the search and concatenates the resulting
AS-sequence with the vector-path associated with the
learned anycast route, and sends this vector-path to the
OBR in a reply message. The replier field in the reply
message is set to the router from which the existing route
is learned.  All addresses for which the receiving BR is
able to send a reply are removed from the search message.
If there are still anycast groups to search for, the
receiving BR decrements the TTL of the search, checks that
the TTL did not reach zero, and propagates the search to
all of its peers.

Finally, to prevent a search message from looping, GIA
requires a BR that receives from an external peer a search
whose vector-path include its own AS to ignore the search
message. Moreover, to reduce the number of messages
emanating from a search, we require each BR to maintain a
table of all (OBR, sequence number, shortest vector-path up
to present) heard of in the last two search intervals. For
each search it receives, the BR propagates the search only
if it contains a vector-path whose size is smaller than all
the vector-paths with the same (OBR, sequence number) pair
seen so far.  Although, storing a table of (OBR, sequence
number, shortest vector-path) consumes some memory at a
border router, the size of the memory needed is around
2*(the number of BRs in a neighborhood), which is much
smaller than all anycast groups in the Internet. In
addition, the lookups in this table are not on the critical
path of unicast data packets.

After sending a search message an OBR sets a timer and
waits for replies. When the timer expires, the OBR checks
all the received replies and chooses the one that has the
smallest vector-path. The OBR checks its list of pending
addresses and deletes any address for which it found a
route. If the list contains an address for which no route
was found, the OBR multiplies the popularity of that
address by a decaying factor to decrease it chance in being
included in a new search.

The learned routes are kept in a cache of popular anycast
routes. Also, the routes are advertised to all internal
peers as if they were learned from a BGP update message.
Depending on the domain's policy the routes might be
injected into internal routers' routing table or kept only
at border routers.

A stored external anycast route contains the path-vector,
which describes the set of domains the route traverses, and
the unicast address of the destination BR. Both are
extracted from the reply message. The vector is kept to
answer search messages issued by neighboring domains
looking for a route to this anycast group. The unicast
address of the destination border router (ReplierID Field)
is used to tunnel all subsequent anycast packets to the
neighboring domain that has a group member.

Figure 7 illustrates a possible scenario for a search
message. The border router OBR in domain 1 sends to its
peers BR2 and BR3 a search message solicitating routes for
groups A and B. Since BR2 has a route for group B, it sends
a reply back to OBR informing it of the availability of the
route.  However, since BR2 has no route for group A it
propagates to its peer the search message, which now
contains only a query for A. Eventually the search for A
hits BR5 in domain 4, which has internally a group member.
Thus, BR5 sends directly a reply to the OBR. When OBR's timer
fires it examines the replies it received and decides that
the nearest member for group B is through BR2 and that the
nearest member of group A is through BR5.  Having learned the
routes, OBR tunnels subsequent packets addressed to groups A
and B to BR5 and BR2 respectively, which decapsulate them
and deliver them to the local members. It is also possible
(though unnecessary) for OBR to inform its intradomain
routing component (RIP for example) to inject the routes into
domain 1.

  +--------+          +--------+
  |Domain4 |          |Domain2 |
  |        | Search A |        |   Search A,B
  |        BR5 <---- BR4  <-- BR2 <---+
  |  (A)   |  \       | (B)    | \     \    +---------+
  +--------+   \      +--------+  \     \   | Domain1 |
                \             Reply B    \  |         |
                 \                  \-->  OBR         |
                  \------Reply A------->  / |         |
                                         /  +---------+
                                        /
                      +--------+   Search A,B
                      |Domain3 |      /
                      |       BR3  <-+
                      |        |
                      +--------+

Figure 7: Learning routes to popular anycast groups

-  Maintaining a Learned Route

A learned route becomes invalid in the following cases.
First, when the originating domain loses connectivity to
the destination domain. In this case, the OBR discovers the
loss of connectivity from its unicast routing table and
initiates a new search. The second case happens when the
nearest anycast member crashes or leaves the group. In this
case the originating domain can't discover the invalidity
of the route directly and keeps tunneling the packets to
the learned BR. However, when those packets get to the
destination domain, the receiving BR discovers that there
is no local anycast entry. Thus, it forwards the packets
according to its best knowledge of the route. (Most likely
the BR will forward the packets to their home domain.
However, it might be the case that after the local anycast
member crashed, the domain has learned a route to some
other nearby member.) Also, it sends an ICMP message to the
BR that tunneled the packet informing it of the invalidity
of the learned route. A BR that receives such an ICMP
treats the message similarly to a route withdrawal received
via BGP. Finally, as a consequence of the route being
withdrawn, the OBR schedules the group to be considered in
the next search.

On the other hand, a learned route might be withdrawn even
when it is still valid. This is necessary to allow caching
of new popular anycast routes while maintaining an upper
bound on the size of cached popular anycast routes. Thus,
we require the exit border router toward the destination
domain to check a route level of usage and discards learned
routes that are no longer popular. The algorithm for
discarding learned routes that are no longer popular is an
area for further research.

6.0   Discussion and Evaluation

-  How does GIA affect the routing tables?

In contrast to the traditional approach for global anycast,
where the routing tables grow proportionally to the total
number of anycast groups, the growth in the routing tables
in GIA is manageable. In fact, routers in an edge domain
store routes to internal anycast groups and popular ones.
Both numbers are controllable by the domain's administrator
and should be much smaller than the total number of anycast
groups. On the other hand, routers in the backbones, which
usually maintain a large routing table only maintain
anycast routes for their internal groups (if there are
any).

In addition, the fact that anycast addresses are
distinguishable from unicast addresses means that anycast
routes can be maintained in their own table separated from
unicast addresses. As a result, the existence of an anycast
service does not slow down the unicast forwarding process.
Moreover, the anycast routing table can be much simpler and
allow faster search and insertion than the unicast routing
table because it does not need to account for the longest
match.

-  What is the overhead of using GIA?

The control traffic overhead is dominated by the number of
search messages, which is a function of the following
parameters:
- The maximum number of domain-hops the search message
explores (the TTL field in the search message).
- The network inter-domain topology specially the average
edge degree.
- The number of popular anycast groups.
- The correlation of popularity in neighboring domains. (or
in other words, the similarity in popular groups between
neighbor domains)
- The number of members in an anycast groups and the
groups' topology.

A thorough evaluation of the search overhead needs to
investigate the effect of all of these parameters on the
number of search messages, which is beyond the scope of
this document.  Instead, we try to provide some intuition
about why a search might consume less bandwidth than
advertising the group (the proactive approach). One reason
is that we search only for popular groups. A second reason is
that we search only the neighborhood of a domain. The third
and most important reason is that a search is more stable
than an anycast advertisement through BGP updates. In other
words, once we find a route, we don't need to perform a
new search as long as the forwarding path does not change.
Usually, the forwarding path stays the same for days [10,20].
Unfortunately, this is not the case for advertisement. The
study of BGP routes in [15] shows that on average each
prefix generates 100 updates a day at a BGP router. Thus,
the frequency of search is multiple order of magnitude
less than that of the equivalent advertisement.

The second source of overhead in GIA is the processing time
of the control messages. The major concern here is that
processing the search and reply messages might affect the
BGP router performance and slow down its processing of the
unicast updates. to prevent that, BGP routers might assign
higher priority to processing unicast updates. On the other
hand, note that processing search and reply messages is
logically independent from processing the update messages
and can be performed by a separate CPU. In addition,
searches are allowed to explore only a limited neighborhood
around an edge domain. Thus, only few of them reach the
backbone and most of them are processed by routers at the
edges of the network where the traffic is not as intense.

The last source of overhead is storage space. In addition
to the anycast routing tables discussed above GIA requires
a BR to store a list of (OBR, sequence number, shortest
vector-path) seen in the past 2*Search_Interval seconds.
However, this list is on the order of the number of border
routers in a neighborhood, which is much smaller than a
routing table.

- What is the average path length of an anycast packet in
GIA to the shortest path, where the shortest path is the
one found by unicast routing?

The routes to internal anycast groups are similar to those
generated by unicast routing. Thus, for internal anycast
groups, the average path length in GIA to the shortest path
is equal to 1.

On the other hand, the path to external anycast groups, on
average, is longer than the shortest path. The difference
is due to the existence of packets destined to unpopular
groups and to the possibility of a search failure.

To estimate 'Average[external path in GIA / shortest path]'
we ran simulation on 100 graphs generated using GT-ITM
network graph generator [24]. All graphs are generated
using Transit-Stub edge connection method and have a size
of 208-node, an average degree around 3, and a diameter of
11. These quantities (except the number of nodes) are
chosen according to [10], which studies the interdomain
topology in the Internet. Members in an anycast group are
assigned randomly to domains as long as no two members of
the same group are assigned to adjacent domains. The home
domain of an anycast group is also chosen randomly. Note
that each node in this simulation represents a domain.

Our simulation shows that if the fraction of anycast
traffic sent to popular group is 80% of all anycast traffic
generated by the domain, and the search has a TTL of 3,
then, as the number of the members in the anycast group
vary between (0.005*number of domains) and all the domains
in the simulation the below inequality stays valid:
1.05 < Average[external path in GIA/ shortest path] <1.15
Thus, the average performance of GIA is significantly good.
(for further information about the simulation please
contact the authors.)

Note that the simulation does not assume any correlation
between where the service is popular and where the members
exist. In practice, providers of an online service try
to establish servers in network regions where the service
is popular. This enhance GIAÆs performance further, but it
is not a necessary assumption for the design to perform
well.

-   What is a typical working environment for GIA?

In practice, the authors think that there are going to
be three major types of anycast groups: First, well-known
groups that are used for auto-configuration purpose such
as the one representing the set of DNS servers. These
groups will be widely represented and internal to most
domains and will cause hardly any searches. For the few
domains that have to search for a DNS server the search
will be satisfied in one domain-hop or at most in 2
domain-hops and the wide spread of the group will cause
the messages emanating from a search to die close to
their origin. Second, groups that represent an
internationally replicated service such as the CNN web
server. These groups will have a much smaller number of
members distributed with reasonable spread in the
Internet. Because these groups are popular in the
majority of domains, a search usually succeeds in one
or 2 domain-hops and generates few messages. Third, groups
that are regionally popular such as a local TV broadcaster.
Most of the searches soliciting these groups will spring
in their neighborhood and will locate them without
involving the whole Internet in the search. Some search
might spring from random regions in the network soliciting
groups with random topology in which case the simulation
results provided above give a rough estimate of the
possible number of messages. Definitely, the above is
not an enumeration of all possible group types. It is
rather an attempt to understand GIAÆs working environment
based on the currently proposed anycast applications.

7.0   Deployment issues

This section addresses the following two deployment issues:

- Changes to routers

To deploy GIA in a transit domain we need to change the
border routers to participate in route learning and to
change the internal routers to shift the anycast indicator
off when they have no route to the anycast group. However,
changing the internal routers is not crucial for the
design. In fact, the same effect can be achieved by having
the border routers inject the unicast interdomain
routing information internally after shifting the anycast
indicator in. We suggest this solution as an intermediate
step until the domain upgrades the internal routers to
understand the anycast address syntax.

On the other hand, deploying GIA in an edge domain requires
integrating popularity monitoring, route learning, and
route maintenance in the border routers. It also requires
changing the internal routers to understand the syntax of
an anycast address. However, most edge domains have only
one border router, which makes it unnecessary to change the
internal routers. When the internal routers receive a
packet destined to an external unpopular anycast group they
treat it as a unicast packet and realize that they have no
route for it; thus, they forwarded to the border router.
The border router, which is GIA-enabled, shifts the anycast
indicator off and forwards the packet according to its
unicast routing table. For the case of edge domains that
have more than one border router, an intermediate stage
similar to the one described for the transit domain case
can be adopted.

If GIA is deployed in an IPv6 environment; therefore, the
aforementioned changes can be incorporated to the routers
while upgrading them to be IPv6 enabled.

- Crossing Regions that are not GIA-enabled

During the deployment phase, the Internet will contain both
GIA-enabled and non-GIA-enabled regions. We would like a
domain in a GIA-enabled region to forward packets addressed
to an unpopular anycast group towards their home domain
even if the home domain is separated from this domain by a
non-GIA-enabled region. One possible solution is to
configure the border routers at the periphery of a GIA-
enabled region to encapsulate anycast packets leaving the
region in unicast packets destined to the unicast address
resulting from shifting the anycast indicator off. In
addition, the border routers set the transport protocol
field in the IP packet to a special protocol number that
identifies these encapsulated anycast packets. The packets
cross the non-GIA-enabled region safely heading toward the
home domain. Once they cross the border of a second GIA-
enabled region the border router recognizes them as
encapsulated anycast packets. The BR decapsulates the
packets, which now complete their path according to the
scheme described in the previous sections.

8.0   Other Issues

- Scoped Anycast Addresses

The design in the above sections addresses only the case of
global anycast groups. One can imagine providing a class of
scoped anycast addresses by changing the anycast address
syntax to the following:

+-----------+-----------------------+------------+----------+
| Anycast   | 1 bit differentiating |            |          |
| Indicator | scoped addresses from | Home Prefix| Group ID |
|           | global ones           |            |          |
+-----------+-----------------------+------------+----------+

        Figure 8: A possible change in the Syntax of the
        anycast address to provide scoped anycast groups

In this case the scope of the address is the home domain
and the routers outside the home domain will never initiate
a search for a scoped anycast group.
Another possibility is to use a sub-space of the domain's
unicast address space for anycast groups whose scope is
confined to the domain (in an approach similar to that
proposed in IPv6).

- Long domains

The design assumes that the distance between any border
routers in a domain is roughly equivalent. However,
this is not always the case. For example, a domain that
connects Europe with North America would have BRs in both
continents. It makes sense for a BR in Europe that received
a search message to propagate the search only to its
European peers. This issue can be addresses by providing
the border routers in such domains with some proximity
information to guide them in propagating search messages.
The size of this information is on the order of the number
of BRs in a domain. Thus, the authors think it is scalable.
Besides, providing the BRs with this information is an easy
task since the providers know these facts about their
networks.

- A different measure of distance

As described above, the architecture computes the length of
an external route in terms of the number of domain-hops.
This is the measure of distance used by the unicast inter-
domain routing. However, the architecture is flexible enough
to handle different types of distance measures. For example,
an ISP can occasionally run Ping between all pairs of
border routers and store the results in a table at the
border routers. A search message can collect this
information and measure a path length using latency.

-  The Transit Domain Case

The design as described in the previous sections carries
the assumption that anycast packets addressed to external
anycast groups are generated in edge domains. This
assumption is needed because it is hard for Transit domains
to monitor popularity. More precisely, in a transit domain,
a BR forwarding an anycast packet along a default route can
not decide whether the packet has been generated in the
domain and consequently should increase the popularity of
its group, or the packet is a transit one.
To address this issue we distinguish between two cases. In
the first case, the transit domain contains hosts that
access some external anycast groups. In this case the above
assumption stays valid because the hosts are usually
grouped on a local network that is connected to the core of
the domain through one router. Therefore, the host network
can be regarded as an edge domain given that we ran iBGP on
the router connecting it to the core of the transit domain.
In the second case routers inside the transit domain send
packets to popular external anycast groups. Here, we need a
mechanism to distinguish anycast packets generated
internally from transit anycast packets. One possible
solution is to have border routers tag transit anycast
packets when they enter the domain such that they don't
interfere with the domain's popularity-monitoring task (yet
this issue needs further study).

-  Variations on the route learning protocol

One possible variation on the route learning protocol is to
never discard previously learned anycast routes. These
routes are deleted form the anycast routing table but
stored in some secondary memory, which is not on the
forwarding path. Later, if the group becomes popular again
its route is reinserted into the routing table after
checking that it is still valid.
A second variation is to do an expanding ring search
instead of sending the search immediately three or four
domain-hops.

9.0   Security Considerations

Security considerations are not discussed in this document
(yet).

10.0 Acknowledgments

This work benefited from discussion with D. Clark and T.
Shepard. In addition, B. Priyantha and R. Hariharan provided
comments on the draft and M. Kasbekar and S. Mneimneh helped
with the simulation. Finally, the authors would like to thank
C. Partridge, T. Mendez, and W. Milliken for writing RFC 1546.

11.0   Authors' Addresses:

Dina Katabi
MIT Laboratory for Computer Science
545 Technology Square
Cambridge, MA 02139
nora@lcs.mit.edu
617-253-3147
617-253-2673 (FAX)

John Wroclawski
MIT Laboratory for Computer Science
545 Technology Square
Cambridge, MA 02139
jtw@lcs.mit.edu
617-253-7885
617-253-2673 (FAX)

12.0   References

1. E. Basturk, R. Haas, R. Engel, D. Kandlur, V. Peris, and
D. Saha, "Using Network Layer Anycast for Load Distribution
in the Internet". Global Internet, Dec. 1998.

2. S. Bhattacharjee, M. H. Ammar, E. W. Zegura, N. Shah,
and Z. Fei, "Application Layer Anycasting". In Proc of
INFOCOM'97, Apr. 1997.

3. J. Bound, P. Roque, "IPv6 Anycasting Service: Minimum
Requirements for End Nodes". Work in progress.

4. M. Doar, "A Better Model for Generating Test Networks".
IEEE Global Telecommunications Conference/GLOBECOM'96,
London, Nov 1996.

5. D. Farinacci, L. Wei, and J. Meylor, "Use of Anycast
Clusters for Inter-Domain Multicast Routing". Internet-
Draft, Mar. 1998.

6. Z. Fei, S. Bhattacharjee, M. H. Ammar, and E. W. Zegura,
"A Novel Server Technique for Improving the Response Time
of a Replicated Service". In Proc. of INFOCOM'89, Apr.
1998.

7. P. Francis, "A Call for An Internet Wide Host Proximity
Service (HOPS)," Aug. 1998.

8. P. Francis, "Pip Near-term Architecture" May 1994.

9. P. Francis, S. Jamin, V. Paxon, L. Zhang, D. F.
Gryniewicz and Y. Jin, "An Architecture for a Global Host
Distance Estimation Service". In Proc of INFOCOM'98, Apr.
1998.

10. S. V. Fuller, T. Li, J. Yu, and K. Varadhan "Classless
Inter-Domain Routing (CIDR): an Address Assignment and
Aggregation," RFC1519, Sep. 1993.

11. R. Govindan and A. Reddy, "An Analysis of Internet
Inter-Domain Topology and Route Stability". Technical
report USC-CS-96-642, department of computer science,
University of Southern California, In Proc of INFOCOM'97.

12. J. D. Guyton and M. S. Schwartz, "Locating nearby
Copies of Internet Servers". In Proc. of SIGCOMM'95,
Aug,1995.

13. R. Hinden, S. Deering, "IP version 6 Addressing
Architecture". RFC 2373, July 1998.

14. R. Hinden, "Simple Internet Protocol Plus," White
Paper, RFC1710, Oct. 199

15. J. Itoh, "Disconnecting TCP connection toward IPv6
anycast address". Work in progress, Oct. 1998.

16. D. Katabi, " The use of IP-Anycast to Construct
Efficient Multicast Trees," Master Thesis, Sep. 1998.

17. C. Labovitz, G. R. Malan, and F. Jahanian, "Internet
Routing Instability," In Proc of SIGCOMM'97, Sep. 1997.

18. K. Moore, J. Cox, and S. Green, " Sonar - a Network
proximity Service," Internet-Draft, Feb 1996.

19. C. Partridge, T. Mendez, and W. Milliken, "Host
Anycasting Service," RFC1546, Nov. 1993.

20. V. Paxon, " End-to-End Routing Behavior in the
Internet," In Proc. of SIGCOMM'96, Aug. 1996.

21. Y. Rekhter and T. Li, "A Border Gateway Protocol 4
(BGP-4)," RFC 1771, March 1995.

22. M. Stemm, R. Katz, and s. Seshan, "Shared Passive
Network Performance Discovery."

23.  J. Veizades, E. Guttman, C. Perkins, and S. Kaplan,
"The Service Location Protocol," RFC 2165, June 1997.

24. W. Zegura, K. Calvet, and S. Bahattacharjee, "How to
Model an Inter-network," In Proc. of INFOCOM'96, Apr. 1996.

25. T. Narten, E. Nordmark, and W. Simpson, " Neighbor
Discovery for IP Version 6 (IPv6)," RFC 2461, Dec. 1998.

26. W. Fenner, "Internet Group Management Protocol, Version
2," RFC 2461, Nov. 1997.

<draft-katabi-global-anycast-00.txt> Expires 1, 2000.
Document	Document type	Expired Internet-Draft (individual) Expired & archived
	Select version	00
	Author	John T. Wroclawski Email authors
	RFC stream	(None)
	Intended RFC status	(None)
	Other formats	txt pdf bibtex bibxml