Network Working Group                                     J. Mansigian
INTERNET-DRAFT                                              Consultant
Expire in six months                                        March 1997

Draft: Version 01

            Clearing the Traffic Jam at Internet Servers
         A Network Layer View Of Network Traffic Consolidation
                <draft-mansigian-ntc-intro-01.txt>

Status of this Memo

    This document is an Internet-Draft.  Internet-Drafts are working
     documents of the Internet Engineering Task Force (IETF), its
     areas, and its working groups.  Note that other groups may also
     distribute working documents as Internet-Drafts.

     Internet-Drafts are draft documents valid for a maximum of six
     months and may be updated, replaced, or obsoleted by other
     documents at any time.  It is inappropriate to use Internet-
     Drafts as reference material or to cite them other than as
     ``work in progress.''

     To learn the current status of any Internet-Draft, please check
     the ``1id-abstracts.txt'' listing contained in the Internet-
     Drafts Shadow Directories on ftp.is.co.za (Africa),
     nic.nordu.net (Europe), munnari.oz.au (Pacific Rim),
     ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).


Abstract

   The cause of the typically glacial response from popular Internet
World Wide Web servers is seldom lack of network bandwidth or
any deficits in the client's equipment. The reason for the abysmal
performance is that the accessed server is spending an inordinate
amount of time managing two problems; an unnecessarily large number
of transport connections and the transmission of masses of redundant
data without optimization. This work addresses both problems.

   This document presents an introduction to the concepts and
architecture of network traffic consolidation. It is not intended to
describe a complete protocol with every ancillary feature but rather
to focus on performance driven core ideas that could become a part of
emerging commercially featured protocols.

   The scope of network traffic consolidation is confined to file level
interactions between Internet World Wide Web servers and their clients.
Data is delivered to clients without client specific change.

   The goal of network traffic consolidation is to make an overburdened
file server's behavior become very much as if it were servicing a light
flow of file requests. The methods of network traffic consolidation can
be summarized by saying that they achieve their goal by actually making
the server's file request flow light.

   Network traffic consolidation acts on both the input and output flows
of client server data. The input processing of network traffic
consolidation is called request reduction. The output processing is
called multicast response. Input and output processing operate
asynchronously.

   Request reduction is implemented by a multi-threaded process resident
on the server platform. This process sees a busy server's file
request flow for each file templated by a series of time windows of
small uniform interval. Request reduction divides all input requests
into two classifications - those requests that can be consolidated and
those that cannot be. Requests that cannot be consolidated are passed
without delay to the server. Reqeusts that can be consolidated are
treated differently. Conceptually a thread of the request reduction
process gathers into one group all file requests that are from the same
time window, request the same file, and originate from different
clients. A common example would be multiple HTTP requests from different
clients requesting the same HTML document file occuring in the same time
window. What is actually constructed are two data structures. One is a
copy of the common request data with a system generated key that is
placed in the originator's address field. The other data structure is a
list of all of the requesting clients, called the multicast distribution
list. This list is keyed by the system generated key for later
retrieval. The thread then places the keyed multicast distribution list
in a queue that resides in shared memory and passes the request to the
server.

   A consolidated request is indistinguishable from an individual
request to the server so therefore no server request processing logic
needs to change.

   A direct advantage of request reduction is a dramatic decrease in the
frequency of server interruption with attendant improvement in server
performance.

   Multicast response is implemented as a thin layer which invokes a
qaulified third party implementation of a reliable multicast protocol
( RMP ) of the customer's choice. The multicast thin layer is a simple
single threaded driver that is responsible for invoking RMP to service
consolidated requests.

   The multicast response process receives a reply from the server and
tries to locate a multicast distribution list that matches the reply's
destination address. If it cannot then the reply is from a
non-consolidated request and the unicast service of RMP is used to send
the reply. If a match is found then the multicast response process
takes as input the server's response, and consumes as input a member
from the queue of multicast distribution lists. These two input data
along with configuration data are used by the driver in its invocation
of the RMP. Once the multicast response driver initiates the RMP it has
no further responsibility with respect to the request except to invoke
the RMP primitive to delete and disestablish all RMP data structures and
remote RMP processes associated with the multicast group after a
configured amount of time. All flow control, error retry and error
messages come from the RMP.

   An important advantage of multicast response is that the server needs
to put only one copy of the file on the wire no matter how many client
requests compose the consolidated request.

Introduction

   The Internet is used by millions of people every day for a variety
of purposes. There is no sign that growth of interest in using the
Internet is abating. The demands being placed on the Internet today
could not have been anticipated decades ago when its predecessor,
Arpanet, was designed. However the effects of these early design
decisions are still very much with the Internet.

The Paradigm Shift and its Effects

   The 1990s brought to the Internet a crush of new users from diverse
backgrounds. This influx was mostly the result of the meteoric
rise of the World Wide Web precipitated by the widespread
availability of easy to use graphically interfaced clients such as
Mosaic and Netscape. There were two important changes that resulted
from this burgeoning movement. One was that there was explosive growth
in activity both over the network and at the host interface. Two, the
predominant form of communication had shifted from a peer-to-peer model
to a mixture of peer-to-peer and client server modalities. In the early
days of internetworking the peer-to-peer model of network use was
exemplified by collaborating researchers sending email and experiment
data to each other. The peer-to-peer traffic remains very important
today but is no longer unchallenged as the predominant form of
communication on the Internet. The ascendency of the World Wide Web has
created a massive client server traffic on the Internet that differs
qualitatively from the previous network traffic in important ways.

   The first difference is that in today's internetworked client server
model the data is no longer necessarily unique. Colleagues sending each
other email or collaborating by exchanging files of work related data
are very unlikely not to make progress from one communication to the
next. Thus the data being transmitted is pretty much unique. Intimate
use of the Internet by a small number of communicants sending unique
data was the predominant style before the 1990s. This was the culture
of the Internet before the public embraced it. This state of affairs
contrasts sharply with the current rage for accessing HTML pages from
the World Wide Web.

   Popular Web pages undergo change very slowly when regarded in ratio
with their access rate and therefore closely approach being constant
data. The number of clients that access popular Web pages is
spectacular. On another front the emergence of commercial and public
data bases accessible from the Internet has brought about the
commoditization of online information. The commodity data of these
databases tends to change slowly in ratio with their access rate and is
therefore another major source of nearly constant data which did not
exist before. Like the Web pages many of these information files also
experience heavy demand from a growing public audience.

   Another important way in which the new Internet traffic differs from
its precursor has to do with the temporal clustering of requests.
With the phenomenal growth in client activity in recent years the
percentage of requests that arrive almost simultaneously at servers has
also increased dramatically.

   The confluence of data redundancy, temporal clustering of requests,
and heavy traffic in the new Internet are crucial factors that effect
the client's perception of overall performance when accessing Web pages.
The also provide the basis for optimization.

The Problems and Their Causes

Frequent Host Interruption

   Network based hosts on which server processes execute are controlled
by general purpose operating systems. The host system does not perform
efficiently when interrupts occur too frequently. Protocols based on
individual request and response in conjunction with an environment of
hundreds or thousands of clients a minute accessing the host produce
such a dense pattern of interrupts that the host's performance is
seriously degraded.

LAN Saturation

   The LANs that Internet based hosts are connected to are adversely
and unnecessarily effected by the passage of large numbers of
individual requests onto the LAN when data redundancy of the requests
is high. Every packet that arrives at the LAN must have its Internet
address resolved to a physical address. Carrying request packets that
are to be processed individually keeps the LAN unnecessarily loaded.
The degrading effects of LAN saturation go beyond shackling performance
delivered to remote clients. Local clients running transient
applications ( e.g. word processors ) on hosts connected to the LAN
also experience a loss in quality of service.

Host Interface Burdened by Redundant Output

   The current state of the art for the internetworked client server
model has the server or a proxy copying data onto the wire as many
times as it is requested regardless of conditions. Conditions may
include many clients making requests for the same data within a brief
time interval. However current protocols used to distribute the
server's output cannot optimize transfer of data from a memory buffer
to the network media using the conditions cited. As a server becomes
more popular and develops tighter temporal clustering for same file
requests the time it takes to output the data increases at a faster
than linear rate. The rate is thus because as the interrupt pattern
becomes more dense the system degrades. The use of on server caches
and mirror servers cannot address the fact that the data is transferred
to the network substrate as many times as it is requested.

Conclusion To The Introduction

   The individualy focused request and response paradigm at the core of
the current client server model fails massive public application because
of inefficiency bred of treating every request and every response as an
individual piece of work regardless of the prescence of conditions that
allow optimization. Solution lies in the direction of revised input and
output processing that exploits patterns of data redundancy, temporal
clustering, and the efficiencies of multicast delivery.

   This approach cannot be wholly transparent below the application
layer. Transport and network layer protocols different from those
commonly used today must be employed in the new client server model.

Client To Server

Basis for Request Reduction

   The basis for advantageous request reduction is high frequency
arrival of the same request semantics from different clients.
The busiest Web sites today receive HTTP hits at a sustained rate of
300 per second. Given the fact that most clients will use the same
entry point to the site and the same few layers of the site's HTML
document hierarchy there exists, within a small time window such as
two seconds, scores of requests for the same HTML file. Even if we
scale down from the busiest Web sites by an order of magnitude the
sixty or so HTTP hits inside the time window provides sufficient basis
for successful request reduction.

Distribution of Request Reduction Responsibilities

   The request reduction process runs on the server's platform.
The request reduction process is implemented as a multi-threaded daemon
that receives incoming client requests from an RMP connection that has
endpoint code running at the client's host and at the host of the
request reduction process.

The request reduction process consists of the following threads:

  One manager thread.
     Classifies input requests.
     Starts and stops service threads.
     Provides overall process control.

  One listener thread.
     Receives client requests and places them in the manager
     thread's input queue.

  Many unicast service thread(s)
     Implements the server side unicast RMP endpoint for requests
     that are not consolidated.

  Many time window service thread(s)
     Implements the server side multicast RMP endpoint for
     consolidated requests.

   The request reduction process acts as a filter that removes and
processes the file requests it is responsible for and passes directly
through to the server the rest of the request traffic.

How Request Reduction Sees Time Flow

   A request reduction time window service thread divides time into
small windows of configured interval. All time windows associated with
the same file use the same configured interval. Time windows of
different files are free to be configured with differant time intervals.
There can and almost certainly will be more than one time window for the
same file when measured over a longer time. The temporal flow of time
windows for the same file may or may not be continuous. New time windows
for the same file start at the release time of the previous time window
for that file if there is one or more pending request(s) for the file.
If not the continuity is broken and the time window for that file will
reappear when the next request for that file is received. All time
windows for the same file are non-overlapping with each other. Time
windows of differant files may and almost certainly will overlap each
other's temporal boundaries.

Request Reduction Operating Cycle

   When an input request arrives at the listener thread it is
immediately put into the manager thread's queue of input requests.
The manager thread will go through this queue, classify each reqeust,
and take appropriate action.

Case 1: Request Is Not A Consolidation Candidate

   The manager thread has examined the request to see if it is a file
request with no client specific processing. It hasn't passed this test
so the manager thread starts a unicast thread to service the request's
reply and passes the request to the server.

Case 2: Request Is A Consolidation Candidate For A New Time Window

   The request examined by the manager thread is a consolidation
candidate. Its semantics have been examined to see if they match the
semantics of the request held by any existing time window. No match
was found. The manager thread starts a new time window service thread
to service this request. This involves creating a new time window by
allocating a time window structure, starting a new timer, setting the
reduction count variable to zero, allocating a memory buffer for the
new request, moving the new request into this buffer, and incrementing
the time window's reduction count variable by one. Another buffer
associated one-to-one with the newly allocated time window is allocated.
This buffer contains the list of addresses of clients that share the
same time window membership. Another way to look at it is to say that
this is a list of addresses of clients that have made the same request
at nearly the same time. This list is called the multicast distribution
list. This list is keyed by a system generated unique key that binds it
to the consolidated request.

Case 3: Request Is A Consolidation Candidate For An Existing Time Window

   The request examined by the manager thread is a consolidation
candidate. Its semantics have been examined to see if they match the
semantics of the request held by any existing time window. A match was
found. The client address of the newly arrived request is inserted into
the matching time window's multicast distribution list and the time
window's reduction count variable is incremented by one.

   When a time window's release comes due, either because of elapsed
time or because the reduction count variable has exceeded a configured
maximum, the following happens.

   1) The time window service thread generates, for reference by the
co-resident multicast response process, a unique request key that
identifies the consolidated request and its multicast distribution
list.

   2) The time window service thread inserts the multicast distribution
list into a queue kept in shared memory. ( The other co-resident
process, multicast response, is this queue's consumer. )

   3) The time window service thread creates a consolidated request.
This consists of the common request data held in the time window's
request buffer with the client's address field filled in with the
system generated unique key that references the list of clients that
should receive a reply to the consolidated request.

   4) The consolidated request is passed to the server.
( Note that this request looks exactly like an ordinary request
to the server. )

Key Generation Issues

   It is important that the key generator does not select a key value
to identify a multicast distribution list which collides with the
originator address of a request which was bypassed as a candidate
for consolidated request. To insure that this does not occur Network
Traffic Consolidation will have assigned to it one Class B Internet id
from which the key generator will make the key values that uniquely
identify each multicast distribution list. Since these key values are
pseudo-addresses that are never seen outside of the NTC processes that
reside on one platform it is possible for every installation of NTC to
re-use the same Internet id as the root from which all key values are
derived. The only real issue is to insure that no client can ever have
the same address as an NTC key value.

Server To Client

IP Multicast in a Nutshell

   Multicast communication involves the sending of packets from one
source to many destinations. Network routers that run the multicast
router daemon copy received packets onto those interfaces that are part
of a shortest path distribution tree pruned of superfluous links.
This pruned distribution tree provides just one path from the packet's
source to each destination. Destinations are referenced by a special
type of IP address known as a group address or Class D Internet address.
Recipients of multicast packets have a standard command interface that
allows them to join and leave a group address thus controlling what
transmissions they will receive. The architecture of IP multicast is
defined by RFC 1112. A representative implementation is MOSPF defined
by RFC 1584 and further discussed in RFC 1585.

Reliable Multicast Protocols

   Reliable Multicast Protocols, or RMPs, are built upon the network
layer IP Multicast service that has become widely deployed on the
Internet.

Qualifying A Reliable Multicast Protocol

   Network Traffic Consolidation allows customer's to select the RMP
service of their choice from a list of supported RMPs that meet
the following qualification criteria.

Receiver initiatiated design to achieve:
   Distributed state management that scales to many receivers without
   sender becoming a bottleneck.

IP multicast is the network layer service used by the RMP.

Supports dynamic join and leave of members from a multicast group.

Completely and transparently manages flow control, error retransmission
and error messages.

Native support for unicast transmission instead of reliance on TCP
as a second transport.

Distributed modular organization that allows error retransmission
from local data store instead of the sender to achieve:
   Reduced wait by client on a retransmission.
   Better use of network bandwidth.
   Server free from having to buffer sent data.

Multicast Response

   The multicast response driver is a very simple single threaded
process. It receives a server reply through a call from the NTC
service API and tries to find a multicast distribution list that
matches the reply's destination address. If it cannot match the address
then it simply calls the RMP's unicast service to handle delivery of the
server's reply. This is the case of a non-consolidated request being
fulfilled. If there is a match then a consolidated request is being
processed. If this is the case then the multicast response driver
makes a multicast invocation of the RMP using the server provided reply
data, the consolidated request's multicast distribution list, and any
pertinent configuration data. The only other thing that the multicast
response driver does is to invoke an RMP primitive to release all process
and data resources associated with the request after a configured amount
of time. The timeout is calculated to be long enough to allow nearly 100%
of transmissions to complete in their entirety under adverse conditions.

Advantages Of Network Traffic Consolidation

   Network Traffic Consolidation moves in the direction of bounding
the number of requests that a server will receive for a given file
in a given span of time. This makes server load more predictable and
more importantly protects the server from being overwhelmed by too
many requests in a given span of time.

   Although HTTP data can be served very well by Network Traffic
Consolidation this technology is future safe in the sense that it is
general enough to process all highly redundant data records regardless
of format.

   Network Traffic Consolidation addresses the bottleneck at the point
where busy servers, be they primary or proxy servers, transfer data from
host memory to network media. This involves significant CPU resource on
popular servers that regularly have dozens of clients simultaneously
requesting the same few high level HTML files.

   The multicast mode of transmission used by the server to client
processing of Network Traffic Consolidation preserves network bandwidth
when compared to the current unicast method of serving clients.

   Network Traffic Consolidation reduces the number of software
interrupts received by network hosts for a given rate of client
requests.

   Network Traffic Consolidation scales exceptionally well. The worst
area of Web site overload involves accessing the first few levels of
HTML document files. There is more redundant data access here than
anywhere else. Because of the hierarchical structure of a Web site nearly
everyone enters from a common top page and there is a slow moving
concentration of traffic at levels near the top page that gradually
works downward. In Network Traffic Consolidation, because every like
intentioned request in the same small time window is consolidated into
one request, the greatest improvement over the conventional one request
one response mode of service is seen during heavy load.


Security Considerations

   Since the identities of the clients involved in a consolidated
   request are masked by a pseudo-address key the server is not able to
   enforce any client specific restrictions to data access. In this
   sense, a consolidated request is a Trojan Horse.  This is an area
   that needs to be addressed.


References

   S. Deering,   "Host Extensions for IP Multicasting", STD 5, RFC1112
                  Stanford University, August 1989

   Rajendra Yavatkar, James Griffioen, Madhu Sudam,
                 "A Reliable Dissemination Protocol for Interactive
                  Collaborative Applications", University of Kentucky,
                  December 1996
                  http://www.dcs.uky.edu/~griff/papers/
                  tmtp-mm95/main.html


   Nils Seifert, "Multicast Transport Protocol Version 2",
                  Berlin, October 1995
                  http://www.cs.tu-berlin.de/~nilss/mtp/protocol.html


   Alex Koifman, Steve Zabelle
                 "A Reliable Adaptive Multicast Protocol", RFC 1458,
                  TASC, May 1993

   S. Armstrong, A. Freier, K. Marzullo
                 "Multicast Transport Protocol", RFC 1301,
                  Xerox, Apple, Cornell,
                  Feb 1993

   J. Moy        "Multicast Extensions to OSPF", STD 1, RFC 1584,
                  Proteon Inc., March 1994

   J. Moy        "MOSPF: Analysis and Experience", RFC 1585,
                  Proteon Inc., March 1994

   T. Berners-Lee, R. Fielding, H. Frystak,
                 "Hypertext Transfer Protocol - HTTP/1.0", RFC 1945,
                  MIT/LCS, UC Irvine, DEC, May 1996

   R. Fielding, J. Gettys, J. Mogul, H. Frystyk, T. Berners-Lee
                 "Hypertext Transfer Protocol - HTTP/1.1",
                  STD 1, RFC 2068, January 1997

   S.E. Spero,   "Analysis of HTTP Performance Problems",
                  http://sunsite.unc.edu/mdma-release/http-prob.html


Author's Address

   Joseph Mansigian
   155 Marlin Rd.
   New Britain, CT 06053

   Phone: (860) 223-5869
   EMail: jman@connix.com