Internet Engineering Task Force Yunhong Gu
Internet Draft Robert L. Grossman
Document: draft-gg-udt-01.txt University of Illinois at Chicago
Expires: February 2005 August 2004
UDT: A Transport Protocol for Data Intensive Applications
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC 2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that other
groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Abstract
This document proposes a new UDP based Data Transfer protocol, also
known as UDT. UDT can be used in high bandwidth-delay product (BDP)
networks to utilize the abundant network resource efficiently and
fairly. It is expected to be used in distributed data intensive
applications over high-speed wide area networks, such as scientific
computing on computational grids.
The current most widely used TCP version (TCP NewReno/SACK) does not
work well in such environments due to its slow discovery and recovery
to the available bandwidth as the network BDP increases. In addition,
it exhibits "unfairness" when concurrent TCP flows have different
round trip time (RTT).
UDT is an application layer solution by introducing new congestion
control and reliability control. The protocol is defined upon UDP.
The congestion control combines both rate-based and window-based
methods and uses bandwidth estimation techniques to dynamically
configure the control parameters.
Gu, Grossman Expires - February 2005 [Page 1]
UDT August 2004
Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.
Table of Contents
1. Introduction...................................................2
2. Target Use Scenarios and Design Goals..........................3
3. Protocol Specification.........................................4
3.1 Overview...................................................4
3.2 Packet Structures..........................................4
3.3 Timing.....................................................6
3.4 The Sender's Algorithm.....................................7
3.5 The Receiver's Algorithm...................................8
3.6 Rate Control Algorithm....................................11
3.7 Flow Control Algorithm....................................12
3.8 Connection Setup and shutdown.............................12
3.9 Loss Information Compression Scheme.......................13
4. Efficiency and Fairness Characters............................14
Security Considerations..........................................14
Normative References.............................................15
Informative References...........................................15
Acknowledgments..................................................15
Author's Addresses...............................................16
1. Introduction
As the network bandwidth-delay product (BDP) increases, conventional
TCP implementations become inefficient. This is because its AIMD
(additive increase multiplicative decrease) algorithm reduces the TCP
congestion window drastically but fails to recover it to the
available bandwidth quickly. Theoretical flow level analysis has
shown that TCP becomes more vulnerable to packet loss as the BDP
increases higher [LM97].
Additionally, the unfairness of RTT bias inherent in TCP congestion
control also becomes more serious in distributed data intensive
applications. Concurrent TCP flows with different RTT will tend to
unfairly share the available bandwidth. Although small BDP networks
will share bandwidth relatively fairly using conventional TCP
implementations, networks with large BDP tend to suffer severe
unfairness issues under conventional TCP. This RTT bias severely
limits the effectiveness of distributed computations based on high-
speed wide-area networks as, for example, Grid-based computations in
Gu, Grossman Expires - February 2005 [Page 2]
UDT August 2004
an Internet2-based context.
As of today, improvements on the standard TCP still fails to satisfy
the efficiency and fairness (especially the RTT bias problem)
objectives in high BDP environments. Such TCP modifications as
presented in RFC 1423 (high-performance extensions), RFC 2018 (SACK),
RFC 2582 (New Reno), RFC 2883 (D-SACK), and RFC 2988 (RTO
calculation) do improve the efficiency more or less, but the
fundamental problem of the TCP AIMD algorithm is unsolved. HS TCP
(RFC 3649) achieved high utilization of bandwidth in high BDP
networks by radically changing the TCP congestion control algorithm,
but the fairness problem still remains.
Considering the background above, new transport protocol is needed to
support the high performance bulk data transfer in high BDP networks.
We propose a new application level transport protocol, named UDT, or
UDP based Data Transfer Protocol, together with a new congestion
control algorithm.
This document presents the two orthogonal parts of the UDT protocol
and the UDT congestion control algorithm. The application level
protocol, layering over UDP, can use other congestion control
algorithms; whereas the algorithm defined in this document can be
implemented in other protocols, such as TCP.
An example implementation of the protocol can be found at [UDT].
Detailed performance analysis of the congestion control algorithm can
be found in [GHG04].
2. Target Use Scenarios and Design Goals
UDT is expected to be used in the situations where a small number of
bulk sources share abundant bandwidth. Typical use scenario is the
computational grids [FKT01] built on wide area optical networks. A
few institutes run their data intensive distributed applications on
the networks, such as remote access to scientific instruments,
distributed data mining, and high-resolution media streams.
The main objectives of UDT are efficiency, fairness, and stability.
Single, or small number of, UDT flows should utilize all the
available bandwidth provided by the high-speed links, even if the
bandwidth changes drastically. Meanwhile, all concurrent flows must
share the bandwidth fairly, independent of the different bottleneck
bandwidth, start time, or RTT. Stability requires that the packet
sending rate should always converge to the available bandwidth very
quickly, and congestion collapse must be avoided.
UDT is NOT used to replace TCP in the Internet where the bottleneck
Gu, Grossman Expires - February 2005 [Page 3]
UDT August 2004
bandwidth is relatively small and there are large amounts of
multiplexed short life flows.
However, UDT aims to be TCP friendly - when coexisting with TCP, the
bandwidth that UDT allocates should not exceed its fair share
according to the max-min rule. (Note that the max-min rule allows UDT
to allocate the available bandwidth that TCP fails to utilize in high
BDP lossy links.) We feel that TCP friendliness is critical since TCP
will always be in use in these high BDP networks, and UDT
applications may sometimes run in the Internet.
3. Protocol Specification
3.1 Overview
UDT is duplex. Each UDT entity has two parts: the sender and the
receiver. The sender sends (and retransmits) application data
according to flow control and rate control. The receiver receives
both data packets and control packets, and sends out control packets
according to the received packets as well. Both the sender and the
receiver share one UDP port for packet sending/receiving.
The receiver is also responsible for triggering and processing all
control events, including congestion control and reliability control,
and their related mechanisms such as RTT estimation, bandwidth
estimation, acknowledging and retransmission.
UDT always tries to pack application data into fixed size packets,
unless there is not enough data to be sent. Similar to TCP, this
fixed packet size is named MSS (maximum segment size). Since UDT is
supposed to be used to transfer bulk data stream, we assume that
there is only a very small portion of irregular sized packets in a
UDT session. MSS can be set up by applications (see Section 3.8) and
the optimal value is the path MTU (including all packet headers).
The UDT congestion control algorithm combines rate control and window
control (flow control), where the former tunes the packet-sending
period and the latter limits the number of unacknowledged packets.
The parameters used in rate control are updated by bandwidth
estimation technique, which is derived from the receiver based packet
pair method [P99], in real time. Meanwhile, the rate control period
is constant in order to eliminate RTT bias. The flow control
parameters depend on the data arrival speed at the peer side, in
addition to the receiver's free buffer size.
3.2 Packet Structures
UDT has two kinds of packets: the data packets and the control
Gu, Grossman Expires - February 2005 [Page 4]
UDT August 2004
packets. They are distinguished by the 1st bit (flag bit) of the
packet header. The flag bit of a data packet is 0, whereas it is 1
for a control packet.
The data packet structure is shown as below.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| Packet Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
~ Application Data Payload ~
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Packet sequence number is the only content in the header of a UDT
data packet. It is an unsigned integer using the following 31 bits
after the flag bit. UDT uses packet based sequencing, i.e., the
sequence number is increased by 1 for each non-retransmitted data
packet. The sequence number is wrapped after it is increased to the
maximum number (2^31 - 1). Following this 32-bit packet header is the
payload for application data.
The control packet has the following structure.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1|type | Reserved | ACK Seq. No. |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
~ Control Information Field ~
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
There are 6 types of control packets in UDT and the type information
is put in bit field 1 - 3 of the header. The first 32 bits must exist
in the packet header. The control Information Field contains zero
(i.e., it does not exist; in this case it is noted as "none" in the
specification) or multiple 32-bit unsigned integers, depending on the
packet type.
UDT uses ACK sub-sequencing. Each ACK/ACK2 packet has a unique 16-bit
sequence number, which is independent of the data packet sequence
number. It uses bits 16 - 31 in the control packet header. The ACK
sequence number ranges from 0 to (2^16 - 1). Bits 16 - 31 are
undefined in other control packets.
Gu, Grossman Expires - February 2005 [Page 5]
UDT August 2004
TYPE 000: Protocol Connection Handshake
Control Info:
1) 32 bits: UDT version
2) 32 bits: initial sequence number
3) 32 bits: MSS (bytes)
4) 32 bits: maximum flow window size (bytes)
TYPE 001: Keep-alive
Control Info: None
TYPE 010: Acknowledgement (ACK)
bits 16-31: ACK sequence number
Control Info:
1) 32 bits: The packet sequence number to which (excluding)
all the previous packets have been received
2) 32 bits: RTT (microseconds)
3) 32 bits: RTT variance, or RTTVar (microseconds)
4) 32 bits: Flow window size (number of packets)
5) 32 bits: Estimated link capacity (number of packets per
second)
TYPE 011: Negative Acknowledgement (NAK)
Control Info:
32-bit integer array of compressed loss information
(see Section 3.9).
TYPE 100: Reserved.
This type of control message is reserved for congestion
warning that can be sent from the receiver to the sender. A
congestion warning can be triggered by ECN, or a
measurement of increasing trend in the packet delay.
TYPE 101: Shutdown
Control Info: None
TYPE 110: Acknowledgement of Acknowledgement (ACK2)
bits 16-31: ACK sequence number
Control Info: None
TYPE 111: Explained by bits 4 - 15, reserved for future use.
Note that for both data and control packets, the actual packet sizes
can be ascertained from UDP header [UDP]. The packet size information
can be used to derive the size of the payload in a data packet and
the size of control information field in an NAK control packet.
3.3 Timing
UDT uses four timers in the receiver to trigger different periodical
events, including rate control, acknowledgement, loss report
(negative acknowledgement), and retransmission / connection
maintenance.
Timers in UDT use the system time as origins and can process wrapping
if the system time wraps. The UDT receiver actively queries the
system time to check whether a timer is expired or not. For a certain
Gu, Grossman Expires - February 2005 [Page 6]
UDT August 2004
timer T in UDT with a period of TP, suppose variable t is used to
record the most recent time when T is set or reset. If T is set or
reset at system time t0 (t = t0), then at any time t1, (t1 - t >= TP)
is the condition to check if T is expired.
The four timers are 1) RC timer, 2) ACK timer, 3) NAK timer, and 4)
EXP timer. Their periods (or timeout values) are RCTP, ATP, NTP, and
ETP, respectively.
RC timer is used to trigger periodical rate control. ACK timer is
used to trigger periodical selective acknowledgements (ACK packets).
Both RCTP and ATP are constant values: RCTP = ATP = 0.01 seconds.
NAK is used to trigger negative acknowledgements (NAK packets).
Retransmission timer is used to trigger a data packet's
retransmission and maintain connection status. Their periods depend
on the estimated RTT value. The ETP value also depends on the number
of continuous EXP timeouts. The recommended initial value of RTT in
UDT is 0.1 seconds and the values of NTP and ETP are initialized as:
NTP = 3 * RTT, ETP = 3 * RTT + ATP.
The system time is queried after each time bounded UDP receiving
(there will be additional necessary data processing time if a UDP
packet is received) to check if any of the four timers is expired.
The recommended granularity of their periods is microseconds. The
timeout value of UDP receiving is a choice of implementation,
depending on the tradeoff between the system burden caused by the
looped queries on system time and the accuracy of event periods.
The rate control event updates the packet-sending period, STP, which
is used to schedule data packet sending in the UDT sender. Suppose a
packet is to be sent at time t0, then the next packet sending is
scheduled to (t0 + STP). In the other word, if the previous packet
sending takes t' time, the sender will wait for (STP - t') for the
next sending (if STP - t' < 0, no waiting is needed). This waiting
interval needs a high precision implementation. It is recommended
that CPU clock cycle granularity be used for this timer.
3.4 The Sender's Algorithm
Data Structures and Variables:
1) SND PKT History Window: A circular array that records the
departure time of each data packet.
2) Sender's Loss List: The sender's loss list is a linked list used
to store the sequence numbers of the lost packets fed back by the
receiver in NAK packets. The numbers are stored in increasing
order.
Data Sending Algorithm:
Gu, Grossman Expires - February 2005 [Page 7]
UDT August 2004
1) If the sender's loss list is not empty, retransmit the first
packet in the list and remove it from the list. Go to 5).
2) Wait until there is application data to be sent.
3)
a. If the number of unacknowledged packets exceeds the flow
window size, go to 1).
b. Pack a new data packet and send it out.
4) If the sequence number of the current packet is 16n, where n is an
integer, go to 2).
5) Record the packet sending time in SND PKT History Window.
6) If this is the first packet sent after the last time when the
sending rate is decreased, wait SYN time.
7) Wait (STP - t) time, where t is the total time used by step 1 to
step 4. Go to 1).
3.5 The Receiver's Algorithm
Data Structures and Variables:
1) Receiver's Loss List: It is a linked list of tuple whose values
include: the sequence number of a lost data packet, the latest
feedback time of the lost packet, and the number of times the
packet has been fed back. Values are stored in the increasing
order of packet sequence numbers.
2) ACK History Window: A circular array of each sent ACK and the time
it is sent out. By "circular" it means that the most recent value
will overwrite the oldest one if there is no more free space in
the array.
3) RCV PKT History Window: A circular array that records the arrival
time of each data packet.
4) Packet Pair Window: A circular array that records the time
interval between each probing packet pair.
5) LRSN: A variable to record the largest received data packet
sequence number. LRSN is initialized to the initial sequence
number (to be sent) minus 1.
6) exp-count: A variable to record number of continuous EXP timeout
events and its initial value is 1.
Data Receiving Algorithm:
1) Query the system time to check if RC, ACK, NAK, or EXP timer is
expired. If there is any, process the event accordingly (as
described below in this section) and reset the expired timers.
2) Start time bounded UDP receiving. If no packet arrives, go to 1).
3) Set exp-count to 1 and update ETP: ETP = RTT + 4 * RTTVar + ATP.
4) If all sent data packets have been acknowledged, reset the EXP
time variable.
5) Check the flag bit of the packet header. If it is a control
packet, process it according to its type and go to 1).
6) If the sequence number of the current data packet is 16n + 1,
Gu, Grossman Expires - February 2005 [Page 8]
UDT August 2004
where n is an integer, record the time interval between this
packet and the last data packet in Packet Pair Window.
7) Record the packet arrival time in PKT History Window.
8)
a. If the sequence number of the current data packet is greater
than LRSN + 1, put all the sequence numbers between (but
excluding) these two values into the receiver's loss list and
send them to the sender in an NAK packet.
b. If the sequence number is less than LRSN, remove it from the
receiver's loss list.
9) Update LRSN. Go to 1).
On RC timer expired:
Update STP by rate control algorithm (see Section 3.6).
On ACK timer expired:
1) Find the sequence number prior to (excluding) which all the
packets have been received by the receiver (ACK number) according
to the following rule: if the receiver's loss list is empty, the
ACK number is LRSN + 1; otherwise it is the smallest sequence
number in the receiver's loss list.
2) If (a) the ACK number is not greater than the largest ACK number
ever acknowledged by ACK2, or (b) it is equal to the ACK number in
the last ACK and the time interval between this two ACK packets is
less than (RTT + 4 * RTTVar), stop (do not send this ACK).
3) Assign this ACK a unique increasing ACK sequence number. It is
recommended that the ACK sequence number should be increased by 1
for each sent ACK in the sending order (start with 0) and wrapped
after reaching the maximum value.
4) Calculate the packet arrival speed according to the following
algorithm:
Calculate the median value of the last 16 packet arrival
intervals (AI) using the values stored in PKT History Window.
In these 16 values, remove those either greater than AI*8 or
less than AI/8. If more than 8 values are left, calculate the
average of the left values AI', and the packet arrival speed is
1/AI' (number of packets per second); Otherwise, return 0.
5) Calculate flow control window for the peer side (W) according to
Section 3.7. Then calculate effective flow window size as:
max(min(W, available receiver buffer size), 2).
6) Calculate the estimated link capacity according to the following
algorithm:
If the flow control quick start phase (see Section 3.7) is
still ongoing, return 0. Otherwise,
Calculate the median value of the last 16 packet pair
intervals (PI) using the values in Packet Pair Window, and the
link capacity is 1/PI (number of packets per second).
7) Pack the ACK sequence number, ACK number, RTT, RTT variance,
effective flow window size, and estimated link capacity into the
Gu, Grossman Expires - February 2005 [Page 9]
UDT August 2004
ACK packet and send it out.
8) Record the ACK sequence number, ACK number and the departure time
of this ACK in the ACK History Window.
On NAK timer expired:
1) Search the receiver's loss list, find out all those sequence
numbers whose last feedback time is (k * (RTT + 4 * RTTVar))
before, where k is the number of feedback times of this packet
plus 1. If there is no loss to be fed back, stop.
2) Compress the sequence numbers obtained in step 1 (according to
Section 3.9) and send them back to the sender in an NAK packet.
3) Stop flow control quick start phase if it not.
On EXP timer expired:
4) If the sender's loss list is not empty, stop.
5) Put all the unacknowledged packets into the sender's loss list.
6) If (exp-count > 16) and the total time since last time a packet
from the peer side is received has exceeded 3 seconds, or this
time is greater than 3 minutes, which is supposed to be an
indication that the connection is broken, close the UDT
connection.
7) If there is no data packet that is not acknowledged, send a keep-
alive packet to the peer side; otherwise, put all the sequence
numbers of the unacknowledged packets into the sender's loss list.
8) Update exp-count: exp-count = exp-count + 1.
9) Update ETP: ETP = exp-count * (RTT + 4 * RTTVar) + ATP.
On ACK packet received:
1) Update the largest acknowledged sequence number.
2) Update RTT and RTTVar by:
RTT = rtt
RTTVar = rv
where rtt and rv are the RTT and RTTVar values carried in the ACK.
3) Update NTP and ETP:
NTP = RTT + 4 * RTTVar
ETP = exp-count * (RTT + 4 * RTTVar) + ATP
4) Update estimated link capacity: B = (B * 7 + b) / 8, where b is
the value carried in the ACK.
5) Update flow window size to the value carried in this ACK.
6) Send back an ACK2 with the same ACK sequence number in this ACK.
7) Reset the EXP timer.
On NAK packet received:
1) Add all sequence numbers carried in the NAK into the sender's loss
list.
2) Update STP by rate control (see Section 3.6).
3) Reset the EXP timer.
On ACK2 packet received:
Gu, Grossman Expires - February 2005 [Page 10]
UDT August 2004
1) Locate the related ACK in the ACK History Window according to the
ACK sequence number in this ACK2.
2) Update the largest ACK number ever been acknowledged.
3) Calculate new rtt according to the ACK2 arrival time and the ACK
departure time, and update the RTT and RTTVar values as:
RTTVar = (RTTVar * 3 + abs(rtt - RTT)) / 4
RTT = (RTT * 7 + rtt) / 8.
The initial values of RTT and RTTVar are 0.1 seconds and 0.05
seconds, respectively.
4) Update NTP and ETP: NTP = RTT, ETP = (exp-count + 1) * RTT + ATP.
On Keep-alive packet received:
Do nothing.
On Handshake/Shutdown packet received:
See Section 3.8.
3.6 Rate Control Algorithm
Rate Control Quick Start:
The STP is initially set to the minimum time precision (e.g., 1
CPU clock cycle or 1 microsecond). This is the quick start phase
of UDT and it stops as soon as an ACK is received the estimated
bandwidth carried by it is greater than zero. The packet sending
period is then set as 1/W, where W is the flow window size
carried by this ACK.
The quick start occurs only at the beginning of a UDT connection, and
it never executed again in the UDT connection life. After quick
start, the following algorithms start to work.
On RC timer expired:
1) If there is no ACK received during last RCTP, stop.
2) Calculate the loss rate during the last RCTP according to the
number of packets sent and the number of total lost packets fed
back in NAK. If the loss rate is greater than 0.1%, stop.
3) The number of sent packets to be increased in the next RCTP (inc)
is calculated as:
if (B <= C)
inc = 1/MSS
else
inc = max(10^(ceil(log10((B-C)*MSS*8))) * Beta/MSS, 1/MSS)
where B is the estimated link capacity and C is the current
sending speed. Both are counted as packets per second. MSS is
counted in bytes. Beta is a constant value of 0.0000015.
4) Update STP: STP = (STP * RCTP) / (STP * inc + RCTP).
5) Compute the real data sending period (rsp) from SND PKT History
Window, If (STP < 0.5 * rsp), set STP to (0.5 * rsp).
6) If (STP < 1.0), set STP to 1.0.
Gu, Grossman Expires - February 2005 [Page 11]
UDT August 2004
On NAK packet received:
Data Structures and Variables:
1) LSD: The largest sequence number that has been sent when last
rate decrease occurred.
2) NumNAK: Number of NAKs since last time the LSD is updated.
3) AvgNAK: The moving average number of NAKs between two events when
the largest sequence number is greater than LSD.
4) DR: A random number that satisfies the uniform distribution on
[1, AvgNAK].
The rate control algorithm on NAK received:
1) If the largest lost sequence number in the NAK is greater than
LSD:
Increase the STP by 1/8: STP = STP * (1 + 1/8).
Update AvgNAK: AvgNAK = (AvgNAK * 7 + NumNAK) / 8.
Update DR.
Reset NumNAK = 0;
Record LSD.
2) Otherwise, increase NumNAK by 1, and if NumNAK % DR = 0:
Increase the STP by 1/8: STP = STP * (1 + 1/8).
Record LSD.
3.7 Flow Control Algorithm
The flow control window size (W) is 16 initially.
On ACK timer expired:
1) Flow Control Quick Start: If no NAK has been ever generated, or, W
has not reached or exceeded 16 packets and AS > 0, the flow window
size is updated to the total number of acknowledged packets.
2) Otherwise, if (AS > 0), where AS is the packet arrival speed, W is
updated as:
W = ceil(W * 0.875 + AS * (RTT + ATP) * 0.125)
3) Limit W to the maximum flow window size of the peer side.
3.8 Connection Setup and shutdown
One UDT entity starts first as the server, and its peer side (the
client) will send a handshake packet when it wishes to connect. The
client should keep on sending the handshake packet every constant
interval, (implementations should decide this interval according to
the balance between response time and system overhead), until it
receives a response handshake from the server or timeout occurs.
The handshake packet has the following information:
1) UDT version: this value is for compatibility purpose. The
current version is 2.
Gu, Grossman Expires - February 2005 [Page 12]
UDT August 2004
2) Initial Sequence Number: this is the initial data packet
sequence number that the UDT entity that sends this handshake
will use to send out data packets. This must be a random value
between 1 and (2^31 - 1). In addition, it is recommended that
the value should not repeat within a reasonable time history
window.
3) MSS: the size of a data packet (measured by IP payload).
4) Maximum Flow Window Size: This is the maximum flow window size
the UDT entity that receives this handshake is allowed to have.
The window size is generally limited by the sizes of data
structures in the receiver.
The server, when receiving a handshake packet, compares the MSS value
with its own value and set its own values as the smaller one. The
resulting value is also sent back to the client by a response
handshake packet, together with the server's version, initial
sequence number, and maximum flow window size.
The version value is used to check the compatibility of the two
peers. The initial sequence number and maximum flow window size are
used to initialize the parameters of the UDT entity that receives
this handshake packet.
The server is ready for sending/receiving data right after this step
is completed. However, it must send back response packets as long as
it receives any further handshakes from the same client.
The client can start sending/receiving data once it gets a response
handshake packet from the server, sets its own MSS as the value
carried in the handshake packet, and initialize its parameters
related to initial sequence number and maximum flow window size.
Further response handshake messages, if received any, should be
omitted.
If one of the connected UDT entities is being closed, it will send a
shutdown message to the peer side. The peer side, after receiving
this message, will also be closed. This shutdown message, delivered
using UDP, is only sent once and not guaranteed to be received. If
the message is not received, the peer side will be closed according
to the timeout mechanism (see Section 3.5).
3.9 Loss Information Compression Scheme
The loss information carried in an NAK packet is an array of 32-bit
integers. If a number in the array is a normal sequence number (1st
bit is 0), it means that the packet with this sequence number is
lost; if the 1st bit is 1, it means all the packets starting from
(including) this number to (including) the next number in the array
(whose 1st bit must be 0) are lost.
Gu, Grossman Expires - February 2005 [Page 13]
UDT August 2004
For example, the following information carried in an NAK:
0x00000002, 0x80000006, 0x0000000B, 0x0000000E
means packets with sequence number 2, 6, 7, 8, 9, 10, 11, and 14 are
lost.
4. Efficiency and Fairness Characters
UDT can utilize close to the full available bandwidth independent of
link capacity, RTT, and background coexisting flows, given the link
bit error rate (BER) of the current wired networks. There is a
constant time needed for UDT to increase from 0bits/s to 90 percent
of the available bandwidth when there is no loss, which is 7.5
seconds. UDT is not suitable for wireless networks.
UDT strictly satisfies max-min fairness in single bottleneck network
topologies. In multi-bottleneck scenarios, it guarantees that flows
over smaller bottleneck links obtain at least half of their fair
share according to max-min rule. RTT has little impact on the
fairness.
When coexisting with concurrent bulk TCP flow, TCP can occupy more
bandwidth than UDT does, except in one of the following 3 situations:
1) The network BDP is very large that TCP fails to utilize their fair
share bandwidth. In this situation, UDT will occupy the bandwidth
that TCP cannot utilize.
2) The link capacity is so small that UDT's bandwidth estimation
techniques do not work optimally. Simulation shows that this
threshold link capacity is about 100 kb/s.
3) In networks where FIFO queues are used along the paths, if the
queue size is greater than BDP, TCP's bandwidth share decreases as
the queue size increases. However, to reach about equal sharing
with UDT, the queue size needed generally exceeds the amount that
a practical router/switch will provide.
When short (timewise) web-like TCP flows coexist with a small number
of concurrent UDT flows, the effect of UDT on TCP flows is very
small.
More analysis can be found in [GHG03].
Security Considerations
UDT has not its specific security mechanism, whereas it depends on
the application to provide authentication and the lower layer to
provide security mechanisms, if necessary.
Gu, Grossman Expires - February 2005 [Page 14]
UDT August 2004
However, UDT implementations should check that all arrived packets
are from the expected source, since UDP is connectionless. This is
inherently accomplished because in the socket API a "connected" UDP
socket only accepts packets from a certain source the UDP socket is
"connected" to.
Normative References
[UDP] J. Postel, RFC 768: User Datagram Protocol, Aug. 1980.
Informative References
[FKT01] I. Foster, C. Kesselman, and S. Tuecke, "The Anatomy of the
Grid: Enabling Scalable Virtual Organizations", International J.
Supercomputer Applications, 15(3), 2001.
[GHG04] Y. Gu, X. Hong, and R. L. Grossman, "Experiences in Design
and Implementation of a High Performance Transport Protocol". SC
2004, Pittsburgh, PA, Nov. 6 -12, 2004.
[LM97] T. V. Lakshman and U. Madhow, "The Performance of TCP/IP for
Networks with High Bandwidth-Delay Products and Random Loss",
IEEE/ACM Trans. on Networking, vol. 5 no 3, July 1997, pp. 336-
350.
[P99] V. Paxson, "End-to-End Internet Packet Dynamics", IEEE/ACM
Transactions on Networking, Vol.7, No.3, June 1999, pp. 277-292.
[UDT] UDT Source Code, URL http://sourceforge.net/projects/dataspace.
Acknowledgments
The authors thank Dr. Xinwei Hong for his cooperated work in the
SABUL protocol, an early prototype of UDT protocol. Dr. Marco
Mazzucco and Dr. Sivakumar Harinath also contribute to the earlier
idea of SABUL.
We appreciate those researchers, developers, and system
administrators inside or outside LAC/NCDM for their helpful feedbacks
on using UDT. We are grateful to StarLight, SARA, and Canarie for
providing us the high-speed networking infrastructures.
We are also grateful for the feedback and comments from the following
individuals: Guy Almes, Cees de Laat, Freek Dijkstra, Rajkumar
Kettimuthu, and Johan Blom.
Gu, Grossman Expires - February 2005 [Page 15]
UDT August 2004
Author's Addresses
Yunhong Gu
Laboratory for Advanced Computing
University of Illinois at Chicago
700 SEO, M/C 249, 851 S Morgan St
Chicago, IL 60607, USA
Phone: +1 (312) 996-0305
Email: yunhong@lac.uic.edu
Robert Grossman
Laboratory for Advanced Computing
University of Illinois at Chicago
727 SEO, M/C 249, 851 S Morgan St
Chicago, IL 60607, USA
Phone: +1 (312) 413-2176
Email: grossman@uic.edu
Gu, Grossman Expires - February 2005 [Page 16]