Network Working Group C. Jennings
Internet-Draft Cisco
Intended status: Informational July 8, 2016
Expires: January 9, 2017
ICE and STUN Timing Experiments for WebRTC
draft-jennings-ice-rtcweb-timing-01
Abstract
This draft summarizes the results in some experiments looking at the
impact of proposed changes to ICE based on the latest consumer NATs.
It looks and the amount of non congestion controlled bandwidth a
browser can use and the impacts of that on ICE timing.
This draft is not meant to become an RFC. It is purely information
to help guide development of other specifications.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 9, 2017.
Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
Jennings Expires January 9, 2017 [Page 1]
Internet-Draft ICE Timing Experiments July 2016
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1. A multi PC use case . . . . . . . . . . . . . . . . . . . 3
3. Nat Connection Rate Results . . . . . . . . . . . . . . . . . 4
4. ICE Bandwidth Usage . . . . . . . . . . . . . . . . . . . . . 5
4.1. History of RFC 5389 . . . . . . . . . . . . . . . . . . . 5
4.2. Bandwidth Usage . . . . . . . . . . . . . . . . . . . . . 5
4.3. What should global rate limit be . . . . . . . . . . . . 6
4.4. Rate Limits . . . . . . . . . . . . . . . . . . . . . . . 6
4.5. ICE Synchronization . . . . . . . . . . . . . . . . . . . 7
5. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 7
6. Future work . . . . . . . . . . . . . . . . . . . . . . . . . 8
7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 8
8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 9
9.1. Normative References . . . . . . . . . . . . . . . . . . 9
9.2. Informative References . . . . . . . . . . . . . . . . . 9
Appendix A. Appendix A - Bandwidth testing . . . . . . . . . . . 10
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 12
1. Introduction
The ICE WG at IETF has been considering speeding up the rate of
starting new STUN and TURN connections when doing ICE. The two
primary questions that have been raised about this are 1) can the
NATs create new connections fast enough to not cause problems 2) what
will the impact on bandwidth usage be.
2. Background
A web page using WebRTC can form multiple PeerConnections. Each one
of these starts an ICE process that initiates STUN transactions
towards various IP and ports that are specified by the Javascript of
the web page.
Browser do not limit the number of PeerConnections but do limit the
total amount of STUN traffic that is sent with no congestion control.
This draft assumes that browsers will limit this traffic to 250kbps
thought right now implementation seems to exceed that when measured
over an 100ms window.
Each PeerConnection starts a new STUN transaction periodically until
all the iCE testing is done. [RFC5245] limits this to be 20ms or
Jennings Expires January 9, 2017 [Page 2]
Internet-Draft ICE Timing Experiments July 2016
more while [I-D.ietf-ice-rfc5245bis] proposes moves the minimum time
to 5 ms. Retransmission for previous stun transaction can be
happening in parallel with this.
The STUN specification [RFC5389] specifies 7 retransmission each one
doubling in timeout starting with a 500ms retransmission time unless
certain conditions are meant. This was put in the RFC to meet the
requirements of the IESG from long ago and is largely ignored by
existing implementations. Instead system do several retransmissions
(6 for Firefox, 8 for Chrome) with a retransmission time starting at
100ms and doubling every retransmission. Often there is a limit of
how the maximum retransmission time. Chrome for example only doubles
the retransmission time up to a limit of 1600 ms for chrome where it
stop increasing the time between retransmissions.
The size of STUN packets can vary based on a variety of options
selected but the packets being used by browser today for IPv4 are
about 70 bytes for the STUN requests. (Note: some other drafts have
significantly higher numbers for this size so some investigation is
likely needed to determine what the correct number is)
As the speed of the pacing is speeded up to 5ms, it increases the
number of new mappings the NAT needs to create as well as increasing
the non congestion controlled bandwidth used by by the browser. The
rest of this draft looks at what sort of issue may or may not come
out of this.
Additional information about ICE in WebRTC can be found in
[I-D.thomson-mmusic-ice-webrtc].
2.1. A multi PC use case
A common design for small conferences is to have a full mesh of media
formed between all participants where each participants sends their
audio and video to all other participants who mix and render the
results. If there are 9 people on a conference call and a 10th one
joins, one design might be for the new person to in parallel form 9
new PeerConnections - one to each existing participant.
This might result in 9 ICE agent each starting a new STUN transaction
every 5 ms. Assuming no retransmissions, that is a new NAT mapping
every 5ms / 9 ICE agent = 0.5 ms and about 5 ms / 9 ICE agent * 70
bytes / packet * 8 bits per byte which comes to about 1 mbps. As
many of the ICE candidates are expected not to work, they will result
in the full series of retransmitting which will up the bandwidth
usage significantly. The browser would rate limit this traffic by
dropping some of it.
Jennings Expires January 9, 2017 [Page 3]
Internet-Draft ICE Timing Experiments July 2016
An alternative design would be to form these connection to the 9
people in the conference sequentially. Given the bandwidth
limitations and other issues, later parts of this draft propose that
if we move the pacing to 5ms, the WebRTC drafts probably need to
caution developers that parallel implementation with these many peers
are likely to have failures.
With the current timings, doing this in parallel often works and
there are applications that do it in parallel that ill likely need to
change if the timing change.
3. Nat Connection Rate Results
The first set of tests are concerned with how many new mappings the
NAT can create. The 20ms limit in [RFC5389] was based on going
faster than than exceeded the rate of which NATs widely deployed at
that time could create new mappings.
The test for this draft were run on the very latest models NATs from
Asus, DLink, Netgear, and Linksys. These four vendors were selected
due to the large market share they represent. This is not at all
representative of what is actually deployed in the field today but
represents what we will be seeing widely deployed in the next 3 to 7
years as this generation of NATs moves into the marketplace as well
as the lower end NATs in the product lines. It is also clear that in
some geographies, a national broadband provider may use some globally
less common NAT causing that vendors NAT to prevalent in a given
country even if it is not common world wide.
Test were only run using wired interfaces and consisted of connecting
both sides of the NAT to two different interfaces on the same
computer and using a single program to send packet various direction
as well as measure the exact arrival times of packets. Key results
were verified using Wireshark to look wire captures made on a
separate computer. The first test was normal tests made to classify
the type of the NAT for the cases when 1, 2, and 3 internal clients
all have the same source port. The second test created many new
mappings to measure the maximum rate mapping could reliably be made.
The conclusion of the first test was that all of the NATs tested were
behave complaint (for UDP) with [RFC4787] with regards to mapping and
filtering allocations. This is great news as well as a strong
endorsement on the success of the BEHAVE WG. The fact that we see a
non trivial percentage of non behave compliant NATs deployed in the
field does highlight that this sample set of NATs tested is not a
representative sample of what is deployed. It does suggest that we
should see a reduced use of TURN servers over time.
Jennings Expires January 9, 2017 [Page 4]
Internet-Draft ICE Timing Experiments July 2016
On the second test, all the NATs tested could reliably create new
mapping in under 1ms - often more like several hundred micro seconds.
The NATs do drop packets if the rate of new mapping gets too high but
for all the NATs tested, this rate was faster than 1000 mappings per
second. Looking at the code of one NAT, this largely seems to be due
to large increase in clock speed of the CPUs in the NATs tested here
vs the speed in the NATs tested in 2005 in
[I-D.jennings-behave-test-results].
This implies that as long as there or less than 5 or 10 PC doing ICE
in parallel in a given browser, we do not anticipate problems on the
texted NATs moving the ICE pacing to 5ms.
4. ICE Bandwidth Usage
4.1. History of RFC 5389
At the time [RFC5389] was done, the argument made was it was OK for
STUN to use as much non congestion controlled bandwidth as RTP audio
was likely to do as the STUN was merely setting up a connection for
an RTP phone call. The premise was the networks that IP Phones were
used on were designed to have enough bandwidth to reasonable work
with the audio codecs being used and that the RTP audio was not
elastic and not congestion controlled in most implementations. There
was a form of "User congestion control" in that if your phone call
sounded like crap because it was having 10% packet loss, the user
ended the call, tried again, and if it was till bad gave up and
stopped causing congestion.
Since that time the number of candidates used in ICE has
significantly increased, the range of networks ICE is used over has
expanded, and uses have increased. We have also seem much more
widespread use of FEC that that allows high packet loss rate with no
impact on the end user perception of media quality. In WebRC there
applications such as file sharing and background P2P backup that form
data channel connecting using ICE with no human interaction to stop
if the packet loss rate is high. ICE in practical usage has expanded
beyond a tool for IP phones to become the preferred tool on the
internet for setting up end to end connection.
4.2. Bandwidth Usage
To prevent things like DDOS attacks on DNS servers, WebRTC browser
limit the non congestion controlled bandwidth of STUN transaction to
an unspecified number but seems that browsers currently plan to set
this to 250 kbps. An advertisement running on a popular webpage can
create as many PeerConnections as it wants and specify the IP and
port to send all the STUN transaction to. Each Peer Connection
Jennings Expires January 9, 2017 [Page 5]
Internet-Draft ICE Timing Experiments July 2016
objects sends UDP traffic to an IP and port of specified in the
JavaScript which the browser limits by dropping packets that exceed
the global limit for the browser.
It seems that the current plans for major browsers would allow the
browser to 250 kbps of UDP traffic when there was 100% packet loss.
Currently they send more than this. As far as I can tell there is
specification defining what this limit should be in this case.
4.3. What should global rate limit be
It is clear that sending 250 kbps on 80 kbps edge cellular connection
severely impacts other application on that connection and is not even
remotely close to TCP friendly. In the age of cellular wifi hot
spots and highly variable backhaul, the browser has very little idea
of what the available bandwidth is.
This draft is not in anyway suggesting what the bandwidth limit
should be but it is looking at what are the implication to ICE timing
based on that number. The limit has security implication in that
browser loading Javascript in paid advertisements on popular web
sides could use this to send traffic to DDOS an server. The limit
has transport implication in how it interacts with other traffic on
the networks that are close to or less than this limit.
More information on this topic can be found in
[I-D.ietf-tsvwg-rfc5405bis] and
[I-D.ietf-avtcore-rtp-circuit-breakers].
4.4. Rate Limits
Having a global bandwidth limit for the browser, which if exceeded
will drop packets, means that applications need to stay under this
rate limit or the loss of STUN packets will cause ICE to start
mistakenly thinking there is no connectivity on flows which do not
work. Consider the case with two NICs (cellular and wifi), each with
an v4 and v6 address, and a reachable TURN server on each. This
gives 12 candidates and if the other side is the same there are six
v6 addresses matching on the other side so 36 pairs for v6 and the
same for v4 resulting in 72 pairs for ICE to check (assuming full
bundle, RTCP mux etc). The number of pairs we will see in practice
in the future is a somewhat controversial topic and the 72 here was a
number pulled out of a hat and not based on any real tests. There is
probably a better number to use.
A simple simulation on this where none of the connections works
suggests that the peak bandwidth in 100ms windows is about 112kbps if
the pacing is 20 ms while it goes to about 290kbps if the pacing is 5
Jennings Expires January 9, 2017 [Page 6]
Internet-Draft ICE Timing Experiments July 2016
ms. This is the bandwidth used by a single ICE agent and there could
easily be multiple ICE agents running at the same time in the same
tab or across different tabs in the browser.
The point I am trying to get at with this is that if the global rate
limit would need to be much higher than 250 kbps to move to a 5 ms
pacing and have it reliably work with multiple things happening at
the same time in the browser.
4.5. ICE Synchronization
NATs and firewalls create very short windows of time where a response
to an outbound request is allowed to in and allowed to create a new
flow. (We do not have current measurements of this but suspect the
time is in the order of 500 ms to 5 seconds ). Though this draft did
not test these timing on major firewalls, some information indicates
these windows are being reduced as time goes on to possible provide
better a short attack window for certain types of attacks. ICE takes
advantage of both one side sending a suicide packet that will be lost
but will create a short window of time where if the other side sends
a packet it will get in a the window created by the suicide packet
and allow a full connection to form. To make this work, the timing
of the packets from either side needs to be closely coordinated.
Most the complexity of the ICE algorithm comes from trying to
coordinate both sides such that they send the related packets at
similar times.
A key implication of this is that if several ICE agent are running in
single browser, what is happening in other ICE agent can't change the
timing of what a given ICE agent is sending. Or at least the amount
of skew introduced can't cause the packets to fall outside the timing
widows from the NATs and Firewalls. So any solution that slowed down
the transmission in one Peer Connection if there were lots of other
simultaneous Peer Connection may have issues unless the far side also
knows to slow down.
Figuring out how much the timing can be skewed between the two sides
requires measuring how long the window is open on the NATs and
firewalls. Currently we do not have good measurements of this timing
and it is not possible to evaluate how much this is an issue without
that information.
5. Recommendations
The ICE and RTCWeb Transport documents should specify a clear upper
bound on the amount of non congestion controlled traffic an browser
or applications should be limited to. The transport and perhaps
security area should provide advice on what that number should be.
Jennings Expires January 9, 2017 [Page 7]
Internet-Draft ICE Timing Experiments July 2016
WebRTC basically application work better the larger that number is at
the expense of other applications running on the same congested
links.
There is no way for a JavaScript application to know how many other
web pages or tabs in the browser are also doing stun yet all of these
impact the global rate limit in the browser. If the browser discards
STUN packets due to the global rate limit being exceeded, it results
in applicant failures that look like network problems which are in
fact just an artifact of other applications running the browser at
the same time. This is critical information to understanding why
applications are failing. The recommendation here is that the WebRTC
API be extended to provide a way for the browsers to inform the
application using a given PeerConnection object if STUN packets that
PeerConnection is sending are being discarded by the browser.
6. Future work
It would be nice to collect measurements on how long NATs and
Firewalls keep mapping with no response open. It would be nice to
simulate how much global pacing would introduce skew the timing of
ICE packets and if that would reduce non relay connectivity success
rates.
7. Conclusions
The combination of a low ICE pace timing, lots of Peer Connections,
and many candidates will cause problems. The optimal way to balance
this depends on the factors such as what how much non congestion
controlled bandwidth we should assume is available.
The speed of NATs mapping creation going forward in the future is
likely adequate to move the pacing to 5ms. However applications that
create parallel peer connections or situations where more than a
handful of PeerConnections are forming in parallel in the same
browser (possibly in different tabs or web pages) need to be avoided.
From a bandwidth limit point of view, if the bandwidth is limited at
250 kbps, a 5ms timing will work for a single PeerConnection but not
much more than that. The specification should make developers aware
of this limitation. If the non congestion controlled bandwidth limit
is less than 250 kbps, a 5ms timing is likely too small to work
reliably particularly with multiple ICE agents running in the
browser.
Jennings Expires January 9, 2017 [Page 8]
Internet-Draft ICE Timing Experiments July 2016
8. Acknowledgments
Many thanks to review from Eric Rescorla for review and simple
simulator and to Ari Keraenen, and Harald Alvestrand.
9. References
9.1. Normative References
[RFC5245] Rosenberg, J., "Interactive Connectivity Establishment
(ICE): A Protocol for Network Address Translator (NAT)
Traversal for Offer/Answer Protocols", RFC 5245, DOI
10.17487/RFC5245, April 2010,
<http://www.rfc-editor.org/info/rfc5245>.
[RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing,
"Session Traversal Utilities for NAT (STUN)", RFC 5389,
DOI 10.17487/RFC5389, October 2008,
<http://www.rfc-editor.org/info/rfc5389>.
9.2. Informative References
[I-D.ietf-avtcore-rtp-circuit-breakers]
Perkins, C. and V. Singh, "Multimedia Congestion Control:
Circuit Breakers for Unicast RTP Sessions", draft-ietf-
avtcore-rtp-circuit-breakers-16 (work in progress), June
2016.
[I-D.ietf-ice-rfc5245bis]
Keraenen, A., Holmberg, C., and J. Rosenberg, "Interactive
Connectivity Establishment (ICE): A Protocol for Network
Address Translator (NAT) Traversal", draft-ietf-ice-
rfc5245bis-04 (work in progress), June 2016.
[I-D.ietf-tsvwg-rfc5405bis]
Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
Guidelines", draft-ietf-tsvwg-rfc5405bis-15 (work in
progress), June 2016.
[I-D.jennings-behave-test-results]
Jennings, C., "NAT Classification Test Results", draft-
jennings-behave-test-results-04 (work in progress), July
2007.
Jennings Expires January 9, 2017 [Page 9]
Internet-Draft ICE Timing Experiments July 2016
[I-D.thomson-mmusic-ice-webrtc]
Thomson, M., "Using Interactive Connectivity Establishment
(ICE) in Web Real-Time Communications (WebRTC)", draft-
thomson-mmusic-ice-webrtc-01 (work in progress), October
2013.
[RFC4787] Audet, F., Ed. and C. Jennings, "Network Address
Translation (NAT) Behavioral Requirements for Unicast
UDP", BCP 127, RFC 4787, DOI 10.17487/RFC4787, January
2007, <http://www.rfc-editor.org/info/rfc4787>.
Appendix A. Appendix A - Bandwidth testing
The following example web page was used to measure how much bandwidth
a browser will send to an arbitrary IP and port when getting 100%
packet loss to that destination. It creates 100 Peer Connections
that all send STUN traffic to port 10053 at 10.1.2.3. It them
creates a single data channel for each one and starts the ICE machine
by creating an offer setting that to be the local SDP.
Jennings Expires January 9, 2017 [Page 10]
Internet-Draft ICE Timing Experiments July 2016
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta charset="utf-8">
<title> STUN traffic demo </title>
</head>
<body>
<h2> This is STUN traffic demo </h2>
<p>
grab traffic with:
sudo tcpdump -i en1 -s 0 -w /tmp/dump.pcap 'port 10053'
</p>
<script>
var pc = new Array(100);
var i = 0;
function setupPC(lpc) {
lpc.createDataChannel("myData");
lpc.createOffer().then(function(offer) {
return lpc.setLocalDescription(offer);
});
}
var configuration = {
iceServers: [{
urls: 'stun:10.1.2.3:10053'
}]
};
for (i = 0; i < pc.length; i += 1) {
if (navigator.mozGetUserMedia) {
pc[i] = new RTCPeerConnection(configuration);
} else { // assume it is chrome
pc[i] = new webkitRTCPeerConnection(configuration);
}
setupPC(pc[i]);
}
</script>
</body>
</html>
Jennings Expires January 9, 2017 [Page 11]
Internet-Draft ICE Timing Experiments July 2016
Author's Address
Cullen Jennings
Cisco
Email: fluffy@iii.ca
Jennings Expires January 9, 2017 [Page 12]