QUIC WG E. Stephan
Internet Draft M. Cayla
Intended status: Informational A. Braud
Expires: January 2018 F. Fieau
Orange
July 3, 2017
QUIC Interdomain Troubleshooting
draft-stephan-quic-interdomain-troubleshooting-00.txt
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on January 3, 2018.
Copyright Notice
Copyright (c) 2017 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document.
Stephan Expires January 3, 2018 [Page 1]
Internet-Draft QUIC Interdomain Troubleshooting July 2017
Abstract
On-path network performance measurements methods currently deployed
contribute to the ossification of the Internet because they are
expensive to deploy and to maintain. This draft motivates the
exposure of QUIC header fields for on-path network measurements and
their specification in the QUIC core protocol as a solution to avoid
on-path network performance measurements to ossify the IP stack in
the future.
Table of Contents
1. Introduction ................................................ 2
2. Conventions used in this document............................ 3
3. Interdomain UX troubleshooting............................... 3
4. Reference of Network Performance............................. 4
5. QUIC Fallback ............................................... 5
5.1. Flapping .................................................. 6
6. Versioning and Implementations............................... 6
7. Security Considerations...................................... 6
8. IANA Considerations ......................................... 7
9. Discussions ................................................. 7
9.1. Fallback .................................................. 7
9.2. On-path Measurement........................................ 7
10. References ................................................. 8
10.1. Normative References...................................... 8
10.2. Informative References.................................... 8
11. Acknowledgments ............................................ 8
1. Introduction
The IP layer does not include the material for measuring the delay
and packets losses of segments of a path. The network performance is
currently measured by points of presence of the path [SPATIAL],
[COMPO] using transport fields of the upper layers: TCP transport
layer, RTP application layer...
The evolution of the Internet stack toward end-to-end integrity
protection is unavoidable [IABSEC]. This document presents the
benefits of preserving the same on-path network performance
measurement capabilities in the evolution of TCP (TCP/TLS, TCPinc...)
and UDP (QUIC/UDP...) currently specified at the IETF.
On-path network performance measurements methods currently deployed
contribute to the ossification of the Internet because they are
Stephan, et al Expires January 3, 2018 [Page 2]
Internet-Draft QUIC Interdomain Troubleshooting July 2017
expensive to deploy and complex to maintain. This is due to the use
of protocols' fields not primarily designed for this purpose. This
draft motivates the exposure of the fields for on-path network
performance measurements in the QUIC core protocol to avoid network
performance measurements to ossify the IP stack in the future.
The memo recalls the UX interdomain troubleshooting complexity
[Issue166] introduced by QUIC deployment. Then it describes
operational concerns QUIC fallbacks and discusses the potential
impacts on the security. Finally it discusses the benefits of
exposing durably the fields needed for measuring packet delay and
losses.
2. Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
3. Interdomain UX troubleshooting
Fast troubleshooting of networks performance is mandatory to maintain
end-users high Quality of Experience.
Troubleshooting is the act of identifying the origin of a problem. A
major case is the localization of troubles impacting a large number
of customers. This case becomes critical when it appears suddenly and
represents a noticeable part of the traffic between the entity that
connects customers (ISP) and the entity that provides the data (APP).
It becomes critical because network operation center (NOC) teams of
the two entities are expected to immediately identify the causes in
order to restore UX as quickly as possible. Each team checks that the
point of failure is either in their entity or outside. When they
located the point of failure in their entity they investigate their
own chains of components (network, routers, reverse proxies,...) and
quickly fix the issue.
It becomes extremely critical when an entity locates the point of
failure outside of their entity. In this case the time needed to fix
the problem is much longer and unpredictable because it expects other
entities on the path to perform the same actions on their segments.
There are many cases of troubleshooting. A typical example starts by
signaling to the ISP that its end-users are experiencing a
significant decrease of QoE when using an Internet application.
Stephan, et al Expires January 3, 2018 [Page 3]
Internet-Draft QUIC Interdomain Troubleshooting July 2017
Typical point of failures can be line cards memory errors or
overloaded routers located somewhere on the path, either in the ISP
domain or outside.
The ISP NOC has to localize asap the point of failure. Currently it
proceeds by dichotomy (is the point of failure inside ISP domain or
outside?) using passive monitoring of packet loss and congestion.
Following is the description of the parameters in use:
Packet lost downstream (vice versa for upstream):
o Measure of packet loss before the point of measure needs TCP
sequence number;
o Measure of loss located after the point of measure needs TCP ACK +
SACK.
Congestion:
o Detects that the congestion is located either before or after the
point of measure;
o The analysis is based on TCP ACK+SACK observation.
4. Reference of Network Performance
A reference of the real performance of the network is not always
provided by counters of network equipment. Counters may not be
implemented, values are not always stable, their values could be
compromised in case of software bugs or equipment congestion... In
addition, to face the increasing network architecture complexity
involved by the evolution of access networks, security and hosting
infrastructures, NOCs need reliable network performance measurement
in near real time.
In practice, these measurement tools shall be able to monitor
numerous points and interfaces within the network to provide near
real time network performances indicators taking into account the
global network state.
These measurements systems can also allow detecting issues or
unexplained behaviors on equipment, links, peers: for instance, in
mobile networks, operators shall be able to identify in real time
bottleneck links responsible for customer experience degradation and
take necessary actions to avoid further snowball effect.
Stephan, et al Expires January 3, 2018 [Page 4]
Internet-Draft QUIC Interdomain Troubleshooting July 2017
Charging of data usage is another key feature for Mobile Network
Operators where per flow accurate information is collected. It is
important to reconcile values between amount of charged data and
amount of data seen by the network (cf discussion on goodput in
section 6)). This could allow detecting fraud attempts or
dysfunctions within the network. In case of significant gap,
operators must be able to react quickly to isolate this traffic.
Additionally, charging may require the differentiation of the goodput
from the throughput.
Continuous network performance monitoring requires packet losses and
delay measurements to allow operators to manage properly their
networks and to provides them with a reference of performance of
their network for interdomain troubleshooting.
5. QUIC Fallback
Fallback is necessary to address cases where a QUIC connection
establishment fails [QUICAPP] (A device of the path blocks UDP, the
stack blocks 0-RTT...).
Fallback may occur additionally when an active QUIC connection drops
and tries to reconnect. As an example, the steps of the fallback
could be:
o The QUIC connection drops accidently;
o The UA fallback and connects in TCP/TLS to the origin server;
o The UA receives from the origin server an indication for an
alternate service [ALTSVC] supporting QUIC;
o The UA ends gracefully the TCP connection;
o The UA tries to establish a QUIC connection to the server and port
described in the alternate service indication;
o The QUIC 1-RTT connection is established;
Stephan, et al Expires January 3, 2018 [Page 5]
Internet-Draft QUIC Interdomain Troubleshooting July 2017
5.1. Flapping
A fallback may suddenly occur when one or more elements (links,
nodes, reverse proxy, switch, server ...) of the path fail or are
reconfigured.
There are cases where the fallback loops and triggers flapping
between the origin server and the alternate server. As an example,
this might happen when an alternate service indication is outdated
and points to a server which does not support QUIC anymore.
This becomes critical for UX when numerous fallbacks occur suddenly
on the same path between a set of customers of an ISP and another
entity which provides the application data. The time to troubleshoot
can be very long. The origin server and the alternate servers can be
hosted by different entities.
6. Versioning and Implementations
Versioning is an important part of the QUIC protocol framework
[QUICCORE]. Multiple versions of the protocol are expected to be
deployed and used concurrently. In order to encourage networks to
rapidly support the QUIC protocol and to support any versions of QUIC
in the future, the exposure of the fields for on-path network
performance measurement must not depend on the version.
There might be numerous implementations of the QUIC protocol in the
future. An important part of them will implement the congestion
control at application level. There will be unfair behaviors like
abnormal retransmission rate which will impact the fairness of the
repartition of the bandwidth amongst the customers of the network. By
consequence the network needs to be able to detect connections which
have abnormal throughput/goodput.
7. Security Considerations
The integrity of the transport parameters exposed for measuring on-
path delay and losses can be end-to-end protected to increase the
security of the connection. Additionally it helps end-points and on-
path points of presence to compute metrics based on the same raw
values.
Flapping from QUIC to a fallback protocol might overload on-path
devices and end-points and by consequence affect the stability of the
connections and introduces weaknesses.
Stephan, et al Expires January 3, 2018 [Page 6]
Internet-Draft QUIC Interdomain Troubleshooting July 2017
The fallback from encrypted headers to clear headers transport
protocols might open the door to new types of active attacks.
It is not clear yet whether a network can distinguish numerous QUIC
fallback flappings from an active attack:
o What is the expected behavior from the network?
o Will networks detect QUIC flapping as an active attack?
8. IANA Considerations
This draft does not request any IANA actions.
9. Discussions
9.1. Fallback
Troubleshooting QUIC traffic and its fallbacks requires measuring
similar metrics. One suggestion is to use the integrity mechanism of
the TCPinc WG [TCPINC] to protect and keep visible the fields used
for on-path measurement.
Fallback must be precisely specified in the core specification of
QUIC [QUICCORE].
To avoid unnecessary flapping [QUICCORE] might clarify the usage of
the advertisement of QUIC support in HTTP protocols [ALTSVC].
[QUICMAN] should propose guidance for the management of QUIC fallback
flapping situations.
QUIC packets numbering should be continuous to allow packet loss
monitoring. Flows control fields (ACK, SACK) are available in QUIC
but encrypted.
9.2. On-path Measurement
QUIC is designed to carry other traffic than HTTP such as DNS and
Web. End-to-end encryption of the transport headers prevents the use
of models [E-MODEL] and heuristics to estimate UX on a path segment.
To maintain a high level of UX, QUIC capabilities should support the
measurement of the delay and the losses of a segment of a source to
destination path.
Stephan, et al Expires January 3, 2018 [Page 7]
Internet-Draft QUIC Interdomain Troubleshooting July 2017
On-path measurement techniques are currently ad hoc. Adding the
exposure of the fields for on-path packet delay and losses in the
core specification of the QUIC protocol creates a stable network
performance measurement framework. It will be a real incentive for
networks to support QUIC rapidly and to support the numerous QUIC
versions in the future. This will reduce network impacts on the
ossification of the IP stack in the future.
10. References
10.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[ALTSVC] https://tools.ietf.org/html/rfc7838
[QUICCORE] https://tools.ietf.org/html/draft-ietf-quic-transport
[QUICAPP] https://tools.ietf.org/wg/quic/draft-kuehlewind-quic-
applicability/
[QUICMAN] https://tools.ietf.org/wg/quic/draft-kuehlewind-quic-
manageability/
[TCPINC] https://tools.ietf.org/wg/tcpinc/
[Issue166] https://github.com/quicwg/base-drafts/issues/166
[IABSEC] https://www.iab.org/2014/11/14/iab-statement-on-internet-
confidentiality/
10.2. Informative References
[E-MODEL] https://www.itu.int/rec/T-REC-G.107-199812-S/en
[SPATIAL] https://tools.ietf.org/html/rfc5644
[COMPO] https://tools.ietf.org/html/rfc6049
11. Acknowledgments
This document was prepared using 2-Word-v2.0.template.dot.
Stephan, et al Expires January 3, 2018 [Page 8]
Internet-Draft QUIC Interdomain Troubleshooting July 2017
Authors' Addresses
Emile Stephan
Orange
2, avenue Pierre Marzin
Lannion 22300
France
Email: emile.stephan@orange.com
Mathilde Cayla
Orange
6, avenue Albert Durand
Blagnac 31700
France
Email: mathilde.cayla@orange.com
Arnaud Braud
Orange
2, avenue Pierre Marzin
Lannion 22300
France
Email: arnaud.braud@orange.com
Fred Fieau
Orange
40-48, avenue de la Republique
Chatillon 92320
France
Email: frederic.fieau@orange.com
Stephan, et al Expires January 3, 2018 [Page 9]