Skip to main content

Last Call Review of draft-ietf-alto-path-vector-17
review-ietf-alto-path-vector-17-opsdir-lc-chown-2021-09-09-00

Request Review of draft-ietf-alto-path-vector
Requested revision No specific revision (document currently at 25)
Type Last Call Review
Team Ops Directorate (opsdir)
Deadline 2021-08-25
Requested 2021-08-11
Authors Kai Gao , Young Lee , Sabine Randriamasy , Y. Richard Yang , Jingxuan Zhang
I-D last updated 2021-09-09
Completed reviews Secdir Last Call review of -19 by Samuel Weiler (diff)
Genart Last Call review of -17 by Suresh Krishnan (diff)
Opsdir Last Call review of -17 by Tim Chown (diff)
Artart Last Call review of -16 by Paul Kyzivat (diff)
Opsdir Telechat review of -19 by Tim Chown (diff)
Secdir Telechat review of -22 by Samuel Weiler (diff)
Assignment Reviewer Tim Chown
State Completed
Request Last Call review on draft-ietf-alto-path-vector by Ops Directorate Assigned
Posted at https://mailarchive.ietf.org/arch/msg/ops-dir/v1viuzqkt334W6M-BlgLQHK7vko
Reviewed revision 17 (document currently at 25)
Result Not ready
Completed 2021-09-09
review-ietf-alto-path-vector-17-opsdir-lc-chown-2021-09-09-00
Hi,

I have reviewed this document (draft-ietf-opsec-v6-26) as part of the
Operational directorate's ongoing effort to review all IETF documents being
processed by the IESG.  These comments were written with the intent of
improving the operational aspects of the IETF drafts. Comments that are not
addressed in last call may be included in AD reviews during the IESG review. 
Document editors and WG chairs should treat these comments just like any other
last call comments.

This draft proposes an extension to the ALTO protocol to allow the definition
of Abstract Network Elements (ANEs) on a path between two endpoints that can be
considered when orchestrating connectivity between those endpoints, rather than
just computing based on the abstract cost of a path.  A Path Vector allows a
set of such ANEs to be defined for a path.

Caveat:

I am generally familiar with the work of the ALTO group.  My work at Jisc, a
national research and education network, includes assisting universities and
research organisations optimise large scale data transfers (up to petabytes of
data).

Overall:

I believe the document is generally well written, and the problem space it is
addressing is one for which there is value in defining a solution, but I feel
the document suffers from being too abstract and vague about what it is
defining, and its consideration of practical use cases could be improved.  Thus
I feel at this stage it is Not Ready for publication.

General comments:

The use cases defined are quite varied - large scale analytics, mobile and
CDNs.  SENSE and LHC are not specifically data analytics use cases in the usual
sense of the word, rather SENSE is a model for orchestrating network links (and
capacity) between sites, and the LHC provides large scale data sets for four
major experiments that are distributed and computed upon via the WLCG
(worldwide large hadron collider computing grid).

For LHC, QoE is not so much about time to complete; the important point is not
to have data backlogging if performance drops.

For the WLCG, two networks have evolved over many years to carry the traffic
from the four main experiments; LHCOPN, the optical network, and LHCONE, the
overlay network, both of which are ‘manually’ configured, and with enough
capacity for the traffic thanks to regular network forward look exercises. 
While a little complex to administer, other emerging disciplines have expressed
interest in using LHCONE to move data, and some have established agreements
(e.g. SKA, I believe).  While a means to provision capacity on demand would be
attractive, the R&E networks typically have capacity, LHCOPN/LHCONE carry the
LHC traffic, and bottlenecks are in the end sites (hence the evolution of the
Science DMZ principles).

Some specific examples of ANEs would be very helpful.  While the document does
contain examples, they are not grounded around a use case I can readily relate
to, such as the orchestration of a large data flow between two sites in
different R&E networks.  Can the doc show some real examples?

Section 3 talks of definitions of ANEs being “similar to” Network Elements in
RFC2216, but this is vague.  The topology in Figure 5 is quite simple, as an
example; something more realistic would be interesting. Ultimately, if ALTO
clients have the full network topology even then they may not know about the
routing that occurs by default, so implicitly there's an assumption of a
capability to steer traffic to meet a request.  What is the “request” referred
to in 5.1.2, for example?

It seems that the document argues that ‘bottlenecks’ are typically capacity
based; do ANEs include specific links, rather than routers, firewalls, etc?   A
stateful firewall can be a significant bottleneck on throughput, for example.

In 4.2.1 it talks of ALTO client identifying bottlenecks; a little more
discussion and examples of that would be useful, for practical use cases such
as an international R&E data transfer.

The discussion on p.9 about multiple flows is a little odd; in practice in R&E
networks large transfers use tools like GridFTP which uses multiple parallel
TCP flows, such that loss on individual flows does not severely impact
throughput.  Of course, BBR also reduces this concern.

Is the use of ALTO designed for single domain, or can it span multiple domains?
 It seems the latter, given the definition of ANE domains, but for the latter
there is no specific model for the common definition of ANEs.

Given the definition of ANEs and PVs, how is traffic then orchestrated or
optimised?  Some pointers here would be useful.  SENSE may be one example. 
From my own discussion with people involved with SENSE (and AutoGOLE which uses
it) there is as yet no use of ALTO (rather SENSE uses its own methods to
orchestrate based on intent-based descriptors), but it is something that may be
considered in the future.

What of non-ALTO traffic on the same links; is the approach to reserve x%
capacity of a link for ALTO orchestrated traffic (the SENSE approach, I
believe)?

Tim