Skip to main content

Early Review of draft-ietf-rtgwg-net2cloud-problem-statement-41
review-ietf-rtgwg-net2cloud-problem-statement-41-secdir-early-ounsworth-2024-09-10-00

Request Review of draft-ietf-rtgwg-net2cloud-problem-statement
Requested revision No specific revision (document currently at 41)
Type Early Review
Team Security Area Directorate (secdir)
Deadline 2024-10-11
Requested 2024-08-20
Requested by Jim Guichard
Authors Linda Dunbar , Andrew G. Malis , Christian Jacquenet , Mehmet Toy , Kausik Majumdar
I-D last updated 2024-09-10
Completed reviews Secdir Last Call review of -36 by Deb Cooley (diff)
Tsvart Last Call review of -32 by Magnus Westerlund (diff)
Intdir Early review of -26 by Benson Muite (diff)
Secdir Early review of -22 by Deb Cooley (diff)
Genart Early review of -21 by Paul Kyzivat (diff)
Opsdir Early review of -22 by Susan Hares (diff)
Rtgdir Early review of -22 by Ines Robles (diff)
Tsvart Early review of -22 by David L. Black (diff)
Dnsdir Early review of -22 by Florian Obser (diff)
Dnsdir Last Call review of -41 by David C Lawrence
Rtgdir Early review of -41 by Shuping Peng
Secdir Early review of -41 by Mike Ounsworth
Artart Last Call review of -41 by Rich Salz
Assignment Reviewer Mike Ounsworth
State Completed
Request Early review on draft-ietf-rtgwg-net2cloud-problem-statement by Security Area Directorate Assigned
Posted at https://mailarchive.ietf.org/arch/msg/secdir/SQ9Ba_5thXkFfuYeaQdFUjRpqw4
Reviewed revision 41
Result Has issues
Completed 2024-09-10
review-ietf-rtgwg-net2cloud-problem-statement-41-secdir-early-ounsworth-2024-09-10-00
I have reviewed this document as part of the security directorate's
ongoing effort to review all IETF documents being processed by the
IESG. These comments were written primarily for the benefit of the
security area directors. Document editors and WG chairs should treat
these comments just like any other last call comments.

The summary of the review is that while the Security Considerations section
lists a few specific good points, it does not address the more fundamental
issue that when you bridge a network you own to a network that you don't, you
should consider at ever level how much and how deep you're intending that
access into your network to be. -- For example: I may want O365 to see my
on-prem Exchange server, but not to be able to build a network map of all
corporate laptops and workstations.

This review is broken up into "Security Comments" first, and then "Editorial
Comments".

Security Comments

For some levity, I will start my security review with a comic that I think
brilliantly summarizes the situation https://xkcd.com/2044/ This draft is
hitting the "I wish these parts could communicate more easily", "Integrate
Everything!" boxes, and it's addressing some of the "Oh-oh, there are so many
connections. It's creating Bugs and Security Holes" box, but it's not
addressing the final box that often (especially for security) segmentation and
sandboxing is a good thing, and you need to be mindful of keeping that
segmentation where it matters.

Secton 3
This document would benefit from some discussion of thinking about where your
"sensitive" applications and data is, and what needs to be protected from what.
Absent is discussion that sometimes the drive to multi-cloud or hybrid-cloud is
actually motivated by security. Ex1: sometimes you have to run super risky
workloads (like user-submitted code or virus samples), and you know that cloud
providers can build a better sandbox than you can, so the motivation is to get
the risky stuff off your on-prem network. Ex2: You may put a highly sensitive
server in the cloud for its own protection so that it is isolated from a
potential compromise of your on-prem network. Ex3: sometimes you want to keep
your sensitive data (ex.: financial, contracts, pre-patent products, etc)
on-prem so that you can have extremely tight control of it. Depending on
whether you're trying to protect on-prem from cloud or protect cloud from
on-prem, this leads to different design considerations across pretty much every
section of this document.

Section 3.1:
TL;DR: I am not an expert on GBP, but it very much feels like there should be
at least some relevant security considerations in this section.

I am far from an expert in BGP, but my quick google shows that BGP Peering is
traditionally symmetric; ie the goal is to fully bridge the two networks. In an
On-Prem -> Cloud setup, you probably want to consider separately whether you
want to grant on-prem things access to the cloud network vs granting cloud
things access to the on-prem network. You also want to consider whether you're
intending to grant access to *the entire* on-prem network, or only certain
subnets of it. This becomes a security issue pretty quickly if you
unintentionally grant workloads in a public cloud with broad access to your
on-prem network. This falls inline with the sentence that is already in the
draft: "As such, there is pressure to peer more widely with more customers,
including those who may lack the expertise and experience in running complex
BGP peering relationships. This can contribute to ..." because doing partial
bridging / network exposure is certainly more complex. It also begs the
question of whether the public cloud operators are properly isolating
BGP-peered customers from each other.

Secton 3.5
Similar to my comment for 3.1: the described scenario is about using your
on-prem DNS servers to bridge two cloud DNS domains, and sorta implied is that
cloud-based workloads may need to resolve on-prem resources. Absent is a
discussion that you may not want cloud-based workloads to be able to resolve
(from a security perspective, "map") your entire on-prem network; for example,
even if sensitive on-prem resources are not reachable due to TCP/UDP
firewalling, the ability for an attacker to build a network map of hostnames
and addresses may already be compromising in some situations for example
addition of certain hostnames / domains to a local network may indicate a
not-yet-publicly-announced merger, acquisition, department, partnership, or new
product feature. That makes DNS Practices for Hybrid Workloads even more
complex if you only want to expose *part* of your on-prem DNS space to the
cloud workloads.

There should be some discussion of confidentiality: whether it's an on-prem
component resolving an cloud component, or vice-versa, these are private
internal components, and the DNS queries should be considered sensitive
metadata. So cross-network DNS queries should be paired with an encryption
layer such as DNS-over-HTTPS, or ensuring that DNS queries are routed over a
site-to-cloud VPN.

Section 3.6
Surely we could think up at least half a dozen security considerations relating
to overly-broad NATs punching inintentional holes in firewalls. This is the
section where the XKCD comic really applies.

Are you sure you want your cloud workloads to have outbound to the internet?
That enables compromised workloads to "phone home" to the attacker's
command-and-control server or reverse-shell server, or to exfiltrate data to
the attackers server, or .., or .., or. Typically on-prem enterprises have Data
Loss Prevention (DLP) tools to detect this sort of behaviour leaving their
network, but public clouds may or may not have enterprise-grade DLP solutions
that monitor outbound connections from workloads for suspicious traffic
patterns.

We could also ask about whether compromised cloud workloads should have access
to resources outside their local subnet. In security modelling, we typically
consider subnets of workloads to be nice security containers ... at least a
compromise can't spread beyond a subnet boundary; but if you start using NAT to
punch holes in your firewalls, then you don't have firewalls anymore and all
bets are off.

We could also ask about whether it's a good thing for things outside a VPC to
be able to reach things inside. Typically a multi-tiered cloud application will
bury sensitive things like databases, HSMs, secrets vaults, legacy applications
that don't meet modern security requirements, etc deep in a backend VPC for
their own protection. Again, if you start using NAT to punch holes in your
firewalls, then you don't have firewalls anymore.

... speaking of the word "firewall", it only appears once in the entire
document (in the Introduction). I would think that this word would feature
prominently throughout pretty much every section of a document that is
fundamentally about bridging networks of things you own and things you don't.

Secton 3.7
This also has security concerns in that inbound tunnels to on-prem networks
often want to have IP based firewalling, or increasingly AI-based anomaly
detection, but constantly-shifting cloud sides of the tunnel make this sort of
thing hard.

Sections 4 & 5
Nothing new that isn't already mentioned above.

Editorial Comments
(I am a complete outsider to the routing space, so feel free to disregard
editorial comments if these points would be obvious to the target audience of
this document).

1. Introduction
Is it correct to interpret from the last sentence that the target audience of
this document is network admins for existing enterprise networks who are
considering adopting some form of hybrid cloud? If so, this point becomes
muddier further on as some of the proposed mitigations seems like they need to
be implemented by the the public cloud infrastructure. Possibly the document
could benefit from more clearly separating "on-prem components", "cloud
components under the control of the customer" and "cloud infrastructure under
the control of the cloud operator", and being clearer about where each
suggested mitigation applies? As a concrete example of this, Section 3.3 has
the sentence: "   One method to mitigate the problems listed above is to use
anycast
   [RFC4786] for the services so that network proximity and conditions
   can be automatically considered in optimal path selection."
which, at least to my layman's reading, is not clear whether this is something
the customer can do, or needs to be done by the cloud operator. And then the
other two suggested mitigations in that section really sound like things the
cloud operator need to implement.

Section 2:
Is it possible to expand the acronym for "SD-WAN"? It's used several times
throughout the document but never defined. I assume it's "Software-Defined Wide
Area Network"?

The term "BGP peering" is first used in 3.1, but I am unfamiliar with it, and
it was not defined in 2, it probably should be.

Section 3.2
Term "IGP" used without any explanation or reference.

Term "EVPN" used without any explanation or reference.

Section 3.3
This section uses the terms "DNS Server", "client", "DNS resolver", and "Local
DNS resolver". It might enhance readabilty if the text was explicit about which
things are part of the on-prem network (and therefor can be mitigated by
on-prem solutions), and which are part of the cloud.

Secton 3.6
I had to read this sentence 3 times to parse it:

> By
   configuration, some private subnets can have NAT functionality to
   reach out to external networks, and some private subnets are
   internal to a Cloud DC only.

I think a simple word change would help:

> By
   configuration, some private subnets can have NAT functionality to
   reach out to external networks, *WHILE* some private subnets are
   internal to a Cloud DC only.

Section 5.1

> in [Section 3] (Security Considerations)
The link "[Section 3]" does not appear to go to the Security Considerations?