Last Call Review of draft-ietf-bess-evpn-proxy-arp-nd-04
I have been assigned to do an OPS DIR review of draft-ietf-bess-proxy-arp-nd. This document describes how the proxy ARP/ND functions of EVPN can be used to mitigate the impact of address resolution in large broadcast domains. While the text does support the abstract and describes how such proxy functions can help prevent flooding of broadcast traffic into the EVPNs, I found something noticeably missing from an operational point of view. This was especially apparent since the concept of an ARP-Sponge was also discussed. That is, one of the big operational headaches of a large broadcast domain is with negative ARP/ND (especially ARP). We actually see this in the IETF conference networks due to internet backscatter (we might be considered a DC in this regard). While the proxy functions can mitigate the positive address resolution, they will not help with negative caching. I feel that should be discussed, at least in the security recommendations, assuming there is not a proposal for adding capabilities to the EVPN proxy functions to further mitigate this.
An overall nit is that you seem to mix capitalization of various terms like Ethernet, Proxy-ARP, Layer-2, etc. It would be good to normalize these for easier reading.
Other comments and nits on a per-section basis are found below.
In general, I found this section lacking when compared with the IXP section below it. You describe that large DCs may have a problem with broadcasts, but that is a known quantity. What I was expecting is to see more on how this document is applicable to that scenario like you did with the subsequent IXP section.
"When CE3 sends an ARP Request asking for IP1..."
Technically, CE3 is asking for the MAC address of IP1.
s/potential Layer-2 switches seating/potential Layer-2 switches sitting/
You say that a dynamic Proxy-ARP/ND entry SHOULD be flushed. It MUST be flushed if the age-time expires. I think this should be restated applying the SHOULD to whether or not an age-time is implemented. IMHO, if an age-time is implemented, keeping an entry in the table after it has aged out is incorrect behavior. An implementation MUST NOT do that.
LAG is not listed in your glossary of abbreviations.
Typically the conventions section is located at the top of a document, after the abstract.