Botnet Identification by Coordination-Coherence and Coherence-Driven Remediation Signaling: The MVPS Botnet Profile
draft-melegassi-mvps-botnet-coherence-00
This document is an Internet-Draft (I-D).
Anyone may submit an I-D to the IETF.
This I-D is not endorsed by the IETF and has no formal standing in the
IETF standards process.
| Document | Type | Active Internet-Draft (individual) | |
|---|---|---|---|
| Author | Leonardo Melegassi Costa | ||
| Last updated | 2026-06-03 | ||
| RFC stream | (None) | ||
| Intended RFC status | (None) | ||
| Formats | |||
| Stream | Stream state | (No stream defined) | |
| Consensus boilerplate | Unknown | ||
| RFC Editor Note | (None) | ||
| IESG | IESG state | I-D Exists | |
| Telechat date | (None) | ||
| Responsible AD | (None) | ||
| Send notices to | (None) |
draft-melegassi-mvps-botnet-coherence-00
Internet Engineering Task Force L. Melegassi
Internet-Draft Catellix
Intended status: Experimental 3 June 2026
Expires: 5 December 2026
Botnet Identification by Coordination-Coherence and Coherence-Driven
Remediation Signaling: The MVPS Botnet Profile
draft-melegassi-mvps-botnet-coherence-00
Abstract
This document specifies how the Multi-Vantage Path Synchrony
(MVPS) framework [I-D.melegassi-ippm-mvps-bundle] and its DDoS
profile [I-D.melegassi-mvps-ddos-resilience] are extended to
IDENTIFY the participating sources of a botnet by their
coordination-coherence signature, and to EMIT corroborated,
signed evidence that DRIVES existing, standardised remediation
("sanitization") machinery.
The central design constraint is honesty about scope. MVPS does
NOT itself clean, quarantine, sinkhole, or take down infected
hosts. Remediation is performed by the mechanisms already
defined by the IETF:
o RFC 6561 (Recommendations for the Remediation of Bots in
ISP Networks) -- the notification/remediation workflow;
o RFC 9132 / RFC 8783 / RFC 8811 (DOTS) -- mitigation
request signaling;
o RFC 8520 (Manufacturer Usage Description, MUD) --
containment of compromised constrained/IoT devices;
o BCP 38 / BCP 84 (RFC 2827 / RFC 3704) -- source-address
validation against spoofed botnet traffic;
o RFC 7970 / RFC 8727 (IODEF) and RFC 6545 (RID) -- the
exchange and inter-domain coordination formats;
o RFC 9424 -- the Indicator-of-Compromise (IoC) framing for
what MVPS exports.
What MVPS contributes is precisely the gap RFC 6561 Section 4
names: it asks operators to "confirm a bot infection through the
use of a combination of multiple bot detection data points ... to
corroborate information of varying dependability ... [and] avoid or
minimize the possibility of false-positive identification of
hosts." MVPS is exactly such a corroboration engine, with the
addition of a provable false-positive bound (Theorem B2) and a
coordination-coherence test (Theorem B1) that distinguishes a
genuinely coordinated population (a botnet) from an equal number of
independently misbehaving hosts.
We state three results:
Theorem B1 (Coordination Signature). A population of S sources
driven by a common controller produces a low-rank deformation
of the cross-vantage coherence covariance; independent legitimate
sources do not. The leading eigenvalue ratio is therefore a
detector of coordination, not of volume.
Theorem B2 (Corroboration / False-Positive Bound). If a single
vantage flags a candidate source with per-vantage false-positive
rate p, then requiring agreement across V independent vantages
drives the host-level false-positive probability to at most p^V
under vantage independence, and to a stated mixture bound under
partial correlation.
Theorem B3 (No Unilateral Action / Remediation Soundness). MVPS
emits evidence only. Every enforcement step is taken by an
existing standardised control point (RFC 6561 / DOTS / MUD /
BCP 38). No host is quarantined on single-vantage evidence.
Theorem B4 (Falsifiability / coherence-collapse axis). The
corroboration bound of B2 COLLAPSES on a correlated benign
population: a legitimate flash crowd is coordinated-but-benign,
the botnet analogue of the COHERENT_BUT_FALSE failure mode of the
MVPS AI-Coherence extension [I-D.melegassi-mvps-ai-coherence].
When the coherence environment so collapses, that extension's
falsifiability axis enters: re-test the apparent coordination on
the machine-regularity subspace -- features a human crowd cannot
fake. A flash crowd collapses to the independent floor there; a
real bot fleet does not.
Theorem B5 (No Free Decorrelation). Spreading the botnet's
coordination across many sources to drop each per-vantage signal
cannot lower what the multi-vantage aggregate sees: the coherent
statistic is spread-INVARIANT (T_agg = sqrt(E)) with NO compute
term, so the multi-vantage advantage GROWS with the spread and the
silent-coordination cap is E < tau^2. This is the exact form of
the B1 evasion corollary.
Theorem B6 (Non-Blinding of the corroboration set). Silently
hiding the coordination by corrupting the vantages is impossible
while the redundancy rho = V - d_eff >= 1 with diverse vantages:
any such blinding needs k > rho corruptions and is FLAGGED by the
vantage-integrity monitor (a non-zero stealth-gap), and the only
un-flagged corruption -- forging vantage reports -- is gated by a
post-quantum signature (ML-DSA, FIPS 204). "Blind" implies
"known-blind".
THE THESIS IN ONE LINE. Cross-vantage agreement is necessary but
not sufficient: the coherence environment can collapse (correlated
benign crowds, or Byzantine vantages), and where it collapses the
AI-coherence axes -- falsifiability (B4) and Byzantine-robust
geometric-median aggregation [I-D.melegassi-mvps-ai-coherence] --
are what keep the identification sound.
NOTE ON DATA PROVENANCE. Section 7 reports two kinds of result,
each tagged. Section 7.1 is a LABELLED SYNTHETIC ground-truth
experiment (script scripts/simulate_botnet_coherence.py).
Sections 7.2 and 7.3 are measured on REAL labelled botnet traffic:
the CTU-13 dataset of the Stratosphere IPS Laboratory (bidirectional
NetFlow [RFC5103] / IPFIX [RFC7011] records labelled Botnet / Normal
/ Background), across three malware families (Neris, Rbot, Virut).
On that real data the detector separates botnet from normal traffic
with held-out AUC 0.85-0.999, and the multi-vantage advantage
(Theorem B5) is instantiated with the MEASURED per-flow effect size.
What remains REQUIRED future work (Section 10) is corroboration
across THREE OR MORE INDEPENDENT REAL VANTAGES observing the same
event (the real-data form of Theorem B2): CTU-13 is a single capture
point. No claim of operational botnet takedown is made.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current
Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts
as reference material or to cite them other than as "work in
progress."
This Internet-Draft will expire on 5 December 2026.
Copyright Notice
Copyright (c) 2026 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with
respect to this document. Code Components extracted from this
document must include Revised BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Revised BSD License.
Table of Contents
1. Introduction ....................................................3
1.1. The honest distinction: identify vs sanitize ..............3
1.2. Relationship to the MVPS DDoS profile .....................4
1.3. Conventions used in this document .........................4
2. The RFC Landscape This Profile Plugs Into ......................5
2.1. RFC 6561 -- the remediation workflow ......................5
2.2. DOTS -- mitigation request signaling ......................5
2.3. MUD -- constrained-device containment .....................6
2.4. BCP 38 / BCP 84 -- anti-spoofing ..........................6
2.5. IoC / IODEF / RID -- evidence exchange ....................6
2.6. Vantage integrity: RPKI/ROV, SAVI, and PQC identity ......6
2.7. Real-world grounding (documented incidents / CVEs) .......7
3. Scope and Threat Model ..........................................7
4. The Coordination-Coherence Signature ...........................7
5. Theorems and Proofs .............................................8
5.1. Theorem B1: Coordination Signature ........................8
5.2. Theorem B2: Corroboration / False-Positive Bound ..........9
5.3. Theorem B3: No Unilateral Action .........................10
5.4. Theorem B4: Falsifiability / coherence-collapse axis ......10
5.5. When vantages collapse: Byzantine-robust aggregation .....10
5.6. Theorem B5: No Free Decorrelation (multi-vantage) ........10
5.7. Theorem B6: Non-Blinding of the corroboration set ........10
6. From Evidence to Sanitization (the pipeline) ..................10
6.1. Evidence object (IoC, RFC 9424 framing) ..................10
6.2. Hand-off to RFC 6561 remediation .........................11
6.3. Hand-off to DOTS / MUD / BCP 38 ..........................11
6.4. Canonical export: YANG and JSON ..........................11
7. Results: detection on labelled ground truth (synthetic+real) ..12
7.1. Synthetic labelled ground truth ..........................12
7.2. Real labelled botnet traffic (CTU-13) -- detection .......13
7.3. The multi-vantage advantage on real effect sizes ........14
8. Security Considerations .......................................15
9. Privacy Considerations ........................................16
10. Operational and Validation Considerations .....................16
11. IANA Considerations ...........................................17
12. References ....................................................17
Appendix A. Reproducibility (validators, simulations, receipts) ..19
Appendix B. Implementation and Deployment Guidance ...............21
Acknowledgements .................................................22
Author's Address .................................................22
1. Introduction
The MVPS DDoS profile [I-D.melegassi-mvps-ddos-resilience] proves
that a volumetric or distributed attack is DETECTED and the hit
region ATTRIBUTED in time (M-1)*T_tick, independent of attack
volume. That profile answers "is there an attack, and where is it
landing?" It does not answer "WHICH sources are participating, are
they a coordinated botnet, and what corroborated evidence can be
handed to a remediation process?"
This document answers the second question. It treats the botnet
problem as two strictly separated phases:
(a) IDENTIFICATION -- recognising, with a bounded false-positive
rate, that a set of sources is behaving as one coordinated
population; and
(b) SANITIZATION -- the operational remediation of those sources,
which this document deliberately delegates, in full, to
existing IETF mechanisms.
The contribution is confined to phase (a) plus the clean hand-off
to phase (b).
1.1. The honest distinction: identify vs sanitize
It is tempting to claim that a detector "sanitizes" a botnet. This
document does not make that claim and actively guards against it.
"Sanitization" -- notification of the subscriber, walled-garden
quarantine, sinkholing of command-and-control (C2), device
containment, or upstream scrubbing -- changes the state of a third
party's host or traffic. Such action carries legal, privacy, and
collateral-damage risk and is, by long-standing IETF consensus
(RFC 6561), the province of the network operator under defined
process, not of a monitoring instrument.
What a monitoring instrument can legitimately do is reduce the
uncertainty that makes remediation risky. RFC 6561 Section 4 is
explicit that the hard part of bot remediation is corroboration:
confirming infection from multiple independent data points to
"avoid or minimize the possibility of false-positive identification
of hosts." MVPS is designed to be exactly that corroborating data
source, with the property -- not present in single-sensor pipelines
-- that its false-positive rate at the host level is bounded in
closed form by the number of independent vantages that agree
(Theorem B2).
1.2. Relationship to the MVPS DDoS profile
This profile REUSES, without modification, the canonical machinery:
the per-vantage coherence vector x_v(t) in R^d and the Mahalanobis
D^2 statistic with chi-square phase thresholds chi^2_{d,0.95} /
chi^2_{d,0.99} [I-D.melegassi-mvps-incremental-be], the cell
partition and cell-aware minimax aggregation D^2_minimax over k cells
with Byzantine bound floor((k-1)/2)
[I-D.melegassi-mvps-ddos-resilience], the M-multiplier / T_tick
detection cadence, and the sub-tick transport of
[I-D.melegassi-coherence-bfd]. In particular, per-source coherence
data rides the existing Coherence-BFD TLVs -- the Vantage-Sketch TLV
(type 0xE0) and the AuthHMAC-SHA256 TLV (type 0xE9) -- with no new
wire format. The only new machinery is:
o a per-source (rather than per-cell) coherence projection;
o the leading-eigenvalue coordination test (Theorem B1);
o the V-vantage corroboration rule (Theorem B2);
o the evidence-export and hand-off pipeline of Section 6,
including the YANG module and JSON schema of Section 6.4; and
o no actuation: evidence only (Theorem B3).
No new wire format and no new cryptographic primitive are
introduced; authentication (the 0xE9 AuthHMAC-SHA256 TLV), replay
protection (monotonic BFD sequence numbers), and control-plane
isolation are inherited from the referenced documents.
1.3. Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
"Vantage" and "broker" are as defined in
[I-D.melegassi-mvps-ddos-resilience]. "Source" denotes an external
IP address (or, where SAVI/BCP 38 applies, a validated source) seen
by one or more vantages. "Coordination" denotes statistically
shared timing/behaviour across sources that exceeds what
independent legitimate sources produce. "Sanitization" denotes any
remediation action defined by RFC 6561 and is OUT OF SCOPE for the
detector itself.
2. The RFC Landscape This Profile Plugs Into
This profile is intentionally a thin layer of NEW analysis on top
of a mature set of EXISTING standards. Each existing mechanism owns
a phase that MVPS must not duplicate. The overarching framing of the
threat is the one set out in the IAB's Internet Denial-of-Service
Considerations [RFC4732]; this profile adds corroborated detection
evidence within that frame and defers every mitigation phase to the
standards below.
2.1. RFC 6561 -- the remediation workflow
RFC 6561 defines the ISP-side bot remediation lifecycle: detection,
notification, remediation, and failure handling, with strong
privacy and non-disruption requirements. Its Section 4 calls for
multi-point corroboration to avoid false-positive host
identification, and its Section 5 governs subscriber notification.
MVPS placement: MVPS is a detection/corroboration input to the
RFC 6561 process. It MUST NOT trigger notification or quarantine
directly; it produces the corroborated evidence on which an RFC 6561
process MAY act.
2.2. DOTS -- mitigation request signaling
DOTS [RFC9132] (signal channel, obsoleting RFC 8782), [RFC8783]
(data channel), and [RFC8811] (architecture) let a client request
upstream DDoS mitigation.
MVPS placement: when the identified coordinated population is
actively flooding a protected resource, an MVPS broker MAY act as a
DOTS client and request mitigation, carrying the per-source evidence
set as DOTS aliases / scope. The decision to request mitigation is
an operator policy, not an automatic consequence of detection.
2.3. MUD -- constrained-device containment
MUD [RFC8520] lets a device's manufacturer publish the device's
intended communication profile so the network can confine it.
MVPS placement: for the IoT botnet class (e.g., Mirai-style),
MVPS-identified compromised devices that possess a MUD profile can
be contained by re-asserting that profile at the access network.
MVPS supplies the "this device is now deviating from its MUD" signal
with cross-vantage corroboration; enforcement is MUD's.
2.4. BCP 38 / BCP 84 -- anti-spoofing
BCP 38 [RFC2827] and BCP 84 [RFC3704] specify source-address
validation (ingress filtering). Botnet traffic frequently spoofs
source addresses; a spoofed source cannot be remediated by host
notification.
MVPS placement: MVPS evidence is host-actionable only for sources
that survive source-address validation. This profile therefore
REQUIRES that per-source evidence be tagged with whether the source
passed SAVI/BCP 38 validation; un-validated sources are reported as
"spoof-suspect" and routed to traffic-level mitigation (DOTS), not
to host-level remediation (RFC 6561).
2.5. IoC / IODEF / RID -- evidence exchange
RFC 9424 frames what an MVPS finding IS: a network-level Indicator
of Compromise (an IP/prefix/behavioural artefact), positioned on the
RFC 9424 "pyramid of pain" at the network-indicator tier. IODEF
[RFC7970] and its JSON binding [RFC8727] are the document formats
for sharing such findings; RID [RFC6545] is the inter-domain
request/coordination transport.
MVPS placement: MVPS exports each corroborated finding as an IoC
carried in an IODEF document, with the coherence statistics and the
V-vantage agreement count attached as confidence metadata.
2.6. Vantage integrity: RPKI/ROV, SAVI, and PQC identity
Theorems B2-B6 are only as sound as the vantages themselves. Three
existing mechanisms -- none invented here -- are what keep the
redundancy margin rho >= 1 of Theorem B6 real and the BCP 38 tag of
Section 2.4 meaningful at host granularity:
o RPKI [RFC6480] and BGP prefix origin validation [RFC6811],
distributed by the RPKI-to-Router protocol [RFC8210], protect
the route-view class of vantage against the BGP-hijack
poisoning that the AI-Coherence cascade model
([I-D.melegassi-mvps-ai-coherence] Section 15) quantifies: a
hijack that would silently move a vantage's view is rejected
(Invalid) rather than accepted, preserving vantage diversity
(rho) instead of collapsing it.
o SAVI [RFC7039] realises BCP 38 / BCP 84 at per-host binding
granularity, so the bcp38_validated tag of Section 2.4 / 6.1
reflects an actual source-binding state, not a coarse prefix
assumption. Without SAVI a "validated" tag is only as good as
the nearest ingress filter.
o PQC vantage identity: Theorem B6(iii) requires each vantage
report to be unforgeable against a quantum adversary. This
profile RECOMMENDS binding each report with a NIST
post-quantum signature, ML-DSA [FIPS204], hardware-rooted,
either replacing or wrapping the inherited Coherence-BFD
AuthHMAC-SHA256 TLV (0xE9).
2.7. Real-world grounding (documented incidents / CVEs)
The mechanisms above are not hypothetical. This subsection is
INFORMATIVE: the CVEs motivate the threat model and the hand-off
targets; they are not used as proof of any theorem (the proofs are in
Section 5 and the appendix).
o IoT command-and-control botnets (Mirai class) recruited hosts
through device vulnerabilities such as CVE-2017-17215 (Huawei
HG532), CVE-2014-8361 (Realtek SDK miniigd), and
CVE-2018-10561 / CVE-2018-10562 (Dasan/GPON). These are
exactly the constrained-device population that the MUD
[RFC8520] hand-off (Section 6.3) is designed to contain, and
their shared C2 is the shared command direction g of
Theorem B1.
o Reflection/amplification floods exploit exposed UDP services,
e.g. CVE-2018-1000115 (memcached UDP, the 1.35 Tbps class
event). Such volumetric, often spoofed, coordinated floods are
the DOTS [RFC9132] traffic-mitigation and BCP 38 spoof-suspect
path (Sections 2.2, 2.4, 6.3), and illustrate why Theorem B5's
volume/spread independence matters.
o Mass-exploitation events such as CVE-2021-44228 (Log4Shell)
drove simultaneous, identically-templated requests from many
sources -- a textbook coordinated command with a strong shared
direction g, i.e. the high-lambda_ratio signature of
Theorem B1.
In each case MVPS would IDENTIFY the coordinated population and EMIT
corroborated evidence; remediation remains owned by RFC 6561 / DOTS /
MUD / BCP 38 (Theorem B3).
3. Scope and Threat Model
In scope:
o Identifying that N_obs observed sources contain a coordinated
sub-population of size S (Theorem B1).
o Bounding the false-positive probability of labelling any
individual source as a member (Theorem B2).
o Exporting corroborated evidence to RFC 6561 / DOTS / MUD /
IODEF (Section 6).
Out of scope (delegated or future):
o Any host- or traffic-state change ("sanitization") -- owned by
RFC 6561 / DOTS / MUD / BCP 38.
o Malware classification, C2 protocol reverse-engineering, or
attribution to a threat actor.
o Operation against an adversary who controls a strict majority
of vantages or cells (Byzantine bound inherited from
[I-D.melegassi-mvps-ddos-resilience] Theorem D2).
Adversary model:
A1. Coordinated population. S sources receive correlated
commands (timing, target, payload shape) from one or more
controllers. This correlation is the signal.
A2. Decorrelation evasion. The adversary jitters or spreads
per-source behaviour to suppress the coordination signature;
this is bounded in Section 5.1 (a cost, not a free evasion)
and made EXACT in Section 5.6 (Theorem B5): the coherent
coordinated effect seen by the multi-vantage aggregate is
spread-invariant, so spreading thins the per-vantage signal
but never the aggregate.
A3. Spoofing. Sources spoof addresses; handled by the BCP 38
tagging requirement of Section 2.4 (spoofed sources cannot
be host-remediated and are routed to traffic mitigation).
A4. Vantage corruption / blinding. The adversary corrupts or
forges vantages to make the coordination invisible. Bounded
in Section 5.7 (Theorem B6): silent blinding is impossible
while redundancy rho >= 1 with diverse vantages, and the only
un-flagged path is PQC-gated forgery (Section 2.6).
4. The Coordination-Coherence Signature
Each vantage v, each tick t, observes a set of sources and computes
a per-source feature vector f_{v,s}(t) in R^d (e.g., arrival-rate,
inter-arrival regularity, destination entropy, flag mix, TTL
stability). Stacking sources gives a matrix F_v(t).
The key empirical premise (made falsifiable in Section 10): a
COORDINATED population's feature matrix is approximately LOW RANK,
because many sources move together. An equally large set of
INDEPENDENT legitimate sources yields a near-full-rank, near-
diagonal feature covariance.
Define the cross-source coherence covariance at vantage v:
C_v(t) = (1/n) * F_v(t)^T F_v(t) (centred)
and the leading eigenvalue ratio:
lambda_ratio_v(t) = lambda_1(C_v(t)) / trace(C_v(t)).
A high lambda_ratio indicates energy concentrated in one direction
-- the coordination signature. This is the per-source analogue of
the per-cell D^2 used for DDoS detection.
5. Theorems and Proofs
5.1. Theorem B1 (Coordination Signature)
STATEMENT. Let S sources be driven by a common controller such
that each source's feature vector is f_s = a_s * g + e_s, where g is
a shared command direction, a_s a per-source gain, and e_s
independent zero-mean noise with per-coordinate variance sigma^2.
Let an equal number of independent legitimate sources have
f_s = e'_s with e'_s independent zero-mean, variance sigma^2.
Then, as S grows, the expected leading eigenvalue ratio of the
coordinated population is bounded below by
E[lambda_ratio_coord] >= (||g||^2 * Avar) /
(||g||^2 * Avar + d*sigma^2)
where Avar = E[a_s^2], while for the independent population
E[lambda_ratio_indep] -> 1/d as S -> infinity.
PROOF (sketch). For the coordinated population, C = ||g||^2 *
Avar * (gg^T/||g||^2) + sigma^2 * I + o(1) by the law of large
numbers over S, so lambda_1 -> ||g||^2*Avar + sigma^2 and
trace -> ||g||^2*Avar + d*sigma^2, giving the stated ratio. For
the independent population C -> sigma^2 * I, whose eigenvalues are
equal, so lambda_1/trace -> 1/d. The two regimes are separated by a
gap that grows with the command strength ||g||^2*Avar relative to
the per-source noise; a threshold placed in the gap separates them.
QED (asymptotic; finite-S concentration and the false-alarm rate are
the subject of the synthetic study in Section 7 and the operational
validation of Section 10).
COROLLARY (evasion cost, addresses A2). Driving lambda_ratio_coord
down to the independent value 1/d requires ||g||^2*Avar -> 0, i.e.
removing the shared command component. A botnet with no shared
command component is not coordinated and loses the operational
advantage of coordination. Evasion is therefore not free; it is
paid in lost coordination. Section 5.6 (Theorem B5) sharpens this
from an asymptotic statement into an exact spread-invariant identity.
5.2. Theorem B2 (Corroboration / False-Positive Bound)
STATEMENT. Suppose each of V vantages independently flags a given
source as a candidate member with false-positive probability at most
p (i.e., labels a benign source as a member with probability <= p).
Require that a source be admitted to the identified set only if at
least V vantages agree. Then:
(i) Under vantage independence, the host-level false-positive
probability satisfies P_fp <= p^V.
(ii) Under partial correlation with pairwise correlation rho in
[0,1], P_fp <= p^V + (1 - (1-rho)^(V-1)) * (p - p^V),
which reduces to p^V at rho=0 and to p at rho=1.
PROOF. (i) is the product rule for independent events. (ii)
interpolates: with probability (1-rho)^(V-1) the V-1 confirming
judgements behave independently of the first (giving p^V), and
otherwise they may collapse onto a single shared error (giving p);
the convex combination yields the bound. QED.
REMARK. This is the closed-form expression of the qualitative
requirement in RFC 6561 Section 4 ("a combination of multiple bot
detection data points ... to avoid or minimize false-positive
identification of hosts").
COROLLARY B2(iii) (independence is the precondition). The
corroboration gain comes ENTIRELY from vantage independence. Under
independence (rho=0), V=3 at p=0.05 already gives P_fp <= 1.25e-4,
below a 1e-3 target. Under correlation, the bound of (ii) does NOT
vanish with V: as V grows it converges UP to p, and for rho=0.1 it is
floored at roughly 7e-3 (minimised near V=2). No number of
CORRELATED vantages reaches a 1e-3 target. Operationally: the
false-positive guarantee is only as strong as the path/observation
diversity of the vantages. This is verified in
validate_botnet_coherence.py check T-B2-3, and is exactly why a
legitimate flash crowd -- whose per-vantage errors are correlated --
is the principal residual false positive (Section 7, Section 10).
5.3. Theorem B3 (No Unilateral Action / Remediation Soundness)
STATEMENT. Under this profile, no source's host state or traffic is
altered by MVPS. Every state-changing action is performed by an
external control point governed by RFC 6561, DOTS, MUD, or BCP 38,
each of which receives MVPS output as advisory input.
JUSTIFICATION. The detector's only output is a signed evidence
object (Section 6.1). The hand-off interfaces of Section 6 are all
request/advisory: an RFC 6561 process MAY notify; a DOTS server MAY
mitigate; a MUD enforcement point MAY contain. Because MVPS holds
no enforcement capability, no false positive at the detector can, by
itself, quarantine a host -- it can only raise a corroborated
request that the responsible, policy-bound control point evaluates.
This is the property that makes the false-positive bound of
Theorem B2 a SAFETY bound and not merely an accuracy figure.
5.4. Theorem B4 (Falsifiability / coherence-collapse axis)
MOTIVATION. Theorems B1-B2 rest on a premise that does NOT always
hold: that a benign population's per-vantage errors are independent,
so corroboration drives them away. The premise fails for a
legitimate FLASH CROWD: a real, correlated event (a news spike, a
software-update thundering herd) makes many sources move together in
the human-driven features (arrival rate, destination), so the crowd
looks coordinated across vantages and SURVIVES corroboration
(observed directly in Section 7: flash-crowd corroborated false
positive 0.115, exceeding the independence bound 0.035). This is the
coherence environment "collapsing": cross-vantage agreement no longer
certifies a botnet.
This is exactly the COHERENT_BUT_FALSE (CBF) failure mode of the MVPS
AI-Coherence extension [I-D.melegassi-mvps-ai-coherence] (Sections
6-7 there): a consensus that is internally coherent yet wrong. In
the botnet setting we call its dual COORDINATED-BUT-BENIGN (CBB).
The AI-Coherence extension's answer to CBF is a falsifiability axis
(its C_4): re-test the consensus on a dimension the failure mode
cannot fake. We import that axis here.
STATEMENT. Partition the per-source feature space R^d into a HUMAN
block H (features a legitimate crowd legitimately shares: arrival
rate, destination entropy) and a MACHINE-REGULARITY block M (inter-
arrival regularity, TTL stability, flag mix, payload-size dispersion
-- features only a machine fleet shares). Write the shared command
g = (g_H, g_M). Then the leading-eigenvalue ratio restricted to M:
o flash crowd (g_M = 0): the cross-source covariance on M is
exactly sigma^2 I, so lambda_ratio|_M = 1/|M| -- it COLLAPSES
to the independent floor, orthogonal to its full-space
coordination (the exact analogue of C_4's orthogonality to
C_1/C_2/C_3 for CBF, [I-D.melegassi-mvps-ai-coherence]
Section 6.5);
o bot fleet (g_M != 0): lambda_ratio|_M >> 1/|M|.
ADMISSION RULE. Admit a source only if it is coordinated on the
full space (B1) AND survives the falsifiability axis on M. This
removes the flash-crowd residual that corroboration alone cannot.
PROOF. A flash crowd's shared component lies in H by hypothesis, so
projecting out H leaves only independent per-source noise on M, whose
covariance is sigma^2 I with all eigenvalues equal: lambda_ratio|_M =
1/|M| exactly. A bot fleet's command has g_M != 0, so the rank-1
term survives the projection and lambda_ratio|_M follows the B1 form
on |M| coordinates. QED (closed form, validate_botnet_coherence.py
checks T-B4-1..3).
EMPIRICAL CONFIRMATION (Section 7). On the machine-regularity
subspace the flash crowd becomes indistinguishable from independent
legitimate traffic (mean 0.299 vs 0.300) while the bot fleet stays
high (0.746); adding the axis to the admission rule drives the
flash-crowd corroborated false positive 0.115 -> 0.000 with botnet
detection held at 1.000.
COST AND LIMIT. The falsifiability axis costs the extra machine-
regularity features per source; it does not defend against a future
adversary that deliberately matches human-crowd statistics on M as
well (the CBB analogue of a perturbation-stable hallucination,
[I-D.melegassi-mvps-ai-coherence] Section 6.5 caveat). Such an
adversary pays the full coordination-suppression cost of the B1
corollary on every monitored feature.
5.5. When vantages collapse: Byzantine-robust aggregation
B2-B4 assume the vantages themselves are honest-but-noisy. A
compromised or hijacked vantage is a second way the coherence
environment collapses: one Byzantine vantage can drag an arithmetic-
mean centroid arbitrarily, forging or masking coordination. This
profile inherits the cell-aware minimax Byzantine bound
floor((k-1)/2) of [I-D.melegassi-mvps-ddos-resilience], and, where
per-vantage distributions are aggregated, MUST use the geometric-
median estimator C_2^gm of [I-D.melegassi-mvps-ai-coherence]
(Section 11), whose breakdown point is 1/2 (versus 1/N for the mean),
together with that document's SUSPECTED_BYZANTINE label for vantage
attribution. MVPS still emits evidence only (Theorem B3); Byzantine
robustness changes WHICH evidence is trustworthy, never whether MVPS
acts.
5.6. Theorem B5 (No Free Decorrelation -- multi-vantage aggregation)
The B1 corollary states that evasion costs coordination, but only
asymptotically. This theorem makes it exact and removes the last
hope of an adversary: that spreading the coordination thinly enough
across sources defeats detection.
STATEMENT. Model the botnet's coordinated effect as a coherent
energy E that lives in the observable rowspace of the multi-vantage
operator (the "observable = actuated" regime: a coordinated effect
with no projection on the observation rowspace is, by definition,
effect with no measurable consequence). An adversary that spreads E
evenly across N sources/vantages drives the per-vantage signal to
mu = sqrt(E/N) -> 0, but the multi-vantage coherent statistic is
T_agg = sqrt(E), INDEPENDENT of N,
and contains NO computational-cost term. Consequently:
(i) no free stealth: T_agg^2 = E exactly (what the dome sees
equals the coordinated effect delivered);
(ii) the advantage adv(N) = sqrt(E) - sqrt(E/N) is positive and
strictly increasing in N -- the very spreading that blinds a
single sensor WIDENS the multi-vantage lead;
(iii) silent-coordination cap: any botnet delivering coherent
effect E >= tau^2 is detected with probability ~1 for ANY
spread; the region {large coordinated effect, detector
silent} is empty.
PROOF. Even spread gives per-source mean sqrt(E/N) over N terms;
the coherent sum normalised by sqrt(N) has mean sqrt(N)*sqrt(E/N) =
sqrt(E) regardless of N. No step references compute, so a faster
(including quantum) adversary cannot lower it: it is an information
bound, not a computation. QED (closed form,
validate_botnet_coherence.py checks T-B5-1..3).
ADVERSARIAL RED-TEAM. Letting an adversary actually play
(scripts/simulate_botnet_redteam.py, Attack 1) confirms it: as the
adversary spreads E across N = 1..128 vantages the single-vantage
detection rate collapses 0.978 -> 0.0001 while the multi-vantage
aggregate stays at 0.977 with mean statistic fixed at sqrt(E) = 6.0.
The spread that blinds one eye only widens the multi-vantage lead.
CONSEQUENCE FOR A2 (decorrelation evasion). The orthogonal
decomposition stealth(a) + effect(a) = ||a||^2 (Pythagoras on the
rowspace/nullspace split of the observation operator) is exact:
every unit of coordination an adversary hides from the observable
space is a unit of coordinated effect that no longer happens.
"Hidden coordination is coordination that does nothing."
5.7. Theorem B6 (Non-Blinding of the corroboration set)
B2-B5 assume the vantages report honestly. The final evasion is to
corrupt the vantages so the coordination becomes invisible -- to
blind the detector. This theorem bounds it.
STATEMENT. Let V vantages observe a coordination subspace of
effective dimension d_eff, with redundancy rho = V - d_eff. Then:
(i) while at most k <= rho vantages are corrupted, the honest
sub-operator keeps full column rank and EVERY coordination
direction stays detectable -- silent blinding is impossible;
(ii) blinding requires k > rho corruptions, which open a null
space of dimension k - rho that the vantage-integrity
monitor reports as a stealth-gap > 0 (the count of missing
or invalid authenticated reports): "blind" implies
"known-blind";
(iii) the only UN-flagged corruption is forging authenticated
vantage reports, gated by a post-quantum signature (ML-DSA,
[FIPS204]) with forgery probability <= 2^-lambda.
Hence P(silent blinding) <= 0 (while rho >= 1) + 2^-lambda (PQC).
Because the geometry term (i)-(ii) carries no computational variable,
the bound holds against any future technology, including quantum; the
composite inherits only the PQC exponent (ML-DSA-65 gives ~2^-112
over a generous 2^80 ten-year quantum query budget).
PROOF. Rank/SVD of the honest sub-operator (closed form,
validate_botnet_coherence.py checks T-B6-1..3).
ADVERSARIAL RED-TEAM. Attack 2 of
scripts/simulate_botnet_redteam.py corrupts k = 0..7 of V = 8
vantages (rho = 3): for k <= 3 the honest sub-operator keeps full
column rank (stealth dimension 0, no blinding); for k > 3 a blinding
null space of dimension k - 3 appears AND the stealth-gap reported by
the integrity monitor equals it exactly (always flagged). Attack 3
confirms the Section 5.5 rule: a single Byzantine forgery of growing
magnitude drags the arithmetic-mean centroid without bound (drift
1.4 -> 1397) while the geometric median stays bounded (~0.42).
DESIGN RULE (what this requires of a deployment). Ship V >= d_eff+1
DIVERSE vantages (diversity, not count, is what guarantees rank --
Section 2.6), authenticate every vantage report with a PQC signature
([FIPS204], replacing or wrapping the inherited Coherence-BFD
AuthHMAC-SHA256 TLV 0xE9), and surface the stealth-gap as a
first-class "known-blind" alarm. The promise is not "never blind"
but "never SILENTLY blind".
6. From Evidence to Sanitization (the pipeline)
6.1. Evidence object (IoC, RFC 9424 framing)
For each admitted source, the broker produces an evidence object
containing at least:
o source identifier (IP / prefix), and BCP 38 validation tag;
o lambda_ratio and D^2 time-series excerpts (the coordination
signature, Section 4);
o lambda_ratio_machine and the falsifiability_pass flag (the
machine-regularity-subspace falsifiability axis, Theorem B4);
o V (number of agreeing vantages) and the estimated P_fp
(Theorem B2);
o observation window and a content hash;
o an HMAC/signature inherited from
[I-D.melegassi-coherence-bfd] Section 12.
This object is a network-level IoC in the sense of RFC 9424 and is
serialised into an IODEF [RFC7970] document (JSON binding
[RFC8727]) for exchange.
The per-vantage observations the broker corroborates are ordinary
flow records: IPFIX [RFC7011] export, whose Information Elements
[RFC7012] (octet/packet counts, durations, ports, protocol) are the
feature substrate of the coordination signature, including
bidirectional flows exported as biflows [RFC5103]. No bespoke
telemetry is required; a deployment corroborates over IPFIX
collectors it already operates.
6.2. Hand-off to RFC 6561 remediation
For host-actionable sources (BCP 38 validated, not spoof-suspect),
the evidence object is delivered to the operator's RFC 6561 process.
That process -- NOT MVPS -- decides on notification, walled-garden
placement, or other remediation, subject to RFC 6561's privacy and
non-disruption requirements. The MVPS P_fp estimate SHOULD be
carried so the RFC 6561 operator can apply its own confidence
threshold.
6.3. Hand-off to DOTS / MUD / BCP 38
o Active flood, protected resource: broker MAY originate a DOTS
[RFC9132] mitigation request scoped to the identified sources,
and MAY pre-stage the coordination metrics (lambda_ratio,
agreeing-vantages, p_fp_estimate) as DOTS telemetry [RFC9244] so
the operator sees the corroborated evidence before any action.
o Constrained/IoT source with a MUD profile: deviation evidence is
delivered to the MUD [RFC8520] enforcement point for containment.
o Spoof-suspect source (failed BCP 38 validation): NOT host-
remediated; routed to traffic-level mitigation and reported to
the upstream for ingress-filtering follow-up per BCP 84.
6.4. Canonical export: YANG and JSON
The evidence object of Section 6.1 has a canonical machine encoding,
so it interoperates with model-driven and SIEM pipelines without a
bespoke format. It follows the conventions of the MVPS telemetry
export model [I-D.melegassi-opsawg-mvps-telemetry-export]:
o YANG module "catellix-mvps-botnet" (namespace
urn:ietf:params:xml:ns:yang:mvps-botnet, prefix mvpsb) defines
the read-only notification "mvps-botnet-evidence" carrying the
Section 6.1 fields (source-address, bcp38-validated,
lambda-ratio, d2, lambda-ratio-machine and falsifiability-pass
(Theorem B4 axis), agreeing-vantages, p-fp-estimate, window,
disposition -- including the coordinated-but-benign label --
content-hash, auth-hmac). The module is delivered
via YANG-Push [RFC8641] over NETCONF/RESTCONF exactly as the
MVPS telemetry export channel C. It is read-only: it carries
no command or actuation (Theorem B3).
o JSON Schema "mvps-botnet-evidence-v1" (2020-12) defines the
equivalent JSON object. Its stable identifier is
evidence_id = SHA-256(JCS(evidence \ {evidence_id})) per
JCS [RFC8785], identical in spirit to the telemetry event_id,
so producers are deterministic and consumers can deduplicate.
The JSON object is directly embeddable as a network-level Indicator
of Compromise in an IODEF [RFC7970] document (JSON binding
[RFC8727]); the lambda_ratio, agreeing-vantages, and p_fp_estimate
travel as confidence metadata.
7. Results: detection on labelled ground truth (synthetic and real)
7.1. Synthetic labelled ground truth
PROVENANCE. The following are results of a LABELLED synthetic
ground-truth experiment, not an operational botnet capture. You
cannot honestly answer "did we find the botnet?" without ground
truth, so every source is tagged at creation as one of three classes
and the detector is scored against the labels it never sees.
Reproducibility: scripts/simulate_botnet_coherence.py (seed
20260603, d = 6, V = 8 vantages, corroboration V_required = 3, 200
sources/population, 400 populations/class); receipt
evidence/botnet_coherence_sim_receipt.json,
body_sha256 460ccb48... The closed-form theorem checks are in
scripts/validate_botnet_coherence.py (19/19 PASS, B1-B6),
evidence/botnet_coherence_receipt.json, body_sha256 c1a2c31a...
Three classes: LEGIT (independent), FLASH CROWD (a legitimate event
correlated in a few features -- the deliberate hard negative of
Section 10), and BOTNET (one controller, shared command direction).
Coordination separation (Theorem B1), mean lambda_ratio
(floor 1/d = 0.167):
Class mean lambda_ratio
----------- -----------------
legit 0.213
flash crowd 0.290
botnet 0.662
Detection with V_required = 3 of 8 corroboration (threshold
calibrated to a per-vantage p = 0.05):
botnet detection rate ........ 1.000 (400/400 admitted)
ROC AUC (botnet vs rest) ..... 1.000
legit false-positive rate .... 0.000
flash-crowd false-positive ... 0.115 (reported, not hidden)
overall false-positive ....... 0.0575
The honest, load-bearing finding (Theorem B2 / B2(iii)). The legit
class's per-vantage errors are INDEPENDENT; corroboration collapses
their population false positive to ~0, matching the p^V bound. The
flash crowd's per-vantage errors are CORRELATED (the crowd shares a
real signal), so they SURVIVE corroboration: measured flash-crowd
population FP 0.115 EXCEEDS the independent binomial bound 0.035.
This is B2(iii) observed directly: the corroboration guarantee
requires vantage independence, and the flash crowd is the coherence-
collapse case.
The falsifiability axis resolves it (Theorem B4). Re-testing the
apparent coordination on the machine-regularity subspace M (the
AI-coherence axis of [I-D.melegassi-mvps-ai-coherence]) separates the
crowd from the fleet:
Class mean lambda_ratio|_M (floor 1/|M| = 0.250)
----------- --------------------
legit 0.300
flash crowd 0.299 (collapses to the legit floor)
botnet 0.746 (survives)
Admitting only sources that are coordinated (B1) AND survive the
falsifiability axis (B4) gives:
flash-crowd false-positive ... 0.115 -> 0.000
botnet detection rate ........ 1.000 (unchanged)
So the residual that pure corroboration cannot remove is removed by
the AI-coherence axis -- exactly the "coherence collapses, AI enters"
structure of [I-D.melegassi-mvps-ai-coherence]. These numbers are
synthetic and MUST be reproduced on operational data before any
non-experimental claim; in particular an adversary that fakes
human-crowd statistics on M as well is not defended (Section 5.4
cost-and-limit, Section 10). Sections 7.2 and 7.3 take the first
step of that reproduction on real labelled traffic.
7.2. Real labelled botnet traffic (CTU-13) -- detection
PROVENANCE. The following are MEASURED on real labelled botnet
traffic: the CTU-13 dataset of the Stratosphere IPS Laboratory,
bidirectional flow records [RFC5103] labelled Botnet / Normal /
Background. Five scenarios spanning three malware families were
used: Neris (scenario 9), Rbot (scenarios 4, 11), and Virut
(scenarios 5, 13). Reproducibility: scripts/collect_ctu13_botnet.py
(download + label-preserving reduction; per-capture provenance and
stream SHA in evidence/ctu13_raw/*_meta.json) and
scripts/validate_ctu13_coordination.py; receipts
evidence/ctu13_coordination_receipt_s*.json and the cross-family
summary evidence/ctu13_coordination_combined_receipt.json
(body_sha256 287f8bef...).
Two independent tests are run, deliberately, because the lab captures
contain very few simultaneously-infected hosts (often one), which is
too few to measure across-source coordination directly.
(a) Per-flow detectability (robust to host count). A held-out
Fisher-LDA on the per-flow features (octets, packets, duration,
source/total byte ratio, rate, protocol, destination port -- all
IPFIX Information Elements [RFC7012]) separates Botnet from Normal
flows with:
Family Scenario held-out AUC (Botnet vs Normal)
------ -------- -------------------------------
Neris 9 0.854
Rbot 4 0.966
Rbot 11 0.999
Virut 5 0.929
Virut 13 0.938
Mean AUC 0.937, minimum 0.854, replicated across three families.
This is a real-data DETECTION result; it is not, by itself, a proof
of the B1 coordination MECHANISM.
(b) Coordination signature (Theorem B1), where measurable. For the
across-source leading-eigenvalue ratio lambda_ratio to be meaningful
it MUST be compared to a same-size null, as lambda_ratio inflates
when the source count is small. Drawing equally many random
non-botnet sources (2000 draws) gives a null whose 95th percentile is
the bar to beat:
Scenario #bot hosts lambda_ratio null p95 z vs null
-------- ---------- ------------ -------- ---------
9 Neris 10 0.813 0.711 +2.96
11 Rbot 3 1.000 0.967 +1.69
4,5,13 1 n/a n/a not testable
Where there are enough infected hosts to test (>= 3), the botnet
ratio exceeds the same-size null -- i.e. it is NOT a small-sample
artefact -- strongest in Neris at ~3 sigma. Where a scenario
captured a single infected host, across-source coordination is not
measurable and is reported as such rather than asserted.
HONESTY. This establishes (i) real-data detectability across
families and (ii) a real, null-controlled B1 signature where the
host count permits. It does NOT establish B2 multi-vantage
corroboration on real data: CTU-13 is a single capture point
(Section 10).
7.3. The multi-vantage advantage on measured real effect sizes
PROVENANCE. Measured on the same CTU-13 captures. Reproducibility:
scripts/validate_mvps_advantage_ctu13_real.py; receipt
evidence/mvps_advantage_ctu13_real_receipt.json
(body_sha256 5f67a31d...).
Theorem B5 (No Free Decorrelation) states that the multi-vantage
coherent statistic is spread-invariant with no compute term, so the
detection advantage GROWS with aggregation. Until now this was
shown only on the synthetic z-game. Here its INPUT -- the
per-observation effect size delta -- is measured from real botnet
flows (the held-out separation in normal-sigma units), and the
prediction is checked empirically:
Family delta (sigma) single-flow AUC coherent AUC (K=16)
------ ------------- --------------- -------------------
Neris 1.43 0.854 0.9999
Rbot 19.96 0.999 1.000
Rbot 4.21 0.963 1.000
Virut 2.76 0.939 1.000
Virut 2.32 0.927 1.000
The empirical coherent statistic (aggregating K real flows) meets or
exceeds single-flow detection for every family and approaches 1 as K
grows -- the measured form of the B5 advantage. Instantiating the
z-game with the measured delta, the number of coherent observations
needed for >= 0.99 detection is 20 (Neris), 6-8 (Virut), and 1-3
(Rbot): even the weakest real family is decisively detected by a few
coherent observations, while a single diluted vantage is not.
HONESTY. delta is empirical; the detection-rate model uses the
Gaussian z-game convention (tau, sigma = 1) of the synthetic proof.
This is a real-data INSTANTIATION of the advantage, not a B2
corroboration across independent real observers.
8. Security Considerations
This document introduces no new wire format or cryptographic
primitive; transport security, authentication, replay protection,
and control-plane isolation are inherited from
[I-D.melegassi-coherence-bfd] and
[I-D.melegassi-mvps-ddos-resilience].
Misuse risk. A corroborated-evidence engine could be misused to
justify wrongful blocking. Theorem B3 is the structural mitigation:
MVPS cannot itself block, and the false-positive bound of
Theorem B2 MUST be carried with every evidence object so the
downstream control point can apply policy. Operators MUST NOT
configure automatic host quarantine on single-vantage (V=1)
evidence.
Adversarial decorrelation (A2) is bounded exactly by Theorem B5:
spreading the coordination cannot hide its coherent energy from the
multi-vantage aggregate, and the bound carries no compute term
(AI/quantum cannot lower it). Vantage corruption / blinding (A4) is
bounded by Theorem B6: silent blinding is impossible while the
redundancy rho = V - d_eff >= 1 with diverse vantages, blinding above
that is flagged ("known-blind"), and the only un-flagged corruption
is PQC-gated forgery ([FIPS204], <= 2^-lambda). Adversarial control
of a strict majority of vantages/cells remains out of scope and
inherits the Byzantine bound floor((k-1)/2) of
[I-D.melegassi-mvps-ddos-resilience]; Section 2.6 (RPKI/ROV, SAVI)
is how rho is kept >= 1 in practice.
Evidence forgery is mitigated by the inherited HMAC and monotonic
sequence numbers; an evidence object with a broken signature MUST be
discarded and MUST NOT reach a remediation process.
9. Privacy Considerations
Identifying sources as botnet members is, by construction, the
handling of data about individual endpoints, which may be
personally identifiable. This profile therefore inherits the
privacy requirements of RFC 6561 (Section 5 and its privacy
discussion) and the framework of [RFC6973].
Specifically:
o Per-source evidence MUST be access-controlled and MUST NOT be
published in raw form.
o When shared cross-organisation, evidence SHOULD carry the
minimum necessary fields (source, confidence, window) and
SHOULD follow IODEF [RFC7970] handling/marking.
o The coherence statistics MUST NOT carry user payload.
o Retention of per-source evidence SHOULD be bounded to the
remediation window plus an audit period defined by operator
policy.
10. Operational and Validation Considerations
This document is Experimental. Section 7.2/7.3 already take the
first reproduction step on REAL labelled traffic (CTU-13, three
families): real-data detectability (AUC 0.85-0.999) and a
null-controlled B1 signature where the host count permits. The
following remain REQUIRED before any progression or any
non-experimental claim:
o Close Theorem B2 on real data: corroborate the SAME event
across THREE OR MORE INDEPENDENT real vantages. CTU-13 is a
single capture point, so it demonstrates detection and the B1
coordination signature but NOT multi-vantage corroboration.
Suitable sources include a network telescope/darknet, multi-
provider IPFIX [RFC7011] export, or an operator's own labelled
incident observed from >= 3 collectors.
o Reproduce the B1 across-source signature on a capture with MANY
simultaneously-infected hosts (CTU-13 lab scenarios have few),
so the leading-eigenvalue test has high statistical power
across families, not only the ~3 sigma seen on Neris.
o Calibrate the per-vantage false-positive rate p and the
vantage correlation rho on that data; Theorem B2 is only as
good as the measured (p, rho).
o Confirm that the low-rank premise of Section 4 holds for real
coordinated populations and does NOT spuriously hold for
large flash-crowd legitimate events (the principal expected
false-positive source). On synthetic ground truth the botnet
is perfectly separated (ROC AUC 1.0, Section 7); the flash
crowd's 0.115 corroborated false positive is closed by the
falsifiability axis (Theorem B4) ON SYNTHETIC DATA, but the
feature partition H/M and the machine-regularity thresholds
MUST be calibrated and the closure REPRODUCED on operational
traces. Whether a real adversary can match human-crowd
statistics on the machine-regularity subspace M (defeating B4)
is the open adversarial question.
o Where vantages may be compromised, deploy the geometric-median
aggregation and SUSPECTED_BYZANTINE attribution of
[I-D.melegassi-mvps-ai-coherence] (Section 5.5) and measure the
realised breakdown fraction.
o Verify the BCP 38 tagging path end to end, since host-level
remediation of a spoofed source is both useless and harmful.
Manageability: implementations SHOULD expose counters for
sources_identified, mean_V_at_admission, estimated_P_fp,
spoof_suspect_count, and evidence_objects_emitted.
11. IANA Considerations
All packet formats, TLVs, and code points are inherited from
[I-D.melegassi-coherence-bfd] and
[I-D.melegassi-mvps-ddos-resilience]; this document requests none.
This document requests, upon adoption, registration of the YANG
module "catellix-mvps-botnet" (Section 6.4) in the "YANG Module
Names" registry, with a namespace URI of the form
urn:ietf:params:xml:ns:yang:mvps-botnet. Pending that assignment the
module is non-normative, consistent with the export module of
[I-D.melegassi-opsawg-mvps-telemetry-export].
12. References
12.1. Normative References
[I-D.melegassi-ippm-mvps-bundle]
Melegassi, L., "Multi-Vantage Path Synchrony Bundle
Envelope and Vector Algebra",
draft-melegassi-ippm-mvps-bundle-00, May 2026.
[I-D.melegassi-coherence-bfd]
Melegassi, L., "Coherence-BFD: Sub-Tick Coherence
Detection over BFD Mechanisms",
draft-melegassi-coherence-bfd-00, May 2026.
[I-D.melegassi-mvps-incremental-be]
Melegassi, L., "Incremental Bandwidth-Efficient
Multi-Vantage Path Synchrony (BE-MVPS): Cell-Partitioned
Coherence with epsilon-Gated Sherman-Morrison Updates",
draft-melegassi-mvps-incremental-be-00, May 2026.
[I-D.melegassi-mvps-ddos-resilience]
Melegassi, L., "Volume-Independent DDoS Detection via
Coherence-BFD: The MVPS DDoS Resilience Profile",
draft-melegassi-mvps-ddos-resilience-00, May 2026.
[I-D.melegassi-mvps-ai-coherence]
Melegassi, L., "MVPS AI-Coherence Extension: Semantic,
Byzantine, and Infrastructure-Cognitive Coherence for
AI-Serving Network Deployments",
draft-melegassi-mvps-ai-coherence-01, May 2026.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2827] Ferguson, P. and D. Senie, "Network Ingress Filtering:
Defeating Denial of Service Attacks which employ IP
Source Address Spoofing", BCP 38, RFC 2827, May 2000.
[RFC3704] Baker, F. and P. Savola, "Ingress Filtering for
Multihomed Networks", BCP 84, RFC 3704, March 2004.
[RFC6561] Livingood, J., Mody, N., and M. O'Reirdan,
"Recommendations for the Remediation of Bots in ISP
Networks", RFC 6561, March 2012.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2119 Key Words", BCP 14, RFC 8174, May 2017.
12.2. Informative References
[RFC4732] Handley, M., Ed., Rescorla, E., Ed., and IAB, "Internet
Denial-of-Service Considerations", RFC 4732,
November 2006.
[RFC5103] Trammell, B. and E. Boschi, "Bidirectional Flow Export
Using IP Flow Information Export (IPFIX)", RFC 5103,
January 2008.
[RFC6480] Lepinski, M. and S. Kent, "An Infrastructure to Support
Secure Internet Routing", RFC 6480, February 2012.
[RFC6545] Moriarty, K., "Real-time Inter-network Defense (RID)",
RFC 6545, April 2012.
[RFC6811] Mohapatra, P., Scudder, J., Ward, D., Bush, R., and R.
Austein, "BGP Prefix Origin Validation", RFC 6811,
January 2013.
[RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken,
"Specification of the IP Flow Information Export (IPFIX)
Protocol for the Exchange of Flow Information", STD 77,
RFC 7011, September 2013.
[RFC7012] Claise, B., Ed. and B. Trammell, Ed., "Information Model
for IP Flow Information Export (IPFIX)", RFC 7012,
September 2013.
[RFC7039] Wu, J., Bi, J., Bagnulo, M., Baker, F., and C. Vogt, Ed.,
"Source Address Validation Improvement (SAVI) Framework",
RFC 7039, October 2013.
[RFC8210] Bush, R. and R. Austein, "The Resource Public Key
Infrastructure (RPKI) to Router Protocol, Version 1",
RFC 8210, September 2017.
[FIPS204] National Institute of Standards and Technology,
"Module-Lattice-Based Digital Signature Standard
(ML-DSA)", FIPS 204, August 2024.
[RFC6973] Cooper, A., Tschofenig, H., Aboba, B., Peterson, J.,
Morris, J., Hansen, M., and R. Smith, "Privacy
Considerations for Internet Protocols", RFC 6973,
July 2013.
[RFC7970] Danyliw, R., "The Incident Object Description Exchange
Format Version 2", RFC 7970, November 2016.
[RFC8520] Lear, E., Droms, R., and D. Romascanu, "Manufacturer
Usage Description Specification", RFC 8520, March 2019.
[RFC8727] Takahashi, T., Suzuki, M., and R. Danyliw, "JSON
Binding of the Incident Object Description Exchange
Format", RFC 8727, August 2020.
[RFC8783] Boucadair, M., Ed. and T. Reddy.K, Ed., "Distributed
Denial-of-Service Open Threat Signaling (DOTS) Data
Channel Specification", RFC 8783, May 2020.
[RFC8811] Mortensen, A., Reddy.K, T., and R. Moskowitz, "DDoS
Open Threat Signaling (DOTS) Architecture", RFC 8811,
August 2020.
[RFC9132] Boucadair, M., Ed., Shallow, J., and T. Reddy.K,
"Distributed Denial-of-Service Open Threat Signaling
(DOTS) Signal Channel Specification", RFC 9132,
September 2021.
[RFC9244] Boucadair, M., Ed., Reddy.K, T., Ed., Doron, E., Chen,
M., and J. Shallow, "Distributed Denial-of-Service Open
Threat Signaling (DOTS) Telemetry", RFC 9244, June 2022.
[RFC8641] Clemm, A. and E. Voit, "Subscription to YANG
Notifications for Datastore Updates", RFC 8641,
September 2019.
[RFC8785] Rundgren, A., Jordan, B., and S. Erdtman, "JSON
Canonicalization Scheme (JCS)", RFC 8785, June 2020.
[RFC9424] Paine, K., Whitehouse, O., Sellwood, J., and A. Shaw,
"Indicators of Compromise (IoCs) and Their Role in
Attack Defence", RFC 9424, August 2023.
[I-D.melegassi-opsawg-mvps-telemetry-export]
Melegassi, L., "Exporting MVPS Coherence Events over
Standard Telemetry Channels (syslog, IPFIX, YANG-Push)",
draft-melegassi-opsawg-mvps-telemetry-export-00, May 2026.
[I-D.melegassi-opsawg-mvps-yang-model]
Melegassi, L., "A YANG Data Model for Multi-Vantage Path
Snapshots (MVPS)",
draft-melegassi-opsawg-mvps-yang-model-00, May 2026.
[I-D.melegassi-mvps-perfsec-coupling]
Melegassi, L., "MVPS Performance-Security Coupling
Profile", draft-melegassi-mvps-perfsec-coupling-00,
May 2026.
[I-D.melegassi-santos-ippm-mvps-cwt]
Melegassi, L. and Santos, "MVPS Trust Profile: Coherent-
Witness Trust (CWT)",
draft-melegassi-santos-ippm-mvps-cwt-00, May 2026.
Appendix A. Reproducibility (validators, simulations, receipts)
Every claim in this document is either an algebraic identity checked
by a validator, or a labelled simulation with a signed receipt.
Theorem checks (closed form), B1-B6:
scripts/validate_botnet_coherence.py 19/19 PASS, exit 0
evidence/botnet_coherence_receipt.json body_sha256 c1a2c31a...
Labelled ground-truth detection experiment (incl. the B4
falsifiability-axis resolution of the flash-crowd residual):
scripts/simulate_botnet_coherence.py
evidence/botnet_coherence_sim_receipt.json
body_sha256 460ccb48...
docs/SIM_BOTNET_RESULTS.txt
Adversarial red-team, Monte-Carlo companion to B5/B6
(spreading, blinding, Byzantine centroid -- all defended):
scripts/simulate_botnet_redteam.py
evidence/botnet_redteam_receipt.json
body_sha256 d9a59fd8...
docs/SIM_BOTNET_REDTEAM_RESULTS.txt
Canonical export:
schema/catellix-mvps-botnet.yang (notification model)
schema/mvps-botnet-evidence.schema.json (JSON Schema 2020-12)
evidence/botnet_evidence_example.json (worked instance)
Real labelled botnet traffic (CTU-13, Stratosphere IPS Lab;
Section 7.2) -- collection, detection, and B1 with same-size null
(scenarios 9 Neris; 4,11 Rbot; 5,13 Virut):
scripts/collect_ctu13_botnet.py
evidence/ctu13_raw/ctu13_s*_meta.json (source URL + stream SHA)
scripts/validate_ctu13_coordination.py
evidence/ctu13_coordination_receipt_s9.json a046e86a...
evidence/ctu13_coordination_receipt_s11.json e26fb4f3...
evidence/ctu13_coordination_receipt_s4.json 57b9b36c...
evidence/ctu13_coordination_receipt_s5.json 94bc9038...
evidence/ctu13_coordination_receipt_s13.json 49a5fe3d...
evidence/ctu13_coordination_combined_receipt.json
body_sha256 287f8bef...
Multi-vantage advantage (Theorem B5) instantiated on measured
real effect sizes (Section 7.3):
scripts/validate_mvps_advantage_ctu13_real.py
evidence/mvps_advantage_ctu13_real_receipt.json
body_sha256 5f67a31d...
Honest negative (free public threat feeds observe DISJOINT
populations and so cannot, alone, corroborate B2 on real data --
the basis for the Section 10 requirement):
scripts/collect_threat_feeds.py
scripts/validate_real_botnet_coherence.py
evidence/real_botnet_coherence_receipt.json
body_sha256 455bc967...
The CTU-13 dataset is the labelled botnet corpus of Garcia et al.,
"An empirical comparison of botnet detection methods", Computers &
Security, 2014, distributed by the Stratosphere IPS Laboratory at
https://www.stratosphereips.org/datasets-ctu13. The collector
records the exact per-capture source URL and a streaming SHA-256 in
evidence/ctu13_raw/*_meta.json so the reduction is auditable; raw
per-source evidence is kept private per Section 9 and is NOT part of
any public page.
The body_sha256 of each receipt is computed over the JCS [RFC8785]
serialization of the receipt body BEFORE any environment-specific
field is attached, so it reproduces bit-for-bit on any machine.
Appendix B. Implementation and Deployment Guidance
This appendix is informative. It describes a minimal, standards-
based way to implement the profile; it adds no normative requirement
beyond the body of the document.
B.1. Components
o Vantages (>= V, diverse): existing flow exporters. Each emits
IPFIX [RFC7011] records (bidirectional biflows [RFC5103] are
preferred) and signs its telemetry with a post-quantum identity
(ML-DSA, [FIPS204]) so the non-blinding gate of Theorem B6 holds.
o Broker: collects per-vantage records, computes the coherence and
coordination statistics, applies corroboration, and emits evidence
objects (Section 6.1). The broker NEVER actuates (Theorem B3).
o Control points: the operator's existing RFC 6561 process, DOTS
[RFC9132]/[RFC9244] server, MUD [RFC8520] enforcement point, and
BCP 38 ingress filters. These -- not the broker -- remediate.
B.2. Per-window computation (the broker)
For each observation window of width (M-1)*T_tick:
1. Ingest IPFIX records per vantage; the features are the Information
Elements [RFC7012] of Section 7.2 (octets, packets, duration,
source/total byte ratio, rate, protocol, destination port).
2. Per candidate source, form its feature vector and score it on each
vantage; a per-vantage flag uses the calibrated per-vantage
false-positive rate p (Section 7.1, p = 0.05 in reference run).
3. Coordination test (Theorem B1): assemble the across-source matrix
and compute the leading-eigenvalue ratio lambda_ratio. Compare it
to a SAME-SIZE null (equally many non-candidate sources, >= 1000
draws); admit the coordination signal only if it exceeds the null
95th percentile. This null control is mandatory because
lambda_ratio inflates for small source counts (Section 7.2).
4. Corroboration (Theorem B2): admit a source only if V_required of V
independent vantages agree (V_required = 3 in the reference run).
Carry the estimated P_fp and the measured vantage correlation rho.
5. Falsifiability re-test (Theorem B4): for sources that look
coordinated, re-run step 3 on the machine-regularity subspace M;
a flash crowd collapses to the independent floor and is dropped, a
bot fleet survives.
6. Integrity (Theorem B6): compute the redundancy rho = V - d_eff;
if rho < 1, raise a "known-blind" alarm rather than emitting
silent results; verify each vantage's [FIPS204] signature and
discard forged or unsigned reports.
B.3. Calibration
The bounds are only as good as their measured inputs. Before
production, calibrate on local data: the per-vantage p, the vantage
correlation rho (Theorem B2 degrades from p^V toward the stated
mixture bound as rho rises), the H/M feature partition and the
machine-regularity thresholds of Theorem B4, and the detection
threshold tau. Section 7.3 shows how the per-flow effect size delta
maps to the number of coherent observations needed for a target
detection probability.
B.4. Export and hand-off
Emit each admitted source as the Section 6.1 evidence object: the
YANG notification "mvps-botnet-evidence" via YANG-Push [RFC8641], or
the equivalent JSON (stable evidence_id = SHA-256(JCS [RFC8785])),
embeddable as an IoC [RFC9424] in an IODEF [RFC7970] document (JSON
binding [RFC8727]). Route per Section 6.2/6.3: host-actionable and
BCP 38-validated sources to the RFC 6561 process; active floods to a
DOTS [RFC9132] request with coordination metrics pre-staged as DOTS
telemetry [RFC9244]; constrained devices to their MUD [RFC8520]
enforcement point; spoof-suspect sources to traffic-level mitigation
and BCP 84 follow-up, NEVER to host remediation.
B.5. Manageability
Expose the counters of Section 10 (sources_identified,
mean_V_at_admission, estimated_P_fp, spoof_suspect_count,
evidence_objects_emitted) plus the redundancy rho and the
known-blind alarm state, so operators can see when the corroboration
guarantee is in force and when it is not.
Acknowledgements
This profile was prompted by the observation that the MVPS DDoS
profile attributes an attack to a region but stops short of
identifying participating sources, and by the recognition that
RFC 6561 already names multi-point corroboration -- with false-
positive avoidance -- as the hard part of bot remediation. The
author thanks the IETF OPSEC, DOTS, and MILE communities for the
standards this document is careful to build on rather than
duplicate.
Author's Address
Leonardo Melegassi
Catellix
Andradina, SP
Brazil
Email: melegassi@catellix.com
URI: https://catellix.com/