ML-KEM & BIKE Pass Deep Learning IND-CPA Security Test

Last updated: 2025

Key Takeaways

Researchers applied deep neural network (DNN) distinguishers to ML-KEM, BIKE, and HQC — the three NIST-selected or candidate post-quantum KEMs — and found zero exploitable ciphertext patterns at a significance level of α = 0.01

Hybrid KEM constructions combining PQC primitives with RSA, RSA-OAEP, and even plaintext all preserved IND-CPA indistinguishability, provided at least one component was IND-CPA-secure

For your security team, this means empirical DNN-based validation can now complement formal proofs during PQC migration — closing the gap between theoretical guarantees and real-world implementation correctness

[IMAGE: Macro photograph of a quantum processor chip with entangled cyan light beams refracting through crystalline lattice structures, deep black background, cinematic lighting, 8K, dark futuristic aesthetic, no text or human faces]

The Implementation Gap That Keeps CISOs Awake

Your organization’s PQC migration plan probably references NIST FIPS 203 (ML-KEM) and points to formal security proofs as the foundation of trust. Those proofs are necessary — but they validate the mathematical construction, not your specific implementation running on your specific hardware stack with your specific hybrid configuration.

Consider this scenario: your team deploys a hybrid KEM that wraps ML-KEM around a legacy RSA-OAEP layer for backward compatibility during a phased migration. The formal proof says this should be secure. But has anyone run an adversarial distinguisher against the actual ciphertext output of that composition? If an attacker can train a classifier to distinguish your ciphertexts from random noise — even with marginal statistical advantage — your IND-CPA assumption breaks, and with it, the confidentiality guarantee your board signed off on.

This is precisely the gap that researchers behind arXiv:2604.06942 set out to close.

What the Research Actually Did

The paper — “Evaluating PQC KEMs, Combiners, and Cascade Encryption via Adaptive IND-CPA Testing Using Deep Learning” (arXiv:2604.06942) — reframes the classical IND-CPA security game as a binary classification problem. Instead of a theoretical adversary, the researchers deploy a deep neural network trained on labeled ciphertext data using Binary Cross-Entropy (BCE) loss. If the DNN can learn to distinguish real ciphertexts from random with better-than-chance accuracy, the scheme fails the empirical test.

The scope covers three distinct evaluation targets:

1. Standalone PQC KEMs — the PKE schemes underlying ML-KEM, BIKE, and HQC

2. Hybrid KEM combinations — PQC KEMs composed with plain RSA, RSA-OAEP, and plaintext

3. Cascade symmetric encryption — sequential combinations of AES-CTR, AES-CBC, AES-ECB, ChaCha20, and DES-ECB

The methodology is described by the authors as “adaptive, practical, and versatile” — designed to function as an empirical estimator for indistinguishability that complements, rather than replaces, analytical security analysis.

“No algorithm or combination of algorithms demonstrates a significant advantage (two-sided binomial test, significance level α = 0.01), consistent with theoretical guarantees that hybrids including at least one IND-CPA-secure component preserve indistinguishability, and with the absence of exploitable patterns under the considered DNN adversary model.” — arXiv:2604.06942

Technical Deep-Dive: How DNN Distinguishers Work Against KEMs

Modeling the IND-CPA Game as Classification

The IND-CPA game has a precise structure: a challenger encrypts one of two adversary-chosen messages, and the adversary must guess which. The researchers map this directly onto supervised learning. The DNN receives ciphertext samples labeled by which message was encrypted and attempts to learn a decision boundary. A network that converges to ~50% accuracy on held-out data confirms indistinguishability — the ciphertexts are computationally indistinguishable from random.

The two-sided binomial test at α = 0.01 provides the statistical threshold. This is a conservative significance level: it requires strong evidence of distinguishability before flagging a failure, reducing false positives while maintaining sensitivity to real weaknesses.

Results by Algorithm Category

Scheme / Combination	Type	DNN Distinguisher Result	IND-CPA Status
ML-KEM (underlying PKE)	PQC KEM	No significant advantage	✅ Passes
BIKE (underlying PKE)	PQC KEM	No significant advantage	✅ Passes
HQC (underlying PKE)	PQC KEM	No significant advantage	✅ Passes
ML-KEM + RSA-OAEP (hybrid)	Hybrid KEM	No significant advantage	✅ Passes
ML-KEM + plain RSA (hybrid)	Hybrid KEM	No significant advantage	✅ Passes
AES-CTR + AES-CBC (cascade)	Symmetric	No significant advantage	✅ Passes
AES-ECB + ChaCha20 (cascade)	Symmetric	No significant advantage	✅ Passes
DES-ECB combinations	Symmetric	No significant advantage	✅ Passes

The result holds even for hybrid constructions that include RSA without OAEP padding — a configuration that would concern most security architects. The theoretical explanation is consistent with the data: as long as one component of the hybrid is IND-CPA-secure, the composition preserves indistinguishability.

What the Methodology Cannot Yet Tell You

The research has documented gaps that matter for enterprise deployment decisions:

No DNN architecture specifics: Layer count, parameter count, and training dataset volume are not disclosed, making reproducibility difficult
No classical baseline: Results are not benchmarked against NIST randomness test suites (SP 800-22), so relative sensitivity is unknown
No adversarial implementation testing: The methodology was not applied to side-channel-leaking or adversarially crafted implementations — the scenario most relevant to real-world attacks
No false positive/negative rates: Without these, calibrating the test’s sensitivity for your own audit pipeline requires additional empirical work
Quantum adversary model excluded: The DNN adversary is classical; the framework does not address distinguishability under quantum query models

These gaps do not invalidate the findings — they define the boundaries of what this empirical tool can currently certify. Organizations should treat DNN-based distinguisher results as one layer of a defense-in-depth validation stack, not a standalone security certification.

Industry Context: Where PQC Validation Stands Right Now

Regulatory Timeline Pressure

NIST finalized FIPS 203 (ML-KEM), FIPS 204 (ML-DSA), and FIPS 205 (SLH-DSA) in August 2024. The agency’s guidance calls for organizations to begin deprecating RSA and ECC in key exchange by 2030, with a hard cutoff for most federal systems. CISA’s post-quantum roadmap echoes this timeline and explicitly requires agencies to inventory cryptographic assets and begin migration planning immediately.

For enterprises in regulated sectors — financial services, healthcare, critical infrastructure — the compliance burden is not abstract. Auditors will ask not just which algorithms you use, but how you validated your implementations. Empirical testing frameworks like the one in arXiv:2604.06942 provide exactly the kind of documented, reproducible evidence that compliance narratives require.

Who Is Moving and Who Is Lagging

Google announced TLS 1.3 hybrid key exchange using X25519Kyber768 in Chrome in 2023 and has since expanded deployment. Cloudflare reported post-quantum key exchange active on a significant portion of its network traffic. Signal updated its protocol to use PQXDH (combining X25519 with CRYSTALS-Kyber) in September 2023.

The laggards are predominantly mid-market enterprises and organizations with large legacy PKI footprints — exactly the organizations most likely to deploy hybrid KEMs during a phased migration and most in need of empirical validation tooling.

The Cost of Skipping Empirical Validation

Formal proofs guarantee security of the mathematical construction. They do not guarantee security of your compiled binary, your TLS library version, your HSM firmware, or your hybrid composition logic. Implementation flaws in cryptographic code have historically been the primary attack surface — not breaks in the underlying mathematics. The ROBOT attack (2017) exploited a 19-year-old RSA padding oracle in implementations that were theoretically sound. Empirical testing catches this class of vulnerability.

The BeQuantum Perspective: Bridging Proof and Practice

At BeQuantum, we observe a consistent pattern across enterprise PQC migrations: organizations trust the NIST algorithm selections — correctly — but underinvest in validating how those algorithms behave in their specific deployment context. The arXiv:2604.06942 methodology formalizes what our own Digital Notary and PQC Layer implementations have long required: a data-driven checkpoint between theoretical security and production deployment.

Our PQC Layer, for example, supports hybrid KEM configurations that combine ML-KEM with legacy RSA-OAEP for organizations that cannot execute a hard cutover. The research confirms that this class of hybrid preserves IND-CPA security — but our validation pipeline goes further, running implementation-level ciphertext analysis against the specific library versions and hardware configurations in each customer environment. The DNN distinguisher framework described in arXiv:2604.06942 represents the kind of tool that belongs in that pipeline as a standardized component.

For organizations using IceCase hardware security modules, the implication is direct: empirical distinguisher testing should run against ciphertext output from the HSM itself, not just from software reference implementations. Side-channel behavior at the hardware layer can introduce statistical artifacts that software-level proofs cannot anticipate.

What Your Team Should Do in the Next 90 Days

Step 1: Inventory your hybrid KEM configurations (Days 1–30) Audit every TLS endpoint, VPN gateway, and key exchange protocol in your environment. Identify which are running hybrid constructions — PQC combined with RSA or ECDH — and document the specific library versions and parameter sets. This inventory is the prerequisite for any empirical validation effort and is also required for NIST compliance documentation.

Step 2: Establish a ciphertext indistinguishability baseline (Days 31–60) Using the framework described in arXiv:2604.06942 as a reference, work with your cryptographic engineering team or a qualified third party to run DNN-based distinguisher tests against ciphertext samples from your actual implementations. Prioritize hybrid KEM configurations and any cascade encryption layers. Document results at α = 0.01 significance threshold to align with the research methodology.

Step 3: Integrate empirical testing into your cryptographic CI/CD pipeline (Days 61–90) One-time testing is insufficient — library updates, configuration changes, and hardware firmware upgrades can all alter ciphertext statistical properties. Build distinguisher testing into your deployment pipeline so that any change to a cryptographic component triggers an automated empirical validation run before promotion to production.

FAQ

Q: Does passing a DNN distinguisher test mean a PQC implementation is fully secure? A: No. DNN-based IND-CPA testing validates ciphertext indistinguishability under a classical, passive adversary model — it does not cover side-channel attacks, active chosen-ciphertext attacks (IND-CCA2), or quantum adversaries. A passing result is a necessary but not sufficient condition for production security. Pair it with formal verification, penetration testing, and hardware-level side-channel analysis.

Q: If ML-KEM already has a formal security proof, why run empirical tests at all? A: Formal proofs cover the mathematical construction. Your deployment runs a specific implementation — a compiled library, a specific parameter set, a specific hybrid composition — none of which are directly covered by the proof. The ROBOT attack demonstrated that theoretically sound RSA implementations can be exploited through implementation-level padding oracle vulnerabilities. Empirical testing catches this class of gap.

Q: Which hybrid KEM configuration should we use during a phased PQC migration? A: The research confirms that any hybrid including at least one IND-CPA-secure component preserves indistinguishability. For most enterprises, ML-KEM combined with X25519 (following the pattern in RFC 9180 hybrid KEM) provides the strongest migration path — it maintains compatibility with existing ECDH infrastructure while adding post-quantum security. Avoid hybrid configurations that combine two non-IND-CPA-secure components, such as plain RSA without OAEP padding paired with an unvalidated PQC scheme.