SentinelAgent: Securing AI Delegation Chains in Federal Systems

Last updated: June 2025

Key Takeaways

SentinelAgent’s Intent-Preserving Delegation Protocol blocks 30/30 adversarial attacks with 0 false positives across 516 benchmark scenarios spanning 13 federal domains

Intent verification — the one probabilistic property in the framework — degrades to 13% TPR against sophisticated paraphrasing attacks, exposing a structural gap that no current LLM-based approach can close deterministically

For security architects deploying multi-agent AI in regulated environments, this research establishes the first formally verified delegation chain calculus with mechanical proof across 2.7 million system states

[IMAGE: A dark cinematic macro shot of interconnected glowing nodes forming a hierarchical chain structure, with teal authorization tokens flowing between agent nodes against a deep black background, representing AI delegation chain verification]

The Authorization Gap No One Is Talking About

Your organization deploys an AI orchestration layer. The primary agent receives a task, delegates subtasks to three specialized agents, each of which spawns additional tool-calling agents. By the time an API call hits your financial data endpoint, it has passed through five delegation hops — and your authorization framework has verified exactly zero of them.

This is the current state of multi-agent AI security in federal and enterprise environments. OAuth governs human-to-system access. ABAC governs resource permissions. But agent-to-agent delegation — the authorization chain that determines what a spawned AI agent is actually permitted to do on behalf of its parent — operates in a verification vacuum.

Researchers at arXiv (paper: arXiv:2604.02767v1) published SentinelAgent, a framework that attempts to close this gap through formal mathematics, runtime enforcement, and adversarial benchmarking. The results are instructive — and the failure modes are equally important to understand.

What Delegation Chain Calculus Actually Defines

Delegation Chain Calculus (DCC) is a formal framework that defines seven properties every agent-to-agent authorization handoff must satisfy. Six are deterministic. One is probabilistic. That distinction carries significant security implications.

The six deterministic properties:

Authority narrowing — a delegated agent cannot hold broader permissions than its delegator
Policy preservation — organizational policies propagate intact through every delegation hop
Forensic reconstructibility — every delegation decision produces an auditable trace
Cascade containment — a compromised agent cannot trigger unbounded downstream delegation
Scope-action conformance — agents can only invoke APIs within their declared scope
Output schema conformance — agent outputs must match declared schemas

The one probabilistic property:

Intent preservation — the delegated task semantically matches what the original principal intended

The DCC framework includes four meta-theorems and one proposition that collectively establish a critical finding: deterministic intent verification is practically infeasible. You cannot write a rule engine that reliably determines whether a natural language task description has been semantically corrupted. This is not a limitation of current technology — the paper frames it as a structural property of the problem.

“Even when intent verification is evaded, the remaining six properties constrain the adversary to permitted API calls, conformant outputs, traceable actions, bounded cascades, and compliant behavior.”

This is the security architecture’s load-bearing insight. The system does not rely on intent verification to prevent catastrophic outcomes. It relies on the six deterministic properties to bound the blast radius when intent verification fails.

How IPDP Enforces These Properties at Runtime

The Intent-Preserving Delegation Protocol (IPDP) is the runtime enforcement mechanism. Its core design decision: the Delegation Authority Service (DAS) that enforces these properties is explicitly non-LLM based.

This matters. An LLM-based enforcement layer would inherit the same semantic ambiguity vulnerabilities it is trying to detect. The DAS instead applies deterministic rule evaluation for properties P1 and P3-P7, and a fine-tuned Natural Language Inference (NLI) model for property P2 (intent preservation).

The three-point verification lifecycle operates as follows:

Pre-delegation check: Before any agent spawns a subtask, the DAS validates that the proposed delegation satisfies authority narrowing and scope conformance
Runtime monitoring: Active API calls are checked against declared scope in real time
Post-execution audit: Output schema conformance and forensic trace completeness are verified before results propagate upward

TLA+ Mechanical Verification

Properties P1 and P3-P7 were verified using TLA+ model checking across 2.7 million system states with zero violations. For security architects who have worked with formal methods, this is a meaningful claim — TLA+ exhaustively explores the state space you define, and 2.7 million states represents substantial coverage of realistic multi-agent interaction patterns.

The caveat: TLA+ verification is bounded by the model you construct. Real-world deployments will introduce state combinations the model did not anticipate. The verification establishes that the protocol is internally consistent, not that it is complete against all possible deployment configurations.

The NLI Fine-Tuning Gap

Intent preservation (P2) uses an NLI model fine-tuned on 190 government delegation examples. Before fine-tuning, P2 achieved 1.7% TPR — effectively random. After fine-tuning, it reached 88.3% TPR with F1=82.1% under 5-fold cross-validation.

Against sophisticated paraphrasing attacks under black-box adversarial conditions, P2 degrades to 13% TPR.

This is the number your threat model needs to account for. A motivated adversary who understands that intent verification is the probabilistic weak point — and who constructs delegation requests using semantic paraphrasing to evade NLI classification — will succeed at bypassing P2 roughly 87% of the time.

Benchmark Results: What DelegationBench v4 Reveals

DelegationBench v4 is the evaluation dataset used to test SentinelAgent. It contains 516 scenarios across 10 attack categories and 13 federal domains.

Metric	Result	Conditions
Combined TPR	100%	Full DelegationBench v4 (516 scenarios)
Combined FPR	0%	Full DelegationBench v4 (516 scenarios)
Attacks blocked (black-box)	30/30	Black-box adversarial conditions
False positives (black-box)	0	Black-box adversarial conditions
P2 TPR (pre-fine-tuning)	1.7%	Standard evaluation
P2 TPR (post-fine-tuning)	88.3%	Standard evaluation
P2 TPR (adversarial paraphrasing)	13%	Sophisticated paraphrasing attacks
NLI F1 score	82.1%	5-fold cross-validation
TLA+ states verified	2.7 million	Zero violations

The 100% combined TPR headline requires careful interpretation. It aggregates performance across all seven properties. The six deterministic properties perform at or near 100% by design — they are rule-based, not probabilistic. The headline metric is therefore dominated by the properties that are easiest to verify mechanically. The 13% P2 TPR under adversarial conditions is the number that reflects real-world adversarial exposure.

The 100% combined TPR figure aggregates across all seven DCC properties — six of which are deterministic and mechanically verified. Security architects should evaluate P2’s 13% TPR under adversarial paraphrasing as the operative metric for threat modeling against motivated adversaries.

Regulatory and Compliance Implications

Federal AI deployments operate under Executive Order 14110 on Safe, Secure, and Trustworthy AI, NIST AI RMF 1.0, and emerging OMB guidance on AI governance. None of these frameworks currently specify delegation chain verification requirements for multi-agent systems — because the problem has not been formally characterized until recently.

SentinelAgent’s DCC framework represents the first published formal calculus for this problem domain. Its adoption trajectory matters for compliance architects:

Near-term (1-2 years): Federal agencies deploying multi-agent AI systems face authorization accountability gaps with no standardized remediation path. IPDP and the DAS provide a deployable runtime enforcement mechanism that can be integrated into existing federal AI architectures. The forensic reconstructibility property (P3) directly addresses audit trail requirements under FISMA and FedRAMP.

Medium-term (3-5 years): DelegationBench v4’s 13 federal domain coverage positions it as a potential evaluation standard for government AI procurement. Agencies evaluating multi-agent AI vendors may begin requiring DelegationBench performance disclosure, similar to how Common Criteria evaluations function for traditional security products.

Long-term (5+ years): The paper’s formal proof of deterministic intent verification infeasibility may shape policy frameworks to explicitly rely on deterministic property enforcement rather than semantic intent verification. This would represent a significant shift in how AI authorization standards are written — moving from intent-based to constraint-based authorization models.

The economic calculus is straightforward. A delegation chain compromise in a federal multi-agent system — where an adversary manipulates a subtask agent to exfiltrate data within its permitted API scope — produces audit findings, remediation costs, and potential FedRAMP authorization suspension. The cost of integrating a DAS into an existing architecture is bounded. The cost of a delegation chain breach in a regulated environment is not.

The BeQuantum Perspective: Verification Chains Require Cryptographic Anchoring

SentinelAgent’s forensic reconstructibility property (P3) establishes that every delegation decision must produce an auditable trace. The paper does not specify how that trace is protected against tampering — and this is where the framework’s practical deployment surface intersects with cryptographic verification infrastructure.

A delegation audit trail stored in a mutable database is not a forensic record. It is a log. The distinction matters when that record is subpoenaed, presented in a compliance audit, or used to reconstruct an incident timeline. An adversary who compromises the logging infrastructure can retroactively alter the delegation chain record.

Organizations deploying IPDP in production environments should treat the delegation audit trail as a document requiring cryptographic notarization at the point of creation. BeQuantum’s Digital Notary infrastructure applies post-quantum cryptographic signatures to timestamped records, producing tamper-evident audit artifacts that remain verifiable against future cryptographic attacks — including those from quantum-capable adversaries.

The integration pattern is direct: each DAS authorization decision generates a structured delegation record, which is submitted to the Digital Notary for PQC-signed timestamping before being written to the audit store. The resulting chain of signed records satisfies P3’s forensic reconstructibility requirement while providing cryptographic guarantees that the paper’s framework does not address.

For organizations operating under FISMA or FedRAMP, this combination — IPDP for runtime enforcement, PQC-anchored notarization for audit integrity — addresses both the authorization accountability gap and the long-term cryptographic validity of the forensic record.

What You Should Do Next

Within 30 days: Map your delegation surface. Inventory every multi-agent AI workflow in your environment. For each workflow, document the delegation hops: which agent spawns which, what permissions are passed, and what APIs are accessible at each hop. Most organizations discover they cannot answer these questions — which is itself a finding.

Within 90 days: Evaluate DAS integration feasibility. The SentinelAgent paper (arXiv:2604.02767) provides the formal specification for the Delegation Authority Service. Assess whether your current AI orchestration layer (LangGraph, AutoGen, CrewAI, or custom) exposes the interception points required to enforce pre-delegation checks and runtime API monitoring. Identify the integration complexity before committing to a deployment timeline.

Within 180 days: Establish cryptographically anchored audit trails. If you deploy IPDP or any delegation chain enforcement mechanism, ensure the audit records it produces are protected by tamper-evident cryptographic signatures. A delegation audit trail that can be altered is not a compliance artifact — it is a liability. Evaluate PQC-signed notarization infrastructure to ensure those records remain verifiable as cryptographic standards evolve.

Frequently Asked Questions

Q: If intent verification only achieves 13% TPR against sophisticated paraphrasing attacks, is the SentinelAgent framework actually secure?

A: Yes, with an important qualification. The framework’s security model does not depend on intent verification to prevent catastrophic outcomes. The six deterministic properties — authority narrowing, policy preservation, scope-action conformance, output schema conformance, cascade containment, and forensic reconstructibility — constrain what a compromised agent can do even when intent verification is bypassed. An adversary who evades P2 is still limited to permitted API calls, conformant outputs, and bounded cascades. The 13% TPR figure defines the gap in semantic verification, not the gap in operational security.

Q: How does DCC relate to existing authorization frameworks like OAuth or ABAC?

A: OAuth and ABAC govern human-to-system and resource-level access control. DCC addresses a distinct problem: agent-to-agent authorization in multi-hop delegation chains where the delegating entity is itself an AI agent. The frameworks are complementary. OAuth may govern the initial human authorization that triggers an agentic workflow; DCC governs what happens to that authorization as it propagates through agent spawning chains. The paper does not provide a direct comparison to existing frameworks, which represents a gap in the current literature.

Q: Is SentinelAgent ready for production federal deployment?

A: The paper is an arXiv preprint (arXiv:2604.02767v1) and has not completed peer review as of this writing. The benchmark results are promising, and the TLA+ mechanical verification across 2.7 million states provides meaningful formal assurance for the deterministic properties. However, the paper provides no data on computational overhead, latency impact, or infrastructure dependencies for the DAS in production environments. Federal agencies should treat this as a framework for evaluation and pilot deployment, not a production-certified solution.