Observations on AARM v0.1: Enforcement Determinism, the Meaning of Context, and a Path Toward Rule Crystallization #10
Replies: 2 comments
-
|
Enforcement determinism is the right thing to obsess over. If two implementations of the same constraint language can produce different enforcement decisions for the same input, the entire verification model breaks. In Nobulex, I went with a deliberately minimal constraint language to keep enforcement deterministic: Three keywords: permit, forbid, require. Deny-override semantics (forbid always wins over permit). Default-deny (anything not explicitly permitted is blocked). Conditions are simple comparisons — no arbitrary expressions. This makes the enforcement function a pure function: The tradeoff is expressiveness — you can't write complex policies. But for behavioral proof that needs to be independently verifiable, determinism matters more than expressiveness. Spec: Proof-of-Behavior v0.1.0 |
Beta Was this translation helpful? Give feedback.
-
|
The determinism point is central. For autonomous action control, I would split the system into a small deterministic enforcement kernel and a broader advisory layer. The enforcement kernel should accept only normalized inputs: actor, delegated user if any, action type, resource, data class, environment, tool identity, declared side effects, and time window. Rules should evaluate over that normalized action envelope with default-deny and deny-overrides semantics. Anything involving free-form model judgment should happen before the kernel as classification or recommendation, not inside the enforcement decision itself. For “context,” the main risk is letting it become an unbounded narrative field. Context used for enforcement should be typed and enumerable: production vs development, public vs restricted data, read vs write, reversible vs irreversible, low vs high financial impact. If a context value cannot be normalized, it should not decide allow/block directly. Rule crystallization could also be handled conservatively: repeated human approvals become candidate rules, but they should require regression tests, counterexamples, and explicit owner approval before promotion. That keeps learning useful without turning operational convenience into silent policy expansion. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Observations on AARM v0.1: Enforcement Determinism, the Meaning of Context, and a Path Toward Rule Crystallization
Author: David Pierce
Date: February 2026
Regarding: AARM Specification v0.1 by Herman Errico (arXiv:2602.09433)
Purpose
This document offers observations on the Autonomous Action Runtime Management (AARM) specification, v0.1, published by Herman Errico in February 2026. The observations come from the perspective of someone who has spent years working at the intersection of security, data governance, and AI deployment in federal environments. I am not an authority on AARM's internal design decisions, nor do I claim deep knowledge of the full body of research that informed it. What I offer instead is a practitioner's perspective on where the specification's assumptions may benefit from refinement, along with a constructive proposal for how one of its core mechanisms could be reframed.
The intent is to strengthen the specification. AARM is addressing a real and urgent problem. These observations are offered in that spirit.
Note about uncertainty. Throughout this document, I have attempted to distinguish between observations I am confident in and assertions where I may be working from incomplete information. Where the AARM paper has already acknowledged a limitation or considered a counterargument, I have tried to note that explicitly. It is entirely possible that decisions I question were made for sound reasons I do not fully appreciate. I welcome correction on any point where my reading of the specification is incomplete or where additional context would change the analysis.
What AARM Gets Right
Before identifying areas for refinement, it is worth being direct about what the AARM specification does well. These are not minor contributions.
The threat model is comprehensive and well-articulated. AARM identifies eleven distinct attack vectors specific to AI-driven actions, including prompt injection, confused deputy attacks, compositional data exfiltration, intent drift, and cross-agent propagation. The paper is also honest about which threats AARM fully mitigates versus those it only partially addresses. The specification explicitly notes that memory poisoning "remains a challenging threat that AARM mitigates but does not fully prevent," and identifies cross-agent propagation, side-channel leakage, and environmental manipulation as requiring "complementary infrastructure-level protections." This intellectual honesty strengthens the specification.
The action mediation architecture is sound. Intercepting actions before execution (not after) is the correct intervention point. AARM's insistence that all tool invocations must be captured and evaluated prior to reaching target systems reflects a fundamental security principle: prevention is preferable to detection. The specification is right that SIEM-based approaches that observe events after execution are insufficient when actions are irreversible.
The receipt and audit mechanism is genuinely useful. Tamper-evident, cryptographically signed receipts that bind action, context, decision, and outcome together create a forensic record that most existing agent frameworks lack entirely. For regulated environments such as federal, financial, and healthcare, this kind of audit trail is not optional. AARM's treatment of receipts as a first-class component is the right design choice.
The compositional risk problem is real and underserved. AARM correctly observes that individually permitted actions can constitute a breach when composed in sequence. An agent that reads sensitive data and then sends an email may satisfy policy on each action independently, but the combination represents exfiltration. Traditional RBAC and API gateway approaches evaluate actions in isolation and genuinely cannot detect this class of threat. This is a meaningful gap in existing security models, and AARM is right to center it.
The specification is appropriately vendor-neutral. By defining the behavioral requirements of a runtime security system without prescribing implementation, AARM creates space for diverse implementations while establishing common evaluation criteria. This is the right approach for an emerging market where proprietary fragmentation is a real risk.
The conformance requirements are concrete and testable. AARM's nine requirements, distinguished between MUST and SHOULD, give implementers and evaluators clear criteria. This is more actionable than most security specifications at the v0.1 stage.
The paper's treatment of existing security tools is fair. The paper explicitly acknowledges that policy languages like OPA and Cedar provide "expressive evaluation engines that can serve as backends for AARM's policy evaluation component" and that capability-based security "constrains authority propagation." AARM does not dismiss these tools; it argues they are necessary but insufficient. That is a reasonable claim, even if, as I discuss below, I think the insufficiency may be more addressable than the specification suggests.
These are meaningful contributions. The observations that follow are intended to build on this foundation.
Observation 1: The Term "Context" Needs a Sharper Definition
Throughout the AARM specification, the term "context" is used to refer to at least two fundamentally different things. The first is concrete, enumerable session state: prior actions taken, data classifications encountered, timestamps, tool outputs, identity attributes. These are discrete data points that can be stored, queried, and evaluated deterministically. The second is semantic intent alignment: whether an action "makes sense" given what the user originally asked for. These are not the same thing, and conflating them under a single term creates ambiguity in the specification that may propagate into implementations.
Consider AARM's four action classifications. "Forbidden" actions are evaluated against static policy, with no ambiguity. But "Context-Dependent Deny" is described as actions that are "allowed by policy, but blocked when context reveals inconsistency with the user's stated intent." The word "context" in that sentence is doing the work of two different systems: a data store that tracks what has happened in the session, and an inference engine that interprets whether the current action aligns with a natural language goal.
This matters because these two capabilities have very different properties for enforcement.
The Accountability Gap
The accountability row in the table above deserves emphasis because it introduces a dimension that is often absent from technical security specifications. When a human reviewer makes an authorization decision (approving a privileged access request, authorizing a data transfer, signing off on a firewall rule change), that decision carries institutional and legal weight. The human has standing in an accountability chain. Their judgment, however imperfect, is backed by training, clearance, role assignment, and legal liability. There are centuries of precedent for how human judgment operates in authorization contexts, and there are established remedies when that judgment fails.
An LLM-based intent evaluator operating in the enforcement plane has none of these properties. Its decisions are not reproducible: the same inputs may produce different outputs across evaluations. Its reasoning is not auditable in any legally meaningful way. There is no established framework for assigning liability when an AI-driven intent evaluation makes the wrong call, and no institutional precedent for treating a probabilistic inference as an authorization decision. The tamper-evident receipts AARM requires would record that a decision was made, but they would not make the decision itself accountable in the way a human authorization is.
The recommendation is straightforward: separate these two concepts in the specification. Define "session state" as the concrete, structured data that the context accumulator tracks. Define "intent alignment" as the semantic evaluation performed against that state. This distinction clarifies what is deterministic (and therefore suitable for enforcement) versus what is interpretive (and therefore more appropriate for detection, alerting, and offline analysis). It also gives implementers clearer guidance on what they need to build and how to test it.
Observation 2: Non-Deterministic Enforcement Is Not Required by the Problem
AARM explicitly distinguishes between "automated response" (deterministic enforcement of pre-specified rules) and "autonomous response" (a system that reasons over accumulated context to decide whether an action should be permitted). The specification positions itself in the autonomous category and frames this as necessary because AI agents are fundamentally different from prior security subjects.
I want to push back on this framing, though I acknowledge that the author may have considered and rejected this line of reasoning for reasons not fully visible in the published specification.
Human users are themselves non-deterministic. A human with database admin credentials can drop a production database. A human with email access can exfiltrate data by forwarding it to a personal account. A human can execute a sequence of individually permitted actions that constitute a breach in combination. Humans experience "intent drift," where they start a task with one goal and end up doing something different. Humans are susceptible to social engineering, which is the human-actor equivalent of prompt injection.
And yet, the security community has never argued that enforcement mechanisms for human users need to reason about intent in real time. We do not put an inference engine between a DBA and their SQL client that evaluates whether a DROP TABLE command aligns with their manager's instructions. We use deterministic controls: role-based access, least privilege, network segmentation, data classification enforcement, and audit logging. The non-determinism of the human actor is managed by constraining the environment the actor operates in.
This model works precisely because it does not try to solve the non-determinism problem. It accepts that you cannot reliably predict or interpret what an actor will do and instead ensures that whatever they do, the blast radius is bounded by deterministic controls. Separation of duties, least privilege, mandatory access controls. These exist because the security community collectively decided that evaluating human intent at runtime was neither feasible nor necessary, and instead built systems where intent does not matter if the controls are properly scoped.
What AI Agents Actually Change
AI agents do change the threat surface. The speed and scale arguments are real; an agent can execute hundreds of actions per minute where a human executes a few. The compositional risk argument is real; agents may chain actions in ways that are harder to anticipate. The untrusted orchestration argument is real; the agent's reasoning layer can be manipulated through prompt injection.
None of the characteristics AARM identifies (irreversibility, speed, compositional risk, untrusted orchestration, privilege amplification) require the enforcement plane to become non-deterministic. Speed is an argument for rate limiting and transaction controls. Compositional risk is an argument for workflow-level policy constraints, which can be deterministic. Untrusted orchestration is an argument for not trusting the agent, which means enforcement should be external and rule-based. These characteristics require the rule set to be richer and the control points to be more granular. That is an engineering challenge.
We did not abandon deterministic enforcement when we moved from single-user systems to multi-user, from local to networked, from human-speed to automated pipelines, or from on-premise to cloud. Each transition expanded the threat surface and required richer policy models, but enforcement remained deterministic. AI agents are the next step in that progression.
Observation 3: Perimeter and Privilege Controls Should Be Primary, Not Secondary
The specification dismisses perimeter defense and traditional access controls as insufficient, noting that firewalls protect perimeters but agents operate inside with legitimate credentials, and that IAM/RBAC evaluates permissions in isolation. Both observations are accurate. But the conclusion, that enforcement should therefore shift to screening outbound actions through a context-aware evaluation layer, inverts the defense model in a way that warrants reconsideration.
Screening every outbound action through a semantic evaluation engine is architecturally analogous to relying on a content filter as your primary security control in place of access control. It accepts that the agent has broad access and then tries to decide, action by action, whether that access is being used appropriately. This approach carries meaningful cost: every action must be intercepted, context must be evaluated, and decisions must be cryptographically signed. In high-throughput agent environments, this becomes a performance chokepoint and a single point of failure.
The alternative, following how security has been architected for decades, starts with constraining what is possible. If an agent's credentials are scoped to read from specific data sources and write to specific targets, enforced at the infrastructure level, then prompt injection, goal hijacking, and confused deputy attacks have a dramatically reduced blast radius regardless of what the agent intends. You do not need to interpret intent if the agent physically cannot perform the harmful action.
Recent work outside AARM reinforces this approach. The AgentSentry framework, published in October 2025, proposes that agent permissions should be dynamically scoped to the specific user-authorized task at hand, with minimal, temporary privileges automatically revoked upon task completion. The AgentBound framework targets MCP servers specifically, arguing that enforcement should be shifted "close to the system layer" through access control policies that cannot be circumvented through prompt manipulation alone. Both of these represent perimeter-first models adapted to the agent context.
For organizations operating in regulated environments, I would argue the priority should be inverted: least privilege and infrastructure-enforced boundaries as the primary defense layer, with action mediation and behavioral monitoring as additional depth. The AARM threat model is well-suited to inform what those privilege boundaries should look like; the eleven attack vectors map naturally to scoping decisions.
A Constructive Proposal: Rule Crystallization Through Behavioral Analytics
The observations above converge on a single question: is there a way to preserve AARM's insight that session context matters for security decisions while keeping the enforcement plane deterministic? I believe there is, and it follows a pattern the security community has used repeatedly. Whether this pattern scales to the full range of threats AARM addresses is an open question that I acknowledge is not fully resolved.
The approach is to use behavioral analytics on agent execution patterns to derive deterministic, enforceable rules, and then apply those rules through conventional policy engines without requiring real-time semantic judgment. The non-deterministic analysis happens offline; enforcement remains deterministic.
The Lifecycle
Phase 1: Observe. Run agents in monitored environments. The context accumulator AARM already describes is well-suited for this phase: track action sequences, record what data was accessed before what actions were taken, map composition patterns. The focus here is on data collection.
Phase 2: Analyze. Apply behavioral analytics to the accumulated data. This is where non-deterministic, AI-assisted analysis belongs. Identify recurring legitimate action patterns. Identify composition sequences that consistently indicate policy violations. Cluster intent categories and map them to concrete action sequences. This is the analytical phase: probabilistic, iterative, human-reviewed.
Phase 3: Crystallize. Convert analytical findings into deterministic policy rules. "If an agent has accessed data marked as sensitive in the last N actions, DENY any action targeting an external endpoint" is a deterministic rule derived from behavioral observation. "An agent performing file deletion must have a preceding action matching one of these five verified cleanup workflow patterns" is enforceable without semantic judgment. The rules are concrete, testable, and auditable.
Phase 4: Enforce. Apply crystallized rules through a standard deterministic policy engine. OPA, Cedar, or any conformant engine can evaluate these rules. No runtime intent interpretation required. Given the same inputs, the same policy produces the same output.
Phase 5: Iterate. Continue behavioral monitoring alongside enforcement. When new patterns emerge that the existing rule set does not cover, cycle back through analysis and crystallization. The rule set grows more comprehensive over time, and the system's DEFER rate (actions that no rule covers and require conservative default handling) decreases.
What This Changes in AARM
This reframing does not invalidate the specification's core components. It repositions them.
How DEFER Fits
AARM's DEFER decision remains valuable in this model, but its role shifts. Rather than being a normal operating mode for context-dependent decisions, DEFER becomes the conservative default for actions that no crystallized rule covers. It is the system saying: I have not seen this pattern before and do not have a rule for it, so pause and escalate. Early in deployment, DEFER rates will be higher. As the rule set matures through the crystallization lifecycle, DEFER rates decrease. A well-managed system should see its DEFER rate as a measurable indicator of rule set completeness.
The Strongest Case Against These Observations
In the interest of evaluating all perspectives, I want to identify the strongest counterargument to the position I have taken.
The AARM paper makes a point that is difficult to dismiss: the distinction between actions that are individually permitted but collectively dangerous may require more than richer rules alone. The paper's example of an agent that reads sensitive data and then composes an email to an external recipient illustrates a pattern where the threat emerges from the relationship between actions, from neither action in isolation. The paper notes that "data retrieved in one action may be summarized, paraphrased, or embedded in unrelated content before being exfiltrated, making lineage tracking difficult."
This is a genuine challenge for the rule crystallization approach. If the threat pattern is not just "READ sensitive data, then SEND email" but rather "READ sensitive data, then the LLM silently incorporates fragments of that data into an unrelated action five steps later," then the deterministic rule needs to encode a far more complex condition. It may need to track data taint across transformations that are inherently opaque. At some point, the rule may become so complex that it approaches the semantic evaluation it was meant to replace.
Closing
AARM is tackling a real problem at the right time. The specification's threat model, action mediation architecture, and audit mechanisms are substantive contributions that the agent security community needs. The observations in this document are offered to sharpen the specification's foundation in three areas: clarifying what "context" means and separating its concrete and semantic components; preserving deterministic enforcement as a design principle; and considering whether perimeter and privilege controls deserve greater emphasis in the specification.
The rule crystallization proposal is offered as a practical mechanism for bridging these concerns. It preserves AARM's core insight, that session context matters for security decisions, while keeping enforcement deterministic, auditable, and reproducible. It follows a pattern the security community has applied successfully at every prior inflection point: observe new actor behaviors, derive new rules, enforce deterministically. Whether this pattern fully addresses the specific characteristics of AI agent threats is an open question deserving of further research and empirical validation.
I look forward to the specification's continued development and welcome the opportunity to discuss these observations with other AARM contributors. Corrections to any misreadings of the specification are genuinely welcomed.
David Pierce
Beta Was this translation helpful? Give feedback.
All reactions