Skip to content

feat(compiler): support registered host-await builtins for natural function call syntax#667

Open
kusha wants to merge 4 commits into
microsoft:mainfrom
kusha:markbirger/registered-host-await-builtins
Open

feat(compiler): support registered host-await builtins for natural function call syntax#667
kusha wants to merge 4 commits into
microsoft:mainfrom
kusha:markbirger/registered-host-await-builtins

Conversation

@kusha

@kusha kusha commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Extends the RVM compiler to accept a list of host function names at compile time. Calls to registered names are compiled directly into HostAwait instructions, allowing policy authors to write lookup(input.account_id) instead of __builtin_host_await(input.account_id, "lookup").

Motivation

  1. No compile-time validation of host-supported identifiers: The identifier passed to __builtin_host_await is a runtime string. Typos like "lokup" instead of "lookup" are silently accepted by the compiler and only discovered at runtime when the host rejects the identifier.

  2. Wrapper workaround breaks caller identification: Defining a wrapper lookup(x) := __builtin_host_await(x, "lookup") compiles the HostAwait instruction inside the wrapper module, not the calling module — breaking caller identification for authorization and auditing.

  3. Host can override standard builtins: Registering a name that matches a standard Rego builtin (e.g. time.parse_duration_ns) lets the host intercept that call and provide its own implementation.

Changes

Compiler (src/languages/rego/compiler/)

  • register_host_await_builtin() on Compiler to declare host function names. Validates arg_count == 1 and rejects the reserved name __builtin_host_await.
  • compile_from_policy_with_host_await() entry point that accepts the builtin list. Existing compile_from_policy() delegates with an empty list.
  • determine_call_target() extended with resolution order: explicit __builtin_host_await → registered host-await → user-defined → standard builtin.
  • Both explicit and registered paths emit identical HostAwait { dest, arg, id } bytecode.

Documentation (docs/rvm/)

  • instruction-set.md: Registered builtin syntax, resolution order, argument handling.
  • architecture.md: Updated to describe both emission paths.

Tests (tests/rvm/rego/)

  • Extended test harness with HostAwaitBuiltinSpec and argument assertion.
  • 9 YAML test cases: suspend/resume, run-to-completion, multiple names, queue, shadowing, object packing, arg_count rejection, reserved name rejection, standard builtin override.

Constraints

  • arg_count must be 1: The HostAwait instruction carries a single argument register. Use object packing to pass multiple values: lookup({"key1": v1, "key2": v2}). This can be lifted in the future by having the compiler auto-pack multiple arguments into an array or object.
  • __builtin_host_await is reserved: Attempting to register this name produces a compile-time error. It is handled by a dedicated code path (explicit 2-argument form) and cannot be overridden.

Limitations

  • No conflict detection: A registered name that collides with a standard builtin silently wins. No warning is emitted.
  • __builtin_host_await still exists: The raw builtin remains functional. Policy authors who discover it can use arbitrary identifier strings. However, the host controls how identifiers are resolved and can reject or error on unexpected identifiers at runtime.

Allow hosts to register function names at compile time so that calls to
those names emit HostAwait instructions directly, enabling natural syntax
like fetch(x) instead of __builtin_host_await(x, "fetch").

- Add host_await_builtins map and register_host_await_builtin() to Compiler
- Validate arg_count == 1 and reject reserved __builtin_host_await name
- Extend determine_call_target() resolution: explicit > registered > user > builtin
- Both explicit and registered paths emit identical HostAwait bytecode
- Add compile_from_policy_with_host_await() entry point in rules.rs
- Extended test harness with HostAwaitBuiltinSpec and args assertion
- 9 YAML test cases: suspend/resume, run-to-completion, multiple names,
  queue, shadowing, object packing, arg_count rejection, reserved name
  rejection, standard builtin override
- Documentation: instruction-set.md, architecture.md

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the Rego→RVM compiler to support “registered” host-await builtins, allowing natural function-call syntax (e.g. lookup(x)) to compile directly into HostAwait instructions, while retaining the explicit __builtin_host_await(arg, id) path.

Changes:

  • Add Compiler::compile_from_policy_with_host_await(...) and Compiler::register_host_await_builtin(...) to configure host-await builtin names (with arg_count == 1 enforced).
  • Extend call target resolution to prioritize explicit __builtin_host_await, then registered host-await names, then user-defined functions, then standard builtins.
  • Update the YAML-based RVM Rego test harness to pass registered builtins and (in suspendable mode) assert the host-await argument payload; add a new YAML suite covering registered host-await scenarios.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/languages/rego/compiler/mod.rs Adds compiler state + API to register host-await builtins and validate registration constraints.
src/languages/rego/compiler/rules.rs Adds a new compile entry point that accepts registered host-await builtins and wires it into compilation.
src/languages/rego/compiler/function_calls.rs Implements resolution order + emits HostAwait for registered names by auto-loading the identifier literal.
tests/rvm/rego/mod.rs Extends harness to pass registered builtin specs into compilation and validate suspendable host-await arguments.
tests/rvm/rego/cases/registered_host_await.yaml Adds test coverage for registered host-await behavior (shadowing, queueing, override, validation errors).
docs/rvm/instruction-set.md Documents registered host-await builtin syntax, resolution order, and argument handling constraints.
docs/rvm/architecture.md Updates architecture walkthrough to describe both explicit and registered HostAwait emission paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/languages/rego/compiler/function_calls.rs Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Mark Birger <birgerm@yandex.ru>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/rvm/rego/mod.rs Outdated
Comment thread tests/rvm/rego/cases/registered_host_await.yaml Outdated

@anakrish anakrish left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good.

Comment thread tests/rvm/rego/cases/registered_host_await.yaml
Comment thread src/languages/rego/compiler/rules.rs
(arg_regs[0], arg_regs[1])
} else {
// Registered host-awaitable builtin — identifier is the function name
if arg_regs.len() != 1 {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to define behavior when registered host await builtin shadows a user write policy rule (both regular and function)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move it out of scope for this PR? Shadowing already happens for __builtin_host_await. Also we need to decide on approach. IMO compilation check is not great choice as it makes compiled policies dependent on engine state. The easiest would be to prioritize user defined if it is present.

}

// Check registered host-awaitable builtins
if let Some(&arg_count) = self.host_await_builtins.get(original_fcn_path) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to detect shadowing here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment above

Comment thread src/languages/rego/compiler/function_calls.rs Outdated
Mark Birger and others added 2 commits June 5, 2026 17:48
… registration

- Compiler::register_host_await_builtin now rejects duplicate, empty,
  and whitespace-only names. Previously a duplicate registration would
  silently overwrite the existing entry, which could mask the host's
  own registration mistakes.
- YAML test cases added: empty registration list as no-op, duplicate
  name rejection, empty/whitespace name rejection, out-param (a, out)
  calling syntax with a single-arg registered builtin, and mixed
  __builtin_host_await + registered builtins in the same policy
  consuming from their respective identifier queues.
- Test harness: replace assert_eq! on HostAwait argument mismatch with
  anyhow::Error so mismatches propagate through the case reporter
  instead of panicking and skipping the harness's normal error path.
- YAML comment fix: "Registration panics" -> "Registration fails with
  an error" (registration returns Err, never panics).

Addresses anakrish + Copilot inline review comments on PR microsoft#667.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…riants

Addresses PR microsoft#667 review item microsoft#8: at the emit site in
`compile_function_call`, the discrimination between explicit
`__builtin_host_await(arg, id)` and a registered host-awaitable
builtin was being recovered by string-comparing `original_fcn_path`
against `"__builtin_host_await"`. The information was already known
in `determine_call_target` and was being thrown away.

Replace the single `CallTarget::HostAwait` variant with two:

* `ExplicitHostAwait` (unit) — the two-argument call form. The
  identifier register comes from the user's second argument.
* `RegisteredHostAwait { identifier: String }` — the one-argument
  call form for registered builtins. The identifier is the registered
  name and is captured in the variant at recognition time, so the
  emit site never re-derives it from the function path.

This removes the magic-string comparison at the emit site (the source
of truth is now `determine_call_target`) and makes both match sites
in `compile_function_call` exhaustive over the two forms — adding a
third host-await form in the future would force a compile error at
every match site instead of silently falling through.

Arities are now hardcoded in the `expected_args` extraction
(`Some(2)` for explicit, `Some(1)` for registered) rather than
carried in the variant; registered builtins are constrained to
`arg_count == 1` at registration time, so there is no per-call
variability to carry.

Bytecode output is unchanged; the full RVM test suite (97 cases) and
the registered_host_await suite (15 cases) pass without modification.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@anakrish

anakrish commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

@copilot Review this PR using all the skills in this repo. Use a separate agent for each skill.

@anakrish

anakrish commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Deep review found a few issues worth addressing before merge. I reviewed this with separate passes for value-flow, safety/API, type conversion, test adequacy, adversarial verification, and an independent deep-review agent.

Findings

  1. Medium: fully-qualified calls can bypass registered host-await shadowing

    The docs/tests say registered names shadow user-defined functions and builtins, but determine_call_target only checks host_await_builtins using original_fcn_path:

    if self.host_await_builtins.contains_key(original_fcn_path) {
        return Ok(CallTarget::RegisteredHostAwait {
            identifier: original_fcn_path.to_string(),
        });
    }

    For package demo, registering ("resolve", 1) intercepts resolve("k"), but a policy can call data.demo.resolve("k"). In that case original_fcn_path is data.demo.resolve, so it does not match the registered key resolve; the call then falls through to the user-defined function path. Either the compiler should also account for the package-qualified form, or the docs/tests should explicitly state that registered host-await builtins only intercept unqualified calls.

  2. Low: suspendable HostAwait arg checks reprocess runtime values as YAML fixtures

    In tests/rvm/rego/mod.rs, the suspendable harness does:

    let actual = process_value(argument)?;
    if actual != expected {
        return Err(anyhow::anyhow!(...));
    }

    argument is already a runtime VM Value; process_value is intended for YAML fixture decoding. This can turn legitimate runtime values into fixture sentinels before comparison. Example: policy passes lookup("#undefined"); YAML expected args: "#undefined" becomes Value::Undefined, and the runtime string is also converted to Value::Undefined, so the test can pass for the wrong reason. Compare argument.clone() directly to the already-processed expected value instead.

  3. Low: run-to-completion args: are silently ignored

    build_host_await_response_vec parses expected args via the shared map builder and then drops them:

    .map(|(id, values)| (id, values.into_iter().map(|(_, output)| output).collect()))

    A run-to-completion YAML case with host_await_responses: [{ id: "translate", args: "WRONG", value: "hola" }] would still pass if the output matches, because the argument is never checked. Consider rejecting args for run-to-completion mode, documenting the limitation clearly, or adding instrumentation so RTC tests can validate payloads too.

  4. Low: HostAwait tests do not fail on unused expected responses

    The suspendable harness pops responses when host-await calls occur, but does not assert that all response queues are empty after completion. A test can provide two expected responses for lookup while the policy calls it once; the leftover response is ignored. This can hide missing calls or too-few HostAwait executions. After completion, assert all suspendable response queues were consumed; add an equivalent check or explicit limitation for RTC mode.

  5. Low: whitespace-padded registered names create unreachable registrations

    register_host_await_builtin rejects all-whitespace names via name.trim().is_empty(), but accepts names like " lookup" or "lookup ". Those get inserted into host_await_builtins, but normal Rego function-call paths will produce lookup, so the registration is effectively dead. Consider rejecting name != name.trim() or normalizing before insertion.

I dropped the general “registered names shadow user/standard builtins” concern because the current PR explicitly documents and tests that precedence as intended behavior. The FQN case above is narrower: the documented shadowing behavior does not appear to hold for fully-qualified call syntax.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment thread tests/rvm/rego/mod.rs
)
})?;
if let Some(expected) = expected_args {
let actual = process_value(argument)?;
@anakrish

anakrish commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

@zicklag Does this PR address your needs in regards to wrapping the host await builtin?

@zicklag

zicklag commented Jun 8, 2026

Copy link
Copy Markdown

Ooh, yeah, from the PR description this looks really useful. Especially being able to move some of the checks to compile time checks instead of runtime checks.

It is worth mentioning that maybe I'd sometimes want to add wrapper functions in the rego policy just to allow you to call functions with more than one arg, ( and just forward that to the single arg async builtin ) but that doesn't reduce the value of this, and passing in a single object as an arg might be best in many cases anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants