Feat/user evals by PClmnt · Pull Request #18418 · Budibase/budibase

PClmnt · 2026-03-30T13:11:57Z

Description

This PR adds the ability to run evaluates of multiple types on your agent instructions.

You can use the following methods.

Exact match
Contains
LLM as a Judge
Tool used

For each agent you have you can create a suite of evaluations, each containing multiple test cases. Each of this test cases can test for multiple conditions as mentioned above.

All suites are ran against the instructions present in the configuration tab at that moment in time.

It's currently gated against the AI_EVALS feature flag.

Screenshots

Launchcontrol

Adds the ability to run evaluations on your agent prompt and tools

Summary by cubic

Adds agent evaluations (Evals) so you can define test cases, run suites, and review results for agents. The feature is behind the AI_EVALS flag and includes new UI, API, SDK, and logs support.

New Features
- Builder UI to create/edit cases, run suites, and inspect results with per-case reviewers and snapshots; new Evals tab appears only when AI_EVALS is enabled.
- Reviewer types: exact match, contains text, tool used, and LLM judge.
- API and frontend-core client: GET/PUT /api/agent/:agentId/evals, POST /api/agent/:agentId/evals/run, with validation and AI_EVALS gating (403 when disabled).
- Server SDK for eval suites (CRUD), running suites, and result aggregation; new doc IDs and types for suites and runs.
- Logs classify “Eval” sessions separately; suite runs and latest case results are surfaced in both UI and API.
Migration
- Enable AI_EVALS to use the feature.
- Duration utils moved to @budibase/shared-core (Duration, DurationType); update imports from backend-core to @budibase/shared-core.

^{Written for commit 32ce9b2. Summary will update on new commits.}

PClmnt · 2026-03-30T13:17:30Z

packages/shared-core/src/duration.ts

This was just moved from another location to be in shared-core.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e86592388e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

packages/builder/src/pages/builder/workspace/[application]/agent/[agentId]/evals.svelte

PClmnt added 5 commits March 25, 2026 15:40

evals

8164c21

Add LLM judge support to agent evals

dc2d27b

improved session handling of evluations:

9be3513

massive refactor and feature flag

5052569

Fix ordering bug

9aa5041

github-actions bot added firestorm Data/Infra/Revenue Team size/xl labels Mar 30, 2026

revert accidental deletion

b4638ca

PClmnt commented Mar 30, 2026

View reviewed changes

packages/shared-core/src/duration.ts

Copy link
Copy Markdown

Collaborator Author

PClmnt Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just moved from another location to be in shared-core.

clean up

3d3cc17

PClmnt marked this pull request as ready for review March 30, 2026 13:36

PClmnt requested a review from a team as a code owner March 30, 2026 13:36

PClmnt removed the request for review from a team March 30, 2026 13:36

Merge branch 'master' into feat/user-evals

e865923

PClmnt requested a review from adrinr March 30, 2026 13:36

chatgpt-codex-connector bot reviewed Mar 30, 2026

View reviewed changes

packages/builder/src/pages/builder/workspace/[application]/agent/[agentId]/evals.svelte Show resolved Hide resolved

PClmnt added 2 commits March 30, 2026 15:45

fix eval crud spec mock initialization

e128e71

use real docIds in eval crud spec

32ce9b2

github-actions bot added the stale label Apr 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/user evals#18418

Feat/user evals#18418
PClmnt wants to merge 10 commits intomasterfrom
feat/user-evals

PClmnt commented Mar 30, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

PClmnt Mar 30, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PClmnt commented Mar 30, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Screenshots

Launchcontrol

Summary by cubic

Uh oh!

PClmnt Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PClmnt commented Mar 30, 2026 •

edited by cubic-dev-ai bot

Loading