Skip to content

feat(import): add evaluator and online eval config import subcommands#780

Merged
jesseturner21 merged 18 commits intomainfrom
feat/import-evaluator
Apr 7, 2026
Merged

feat(import): add evaluator and online eval config import subcommands#780
jesseturner21 merged 18 commits intomainfrom
feat/import-evaluator

Conversation

@jesseturner21
Copy link
Copy Markdown
Contributor

@jesseturner21 jesseturner21 commented Apr 7, 2026

Description

Add agentcore import evaluator and agentcore import online-eval commands to import existing AWS evaluators (LLM-as-a-Judge and code-based) and online evaluation configs into CLI projects. Includes TUI import wizard support (select screen, ARN input, progress tracking).

Extracts a generic import orchestrator (executeResourceImport) using a descriptor pattern, reducing ~1,400 lines of duplicated orchestration across import handlers to ~600 lines. Each resource type provides a thin descriptor declaring its specific behavior (AWS APIs, CFN types, spec conversion, hooks).

New features

File Change
import-evaluator.ts Evaluator import handler with toEvaluatorSpec mapping, evaluator descriptor
import-online-eval.ts Online eval import handler with agent/evaluator reference resolution, service name parsing
agentcore-control.ts Enhance getEvaluator to extract full config; add listAllEvaluators, listAllOnlineEvaluationConfigs, getOnlineEvaluationConfig
TUI screens (4 files) Add evaluator and online eval config to import flow, ARN validation, progress dispatch
import-evaluator.test.ts 17 tests for spec conversion, template lookup, error cases
import-online-eval.test.ts 19 tests for agent name extraction, spec conversion, template lookup

Refactoring — generic import orchestrator

File Change
resource-import.ts NewexecuteResourceImport<TDetail, TSummary>() generic orchestrator owning the full 10-step import sequence
types.ts Add ResourceImportDescriptor interface and BeforeWriteContext hook type
constants.ts Add shared NAME_REGEX and ANSI constants (previously copy-pasted in each handler)
import-memory.ts Refactored to thin descriptor + wrapper (312 → ~100 lines)
import-evaluator.ts Refactored to thin descriptor + wrapper (309 → ~85 lines)
import-online-eval.ts Refactored to descriptor factory with closure for beforeConfigWrite state sharing (~170 lines)
import-runtime.ts Refactored to descriptor factory with beforeConfigWrite + rollbackExtra hooks (~130 lines)
command.ts, import-utils.ts Replace local ANSI constants with shared import

Related Issue

Documentation PR

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Documentation update
  • Other (please describe):

Testing

How have you tested the change?

  • I ran npm run test:unit and npm run test:integ
  • I ran npm run typecheck
  • I ran npm run lint
  • If I modified src/assets/, I ran npm run test:update-snapshots and committed the updated snapshots

Additional E2E testing:

  • E2E tested all 4 import commands against real AWS (us-east-1):
    • import evaluator — LLM-as-a-Judge with rating scale, tags
    • import memory — 3 strategies (semantic, summarization, user_preference), tags
    • import runtime — CodeZip Python 3.12, lifecycle config, execution role, source copy
    • import online-eval — agent reference resolved, evaluator local name resolved via deployed state, sampling 50%, enableOnCreate
  • agentcore status shows all imported resources with correct state
  • Rollback works correctly (config restored after CFN failure)
  • Duplicate detection works for all resource types

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the
terms of your choice.

Add `agentcore import evaluator` to import existing AWS evaluators into
CLI projects. Refactor import types and utilities for extensibility so
future resource types require minimal new code.

Changes:
- Add import-evaluator.ts handler with toEvaluatorSpec mapping (LLM-as-a-Judge
  and code-based evaluators), duplicate detection, and CDK import pipeline
- Enhance getEvaluator API wrapper to extract full evaluatorConfig (model,
  instructions, ratingScale) and tags from SDK tagged unions
- Add listAllEvaluators pagination helper filtering out built-in evaluators
- Widen ImportableResourceType union and shared utilities for evaluator support
- Add evaluator to TUI import flow (select, ARN input, progress screens)
- Add 17 unit tests covering spec conversion, template lookup, and error cases

Tested end-to-end against real AWS evaluator (bugbash_eval_1775226567-zrDxm7Gpcw)
with verified field mapping for all config fields, tags, and deployed state.
The TUI import wizard hardcoded importType as 'memory' for all non-runtime
resources, causing evaluator imports to fail with "ARN resource type
evaluator does not match expected type memory". Use flow.resourceType
instead so the correct handler is dispatched.
Add `agentcore import online-eval` to import existing online evaluation
configs from AWS into CLI-managed projects. Follows the same pattern as
runtime, memory, and evaluator imports.

The command extracts the agent reference from the config's service names
(pattern: {agentName}.DEFAULT), maps evaluator IDs to local names or
ARN fallbacks, and runs the full CDK import pipeline.

Also removes incorrect project-prefix stripping from evaluator and
runtime imports — imported resources come from outside the project
and won't have the project prefix.

Constraint: Agent must exist in project runtimes[] before import (schema enforces cross-reference)
Constraint: Evaluators not in project fall back to ARN format to bypass schema validation
Rejected: Loose agent validation | schema writeProjectSpec() enforces runtimes[] cross-reference
Confidence: high
Scope-risk: moderate
Add 'Online Eval Config' option to the interactive import flow so users
can import online evaluation configs via the TUI, not just the CLI.

Follows the same ARN-only pattern as evaluator and memory imports:
select type → enter ARN → import progress → success/error.
Screenshots captured from the TUI import flow showing:
- Import type selection menu with Online Eval Config option
- ARN input screen for online eval config
- ARN input with a real config ARN filled in
… pattern

Reduce ~1,400 lines of duplicated orchestration across four import handlers
(runtime, memory, evaluator, online-eval) to ~600 lines by extracting shared
logic into executeResourceImport(). Each resource type now provides a thin
descriptor declaring its specific behavior.

Constraint: Public handleImport* function signatures unchanged (TUI depends on them)
Constraint: Factory functions needed for runtime/online-eval to share mutable state between hooks
Rejected: Strategy class hierarchy | descriptor objects are simpler and more composable
Confidence: high
Scope-risk: moderate
…-control

Deduplicates identical pagination loops across 4 listAll* functions and
identical tag-fetching try/catch blocks across 3 getDetail functions.
Also adds optional client param to listEvaluators and
listOnlineEvaluationConfigs for connection reuse during pagination.

Addresses deferred review feedback from PR #763.

Constraint: evaluator listAll still filters out Builtin.* entries
Confidence: high
Scope-risk: narrow
…rted evaluators

resolveEvaluatorReferences used string-contains matching (evaluatorId.includes(localName))
which only works when the evaluator was deployed by the same project. Imported evaluators
with renamed local names never matched, falling back to raw ARNs in the config.

Now reads deployed-state.json to build an evaluatorId → localName reverse map and checks
it first, before the string-contains heuristic.

Constraint: Deployed state may not exist yet (first import) — .catch() handles gracefully
Rejected: Passing deployed state through descriptor interface | only online-eval needs this
Confidence: high
Scope-risk: narrow
…ring import

Evaluators referenced by ENABLED online eval configs are locked by the service
(lockedForModification=true), causing CFN import to fail when it tries to apply
stack-level tags. Now the evaluator import detects the lock, temporarily disables
referencing online eval configs, performs the import, then re-enables them.

Constraint: Re-enable runs in finally block so configs are restored on both success and failure
Constraint: Only disables configs that actually reference this specific evaluator
Rejected: Refuse import with manual guidance | user can't pause configs not yet in project
Confidence: high
Scope-risk: moderate
…se ARN-only references

Evaluators locked by an online eval config cannot be CFN-imported because
CloudFormation triggers a post-import TagResource call that the resource
handler rejects. Instead of stripping tags from the import template, block
the import with a clear error and suggestion to use import online-eval.

Online eval config import now always references evaluators by ARN rather
than resolving to local names, since the evaluators cannot be imported
into the project alongside the config.

Constraint: CFN IMPORT triggers TagResource which fails on locked evaluators
Rejected: Strip Tags from import template | still fails on some resource types
Confidence: high
Scope-risk: narrow
…ime has custom name

extractAgentName() derives the AWS runtime name from the OEC service
name pattern, but this fails to match when the runtime was imported
with --name since the project spec stores the local name. Now falls
back to listing runtimes to find the runtime ID, then looks up the
local name in deployed-state.json.
…lving agent

CDK constructs set the OEC service name as "{projectName}_{agentName}.DEFAULT".
extractAgentName() strips ".DEFAULT" but not the project prefix, so the
lookup fails against local runtime names. Now strips the prefix as a fast
path before falling back to the deployed-state API lookup.
getEvaluator() now catches ResourceNotFoundException and
ValidationException from the SDK and rethrows a clear message
instead of exposing the raw regex validation error.
import online-eval used a naive regex to extract the config ID from the
ARN, skipping resource type, region, and account validation. Now uses
parseAndValidateArn like all other import commands. Added an ARN resource
type mapping to handle the online-eval vs online-evaluation-config
mismatch between ImportableResourceType and the ARN format.
@jesseturner21 jesseturner21 requested a review from a team April 7, 2026 12:57
@github-actions github-actions bot added the size/xl PR size: XL label Apr 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Package Tarball

aws-agentcore-0.6.0.tgz

How to install

npm install https://github.com/aws/agentcore-cli/releases/download/pr-780-tarball/aws-agentcore-0.6.0.tgz

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 45.45% 6802 / 14964
🔵 Statements 45.04% 7224 / 16037
🔵 Functions 44.03% 1228 / 2789
🔵 Branches 45.52% 4535 / 9962
Generated in workflow #1638 for commit 426d782 by the Vitest Coverage Report Action

Hweinstock
Hweinstock previously approved these changes Apr 7, 2026
Copy link
Copy Markdown
Contributor

@Hweinstock Hweinstock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I like the refactor, and I think it will be make it easier to add more import logic. Few nit comments, and one edge case that seems unlikely.

- Add `red` to ANSI constants, replace inline escape codes
- Type GetEvaluatorResult.level as EvaluationLevel at boundary
- Combine ARN_RESOURCE_TYPE_MAP, collectionKeyMap, idFieldMap into
  single RESOURCE_TYPE_CONFIG to prevent drift
- Export IMPORTABLE_RESOURCES as const array, derive type from it,
  replace || chains with .includes()
- Fix samplingPercentage === 0 false positive (use == null)
- Document closure state sequencing contract on descriptor hooks
The test exercised a defensive fallback in toEvaluatorSpec for an empty
level string, but now that GetEvaluatorResult.level is typed as
EvaluationLevel, the boundary cast in getEvaluator prevents this case
from ever reaching toEvaluatorSpec.
@github-actions github-actions bot added size/xl PR size: XL and removed size/xl PR size: XL labels Apr 7, 2026
Copy link
Copy Markdown
Contributor

@Hweinstock Hweinstock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for addressing nits.

@jesseturner21 jesseturner21 merged commit e266576 into main Apr 7, 2026
23 checks passed
@jesseturner21 jesseturner21 deleted the feat/import-evaluator branch April 7, 2026 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/xl PR size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants