Add lookup-schema eval test cases by GiorgioUghini · Pull Request #121 · microsoft/skills-for-copilot-studio

GiorgioUghini · 2026-04-01T14:04:10Z

Summary

Adds the first eval test suite for the lookup-schema skill — a reference/query skill that looks up Copilot Studio YAML schema definitions using three commands: lookup, search, and resolve.

Since this is a stdout-based skill (no YAML files created), all checks use stdout_contains, stdout_not_contains, and exit_code.

Test Cases

#	Name	Command Tested	Description
1	Lookup known definition — SendActivity	`lookup`	Looks up a common, well-known kind. Verifies the response includes `SendActivity`, `text`, and `kind`.
2	Lookup trigger kind — OnRecognizedIntent	`lookup`	Looks up a trigger definition. Verifies the response explains `intent` and `trigger` concepts.
3	Lookup top-level kind — AdaptiveDialog	`lookup`	Looks up the root topic kind. Verifies `beginDialog` and `trigger` are described.
4	Search for model-related definitions	`search`	Searches for model-related schema elements. Checks for specific terms like `modelDescription` and `properties` (not just the generic word "model").
5	Resolve $ref references in a definition	`resolve`	Resolves `QuestionWithOptions` with its `$ref` sub-definitions. Verifies fully resolved output includes `properties`.
6	Lookup non-existent definition — graceful fallback	`lookup` (negative)	Looks up `FakeNonExistentKind`. Verifies the skill gracefully reports "not found" without crashes (`ENOENT`, `stack trace`, `FATAL`).

Coverage

✅ All 3 skill commands covered (lookup, search, resolve)
✅ Positive and negative test cases
✅ Uses basic-agent fixture (consistent with other eval suites)

Add 6 eval test cases for the lookup-schema skill covering: - Lookup of known definitions (SendActivity, OnRecognizedIntent, AdaptiveDialog) - Search command for model-related definitions - Resolve command for reference resolution - Negative test for graceful handling of non-existent definitions Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add explicit UTF-8 encoding and error replacement to subprocess.run to prevent cp1252 codec failures on Windows. Also guard against None stdout. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

adilei · 2026-04-05T07:05:59Z

Hey @GiorgioUghini — the evals refactor in #130 changed the eval structure, so this will need a small update before merging:

Move evals/skills/lookup-schema.json → evals/scenarios/lookup-schema.json
Rename top-level key from skill/evals to scenario_name/evals (see evals/scenarios/agent-settings.json for reference)
Rewrite prompts as natural language — e.g. instead of "Use the lookup-schema skill to look up SendActivity", use something like "What properties does the SendActivity kind have in the Copilot Studio YAML schema?"
Add routing checks — skill_invoked: "copilot-studio:lookup-schema" and optionally agent_invoked if it should route through a sub-agent
Drop any changes to evaluate.py — the harness was rewritten in Refactor evals: scenario-based testing instead of skill isolation #130

The actual check types you're using (stdout_contains, stdout_not_contains, exit_code) are still supported, so the test cases themselves are fine — it's mostly a structural move.

GiorgioUghini and others added 2 commits April 1, 2026 15:03

Fix Windows encoding error in evaluate.py

e4fa0e3

Add explicit UTF-8 encoding and error replacement to subprocess.run to prevent cp1252 codec failures on Windows. Also guard against None stdout. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ChrisGarty added the type/infra Evals, hooks, CI, build, scripts label Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lookup-schema eval test cases#121

Add lookup-schema eval test cases#121
GiorgioUghini wants to merge 2 commits intomainfrom
giorgioughini/lookup-schema-evals

GiorgioUghini commented Apr 1, 2026

Uh oh!

adilei commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

GiorgioUghini commented Apr 1, 2026

Summary

Test Cases

Coverage

Uh oh!

adilei commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants