feat: add interactive (multi-turn) evaluation support by joyyc · Pull Request #24 · mgechev/skillgrade

joyyc · 2026-06-17T12:52:53Z

Add multi-turn conversation evaluation where agents can engage in iterative dialogue with simulated user inputs. Key components:

InteractiveSession: orchestrates multi-turn conversation execution
InputInjectorManager: manages input injection, pattern matching, and stop conditions
ClaudeStreamAgent: persistent streaming agent for true multi-turn conversations via stream-json protocol
Output marker parser for agent signals ([NEEDS_INPUT:type], etc.)
LLM grader support for multi-turn transcripts with context-aware hints
Auto-switch from claude to claude-stream when interactive is enabled
Error-path grading: attempts evaluation even when agent setup fails

Includes eval.yaml configuration reference, interactive-demo example, and template updates.

Add multi-turn conversation evaluation where agents can engage in iterative dialogue with simulated user inputs. Key components: - InteractiveSession: orchestrates multi-turn conversation execution - InputInjectorManager: manages input injection, pattern matching, and stop conditions - ClaudeStreamAgent: persistent streaming agent for true multi-turn conversations via stream-json protocol - Output marker parser for agent signals ([NEEDS_INPUT:type], etc.) - LLM grader support for multi-turn transcripts with context-aware hints - Auto-switch from claude to claude-stream when interactive is enabled - Error-path grading: attempts evaluation even when agent setup fails Includes eval.yaml configuration reference, interactive-demo example, and template updates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mgechev · 2026-06-22T16:59:27Z

@joyyc would you share the prompt you used to generate this change? There are 20 modifier changes and 1.8k lines added which makes the PR hard to review. I'd love to understand your feature request better and run the prompt in my trusted workflow.

joyyc · 2026-07-02T08:29:48Z

@joyyc would you share the prompt you used to generate this change? There are 20 modifier changes and 1.8k lines added which makes the PR hard to review. I'd love to understand your feature request better and run the prompt in my trusted workflow.

To be candid, in actual development this was arrived at through multiple rounds of conversation and test-driven iteration with Claude Code. The distilled prompt is as follows (the original was authored in Chinese; the English translation below is provided for readability):

Add interactive multi-turn evaluation support to skillgrade. Implement a persistent multi-turn conversational agent (ClaudeStreamAgent) on top of Claude Code's stream-json protocol (--input-format stream-json --output-format stream-json).
Core requirements:

An InteractiveSession that drives the multi-turn conversation loop and honors max_turns and timeout_per_turn;
An InputInjectorManager that handles input injection (via triggers such as on_turn / on_output_contains) and termination conditions;
Output marker parsing (e.g. [NEEDS_INPUT:type], including Chinese markers);
A new interactive configuration section in eval.yaml;
LLM grader support for scoring multi-turn conversations;
A complete interactive-demo example;
Type definitions and README documentation.

为skillgrade添加交互式多轮评估支持。需要使用Claude Code的stream-json协议（--input-format stream-json --output-format stream-json）实现持久化多轮对话agent（ClaudeStreamAgent）。

核心：

InteractiveSession管理多轮对话循环，支持max_turns和timeout_per_turn；
InputInjectorManager处理输入注入（on_turn/on_output_contains等触发器）和停止条件；
Output marker解析（[NEEDS_INPUT:type]等，含中文标记）；
eval.yaml新增interactive配置节；
LLM grader支持多轮对话评分；
提供interactive-demo完整示例；
类型定义和README文档。

Merge branch 'main' into upstream

468cc30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add interactive (multi-turn) evaluation support#24

feat: add interactive (multi-turn) evaluation support#24
joyyc wants to merge 2 commits into
mgechev:mainfrom
joyyc:upstream

joyyc commented Jun 17, 2026

Uh oh!

mgechev commented Jun 22, 2026

Uh oh!

joyyc commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joyyc commented Jun 17, 2026

Uh oh!

mgechev commented Jun 22, 2026

Uh oh!

joyyc commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants