Skip to content

Commit e6b631b

Browse files
committed
feat: add /strands test command for TUI testing via MCP harness
- Add tester mode to process-inputs.cjs (routes /strands test) - Add task-tester.sop.md with TUI testing instructions - Add tui-test-flows.md with 5 test flows - Add Node.js setup + build steps for tester mode in workflow - Wire TUI harness MCP server (stdio) into the Strands agent
1 parent d41e14b commit e6b631b

File tree

4 files changed

+165
-5
lines changed

4 files changed

+165
-5
lines changed
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Task Tester SOP
2+
3+
## Role
4+
5+
You are a TUI Tester. Your goal is to verify the AgentCore CLI's interactive TUI behavior by driving it through
6+
predefined test flows using the TUI harness MCP tools. You post results as PR comments.
7+
8+
You MUST NOT modify any code, create branches, or push commits. Your only output is test result comments.
9+
10+
## Tools Available
11+
12+
You have TUI harness MCP tools: `tui_launch`, `tui_send_keys`, `tui_action`, `tui_wait_for`, `tui_screenshot`,
13+
`tui_read_screen`, `tui_close`, `tui_list_sessions`.
14+
15+
You also have `shell` for setup commands and GitHub tools for posting comments.
16+
17+
## Steps
18+
19+
### 1. Setup
20+
21+
- Read the test spec file at `.github/agent-sops/tui-test-flows.md`
22+
- The CLI is installed globally as `agentcore`. Launch TUI sessions using `tui_launch` with `command: "agentcore"` and
23+
the appropriate `args`.
24+
- For non-interactive commands (e.g., `--json` output), prefer `shell` over `tui_launch`.
25+
26+
### 2. Run Test Flows
27+
28+
For each flow in the test spec:
29+
30+
1. Create any required setup (e.g., temp directories, minimal projects) using `shell`
31+
2. Use `tui_launch` to start the CLI with the specified arguments and `cwd`
32+
3. Follow the flow steps: use `tui_action` (preferred — combines send + wait + read in one call) or `tui_wait_for` +
33+
`tui_send_keys` for multi-step interactions
34+
4. Verify each expectation against the screen content
35+
5. On **pass**: record the flow name as passed
36+
6. On **failure**: use `tui_screenshot` to capture the terminal state, record the flow name, expected behavior, actual
37+
behavior, and the screenshot text
38+
7. Always `tui_close` the session when done, even on failure
39+
40+
**Constraints:**
41+
42+
- Use `timeoutMs: 10000` (10 seconds) minimum for all `tui_wait_for` and `tui_action` pattern waits
43+
- Use small terminal dimensions: `cols: 100, rows: 24`
44+
- If a wait times out, retry once before declaring failure
45+
- Use text format screenshots only (not SVG)
46+
- Keep terminal dimensions consistent across all flows
47+
48+
### 3. Post Results
49+
50+
Post a single summary comment on the PR with this format:
51+
52+
```markdown
53+
## 🧪 TUI Test Results
54+
55+
**X/Y flows passed**
56+
57+
### ✅ Passed
58+
59+
- Flow name 1
60+
- Flow name 2
61+
62+
### ❌ Failed
63+
64+
#### Flow name 3
65+
66+
**Expected:** description of what should have happened **Actual:** description of what happened
67+
68+
<details>
69+
<summary>Screenshot</summary>
70+
```
71+
72+
(terminal screenshot here)
73+
74+
```
75+
76+
</details>
77+
```
78+
79+
If all flows pass, omit the Failed section.
80+
81+
## Forbidden Actions
82+
83+
- You MUST NOT modify, create, or delete any source files
84+
- You MUST NOT run git add, git commit, or git push
85+
- You MUST NOT create or update branches
86+
- You MUST NOT approve or merge the pull request
87+
- You MUST NOT run deploy, invoke, or any command that creates AWS resources
88+
- Your ONLY output is test result comments on the pull request
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# TUI Test Flows
2+
3+
Each flow describes a user interaction to verify. The tester agent drives these using the TUI harness MCP tools.
4+
5+
---
6+
7+
## Flow: Help text lists all subcommands
8+
9+
1. Launch: `agentcore --help` (use `tui_launch` with `command: "agentcore"`, `args: ["--help"]`)
10+
2. Wait for: "Usage:" on screen
11+
3. Expect all of these subcommands visible: `create`, `deploy`, `invoke`, `status`, `logs`, `add`, `remove`
12+
4. Close session
13+
14+
---
15+
16+
## Flow: Create wizard prompts for project name
17+
18+
1. Launch: `agentcore create` (no flags, in a temp directory)
19+
2. Wait for: a prompt asking for the project name (look for "name" or "project")
20+
3. Expect: an input field or prompt is visible
21+
4. Close session (Ctrl+C)
22+
23+
---
24+
25+
## Flow: Create with --json produces valid JSON
26+
27+
1. In a temp directory, run via shell:
28+
`agentcore create --name TestProj --language Python --framework Strands --model-provider Bedrock --memory none --json`
29+
2. Expect: stdout contains valid JSON with `"success": true` and `"projectPath"`
30+
3. Verify the project directory was created
31+
32+
---
33+
34+
## Flow: Add agent shows framework selection
35+
36+
1. First create a project via shell: `agentcore create --name AgentTest --no-agent --json` (in a temp directory)
37+
2. Launch: `agentcore add agent` in the created project directory
38+
3. Wait for: agent name prompt
39+
4. Type a name, press Enter
40+
5. Wait for: framework or language selection to appear
41+
6. Expect: at least "Strands" and "LangChain_LangGraph" visible as options
42+
7. Close session (Ctrl+C)
43+
44+
---
45+
46+
## Flow: Invalid project name shows error
47+
48+
1. In a temp directory, run via shell:
49+
`agentcore create --name "123invalid" --language Python --framework Strands --model-provider Bedrock --memory none --json`
50+
2. Expect: exit code is non-zero OR output contains an error about the project name (must start with a letter)

.github/scripts/javascript/process-inputs.cjs

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ function buildPrompts(mode, issueId, isPullRequest, command, branchName, inputs)
7878
implementer: '.github/agent-sops/task-implementer.sop.md',
7979
reviewer: '.github/agent-sops/task-reviewer.sop.md',
8080
refiner: '.github/agent-sops/task-refiner.sop.md',
81+
tester: '.github/agent-sops/task-tester.sop.md',
8182
};
8283
const scriptFile = sopFiles[mode] || sopFiles.refiner;
8384

@@ -94,11 +95,13 @@ module.exports = async (context, github, core, inputs) => {
9495
const { issueId, command, issue } = await getIssueInfo(github, context, inputs);
9596

9697
const isPullRequest = !!issue.data.pull_request;
97-
const mode = command.startsWith('review')
98-
? 'reviewer'
99-
: isPullRequest || command.startsWith('implement')
100-
? 'implementer'
101-
: 'refiner';
98+
const mode = command.startsWith('test')
99+
? 'tester'
100+
: command.startsWith('review')
101+
? 'reviewer'
102+
: isPullRequest || command.startsWith('implement')
103+
? 'implementer'
104+
: 'refiner';
102105
console.log(`Is PR: ${isPullRequest}, Mode: ${mode}`);
103106

104107
const branchName = await determineBranch(github, context, issueId, mode, isPullRequest);
@@ -113,6 +116,7 @@ module.exports = async (context, github, core, inputs) => {
113116
core.setOutput('session_id', sessionId);
114117
core.setOutput('system_prompt', systemPrompt);
115118
core.setOutput('prompt', prompt);
119+
core.setOutput('mode', mode);
116120
} catch (error) {
117121
const errorMsg = `Failed: ${error.message}`;
118122
console.error(errorMsg);

.github/workflows/strands-command.yml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,21 @@ jobs:
9494
};
9595
await processInputs(context, github, core, inputs);
9696
97+
- name: Setup Node.js (tester mode)
98+
if: steps.process-inputs.outputs.mode == 'tester'
99+
uses: actions/setup-node@v6
100+
with:
101+
node-version: 20.x
102+
cache: 'npm'
103+
104+
- name: Build CLI and TUI harness (tester mode)
105+
if: steps.process-inputs.outputs.mode == 'tester'
106+
run: |
107+
npm ci
108+
npm run build
109+
npm run build:harness
110+
npm install -g "$(npm pack | tail -1)"
111+
97112
- name: Run Strands Agent
98113
uses: ./.github/actions/strands-action
99114
with:
@@ -102,6 +117,9 @@ jobs:
102117
provider: 'bedrock'
103118
model: 'us.anthropic.claude-sonnet-4-5-20250929-v1:0'
104119
tools: 'strands_tools:shell,retrieve'
120+
mcp_servers:
121+
${{ steps.process-inputs.outputs.mode == 'tester' &&
122+
'{"mcpServers":{"tui-harness":{"command":"node","args":["dist/mcp-harness/index.mjs"]}}}' || '' }}
105123
aws_role_arn: ${{ secrets.AWS_ROLE_ARN }}
106124
aws_region: 'us-west-2'
107125
pat_token: ${{ secrets.GITHUB_TOKEN }}

0 commit comments

Comments
 (0)