Maintain Agent Skills with functional programming discipline.
Functional Skill is an engineering methodology, designed for complex Skill maintenance and iteration. Combined with trace logging and unit testing, it makes Skills modular, traceable, and testable:
- Treat each
Stepas aFunctionwith explicitInput/Output(pure function first) SKILL.mdonly orchestrates the pipeline ofFunctions, consuming onlyexternal inputsandreference dependencies- Shared rules across
Functions go intoreferences/ - Deterministic logic called by
Functions becomesscripts/ - Add trace logging for every
Function, running locally, recording input/output/token consumption/duration - Equip every
Functionwith unit tests and E2E tests, ensuring neither individual Functions nor the full pipeline regress
npx skills add Shopee-Eng/functional-skill-creator --skill fskill-creator -yInstall via skills.sh. Add -a <agent> to target one agent; add -g only if it supports global install.
Create — new functional skill from a workflow brief:
/fskill-creator create a functional skill for <workflow>
Migrate — convert an existing legacy skill directory into a functional skill:
/fskill-creator migrate <path-to-skill-dir>
The migrate lane reads the whole skill package — SKILL.md, references/, scripts/, and other companion files — not a single markdown file in isolation.
Optional: include_report, include_unittest, include_viewers (on by default; set to false to skip).
Your Skill is getting bloated.
As your Skill's capabilities iterate, your SKILL.md and references/*.md grow longer, rules pile up, edge cases get patched ever more finely — slowly turning into an unmaintainable wall of prose.
A typical example is when everything keeps getting stuffed into a handful of markdown files:
flowchart TB
subgraph mono["Monolithic Skill — tangled dependencies"]
SKILL["SKILL.md<br/>Goal · Workflow · Rules · Output · Known issues"]
W["Workflow + Step 1.5 patch"]
RL["Rules · edge cases · stop rules"]
OUT["Output handler · validation"]
RULES["references/rules.md<br/>Terms · policies · snippets"]
EX["references/examples.md<br/>Success · failures · workarounds"]
S1["parse_input.js"]
S2["parse_input_new.js"]
S3["fix_edge_case_once.sh"]
S4["migrate_old_do_not_delete.js"]
end
SKILL <--> W
SKILL <--> RL
W <--> RL
RL <--> OUT
SKILL <--> RULES
RULES <--> EX
EX <--> W
OUT <--> EX
W --> S1
RL --> S2
S1 <--> S2
S2 --> S3
S3 <--> EX
OUT --> S4
S4 <--> SKILL
S3 <--> RL
S1 --> OUT
style mono fill:#fff5f5,stroke:#c53030
style SKILL fill:#fde8e8,stroke:#c53030
style W fill:#fde8e8,stroke:#c53030
style RL fill:#fde8e8,stroke:#c53030
style OUT fill:#fde8e8,stroke:#c53030
style RULES fill:#fde8e8,stroke:#c53030
style EX fill:#fde8e8,stroke:#c53030
style S1 fill:#fde8e8,stroke:#c53030
style S2 fill:#fde8e8,stroke:#c53030
style S3 fill:#fde8e8,stroke:#c53030
style S4 fill:#fde8e8,stroke:#c53030
Functional Skill Creator provides an engineering methodology that makes Skills modular, traceable, and testable:
- Break each step into a
Functionwith explicit Input/Output SKILL.mdonly orchestrates the pipeline ofFunctions, consuming onlyexternal inputsandreference dependencies- Shared rules across
Functions go intoreferences/ - Deterministic logic called by
Functions becomesscripts/ - Add trace logging for every
Function, running locally, recording input/output/token consumption/duration - Equip every
Functionwith unit tests and E2E tests, ensuring neither individual Functions nor the full pipeline regress
flowchart TB
subgraph exec["execution — compose(f₄ ∘ f₃ ∘ f₂ ∘ f₁)"]
direction LR
ORCH["SKILL.md<br/>orchestration only"] --> F1["f₁ load_input<br/>(raw) → loaded"] --> F2["f₂ extract<br/>(loaded) → req"] --> F3["f₃ generate<br/>(req) → plan"] --> F4["f₄ validate<br/>(plan) → out"]
end
subgraph sup["supporting layers — read-only dependencies"]
direction LR
REF["references/"] --- SCR["scripts/"] --- TC["testcases/"] --- LOG["logs/runs/"]
end
exec --- sup
style exec fill:#f0fff4,stroke:#2f855a,stroke-width:2px
style sup fill:#ebf8ff,stroke:#2b6cb0,stroke-width:2px
style ORCH fill:#e6ffed,stroke:#2f855a
style F1 fill:#e6ffed,stroke:#2f855a
style F2 fill:#e6ffed,stroke:#2f855a
style F3 fill:#e6ffed,stroke:#2f855a
style F4 fill:#e6ffed,stroke:#2f855a
style REF fill:#ebf8ff,stroke:#2b6cb0
style SCR fill:#ebf8ff,stroke:#2b6cb0
style TC fill:#ebf8ff,stroke:#2b6cb0
style LOG fill:#ebf8ff,stroke:#2b6cb0
Functional Skill does not aim to make Skills complex, but to put complexity where it belongs: judgment goes to Functions, rules go into references, deterministic actions go to scripts, and regression behavior solidifies into testcases.
- You are maintaining a long-evolving agent skill and don't want to rely on gut feeling for every regression check.
- Your
SKILL.mdandreferences/have become impossible to maintain manually, leaving you no choice but to blindly let AI iterate down one path. - You want to extract deterministic work like parsing, formatting, and validation from prompts and execute them reliably through scripts.
- You want your skill's functionality and execution flow to become traceable, so you can pinpoint exactly where each run went wrong.
- You want your skill to capture real failure cases and ideal runs, turning them into repeatable test suites.
In other words, if your Skill is concise, or you've already split it cleanly in a modular way and are confident it's maintainable — you don't need to make it Functional.
Skills generated by fskill-creator include basic report / unittest tooling by default. Use include_report=false and include_unittest=false to disable them separately, or use include_viewers=true|false to control whether local viewers are generated.
scripts/report.mjs: Writes function-level report logs, supportsreport_mode=off|local|remote.scripts/runtime.mjs: ExportsrunStep,writeStepReport, andapplyReportModefor wrapping each Step in the function workflow.scripts/test_report.mjs: Validates that the report runtime can write JSONL and checks sensitive field redaction.scripts/test_cases.mjs: Runstestcases/**/*.case.jsonfunction input/output assertions, and can also export trace records as testcases.logs/runs/: JSONL traces written whenreport_mode=local.
Functional Skill encourages turning real execution traces into regression assets:
Run skill → Check trace → Review function behavior → Export testcase → Fix function → Run tests
If Function1 fails, add a testcase for Function1;
if normalize_input is deterministic logic, push it down into scripts/ and write a script test.
Problems stay at the layer where they occur — maintenance cost does not spread across the entire skill.
See methodology details in docs/functional-skill.md. Function contract specification in docs/function-contract.md. When to put logic into scripts/ in docs/scripting.md. Testing and trace in docs/testing.md and docs/observability.md.
Migration exposes structural issues that were already present in the legacy skill — I/O mismatches, blurred function boundaries, ambiguous step definitions, and similar. That is expected, not a migration failure. Review the proposal, fix the contracts, add testcases, and the skill will run more reliably than before.
Migration does not invent new problems — it makes existing ones visible.
Monolithic skills often work despite implicit handoffs: step 2 assumes something step 1 "obviously" produces, boundaries between parsing and judgment are fuzzy, and shared rules are duplicated across sections. When you split into function contracts, those assumptions become explicit — and mismatches show up immediately.
Common findings after migrate:
- I/O mismatch — a downstream function expects fields that upstream does not output
- Blurred boundaries — one legacy section maps to multiple functions, or one function owns too much
- Ambiguous definitions — inputs and outputs described in prose, not stable object fields
Treat these as migration output, not migration failure. Suggested follow-up:
- Review
migration_proposaland function contracts - Align pipeline I/O — rename fields, split or merge functions, add entries to
references/shared-glossary.md - Run the skill once with trace enabled; export failing steps as testcases
- Re-run tests until the pipeline is consistent
Fixing these issues is the point of migration. A skill with explicit, tested function contracts hallucinates less and fails more predictably than a prose wall that only worked by accident.
skills/
fskill-creator/ Create, maintain, or migrate functional skills
sub-skills/
create/ Form create_context from requirement brief
migrate/ Form migration_context from a legacy skill directory
docs/ Methodology and specifications
templates/ Reusable skill templates
examples/ Runnable functional skill examples
Currently v0.1.0 alpha. File formats and script conventions are usable, but may still adjust before 1.0.
This project does not bind to any agent platform, model vendor, or workflow engine. The built-in testcase runner is a runtime-agnostic assertion engine — it only validates output, does not execute agents or call models.
Issues and PRs welcome. See CONTRIBUTING.md for development guide. For security-sensitive submissions, please read SECURITY.md first.
MIT. See LICENSE.