The code generation module transforms retrieved shadcn/ui patterns into production-ready React/TypeScript components using an LLM-first 3-stage pipeline with comprehensive validation and observability.
Epic 4.5 Refactor: Previously used an 8-stage template-based approach. Now uses LLM-first generation with structured output parsing and iterative validation.
Generation Pipeline Flow:
┌─────────────┐
│ Pattern │ (from Epic 3: Pattern Retrieval)
└──────┬──────┘
│
▼
┌─────────────┐
│ Tokens │ (from Epic 1: Design Token Extraction)
└──────┬──────┘
│
▼
┌─────────────┐
│Requirements │ (from Epic 2: Requirement Proposal)
└──────┬──────┘
│
▼
┌──────────────────────────────────────────────────────┐
│ LLM-First Code Generation Pipeline │
│ │
│ STAGE 1: LLM Generation │
│ ├─ Load pattern as reference │
│ ├─ Build comprehensive prompt with exemplars │
│ ├─ Call LLM (GPT-4) for single-pass generation │
│ └─ Parse structured output │
│ │
│ STAGE 2: Validation & Iterative Fixes │
│ ├─ TypeScript validation (tsc) │
│ ├─ ESLint validation │
│ ├─ If invalid: LLM fix loop (max 2 retries) │
│ └─ Quality scoring │
│ │
│ STAGE 3: Post-Processing │
│ ├─ Import resolution & ordering │
│ ├─ Provenance header injection │
│ ├─ Code formatting (Prettier) │
│ └─ File assembly │
│ │
└───────────────────────┬──────────────────────────────┘
│
▼
┌────────────────┐
│Generated Code │
│ - Component │
│ - Stories │
│ - Types │
│ - Metadata │
└────────────────┘
| Aspect | Old (8-stage) | New (LLM-first) |
|---|---|---|
| Approach | Template-based | LLM-based with structured output |
| Stages | 8 sequential steps | 3 stages (Generate → Validate → Post-process) |
| Flexibility | Rigid templates | Adaptive to requirements |
| Quality | Manual rules | LLM understanding + validation |
| Observability | Limited | Full LangSmith tracing |
| Fix Loop | Manual | Automated LLM-driven fixes |
- Purpose: Parse pattern JSON and extract component structure
- Input: Pattern JSON from pattern library
- Output:
PatternStructurewith component name, props, imports, variants - Performance: <100ms per pattern
- Status: ✅ Kept (used as reference for LLM)
- Purpose: Generate complete component code using GPT-4 with structured output
- Input: Comprehensive prompt with pattern, tokens, requirements, exemplars
- Output: Structured JSON with component code, stories, types
- Features:
- Single-pass generation with full context
- Few-shot learning with exemplars
- Structured output parsing (JSON mode)
- Token usage tracking
- Retry logic with exponential backoff
- Performance: ~5-15s per generation
- Model: GPT-4 (default), configurable
- Purpose: Validate generated code and drive LLM fix iterations
- Input: Generated component and stories code
- Output: Validation result with errors and quality score
- Features:
- TypeScript compilation validation (tsc)
- ESLint rule validation
- Iterative LLM-driven fixes (max 2 retries)
- Quality scoring (0-100)
- Parallel validation
- Performance: ~2-5s per validation
- Fix Loop: Automatic convergence with LLM
- Purpose: Build comprehensive generation prompts with context
- Input: Pattern code, tokens, requirements, component metadata
- Output: System and user prompts with exemplars
- Features:
- Exemplar selection based on component type
- Token injection into prompt
- Requirement formatting
- Prompt versioning
- Token counting (for optimization)
- Performance: <100ms per prompt build
- Purpose: Load and manage high-quality component exemplars for few-shot learning
- Input: Component type (button, card, input, etc.)
- Output: Exemplar code with pattern, requirements, output
- Features:
- Type-specific exemplars
- Quality-ranked examples
- Caching for performance
- Exemplar validation
- Performance: <50ms per load (cached)
- Purpose: Generate provenance headers for traceability
- Input: Pattern ID, tokens, requirements
- Output: Header comment with metadata, timestamps, SHA-256 hashes
- Features: ISO 8601 timestamps, content hashing, warning messages
- Status: ✅ Kept (used in post-processing)
- Purpose: Resolve and order imports correctly
- Input: List of import statements
- Output: Ordered imports (external, internal, utils, types)
- Features: Deduplication, missing import detection, package.json generation
- Status: ✅ Kept (used in post-processing)
- Purpose: Assemble final component code and format with Prettier
- Input: All code parts (imports, CSS vars, types, component)
- Output: Formatted component.tsx and stories.tsx files
- Performance: <2s for formatting
- Status: ✅ Updated for LLM-first pipeline
- Purpose: Orchestrate the full LLM-first generation pipeline
- Input:
GenerationRequest(pattern_id, tokens, requirements) - Output:
GenerationResultwith generated code and metadata - Performance: p50 ≤20s, p95 ≤30s (significant improvement)
- Observability: Full LangSmith tracing on all operations
- Status: ✅ Refactored for 3-stage pipeline
The following modules were part of the old 8-stage template-based pipeline and have been removed:
- ❌
token_injector.py- Replaced by LLM understanding of tokens - ❌
tailwind_generator.py- LLM generates Tailwind classes directly - ❌
requirement_implementer.py- LLM implements requirements in single pass - ❌
a11y_enhancer.py- LLM includes accessibility in generation - ❌
type_generator.py- LLM generates TypeScript types directly - ❌
storybook_generator.py- LLM generates stories in single pass
from src.generation.generator_service import GeneratorService
# Initialize service with LLM
generator = GeneratorService(use_llm=True)
# Generate component
result = await generator.generate(
GenerationRequest(
pattern_id="shadcn-button",
tokens={
"colors": {"primary": "#3B82F6"},
"spacing": {"padding": "16px"}
},
requirements=[
{"name": "variant", "category": "props", "approved": True},
{"name": "onClick", "category": "events", "approved": True}
]
)
)
# Access generated code
print(result.component_code) # Component.tsx
print(result.stories_code) # Component.stories.tsx
print(result.metadata.latency_ms) # Performance metrics
print(result.validation_results.quality_score) # Quality score (0-100)The LLM-first generator tracks 3 main stages with LangSmith tracing:
-
LLM_GENERATING - Single-pass LLM generation with full context
- Pattern loading
- Prompt building with exemplars
- LLM API call
- Structured output parsing
-
VALIDATING - Code validation with iterative fixes
- TypeScript validation (tsc)
- ESLint validation
- LLM fix loop (if needed, max 2 retries)
- Quality scoring
-
POST_PROCESSING - Final assembly and formatting
- Import resolution
- Provenance injection
- Code formatting
- File assembly
Each component generation produces:
-
Component.tsx - Main component file with:
- Provenance header (pattern ID, timestamp, hashes)
- Ordered imports (external, internal, utils, types)
- TypeScript interfaces with strict types
- Accessibility-enhanced component code
- No
anytypes - LLM-generated with validation
-
Component.stories.tsx - Storybook stories with:
- CSF 3.0 format
- Meta object with argTypes
- Default story
- Variant stories (Primary, Secondary, Ghost, etc.)
- State stories (Disabled, Loading, Error)
- LLM-generated with examples
-
Metadata - Generation metadata with:
- Total latency (ms)
- Stage latencies
- Validation attempts (0 = valid first try)
- Quality score (0-100)
- Token usage (prompt + completion)
- LangSmith trace URL
| Metric | Old Pipeline | New LLM-First | Status |
|---|---|---|---|
| Total Latency (p50) | ~60s | ≤20s | ✅ 3x faster |
| Total Latency (p95) | ~90s | ≤30s | ✅ 3x faster |
| First-Time Valid | ~60% | ≥85% | ✅ Better quality |
| Validation Fixes | Manual | 0-2 auto | ✅ Automated |
| Token Usage | N/A | ~2000-4000 | ✅ Tracked |
| Quality Score | N/A | ≥80/100 | ✅ Measured |
All LLM operations are traced with LangSmith for debugging and optimization:
# Traces are automatically created with metadata
result = await generator.generate(request)
# Access trace URL from metadata (if available)
if hasattr(result.metadata, 'trace_url'):
print(f"View trace: {result.metadata.trace_url}")- Token Usage: Prompt tokens, completion tokens, total cost
- Latency: Per-stage and total latency
- Quality: Validation attempts, quality score, error types
- Success Rate: Valid first-time vs. needs fixes
- Error Patterns: Common validation errors, fix effectiveness
Access the LangSmith dashboard to:
- View all generation traces
- Analyze prompt effectiveness
- Debug validation failures
- Track token usage and costs
- Monitor quality trends
- A/B test prompt variations
See: PROMPTING_GUIDE.md for prompt optimization strategies
# Run all generation tests
pytest backend/tests/generation/ -v
# Run LLM-first component tests
pytest backend/tests/generation/test_llm_generator.py -v
pytest backend/tests/generation/test_code_validator.py -v
pytest backend/tests/generation/test_prompt_builder.py -v
pytest backend/tests/generation/test_exemplar_loader.py -v
# Run kept module tests
pytest backend/tests/generation/test_pattern_parser.py -v
pytest backend/tests/generation/test_provenance.py -v
pytest backend/tests/generation/test_import_resolver.py -v
pytest backend/tests/generation/test_code_assembler.py -v# Run end-to-end generation tests
pytest backend/tests/generation/test_generator_service.py -v
# Run with coverage
pytest backend/tests/generation/ --cov=src.generation --cov-report=html
# Integration tests with real LLM (requires API key)
pytest backend/tests/generation/ -v -m integration
# Integration tests with real LLM (requires API key)
pytest backend/tests/generation/ -v -m integrationTarget test coverage for LLM-first components:
- LLM Generator: ≥95%
- Code Validator: ≥95%
- Prompt Builder: ≥95%
- Exemplar Loader: ≥90%
- Pattern Parser: 100% (kept)
- Provenance Generator: 100% (kept)
- Import Resolver: 98% (kept)
- Code Assembler: ≥90% (updated)
The module uses structured error handling with automatic recovery:
class GenerationError(Exception):
"""Base exception for generation errors."""
pass
class LLMGenerationError(GenerationError):
"""Failed to generate code with LLM."""
pass
class ValidationError(GenerationError):
"""Code validation failed."""
pass
class CodeAssemblyError(GenerationError):
"""Failed to assemble code."""
pass- LLM Errors: Retry with exponential backoff (max 3 attempts)
- Validation Errors: Automatic LLM fix loop (max 2 iterations)
- Rate Limits: Exponential backoff with jitter
- Timeouts: Configurable per-stage timeouts
See: TROUBLESHOOTING.md for common issues and solutions
If migrating from the old 8-stage pipeline:
- Code Changes: No changes needed - API is backward compatible
- Configuration: Set
use_llm=True(default) to use new pipeline - Environment: Add
OPENAI_API_KEYandLANGSMITH_API_KEY - Testing: Run full test suite to verify behavior
- Monitoring: Set up LangSmith dashboard for observability
# Old API still works
generator = GeneratorService()
result = await generator.generate(request)
# New API with explicit LLM flag
generator = GeneratorService(use_llm=True, api_key="sk-...")
result = await generator.generate(request)- PROMPTING_GUIDE.md - Prompt engineering and optimization strategies
- TROUBLESHOOTING.md - Common issues, debugging, and solutions
- .claude/epics/04-code-generation.md - Original Epic 4 (completed)
- .claude/epics/04.5-llm-first-generation-refactor.md - Epic 4.5 refactor details
The generation module is exposed via REST API:
# Generate component
POST /api/v1/generation/generate
{
"pattern_id": "shadcn-button",
"tokens": {...},
"requirements": [...]
}
# Response
{
"component_code": "...",
"stories_code": "...",
"files": {
"Button.tsx": "...",
"Button.stories.tsx": "..."
},
"metadata": {
"latency_ms": 18000,
"validation_attempts": 0,
"quality_score": 92
},
"validation_results": {
"valid": true,
"quality_score": 92,
"errors": []
}
}- Python: 3.11+
- FastAPI: Web framework
- Pydantic: Data validation
- LangChain: LLM integration
- LangSmith: AI observability and tracing
- OpenAI: GPT-4 for code generation
- Node.js: For TypeScript/ESLint validation and Prettier formatting
Environment variables:
OPENAI_API_KEY: Required for LLM generationLANGSMITH_API_KEY: Optional for tracing (recommended)LANGSMITH_PROJECT: Project name for traces (default: "component-forge")
# LangSmith (for tracing)
LANGSMITH_API_KEY=your_key
LANGCHAIN_TRACING_V2=true
LANGCHAIN_PROJECT=component-forge
# OpenAI (if using AI enhancement in future)
OPENAI_API_KEY=your_key- AI-powered code optimization (Epic 8)
- TypeScript compilation validation (Epic 5)
- ESLint auto-fixing (Epic 5)
- Component preview generation
- Multi-framework support (Vue, Angular)
The Polish Stream adds production-quality enhancements:
- Pattern ID and version tracking
- ISO 8601 UTC timestamps
- SHA-256 content hashes for tokens and requirements
- Warning about manual edits
- Metadata for future regeneration (Epic 8)
- Automatic import ordering (external → internal → utils → types)
- Deduplication of identical imports
- Missing import detection and addition
- Package.json dependency generation
- Alias handling (@/ for src/)
- Component-specific ARIA attributes
- Semantic HTML elements
- Keyboard navigation support
- Focus indicators
- Screen reader support
- Supports: Button, Input, Card, Checkbox, Radio, Select, Switch, Tabs, Alert, Badge
- Strict TypeScript interfaces
- Zero
anytypes - Return type annotations
- Ref forwarding types
- JSDoc comments
- Variant union types
- Utility type usage (Omit, Pick, Partial, etc.)
- CSF 3.0 format
- Meta object with component info
- ArgTypes for interactive controls
- Default story
- Variant stories for all component variants
- State stories (Disabled, Loading, Error)
- Play functions for interaction testing (buttons)
- Documentation parameters
/**
* Generated by ComponentForge
* Version: 1.0.0
* Pattern: shadcn-button
* Generated: 2024-01-15T10:30:00.000Z
* Tokens Hash: a1b2c3d4e5f6
* Requirements Hash: f6e5d4c3b2a1
*
* WARNING: This file was automatically generated.
* Manual edits may be lost when regenerating.
* Use ComponentForge to make changes instead.
*/
import * as React from "react"
import { cn } from "@/lib/utils"
interface ButtonProps {
/** Visual variant of the button */
variant?: "default" | "primary" | "secondary" | "ghost";
/** Size of the button */
size?: "sm" | "md" | "lg";
/** Whether the button is disabled */
disabled?: boolean;
className?: string;
children?: React.ReactNode;
}
const Button = React.forwardRef<HTMLButtonElement, ButtonProps>(
({ variant = "default", size = "md", disabled, className, children, ...props }, ref): React.ReactElement => {
return (
<button
ref={ref}
type="button"
disabled={disabled}
aria-disabled={disabled}
className={cn(
"inline-flex items-center justify-center rounded-md font-medium",
className
)}
{...props}
>
{children}
</button>
)
}
)
Button.displayName = "Button"
export { Button }