This document provides a high-level overview of the Ruff programming language's architecture, design decisions, and internal structure.
Last Updated: January 26, 2026
Version: v0.8.0 (v0.9.0 modularization in progress)
- Overview
- Architecture Diagram
- Data Flow
- Core Components
- Interpreter Module Structure
- Execution Models
- Memory Management
- Concurrency Model
- Design Decisions
- Future Architecture
Ruff is a dynamically-typed, interpreted programming language implemented in Rust. It features:
- Tree-walking interpreter (current primary execution path)
- Bytecode VM (experimental, not yet default)
- Async/await with promise-based concurrency
- Generators and iterators
- Lexical scoping with closures
- Pattern matching and destructuring
- Result/Option types for error handling
The interpreter executes programs by traversing an Abstract Syntax Tree (AST) constructed from the source code.
┌─────────────────────────────────────────────────────────────┐
│ Ruff Program (.ruff) │
└─────────────────────┬───────────────────────────────────────┘
│
▼
┌────────────────┐
│ Lexer │ ──────────► Tokens
│ (lexer.rs) │
└────────┬───────┘
│
▼
┌────────────────┐
│ Parser │ ──────────► AST (Stmt, Expr)
│ (parser.rs) │
└────────┬───────┘
│
┌─────┴──────┐
│ │
▼ ▼
┌────────────┐ ┌──────────────┐
│Type Checker│ │ Compiler │
│(optional) │ │(experimental)│
└────────────┘ └──────┬───────┘
│
▼
┌─────────────┐
│ Bytecode │
│ (vm.rs) │
└─────────────┘
┌─────────────────────┐
│ Interpreter │ ◄─── Current Default
│ (interpreter/mod.rs)│
└─────────┬───────────┘
│
▼
┌────────────────┐
│ Execution │
│ (Value) │
└────────────────┘
-
Lexical Analysis (Lexer):
- Input: Raw source code string
- Output: Token stream
- File:
src/lexer.rs - Process: Character-by-character scanning, keyword recognition, operator identification
-
Parsing (Parser):
- Input: Token stream
- Output: Abstract Syntax Tree (AST)
- Files:
src/parser.rs,src/ast.rs - Process: Recursive descent parsing with precedence climbing for operators
- AST Nodes:
Stmt(statements),Expr(expressions),Pattern(destructuring)
-
Optional Type Checking:
- Input: AST
- Output: Type errors or validated AST
- File:
src/type_checker.rs - Status: Optional, not enforced by default
- Note: Type checker and runtime function registry are separate
-
Interpretation (Tree-Walking):
- Input: AST
- Output: Program execution / side effects
- Files:
src/interpreter/mod.rs(14,802 lines, being modularized) - Process:
eval_stmt()- Execute statementseval_expr()- Evaluate expressions to Values- Environment management for scoping
-
Alternative: Bytecode Compilation (Experimental):
- Input: AST
- Output: Bytecode instructions
- Files:
src/compiler.rs,src/bytecode.rs,src/vm.rs - Status: Implemented but not default execution path (marked with
#[allow(dead_code)]) - Future: Will become primary execution path in v0.9.0+
Responsibility: Convert source code into tokens
Key Features:
- Character-by-character scanning
- Multi-character operator recognition (
:=,==,..., etc.) - String interpolation support
- Line number tracking for error reporting
Important:
- Keywords like
Ok,Err,Some,Noneare identifiers, NOT keywords - This allows them to be used in pattern matching
- Reserved keywords (if, else, func, etc.) cannot be function names
Responsibility: Build AST from token stream
Techniques:
- Recursive descent parsing
- Pratt parser for expression precedence
- Lookahead for method call vs field access disambiguation
AST Structure (src/ast.rs):
pub enum Stmt {
Let { pattern: Pattern, value: Box<Expr> },
Assignment { target: Box<Expr>, value: Box<Expr> },
FuncDef { name: String, params: Vec<String>, body: Vec<Stmt>, is_generator: bool },
Return(Box<Expr>),
If { condition: Box<Expr>, then_block: Vec<Stmt>, else_block: Option<Vec<Stmt>> },
While { condition: Box<Expr>, body: Vec<Stmt> },
For { var: String, iterable: Box<Expr>, body: Vec<Stmt> },
Match { value: Box<Expr>, cases: Vec<(Pattern, Vec<Stmt>)> },
// ... more variants
}
pub enum Expr {
Number(f64),
Str(String),
Identifier(String),
Binary { op: BinaryOp, left: Box<Expr>, right: Box<Expr> },
Call { func: Box<Expr>, args: Vec<Expr> },
MethodCall { object: Box<Expr>, method: String, args: Vec<Expr> },
// ... more variants
}Responsibility: Execute AST and manage runtime state
Current Structure (Being Modularized in v0.9.0):
- 14,802 lines in single file (too large)
- Contains: Value enum, Environment, Interpreter, builtins, test runner
- Target: Split into focused modules (~500-2000 lines each)
Key Components:
pub enum Value {
Int(i64),
Float(f64),
Str(String),
Bool(bool),
Null,
Array(Vec<Value>),
Dict(HashMap<String, Value>),
Function(Vec<String>, LeakyFunctionBody, Option<Arc<Mutex<Environment>>>),
AsyncFunction(...),
Generator {...},
Iterator {...},
Promise {...},
Result { is_ok: bool, value: Box<Value> },
Option { is_some: bool, value: Box<Value> },
// ... 30+ variants total
}pub struct Environment {
scopes: Vec<HashMap<String, Value>>,
}- Stack-based lexical scoping
push_scope()/pop_scope()for blocksget()searches from inner to outer scopesset()updates existing variable or creates in current scope
pub struct Interpreter {
pub env: Environment,
return_value: Option<Value>,
// ... builtin functions, async runtime, module loader, etc.
}Execution Methods:
eval_stmt(&mut self, stmt: &Stmt)- Execute a statementeval_expr(&mut self, expr: &Expr) -> Value- Evaluate expressionregister_builtins(&mut self)- Register native functions (~600 lines)
Responsibility: Provide standard library functions
Categories:
- I/O:
read_file,write_file,http_get,http_post - Collections:
map,filter,reduce,zip,chunk - Strings:
split,join,trim,replace,regex_match - Math:
sqrt,pow,abs,floor,ceil - System:
env,args,exec,spawn_process - Database:
db_connect,db_query,db_transaction - Async:
sleep,timeout,race - Testing:
assert_equal,assert_true,run_tests
Integration: Called via Value::NativeFunction(name) with dispatch in interpreter
Responsibility: Load and manage imported modules
Features:
import "path/to/module.ruff"statement- Caches loaded modules to prevent re-execution
- Module exports tracked in environment
- Circular import detection
src/
├── interpreter.rs (14,802 lines - MONOLITHIC)
├── lexer.rs
├── parser.rs
├── ast.rs
├── builtins.rs
├── compiler.rs (experimental)
├── bytecode.rs (experimental)
└── vm.rs (experimental)
src/interpreter/
├── mod.rs (~14,071 lines) - Core Interpreter + call_native_function_impl + register_builtins
├── value.rs (~497 lines) - Value enum with 30+ variants, DB types
├── environment.rs (~109 lines) - Environment struct with lexical scoping
├── control_flow.rs (~22 lines) - ControlFlow enum for break/continue
├── test_runner.rs (~230 lines) - TestRunner, TestCase, TestResult, TestReport
└── legacy_full.rs (~14,754 lines) - Backup of original monolithic file
Progress Summary:
- ✅ Phase 1 (Jan 26): Extracted Value enum (500 lines) and Environment struct (110 lines) → -517 lines
- ✅ Phase 2 (Jan 26): Extracted ControlFlow enum (22 lines) and test framework (230 lines) → -214 lines
- ✅ Total reduction: 14,802 → 14,071 lines (-731 lines, ~5% reduction)
- ✅ Zero compilation errors, minimal warnings
- ✅ All functionality preserved, tests passing
Key Design Constraints:
- The 5,700-line
call_native_function_implmethod must remain in mod.rs due to Rust's requirement that methods with&mut selfstay in the same impl context - The 564-line
register_builtinsmethod remains in mod.rs as it directly mutatesself.env - These methods are well-organized with category comments and are not extractable without significant architectural refactoring
Benefits Achieved:
- Easier navigation with IDE "Go to File"
- Parallel compilation of modules
- Clear separation of concerns
- Easier code review (smaller diffs)
- Reduced mental overhead per file
- Improved onboarding for new contributors
Status: ✅ Phase 2 Complete - Further modularization requires architectural changes to interpreter design
How It Works:
- Parse source to AST
- Recursively traverse AST nodes
- Execute statements, evaluate expressions
- Return values bubble up the tree
Pros:
- Simple to implement and debug
- Direct mapping from source to execution
- Easy to add new features
Cons:
- Slow: 100-500x slower than compiled languages
- Repeated AST traversal overhead
- No optimization passes
Performance: Acceptable for scripts, not production workloads
How It Works:
- Parse source to AST
- Compile AST to bytecode instructions
- VM executes bytecode with stack machine
- Optimizations during compilation
Pros:
- 10-50x faster than tree-walking
- Optimization opportunities (constant folding, dead code elimination)
- Smaller memory footprint (no AST at runtime)
Cons:
- More complex implementation
- Harder to debug
- Startup overhead for compilation
Status:
- Compiler, bytecode, and VM exist (
src/compiler.rs,src/bytecode.rs,src/vm.rs) - Marked with
#[allow(dead_code)]- not actively used - Roadmap: Complete in v0.9.0 Task #28
Planned Features (v0.9.0+):
- Detect hot code paths at runtime
- Compile bytecode to native machine code (Cranelift or LLVM)
- Type specialization for common types
- Expected: 100-500x faster than tree-walking, competitive with Go
Ruff uses Rust's ownership for memory safety:
- Values are
Clone- copied when needed - Environments wrapped in
Arc<Mutex<>>for shared access - Function closures capture environments via
Arc<Mutex<Environment>>
let x := [1, 2, 3] // Array allocated
let y := x // Array cloned (deep copy)
// x and y are independentImportant: Ruff does NOT use reference counting for user-visible references. Values are copied on assignment.
Problem:
- Function bodies are
Vec<Stmt> - Statements contain nested
Vec<Stmt>(in loops, conditionals) - Recursive Drop causes stack overflow on deeply nested ASTs
Current Workaround:
pub struct LeakyFunctionBody(ManuallyDrop<Arc<Vec<Stmt>>>);- Intentionally leaks memory to avoid stack overflow
- Memory reclaimed by OS at program exit
- Not ideal but prevents crashes
Future Fix (Roadmap Task #29):
- Implement iterative Drop traversal
- Or use arena allocation
- Or flatten statement structures
Change: Replaced Rc<RefCell<>> with Arc<Mutex<>>
Reason: Enable thread-safe async/await
Impact:
- All environments, channels, connections now thread-safe
- Enables true concurrency (not just cooperative multitasking)
- Required for async functions running in separate threads
Architecture:
async func fetch_data(id) {
// Runs in separate thread
let result := http_get("/api/data/${id}")
return result
}
let promise := fetch_data(42) // Returns immediately
let data := await promise // Blocks until completeImplementation:
- Async functions return
Value::Promise - Promise wraps
mpsc::Receiver<Result<Value, String>> - Async body executes in new thread via
std::thread::spawn awaitblocks on receiver until result available
Thread Safety: All shared state uses Arc<Mutex<>>
spawn {
// Runs in separate thread
process_large_file("data.txt")
}
Implementation: Creates detached thread, no return value
let (tx, rx) := channel()
spawn {
tx.send("Hello from thread!")
}
let msg := rx.recv() // Blocks until message
Implementation: Wraps std::sync::mpsc::channel()
func* numbers(n) {
for i in range(n) {
yield i
}
}
for num in numbers(5) {
print(num)
}
Implementation:
Value::Generatorstores: params, body, environment, program counteryieldreturns value and saves state- Next call resumes from program counter
- Status: Syntax parses, execution needs refinement
Decision: Implement tree-walking interpreter before bytecode VM
Rationale:
- Faster to implement (get language working sooner)
- Easier to debug and add features
- Establish language semantics before optimizing
- Prove language viability before investing in performance
Trade-off: Slow execution, but that's acceptable for v0.x
Decision: No compile-time type checking by default
Rationale:
- Simpler for scripting and prototyping
- Faster iteration during development
- Python/Ruby/JavaScript model familiar to users
Future: Optional type annotations (v0.10.0+) for performance and tooling
Decision: Inner scopes can access outer scope variables
Rationale:
- Matches JavaScript, Python behavior
- Enables closures naturally
- Intuitive for most programmers
Implementation: Environment scope stack
Decision: Accept memory leak to prevent stack overflow
Rationale:
- Stack overflow is worse than leak (crashes vs leaks at exit)
- Temporary workaround until proper fix (Task #29)
- OS reclaims memory at program exit anyway
Future: Eliminate with iterative drop or arena allocation
Decision: Thread-per-async-function model, not async runtime like Tokio
Rationale:
- Simpler implementation (std::thread)
- True parallelism on multi-core
- No need for executor, no event loop complexity
Trade-off: More memory per task, but acceptable for Ruff's use cases
Goals:
- Split interpreter into focused modules
- Make bytecode VM the default execution path
- Improve error messages with source locations
- Comprehensive architecture documentation
Timeline: Q2 2026 (2-3 months)
Goals:
- Add type annotation syntax
- Optional runtime type checking
- JIT compilation for typed code (Cranelift or LLVM)
- Performance competitive with Go
Timeline: TBD (exploratory)
Success Criteria:
- Performance within 2-5x of Go
- Comprehensive standard library
- Excellent error messages
- Professional tooling (LSP, formatter, linter)
- Battle-tested in real applications
See CONTRIBUTING.md for:
- How to add a new built-in function
- How to add a new language feature
- Code style guidelines
- Testing requirements
- ROADMAP.md - Planned features and timeline
- CHANGELOG.md - Version history
- notes/GOTCHAS.md - Common pitfalls and lessons learned
Questions? Open an issue on GitHub or reach out to the maintainers.