A production-quality LLVM backend for the Texas Instruments TMS9900 β a unique and long-forgotten 16-bit processor that was surprisingly modern in its architecture, but famously hamstrung in the ill-fated TI-99/4A home computer. Compiles C, C++ (with STL), and Rust π¦ to native TMS9900 machine code.
- Full LLVM 20 toolchain β clang, opt, lld, llvm-objcopy, assembler, and disassembler. No external tools required.
- Heavily stress-tested β 100 Csmith random programs, CoreMark, MiniLZO, sprintf test suite, and 20 hand-written benchmarks (60/60 pass across -O0, -O1, -O2, and -Os).
- C++ with STL β freestanding libc++ headers:
vector,string,tuple,optional,unique_ptr,bitset, algorithms, and more. Lambdas, multiple inheritance, move semantics, variadic templates all working. - Rust
#![no_std]π¦ β builds withcargo +tms9900 build -Z build-std=core. See Rust Support below. - FreeRTOS port β preemptive multitasking with near-zero-cost context switches via the TMS9900's workspace pointer. See FreeRTOS.
- Hand-tuned runtime library β assembly-optimized 32-bit multiply/divide/shift, 64-bit arithmetic,
memcpy/memset/memmoveexploiting auto-increment addressing, and IEEE 754 soft-float. - picolibc math β single-precision
libm(sin, cos, sqrt, exp, log, pow, ...) built from picolibc sources, with a custom compact sinf/cosf (1.2KB vs 7KB+ standard). - GDB/LLDB debugging β the tms9900-trace emulator includes a GDB Remote Serial Protocol stub for interactive debugging with breakpoints, single-stepping, and memory inspection.
The TMS9900 (1976) was TI's flagship 16-bit microprocessor. Its most distinctive feature is the workspace pointer architecture: instead of fixed hardware registers, the CPU's 16 "registers" are just 32 bytes of RAM pointed to by the Workspace Pointer (WP). A context switch is a single instruction (BLWP) that atomically swaps all 16 registers by changing one pointer β no pushing, no popping, no saving.
The chip also has hardware multiply (16Γ16β32) and divide (32Γ·16β16), auto-increment addressing, and a bit-addressable I/O bus (CRU). It was genuinely ahead of its time.
For full details, see the TMS 9900 Microprocessor Data Manual (1976).
git clone git@github.com:apullin/llvm-tms9900.git
cd llvm-tms9900
git submodule update --init --recursive
cd llvm-project
mkdir build && cd build
cmake -G Ninja \
-DLLVM_TARGETS_TO_BUILD="TMS9900" \
-DLLVM_ENABLE_PROJECTS="clang;lld" \
-DCMAKE_BUILD_TYPE=Release \
../llvm
ninja# Compile C to object file
clang --target=tms9900 -O2 -c main.c -o main.o
# Assemble startup code
clang --target=tms9900 -c startup.S -o startup.o
# Link with LLD
ld.lld -T linker.ld startup.o main.o -o program.elf
# Extract raw binary
llvm-objcopy -O binary program.elf program.bintms9900-trace -l 0x0000 program.bin -n 100000 -Sllvm-tms9900/
βββ llvm-project/ # LLVM 18 fork (submodule, branch: tms9900)
βββ libtms9900/
β βββ builtins/ # Runtime intrinsics (32/64-bit ops, memcpy, soft-float)
β βββ libm/ # Math library (picolibc-based, float32)
β βββ picolibc/ # Embedded C library source
βββ FreeRTOS/ # FreeRTOS port with TMS9900 workspace-based context switch
βββ tests/
β βββ benchmarks/ # 20 benchmarks (14 C + 6 C++), 60/60 pass
β βββ stress/ # Csmith, CoreMark, MiniLZO, sprintf
βββ cart_example/ # TI-99/4A cartridge demos (Game of Life, etc.)
βββ llvm2xas99.py # LLVM asm β xas99 format converter (legacy)
βββ PROJECT_JOURNAL.md # Detailed development log
The backend produces ELF object files directly β no external assembler needed:
.c / .S βββΆ clang -c βββΆ .o (ELF) βββΆ ld.lld βββΆ llvm-objcopy βββΆ .bin
All standard LLVM tools work: opt for IR optimization, llc for code generation, llvm-objdump for disassembly, llvm-size for section sizes.
Full C11 support in freestanding mode (-ffreestanding). All integer sizes through 64-bit, IEEE 754 soft-float, switch via jump tables, inline assembly, interrupt handlers, naked functions.
Freestanding C++ with header-only libc++ from the LLVM tree:
clang --target=tms9900 -O2 -ffreestanding -fno-exceptions -fno-rtti \
-fno-threadsafe-statics -nostdinc++ \
-isystem libcxx_config -isystem llvm-project/libcxx/include \
-c program.cpp -o program.oWorking STL containers and utilities: vector, string, pair, tuple, optional, string_view, unique_ptr, initializer_list, bitset, numeric_limits, array, and algorithms (find, count, reverse, min, max, sort).
Working language features: lambdas (all capture modes), multiple inheritance with virtual dispatch, move semantics, perfect forwarding, variadic templates, structured bindings, constexpr, enum class, static local init.
A minimal C++ runtime (cxxrt.cpp) provides operator new/delete and __cxa_* stubs. See tests/benchmarks/ for examples.
Rust #![no_std] cross-compilation is supported via a custom Rust 1.81.0 build. π¦
Setup (one-time):
- Clone Rust 1.81.0 into a separate directory and apply TMS9900 patches (calling convention, LLVM component registration)
- Build stage 1:
python3 x.py build --stage 1 library - Register toolchain:
rustup toolchain link tms9900 build/<host>/stage1
Build a Rust program:
cargo +tms9900 build -Z build-std=core --target tms9900-unknown-none.jsonSee the rust-tms9900 repository for the patched Rust fork, target spec, and examples.
Note: core::arch::asm! is not available for custom targets. Use extern "C" calls to assembly files instead.
The FreeRTOS/ directory contains a full FreeRTOS port that exploits the TMS9900's workspace pointer for near-zero-cost context switching.
On most CPUs, a context switch means saving and restoring all registers β typically dozens of push/pop instructions. On the TMS9900, each task owns a permanent 32-byte workspace in RAM. Switching tasks means changing a single 16-bit pointer (WP). The entire register file swaps atomically β no copying, no stack gymnastics.
The FreeRTOS port stores each task's workspace at the top of its stack allocation. On context switch, only 4 words are saved (WP, PC, ST, critical nesting counter), then the new task's WP is loaded and execution resumes with all 16 registers already in place.
Basic two-task test (main.c): Two equal-priority tasks increment separate memory markers under preemptive time-slicing. Verifies that the scheduler correctly alternates between tasks.
Queue + event group rendezvous (main_queue.c): A port of an MSPM0 Rust demo. A manager task generates random work items and pushes them to a queue; three worker tasks consume items and synchronize at a 4-way event-group rendezvous barrier. Exercises queues, event groups, multiple priority levels, and tick-based delays.
cd FreeRTOS
make run # basic two-task test
make run-queue # queue + rendezvous testDevelopment and testing uses tms9900-trace, a standalone TMS9900 CPU emulator. It was invaluable during compiler development β every backend bug was caught and diagnosed using its cycle-accurate execution traces, memory dumps, and JSON output mode.
The emulator includes a full GDB Remote Serial Protocol (RSP) stub. Connect with GDB or LLDB for interactive debugging:
# Start emulator in GDB server mode
tms9900-trace -l 0x0000 program.bin --gdb
# In another terminal
lldb
(lldb) gdb-remote localhost:1234
(lldb) register read
(lldb) breakpoint set -a 0x0100
(lldb) continueSupports breakpoints, single-stepping, register and memory inspection, and async break (Ctrl-C).
Lightweight PC-hit profiling without full instruction trace overhead:
tms9900-trace -l 0x0000 program.bin --tracepoint 0x0100 --tracepoint 0x0200 -n 500000 -SReports hit counts, first/last cycle timestamps β useful for measuring function call frequency and loop iteration counts.
DWARF emission is not yet implemented. ELF binaries contain symbol tables (function names, globals) which are sufficient for the GDB stub's breakpoint and symbol lookup features. Full source-level debugging with line numbers and variable inspection is a future goal.
Hand-coded TMS9900 assembly for performance-critical operations:
| Category | Functions |
|---|---|
| 32-bit integer | __mulsi3, __divsi3, __udivsi3, __modsi3, __umodsi3 |
| 32-bit shifts | __ashlsi3, __lshrsi3, __ashrsi3 |
| 64-bit integer | __muldi3, __divdi3, __udivdi3, __moddi3, __umoddi3, __udivmoddi4 |
| Memory | memcpy, memset, memmove (auto-increment optimized) |
| Soft-float | __addsf3, __subsf3, __mulsf3, __divsf3, comparisons, conversions |
The 32-bit division includes a hardware DIV fast path for 16-bit divisors. Memory routines exploit the TMS9900's *R+ auto-increment addressing mode.
Single-precision math built from picolibc sources (~21KB at -Os):
sinf, cosf, tanf, asinf, acosf, atanf, atan2f, sinhf, coshf, tanhf, expf, logf, log10f, log2f, powf, sqrtf, fabsf, fmodf, ceilf, floorf, roundf, scalbnf, copysignf, ldexpf, frexpf
Custom compact sinf/cosf implementation: 1.2KB combined (vs 7KB+ from stock picolibc).
| Register | Usage |
|---|---|
| R0 | Return value (16-bit), or R0:R1 (32-bit high:low) |
| R1βR9 | Arguments (first 9 words), caller-saved |
| R10 | Stack Pointer |
| R11 | Link Register (return address) |
| R12 | CRU Base / general purpose |
| R13βR15 | Callee-saved |
Arguments: R1βR9, then stack. Stack grows downward, 4-byte aligned. 32-bit arguments use register pairs (R1:R2, R3:R4, ...).
20 programs (14 C + 6 C++), all passing at every optimization level:
| Benchmark | Description | -O0 | -O1 | -O2 | -O0 | -O1 | -O2 |
|---|---|---|---|---|---|---|---|
| Code | Code | Code | Cycles | Cycles | Cycles | ||
| fib | Fibonacci | 184 | 108 | 282 | 7.1K | 2.2K | 1.5K |
| bubble_sort | Array sort | 440 | 194 | 392 | 117K | 26.9K | 21.6K |
| deep_recursion | Tail-call recursion | 218 | 108 | 144 | 45.6K | 6.3K | 766 |
| crc32 | CRC-32 hash | 514 | 220 | 418 | 204K | 111K | 77.5K |
| q7_8_matmul | Fixed-point matrix multiply | 392 | 134 | 134 | 9.9K | 512 | 512 |
| json_parse | JSON tokenizer | 2.7K | 1.0K | 978 | 115K | 22.5K | 20.4K |
| string_torture | String operations | 740 | 500 | 502 | 34.0K | 12.3K | 11.5K |
| float_torture | IEEE 754 soft-float | 11.4K | 8.6K | 8.6K | 48.0K | 1.7K | 1.7K |
| bitops_torture | popcount, bswap, clz, ctz | 3.3K | 2.1K | 2.1K | 72.5K | 27.1K | 27.1K |
| vertex3d | 3D vertex transforms | 1.1K | 424 | 852 | 95.0K | 57.2K | 56.0K |
| huffman | Huffman codec | 3.6K | 1.4K | 1.3K | 1.86M | 554K | 523K |
| long_torture | 30 tests of 32-bit ops | 3.3K | 1.8K | 1.8K | 41.2K | 19.8K | 19.8K |
| heap4 | FreeRTOS-style allocator | 4.2K | 1.8K | 2.1K | 401K | 125K | 121K |
| i64_torture | 64-bit arithmetic | 9.1K | 7.1K | 7.1K | 374K | 349K | 349K |
| cpp_test | Ctors, vtables, templates | 2.1K | 510 | 510 | 15.5K | 3.0K | 3.0K |
| lambda_test | Lambda expressions | 18.9K | 3.1K | 2.9K | 66.5K | 5.9K | 4.0K |
| mi_test | Multiple inheritance | 7.5K | 1.9K | 1.7K | 63.7K | 9.8K | 9.6K |
| cpp_adv_test | Move, forwarding, variadic | 4.8K | 1.2K | 1.1K | 92.4K | 16.9K | 17.8K |
| stl_test | vector, string | 54.1K | 16.5K | 15.4K | 716K | 34.2K | 30.8K |
| stl_util_test | tuple, optional, unique_ptr | 12.0K | 1.6K | 1.5K | 188K | 10.5K | 9.1K |
Code sizes in bytes. Cycle counts from tms9900-trace emulator.
| Test | Result | Notes |
|---|---|---|
| Csmith (100 programs) | 100/100 self-consistent | O0=O1=O2 where all complete |
| CoreMark | O0βOs | ~1.3M cycles at O2 |
| MiniLZO | O1βOs | Lossless compression + decompression round-trip |
| sprintf (35 tests) | 35/35 at O0βO2 | Variadic functions, format strings |
98 tests (82 CodeGen + 16 MC), all passing.
The integrated assembler supports both LLVM and xas99 syntax:
| Feature | LLVM Style | xas99 Style |
|---|---|---|
| Labels | LABEL: |
LABEL (no colon) |
| Hex | 0x1234 |
>1234 |
| Comments | #, //, ; |
; |
xas99 directives: DATA, BYTE, TEXT, BSS, DEF, REF, EQU, END.
CRU bit-addressing: SBO, SBZ, TB with symbolic offsets.
The cart_example/ directory has several demos:
- life2x.c β Game of Life with 2Γ pixels, keyboard controls, dirty-tile tracing
- ball.c / ball2.c β Bouncing ball demos
- banner.c β Text banner
cd cart_example
make life2x.img # EA5 executable
make banner.rpk # MAME cartridge- Workspace-aware compilation β The TMS9900's workspace pointer architecture means a context switch (
BLWP) swaps all 16 registers atomically by changing one pointer. A workspace-aware compiler could allocate "hot" variables directly in the workspace for ISR-heavy or coroutine-style code, avoiding save/restore overhead entirely. This is architecturally unique to the TMS9900 family and has no equivalent in modern compiler backends. - Migrate to LLVM latest β The backend currently targets LLVM 18. Porting to the LLVM main branch would unlock newer optimization passes and keep pace with upstream improvements.
- Full DWARF debug info β Source-level debugging with line numbers and variable inspection.
- CRU-aware register allocation β Reserve R12 only when CRU I/O instructions are used (currently opt-in via
-mattr=+reserve-cru).
- TMS 9900 Microprocessor Data Manual (1976) β the definitive reference
- tms9900-trace β TMS9900 CPU emulator with GDB stub
- xdt99 β Cross-development tools for TI-99/4A
- PROJECT_JOURNAL.md β Detailed development log
Apache 2.0 with LLVM Exceptions (follows upstream LLVM licensing).
