Helix is a (very much work-in-progress!) poc of a semantic Just-In-Time (JIT) compiler for AI workloads. It aims to optimize Python code at runtime by offloading "hot loops" to a distributed cluster of Large Language Models (LLMs), synthesizing hardware-accelerated kernels (Apple Metal/MLX), and hot-swapping them without stopping execution. Moreover, evolutionary strategies will be used to select from candidate kernels.
Deep Learning researchers often write complex scalar loops in Python (e.g., custom PDE solvers, RL environments) that are too dynamic for torch.compile or XLA. Converting these to custom CUDA/Metal kernels takes hours of engineering time.
Helix introduces a Neuro-Evolutionary Compilation engine:
- Trace: Detects hotspots in live Python code.
- Dispatch: Offloads the AST to a background Ray cluster (non-blocking).
- Evolve: Spawns multiple "Compilation Strategies" in parallel (e.g., Aggressive Vectorization vs. Conservative Stability) driven by LLMs (Gemini/GPT-4).
- Race: The first strategy to generate valid, compilable code "wins" the race.
- Hot-Swap: The Python bytecode is replaced by the compiled Metal kernel in real-time.
On a standard 2D Diffusion PDE solver (512x512 grid):
- Python Baseline: 0.177s / iter
- Helix (Metal): 0.0004s / iter
- Speedup: ~440x
graph TD
User[User Code] -- "1. Triggers JIT" --> Decorator[Helix Decorator]
Decorator -- "2. Async Task" --> Ray[Ray Cluster]
subgraph "Distributed Optimization"
Ray --> Worker1[Worker: Aggressive]
Ray --> Worker2[Worker: Balanced]
Ray --> Worker3[Worker: Stable]
Worker1 -- "3. Prompt" --> LLM[Google Gemini]
Worker2 -- "3. Prompt" --> LLM
Worker3 -- "3. Prompt" --> LLM
LLM -- "4. Kernel Code" --> Worker1
end
Worker1 -- "5. Winner Found" --> Decorator
Decorator -- "6. Hot-Swap Ptr" --> User
| Backend | Status | Technology |
|---|---|---|
| Apple Silicon | ✅ Active | MLX / Metal |
| Nvidia GPU | 🚧 Planned | OpenAI Triton |
| CPU | 🚧 Planned | NumPy / C++ |