Includes;
- Lexer: Tokenizes input into numbers, operators (+, -, *, /), and parentheses
- Parser: Recursive descent parser implementing operator precedence (*, / before +, -)
- Code Generator: Emits x86-64 assembly instructions directly
Tokenization: get_next_token scans characters, classifies them into token types, and handles numeric parsing with decimal-to-binary conversion.
Parsing: Three-level precedence hierarchy:
parse_expression: Handles addition/subtraction (lowest precedence)parse_term: Handles multiplication/divisionparse_factor: Handles numbers and parenthetical expressions (highest precedence)
Code Generation: Emits assembly using register rax for computation:
- Numbers:
mov rax, immediate - Addition:
add rax, operand - Multiplication:
imul rax, operand - Division:
mov rdx, 0; mov rbx, divisor; idiv rbx
Uses fixed buffers for input (256 bytes), output (2048 bytes), and token storage. Output pointer tracks current write position.
Usage:
- Compile with
nasm -f elf64 compiler.asm && ld compiler.o - Run
./a.out, input expression like2+3*4, outputs complete executable assembly - The generated code can be assembled and run independently
Warning
Handles positive integers only, basic error handling, and generates straightforward register-based code without optimization. Real compilers would add symbol tables, type checking, register allocation, and optimization passes.