Compiler Architecture
Zeus is a statically-typed, compiled programming language with automatic garbage collection. The compiler is written in Go and uses LLVM for code generation. The runtime is implemented in Zig and provides garbage collection with precise stack scanning via LLVM statepoints.
High-Level Architecture
┌─────────────────────────────────────────────────────────────────┐│ Zeus Compiler │├─────────────────────────────────────────────────────────────────┤│ ││ Source Code (.zs) ││ ↓ ││ ┌──────────────┐ ││ │ Lexer │ → Tokens ││ └──────────────┘ ││ ↓ ││ ┌──────────────┐ ││ │ Parser │ → AST (Abstract Syntax Tree) ││ └──────────────┘ ││ ↓ ││ ┌──────────────┐ ││ │ Zeus IR │ → Intermediate Representation ││ │ Generator │ ││ └──────────────┘ ││ ↓ ││ ┌──────────────┐ ││ │ Type Checker │ → Type-checked IR ││ └──────────────┘ ││ ↓ ││ ┌──────────────┐ ││ │ LLVM IR │ → LLVM Intermediate Representation ││ │ Generator │ ││ └──────────────┘ ││ ↓ ││ ┌──────────────┐ ││ │ Optimization │ → PlaceSafepoints, ││ │ Passes │ RewriteStatepointsForGC ││ └──────────────┘ ││ ↓ ││ ┌──────────────┐ ││ │ Object │ → .o files ││ │ Generator │ ││ └──────────────┘ ││ ↓ ││ ┌──────────────┐ ││ │ Linker │ → Executable ││ └──────────────┘ ││ │└─────────────────────────────────────────────────────────────────┘Compiler Phases
1. Lexical Analysis (Lexer)
Location: internal/lexer/lexer.go
Converts raw source code into a stream of tokens.
Key Features:
- Handles identifiers, keywords, operators, literals (numbers, strings, booleans)
- Supports numeric separators (e.g.,
2_00_00_000) - Multiple number bases (binary, octal, decimal, hexadecimal)
- Single-line comments (
//) - Position tracking for error reporting
Token Types (internal/token/token.go):
| Category | Tokens |
|---|---|
| Delimiters | (, ), {, }, [, ], ;, :, ,, . |
| Operators | +, -, *, /, =, !, ==, !=, >, >=, <, <= |
| Keywords | let, const, function, return, if, else, while, class, new, import, export |
| Data types | i8, i16, i32, i64, u8, u16, u32, u64, f32, f64, boolean, void, null |
| Literals | numbers, strings, identifiers |
2. Syntax Analysis (Parser)
Location: internal/parser/parser.go
Converts token stream into an Abstract Syntax Tree (AST).
Parser Design:
- Pratt Parser: Uses precedence climbing for expression parsing
- Recursive Descent: For statement parsing
- Error Recovery: Synchronization mechanism to continue parsing after errors
Parsing Strategy:
- Prefix Parselets: Handle tokens at the start of expressions (literals, unary operators, parentheses)
- Infix Parselets: Handle binary operators, function calls, member access
Operator Precedence:
| Level | Operations |
|---|---|
| 6 | Member access (.) |
| 5 | Function call, new |
| 4 | Unary (-, !), Multiplication (*), Division (/) |
| 3 | Addition (+), Subtraction (-), Comparison (<, >, <=, >=) |
| 2 | Equality (==, !=) |
| 1 | Assignment (=) |
AST Structure (internal/ast/):
Expressions (expr.go):
- Literals:
NumberExprNode,BooleanExprNode,NullExprNode - Binary:
BinaryExprNode - Unary:
UnaryExprNode - Identifiers:
IdentifierExprNode - Functions:
FunctionDeclExprNode,FunctionCallExprNode - Classes:
ClassDeclExprNode,NewExprNode,ObjectPropertyAccessExprNode - Type expressions:
TypeExpressionNode(for arrays)
Statements (stmt.go):
- Variable declarations:
VarDeclStmtNode - Control flow:
IfStmtNode,WhileStmtNode,ReturnStmtNode - Blocks:
BlockStmtNode - Module system:
ImportStmtNode,ExportStmtNode - Expression statements:
ExprStmtNode
3. Zeus IR Generation
Location: internal/ir/ir.go, internal/ir/builder.go, internal/ir/instr.go
Converts AST into a custom intermediate representation optimized for Zeus semantics.
IR Design:
- Three-Address Code: Most instructions have at most three operands
- SSA-like: Temporary variables are immutable (assigned once)
- Basic Blocks: Code organized into basic blocks with control flow edges
- Symbol Tables: Scoped symbol management for variables and functions
Instruction Types:
| Category | Instructions |
|---|---|
| Arithmetic | ADD, SUB, MUL, DIV, NEG |
| Comparison | EQ_EQ, NOT_EQ, LESS_THAN, LESS_THAN_EQ, GREATER_THAN, GREATER_THAN_EQ |
| Logical | NOT |
| Memory | LOAD, STORE, DECLARE_VAR |
| Control Flow | JMP, COND_JMP, RETURN |
| Functions | DECLARE_FUNC, CALL_FUNC, CALL_INDIRECT_FUNC |
| Classes | DECLARE_CLASS, DECLARE_CLASS_METHOD, NEW_OBJ, OBJECT_PROPERTY_ACCESS |
| Modules | IMPORT, EXPORT |
| Type Conversion | CAST |
Key Features:
- Primordial Classes: Built-in classes (Array) with runtime implementations
- Method Name Mangling:
ClassName.methodNamefor disambiguation - Scope Management: Nested scopes for blocks, functions, and classes
- Circular Dependency Detection: Prevents infinite import loops
4. Type Checking
Location: internal/ir/tc.go
Validates types and performs semantic analysis using a pass-based system.
Type Checking Passes:
1. ToKnownTypesPass:
- Resolves user-defined types to their actual types
- Converts array type syntax to object types
- Validates type usage (e.g., void only as return type)
- Resolves class and function types
2. TypeCheckingPass:
- Type Compatibility: Checks operand types for operations
- Implicit Casting:
- Integer to float
- Smaller int to larger int (with sign considerations)
- Smaller float to larger float
- Null to object types
- Function Call Validation: Parameter count and types
- Return Value Checking: Ensures all code paths return
- Class Validation: Constructor signatures, access modifiers
- Entry Point: Validates
mainfunction exists in entry modules
3. UnusedWarningPass:
- Tracks variable, function, and class usage
- Generates warnings for unused declarations
- Excludes temporary variables and system functions
5. LLVM IR Generation
Location: internal/codegen/codegen.go
Translates Zeus IR to LLVM IR for optimization and code generation.
Type Mapping:
| Zeus Type | LLVM Type |
|---|---|
i8 | i8 |
i16 | i16 |
i32 | i32 |
i64 | i64 |
f32 | float |
f64 | double |
boolean | i1 |
| Objects | Pointer to struct (address space 1 for GC) |
| Functions | Pointer to function |
Class Representation:
Each Zeus class generates three LLVM structs:
1. VTable Struct: Function pointers for virtual methods
%ClassName_vtable = type { ptr, ptr, ... }2. Object Header Struct: Metadata for GC and runtime
%ClassName_header = type { ptr, ; vtable pointer ptr, ; type info i8, ; gc offsets count [n x i8] ; gc offsets array}3. Class Struct: Actual object layout
%ClassName = type { ptr, ; object header pointer <field1_type>, ; property fields <field2_type>, ...}Memory Management:
- Allocation:
zeus_gc_alloc(size)returns GC-managed memory - Address Space 1: All GC objects use address space 1
- Object Headers: Prepended to all objects for GC tracking
6. LLVM Optimization Passes
Location: internal/zeus_compiler/compiler.go (RunOptimizationPasses)
Optimization Pipeline:
-
mem2reg:
- Promotes allocas to SSA registers
- Essential for performance
-
place-safepoints (if GC enabled):
- Inserts GC safepoint polls at function entries, loop backedges, and before potentially allocating calls
-
rewrite-statepoints-for-gc (if GC enabled):
- Transforms function calls to statepoint intrinsics
- Preserves GC root information
- Enables precise stack scanning
7. Object File Generation and Linking
Object Generation (EmitObjFiles):
- Generates temporary object files for each module
- Uses LLVM’s target machine to emit machine code
- Outputs LLVM IR files in debug mode
Linking (LinkObjFiles):
- Linker: Uses
clangon macOS,gccon Linux - Runtime: Links with Zeus runtime (
zeus-runtime.o) - System Libraries: Automatically links required libraries
- Platform-Specific:
- macOS: Sets deployment target (12.0), SDK path
- Handles different Xcode and Command Line Tools installations
Runtime System
Location: runtime/ (Zig implementation)
Garbage Collection
Type: Mark-and-sweep with precise stack scanning
Components:
1. GC Allocator (gc.zig):
pub const GC = struct { allocator: std.mem.Allocator, gc_roots: ArrayList(*ZeusObj), allocated_objects: ArrayList(AllocatedObject), alloc_mutex: std.Thread.Mutex,}2. Stack Walking (stackmap.zig):
- Uses libunwind for robust stack traversal
- Parses LLVM-generated stack maps
- Extracts GC root pointers from stack frames
3. GC Runtime (gc_runtime.zig):
zeus_gc_alloc(size): Allocates GC-tracked memoryzeus_gc_poll(): Triggers GC cycle- Walk stack and collect roots
- Register roots with GC
- Run mark-and-sweep
GC Algorithm:
Mark Phase:
- Start from GC roots (stack-scanned pointers)
- Recursively mark reachable objects
- Follow object header GC offsets for nested objects
- Handle arrays with object elements specially
Sweep Phase:
- Iterate through allocated objects
- Free unmarked objects
- Cleanup array data buffers
- Update tracking structures
Object Layout (Zeus ABI)
Object Header (runtime/abi.zig):
pub const ZeusObjectHeader = extern struct { vtable_ptr: *anyopaque, // VTable pointer type_info: *ZeusObjectTypeInfo, // Runtime type info gc_offsets_count: u8, // Number of GC-tracked fields gc_offsets: [*]u8, // Byte offsets to GC fields};Object Structure:
pub const ZeusObj = extern struct { obj_header: *ZeusObjectHeader, // Header pointer // Object fields follow};Array Implementation
Arrays in Zeus are implemented as primordial classes - first-class objects with built-in methods and type-safe operations.
Design Features:
- Dynamic: Can grow/shrink at runtime
- Type-safe: Element type tracked at compile-time and runtime
- Multi-dimensional:
u8[][],Point[][][]fully supported - Dual syntax: Bracket notation (
arr[i]) and method calls (arr.get(i)) - GC-managed: Both array object and data buffer are tracked
Memory Layout:
; Array object struct%u8[] = type { ptr addrspace(1), ; obj_header pointer (GC managed) i32, ; capacity i32, ; length ptr ; data buffer pointer}Growth Strategy: Standard doubling strategy with minimum capacity of 4 (same as C++ std::vector, Rust Vec, Java ArrayList).
Time Complexity:
push(): Amortized O(1)pop(): O(1)get(i): O(1)set(i, value): O(1) average
Module System
Import/Export Mechanism:
-
Module Resolution (
internal/module/module.go):- Resolves relative paths
- Handles standard library (
@std/...) - Circular dependency detection
-
Dependency Graph:
- BFS traversal of import tree
- Topological ordering for compilation
- Each source file maintains its own IR and symbols
-
Symbol Scoping:
- Exported symbols have module-scoped names
- Private symbols use internal linkage
- Import/export tracked in IR
Compilation Order:
Entry File (main.zs) ↓ importsModule A ↓ importsModule BCompiled in reverse order: B → A → main
Type System Details
Primitive Types
| Type | Description |
|---|---|
i8, i16, i32, i64 | Signed integers |
u8, u16, u32, u64 | Unsigned integers |
f32 | Single precision float |
f64 | Double precision float |
boolean | true/false |
void | Function return only |
null | Object default value |
Implicit Casts
- Integer to float
- Smaller to larger integers (when safe)
- Null to any object type
- Array type to object type (for built-in arrays)
Error Handling
Compile-Time Errors
Error Reporting (internal/zeus_error/, internal/logger/):
- Severity Levels: Error, Warning
- Error Tracking: Per-source-file error lists
- Pretty Printing: Shows source context with error location
Example Error Output:
error: type 'f32' is not assignable to type 'i32' --> main.zs:5:10 | 5 | let x: i32 = 3.14; | ^^^Key Files Reference
| Stage | File(s) |
|---|---|
| Lexer | internal/lexer/lexer.go |
| Parser | internal/parser/parser.go |
| AST | internal/ast/*.go |
| IR & Type Checking | internal/ir/*.go |
| Code Generation | internal/codegen/*.go |
| Compiler Orchestration | internal/zeus_compiler/compiler.go |
Runtime
| File | Purpose |
|---|---|
runtime/gc_runtime.zig | GC entry points |
runtime/gc.zig | GC implementation |
runtime/array_runtime.zig | Array operations |
runtime/abi.zig | Runtime ABI definitions |
runtime/stackmap.zig | Stack map parsing |
Platform Support
Supported:
- macOS (primary development platform)
- Linux
Requirements:
- LLVM 13+
- Zig 0.14+
- libunwind
- Clang/GCC
Architecture:
- x86_64
- ARM64 (Apple Silicon)
Future Enhancements
Planned Features:
- Generational GC: Reduce GC pause times
- Concurrency: Goroutine-style concurrency
- Generics: Parametric polymorphism
- Pattern Matching: Enhanced control flow
- Standard Library: Expanded built-ins
- Incremental Compilation: Faster builds
- JIT Compilation: LLVM ORC JIT integration
External Dependencies
Go Packages:
github.com/spf13/cobra- CLI frameworktinygo.org/x/go-llvm- LLVM Go bindings
System Libraries:
- LLVM (code generation and optimization)
- libunwind (stack unwinding)
- libc (standard C library)