Ura is a compiled programming language (work in progress) that compiles to LLVM IR. The compiler is written in C and uses LLVM for code generation and optimization.
Ura is designed as a statically-typed, imperative language with a focus on simplicity and direct compilation to native code through LLVM. The language supports basic data types, control flow, functions, references, and interoperability with C libraries through prototype declarations.
.
├── src/ # Main compiler source code
│ ├── main.c # Entry point, tokenizer, parser
│ ├── gen.c # IR generation and LLVM code generation
│ ├── llvm.c # LLVM C API wrappers
│ ├── utils.c # Utility functions, memory management
│ ├── header.h # Type definitions and function declarations
│ ├── config.sh # Build system and development commands
│ ├── file.ura # Working file for development
│ ├── tests/ # Test suite organized by feature
│ │ ├── builtins/ # Built-in function tests
│ │ ├── data_types/ # Type system tests (including references)
│ │ ├── if/ # Conditional statement tests
│ │ ├── while/ # Loop tests
│ │ ├── op/ # Operator tests
│ │ └── libft/ # Standard library function implementations
│ ├── ura-lib/ # Standard library prototypes
│ │ ├── header.ura # Main library header
│ │ ├── io.ura # I/O operations
│ │ ├── string.ura # String manipulation
│ │ ├── memory.ura # Memory management
│ │ ├── math.ura # Mathematical functions
│ │ ├── stdlib.ura # Standard library functions
│ │ ├── time.ura # Time operations
│ │ └── signals.ura # Signal handling
│ └── build/ # Compiler output directory
│
├── llvm/ # LLVM experimentation and testing
│ ├── main.c # Experimental compiler implementation
│ ├── utils.c # Utility functions for experiments
│ ├── wrapper.c # LLVM wrapper functions
│ ├── header.h # Header for experimental code
│ ├── config.sh # Build configuration for experiments
│ ├── examples/ # Example programs (000-013)
│ ├── tests/ # Expected LLVM IR outputs
│ └── build/ # Build artifacts
│
├── workspaces/ # VS Code workspace configurations
├── PLAN.md # Development roadmap
├── TODO.md # Current tasks
└── README.md # This file
Ura supports the following primitive types:
int- 32-bit signed integerlong- 64-bit signed integershort- 16-bit signed integerchar- 8-bit characterchars- String (pointer to char array)bool- Boolean (TrueorFalse)float- 32-bit floating pointvoid- No return valuepointer- Generic pointer typeref- Reference type modifier
main():
a int = 10
name chars = "Hello"
flag bool = True
c char = 'x'
References allow binding to existing variables:
main():
a int = 10
b ref int = a // b is a reference to a
b = 20 // modifies a through the reference
return a // returns 20
Function declaration syntax:
fn function_name(param1 type1, param2 type2) return_type:
// function body
return value
Example:
fn add(a int, b int) int:
return a + b
main():
result int = add(5, 3)
return result
External C functions can be declared using proto:
proto fn puts(str chars) int
proto fn malloc(size int) pointer
proto fn free(ptr pointer) void
main():
puts("Hello, World!")
Functions with variable arguments:
fn printf(format chars, ...) int
main():
a int = 5
if a < 10:
return 1
elif a < 20:
return 2
else:
return 3
main():
i int = 0
while i < 10:
i += 1
return i
main():
i int = 0
while i < 10:
i += 1
if i == 5:
break
return i
+Addition-Subtraction*Multiplication/Division%Modulo
==orisEqual!=Not equal<Less than>Greater than<=Less than or equal>=Greater than or equal
andor&&Logical ANDoror||Logical ORnotor!Logical NOT
=Assignment+=Add and assign-=Subtract and assign*=Multiply and assign/=Divide and assign%=Modulo and assign
main():
str chars = "hello"
c char = str[0] // Access first character
str[0] = 'H' // Modify first character
main():
a int = 65
c char = a as char
return c
Dynamic stack allocation:
proto fn calloc(len int, size int) chars
main():
buffer chars = calloc(100, 1)
buffer[0] = 'A'
// Single line comment
/*
Multi-line
comment
*/
Import other Ura files:
use "io"
use "string"
The Ura compiler follows a multi-stage compilation process:
-
Tokenization (
tokenize()inmain.c)- Reads source file
- Handles
usestatements for imports - Produces token stream with type, value, and position information
- Tracks indentation for block structure
-
Parsing (
parse()inmain.c)- Builds Abstract Syntax Tree (AST)
- Recursive descent parser
- Operator precedence handling
- Scope management
-
IR Generation (
gen_ir()ingen.c)- Type checking
- Symbol resolution
- Semantic analysis
- Intermediate representation construction
-
Code Generation (
gen_asm()ingen.c)- LLVM IR emission
- Register allocation (handled by LLVM)
- Optimization passes
-
Output (
finalize()inutils.c)- Module verification
- LLVM IR file generation (.ll)
- Tokenizer: Lexical analysis, converts source text to tokens
- Parser: Builds AST from token stream
- Entry point: Orchestrates compilation phases
- Scope management: Tracks variables, functions, and structs per scope
- File handling: Manages
usestatements and file imports
- IR Generation: Type checking and semantic analysis
- LLVM Code Generation: Emits LLVM IR instructions
- Reference handling: Implements reference semantics
- Operator implementation: Binary and unary operations
- Control flow: If/while/break/continue code generation
- Function calls: Parameter passing and return values
- LLVM Wrappers: Thin wrappers around LLVM C API
- Builder operations: Instruction emission helpers
- Type operations: Type creation and manipulation
- Module operations: Function and global management
- Error handling: Compilation error reporting
- Debug output: Token and AST printing
- Memory management: Allocation and cleanup
- String utilities: String manipulation helpers
- Symbol tables: Variable and function lookup
- Type utilities: Type conversion and checking
- LLVM initialization: Context, module, and builder setup
- Type definitions: Token, Node, LLVM wrapper types
- Enumerations: Token types, data types
- Global variables: Compiler state
- Function declarations: All public interfaces
- Macros: Debug, allocation, error checking
Represents a lexical unit with:
- Type (keyword, identifier, literal, operator)
- Value (for literals)
- Position (file, line, column)
- Metadata (is_dec, is_ref, is_variadic, etc.)
- LLVM value reference
AST node with:
- Token reference
- Left/right children (binary operations)
- Child array (statements, function bodies)
- Symbol tables (variables, functions, structs)
The compiler maintains a scope stack (Gscoop) for:
- Variable declarations and lookups
- Function declarations and lookups
- Nested function support
- Block-level scoping
References are implemented as:
- Pointers at LLVM level
- Automatic dereferencing on read
- Binding semantics on first assignment
- Value assignment on subsequent assignments
The llvm/ directory contains experimental code and prototypes for testing LLVM features before integration into the main compiler.
- Feature exploration: Test LLVM capabilities
- Proof of concepts: Validate implementation approaches
- Learning resource: Examples of LLVM C API usage
- Regression testing: Verify LLVM IR generation
- main.c: Experimental compiler with different syntax
- utils.c: Parsing and utility functions
- wrapper.c: LLVM API wrappers
- header.h: Type definitions
- examples/: 14 example programs (000-013)
- tests/: Expected LLVM IR output for each example
- config.sh: Build and test commands
The examples/ directory contains numbered test programs:
- 000: Simple main function
- 001: While loop with compound assignment
- 002: If/elif/else conditionals
- 003: Built-in functions and array access
- 004: Functions with multiple parameters
- 005: Nested functions
- 006: Variadic functions
- 007: References
- 008: Reference parameters
- 009: Type casting
- 010: Stack allocation
- 011: Try/catch exception handling
- 012: Array bounds checking
- 013: Additional features
Each example has a corresponding .ll file in tests/ with the expected LLVM IR output.
The LLVM experimental compiler uses slightly different syntax:
defkeyword for functions instead offnendkeyword to close blocks instead of indentationprotoFuncinstead ofproto fn- Different reference semantics
- C compiler (clang recommended)
- LLVM development libraries
- llvm-config tool
- Standard build tools (make, etc.)
Source the configuration to load the build environment:
cd src
source config.shCompiles the Ura compiler from C source files.
buildCompiles: main.c, gen.c, utils.c, llvm.c with:
- Address sanitizer
- Null pointer checks
- LLVM flags from
llvm-config - Warning flags
Output: build/ura executable
Compiles file.ura to LLVM IR.
irRequirements: file.ura must exist in src/
Output: build/file.ll
Converts LLVM IR to assembly and links to executable.
asmRequirements: build/file.ll must exist (run ir first)
Steps:
- Runs
llcto generate assembly (build/file.s) - Links with clang to create executable (
build/exe.out)
Complete compilation pipeline: build + ir + asm.
compEquivalent to running build && ir && asm
Compiles and executes the program.
runEquivalent to comp && build/exe.out
Runs the test suite.
tests [folder]- Without argument: runs all tests
- With folder name: runs tests in specific category
Examples:
tests # Run all tests
tests op # Run operator tests only
tests builtins # Run built-in function testsTest process:
- Compiles each
.urafile intests/ - Generates LLVM IR
- Compares with expected
.llfile (ignoring first 2 lines) - Reports pass/fail for each test
Saves current file.ura and generated IR to test directory.
copy <folder> <filename>Example:
copy op addCreates:
tests/op/add.ura(copy offile.ura)tests/op/add.ll(generated IR)
Formats C source code using astyle.
indentFormatting rules:
- C mode
- 3-space indentation
- Pad operators and headers
- Keep one-line statements/blocks
- Convert tabs to spaces
- Max line length: 150
Reloads the configuration file.
updateUseful after modifying config.sh
- Write code in
file.ura - Test compilation:
run - If working, save to tests:
copy <category> <name> - Run test suite:
tests
- Install LLVM development libraries:
# macOS
brew install llvm
# Linux (Ubuntu/Debian)
sudo apt-get install llvm-dev
# Linux (Fedora)
sudo dnf install llvm-devel- Clone the repository:
git clone <repository-url>
cd ura-lang- Navigate to the src directory:
cd src- Source the build environment:
source config.sh- Create a program in
file.ura:
main():
a int = 42
return a
- Compile and run:
runproto fn puts(str chars) int
main():
puts("Hello, World!")
return 0
fn fib(n int) int:
if n <= 1:
return n
return fib(n - 1) + fib(n - 2)
main():
result int = fib(10)
return result
fn strlen(str chars) int:
i int = 0
while str[i] != '\0':
i += 1
return i
main():
len int = strlen("Hello")
return len
The src/ura-lib/ directory contains prototype declarations for C standard library functions, organized by category:
- io.ura: File I/O, formatted I/O (printf, scanf, etc.)
- string.ura: String manipulation (strlen, strcmp, strcpy, etc.)
- memory.ura: Memory management (malloc, free, memcpy, etc.)
- math.ura: Mathematical functions
- stdlib.ura: General utilities
- time.ura: Time and date functions
- signals.ura: Signal handling
To use standard library functions:
use "ura-lib/header"
main():
// Now you can use any standard library function
puts("Hello")
The test suite is organized by feature category:
- builtins/: Built-in functions (putchar, puts, stack, typeof)
- data_types/: Type system tests, especially references
- if/: Conditional statements
- while/: Loop constructs
- op/: All operators (arithmetic, logical, comparison)
- libft/: Standard library function implementations
Each test consists of:
.urafile: Source code.llfile: Expected LLVM IR output
Run tests with:
tests # All tests
tests op # Specific category- ✅ Basic data types (int, char, chars, bool, void)
- ✅ Variables
- ✅ Arithmetic operators
- ✅ Comparison operators
- ✅ Logical operators
- ✅ Assignment operators
- ✅ Functions with parameters and return values
- ✅ Function prototypes for C interop
- ✅ If/elif/else conditionals
- ✅ While loops
- ✅ Break and continue
- ✅ Array indexing
- ✅ Type casting
- ✅ References
- ✅ Module imports (use statement)
- ✅ Comments (single and multi-line)
- ✅ Variadic functions
- ✅ Stack allocation
- ⏳ For loops
- ⏳ Structs and methods
- ⏳ Arrays (proper declarations)
- ⏳ Global variables
- ⏳ Const/immutable variables
- ⏳ Type inference
- ⏳ Exception handling (try/catch)
- ⏳ Operator overloading
- ⏳ Generics/templates
- ⏳ Garbage collection
- ⏳ Package manager
See PLAN.md for detailed roadmap and TODO.md for current tasks.
Enable debug output by setting DEBUG flag in header.h:
#define DEBUG 1This enables:
- Token stream printing
- AST visualization
- IR generation tracing
- Update token types in
header.h(Type enum) - Add tokenization logic in
main.c(tokenize function) - Add parsing logic in
main.c(appropriate parser function) - Add IR generation in
gen.c(gen_ir function) - Add code generation in
gen.c(gen_asm function) - Add tests in
src/tests/ - Update documentation
- 3-space indentation
- K&R brace style
- Descriptive variable names
- Comments for complex logic
- Use
indentcommand to format
This is a work-in-progress educational project. Contributions, suggestions, and feedback are welcome.
- LLVM Project for the compiler infrastructure
- Inspired by various programming language implementations