Radiant Log #010
The semantic analyzer or “type checker” written in R’ is now complete, insofar as it can validate itself and all of the R’ code we have without error. The Radiance compiler driver is now able to parse itself and validate that the language semantics are fully upheld.
> emulator -run bin/radiance.bin \
-pkg std \
-mod lib/std.rad \
-mod lib/std/fmt.rad \
-mod lib/std/mem.rad \
-mod lib/std/vec.rad \
-mod lib/std/sys.rad \
-mod lib/std/io.rad \
-mod lib/std/lang.rad
...
radiance: std: parsing ( lib/std.rad )
radiance: std: parsing ( lib/std/fmt.rad )
radiance: std: parsing ( lib/std/intrinsics.rad )
radiance: std: parsing ( lib/std/mem.rad )
radiance: std: parsing ( lib/std/vec.rad )
radiance: std: parsing ( lib/std/debug.rad )
radiance: std: parsing ( lib/std/sys.rad )
radiance: std: parsing ( lib/std/io.rad )
radiance: std: parsing ( lib/std/sys/unix.rad )
radiance: std: parsing ( lib/std/lang.rad )
...
radiance: std: running semantic analysis..
radiance: std: ok
Besides all the obvious things a semantic analyzer is supposed to check, we also have the following:
- Module system -
mod,use,superand visibility (pub) checking - Type inference - infers types from initializers, where it can
- Type coercion - automatic
T→?Tlifting, numeric widening - Numeric overflow detection - validates literals fit target type
- Array/slice indexing - static bounds checking for constant values
- Match exhaustiveness - unions require all variants to be handled
let-elsedivergence - else branch must not fall through- Missing return detection - non-
voidfunctions must return - Missing
trydetection - must usetryon fallible calls - Mutability checking - can’t assign to immutable bindings
- Opaque pointer type - no dereference, arithmetic, or value creation
- Built-in intrinsics -
@sizeOfand@alignOfcomputed statically
The analyzer sits at around 4K lines of code with another 4K lines of test code
covering all the functionality over 226 tests. It takes a ModuleGraph as input
and spits out diagnostics.
/// Global analyzer state.
pub record Analyzer {
/// Current scope.
scope: *mut Scope,
/// Package scope containing package roots and top-level symbols.
pkgScope: *mut Scope,
/// Nesting depth of active loop constructs.
loopDepth: u32,
/// Signature of the function currently being analyzed.
currentFn: ?*FnType,
/// Current module being analyzed.
currentMod: u16,
/// Configuration for semantic analysis.
config: Config,
/// Arena of symbols allocated during analysis.
symbols: SymbolArena,
/// Arena of scopes.
scopes: ScopeArena,
/// Combined semantic metadata table.
nodeSemant: NodeSemantTable,
/// Known types.
types: Types,
/// Diagnostics recorded so far.
errors: ErrorList,
/// Module graph for this package.
moduleGraph: *module::ModuleGraph,
}
At the moment, the analyzer traverses the AST in multiple passes to first bind names, then check types and finally validate function bodies. This avoids the need for forward declarations, and allows circular dependencies within a compilation unit, called a package. Each package consists of multiple modules, which directly map to source files.
Here’s the full module graph for the current standard library (std), which
includes the R’ compiler.
std {id: 0, state: Parsed, file: "lib/std.rad", path: std, children: 8, ast: true}
fmt {id: 1, state: Parsed, file: "lib/std/fmt.rad", path: std::fmt, children: 0, ast: true}
intrinsics {id: 2, state: Parsed, file: "lib/std/intrinsics.rad", path: std::intrinsics, children: 0, ast: true}
mem {id: 3, state: Parsed, file: "lib/std/mem.rad", path: std::mem, children: 0, ast: true}
vec {id: 4, state: Parsed, file: "lib/std/vec.rad", path: std::vec, children: 0, ast: true}
debug {id: 5, state: Parsed, file: "lib/std/debug.rad", path: std::debug, children: 0, ast: true}
sys {id: 6, state: Parsed, file: "lib/std/sys.rad", path: std::sys, children: 1, ast: true}
unix {id: 8, state: Parsed, file: "lib/std/sys/unix.rad", path: std::sys::unix, children: 0, ast: true}
io {id: 7, state: Parsed, file: "lib/std/io.rad", path: std::io, children: 0, ast: true}
lang {id: 9, state: Parsed, file: "lib/std/lang.rad", path: std::lang, children: 7, ast: true}
strings {id: 10, state: Parsed, file: "lib/std/lang/strings.rad", path: std::lang::strings, children: 0, ast: true}
ast {id: 11, state: Parsed, file: "lib/std/lang/ast.rad", path: std::lang::ast, children: 1, ast: true}
printer {id: 12, state: Parsed, file: "lib/std/lang/ast/printer.rad", path: std::lang::ast::printer, children: 0, ast: true}
scanner {id: 13, state: Parsed, file: "lib/std/lang/scanner.rad", path: std::lang::scanner, children: 0, ast: true}
parser {id: 14, state: Parsed, file: "lib/std/lang/parser.rad", path: std::lang::parser, children: 0, ast: true}
semant {id: 15, state: Parsed, file: "lib/std/lang/semant.rad", path: std::lang::semant, children: 1, ast: true}
printer {id: 16, state: Parsed, file: "lib/std/lang/semant/printer.rad", path: std::lang::semant::printer, children: 0, ast: true}
module {id: 17, state: Parsed, file: "lib/std/lang/module.rad", path: std::lang::module, children: 1, ast: true}
printer {id: 18, state: Parsed, file: "lib/std/lang/module/printer.rad", path: std::lang::module::printer, children: 0, ast: true}
package {id: 19, state: Parsed, file: "lib/std/lang/package.rad", path: std::lang::package, children: 0, ast: true}
The analyzer traverses the module graph and assigns types to all nodes without mutating the AST. This makes it easier to parallelize type checking in the future.
/// Describes a type computed during semantic analysis.
union Type {
/// A type that couldn't be decided.
Invalid,
/// Types only used during inference.
Nil, Undefined, Int,
/// Primitive types.
Void, Opaque, Never, Bool,
/// Integer types.
U8, U16, U32, I8, I16, I32,
/// Range types, eg. `start..end`.
Range {
start: ?*Type,
end: ?*Type,
},
/// Eg. `*T`.
Pointer {
target: *Type,
mutable: bool,
},
/// Eg. `*[i32]`.
Slice {
item: *Type,
mutable: bool,
},
/// Eg. `[i32; 32]`.
Array {
item: *Type,
length: u32,
},
/// Eg. `?T`.
Optional(*Type),
/// Eg. `fn id(i32) -> i32`.
Fn(*TypeInfo),
/// Named, ie. user-defined types.
Nominal(*TypeInfo),
}
Symbols are created for each type or value identifier, and stored per-scope. One area of future improvement would be to do lazy/on-demand type resolution. This gives the language more flexibility and avoids issues with recursive definitions.
/// Resolved symbol allocated during semantic analysis.
record Symbol {
/// Symbol name in source code.
name: *[u8],
/// Data associated with the symbol.
data: SymbolData,
/// Bitset of attributes applied to the declaration.
attrs: u32,
/// AST node that introduced the symbol.
node: *ast::Node,
}
/// Detailed payload attached to a symbol, specialized per symbol kind.
union SymbolData {
/// Payload describing mutable bindings like variables or functions.
Value {
/// Whether the binding permits mutation.
mutable: bool,
/// Custom alignment requirement, or 0 for default.
alignment: u32,
/// Resolved type associated with the value.
type: Type,
},
/// Payload describing constants.
Constant {
/// Resolved type associated with the value.
type: Type,
/// Constant value, if any.
value: ?ConstValue,
},
/// Payload describing union variants and the union type they instantiate.
Variant {
/// Variant payload.
payload: Type,
/// Union declaration node.
decl: *ast::Node,
},
/// Module reference.
Module {
/// Module entry in the graph.
entry: *module::ModuleEntry,
/// Module scope.
scope: *Scope,
},
/// Payload describing type symbols with their resolved type.
Type(*mut TypeInfo),
}
Now that this is in a good place, I can move to the compiler backend. This involves lowering the typed AST to an intermediate representation (IR) that can then be used to generate RISC-V machine code.
I’m currently debating whether to do this with a single IR:
AST → IR → Machine Code
Or to split the IR generation into two phases, by first lowering the AST to a small subset of R’, while maintaining the general language structure, and then using that subset to generate the IR.
AST → Lowered AST → IR → Machine Code
This would make linear IR generation more straightforward and easier to debug, but it introduces an extra compiler pass, which means more code.
Simpler languages tend to have compilers that go straight from AST to
IR (e.g. clang), while more complex languages like Rust and Zig have multiple
IRs.
Then comes the question of what IR to use before code generation. There are many options here, from QBE to SIL. Since I can’t use anything off-the-shelf, I’ll be porting or adapting an existing IR to R’.
Finally, another important part of semantic analysis which I haven’t touched upon here is data-flow analysis. For example, detecting the use of uninitialized variables. This is very handy for code correctness, but it’s more easily done on the IR, so I’ll be tackling it soon.