Radiant Log #010

The semantic analyzer or “type checker” written in R’ is now complete, insofar as it can validate itself and all of the R’ code we have without error. The Radiance compiler driver is now able to parse itself and validate that the language semantics are fully upheld.

> emulator -run bin/radiance.bin \
    -pkg std \
      -mod lib/std.rad \
      -mod lib/std/fmt.rad \
      -mod lib/std/mem.rad \
      -mod lib/std/vec.rad \
      -mod lib/std/sys.rad \
      -mod lib/std/io.rad \
      -mod lib/std/lang.rad
      ...

radiance: std: parsing ( lib/std.rad )
radiance: std: parsing ( lib/std/fmt.rad )
radiance: std: parsing ( lib/std/intrinsics.rad )
radiance: std: parsing ( lib/std/mem.rad )
radiance: std: parsing ( lib/std/vec.rad )
radiance: std: parsing ( lib/std/debug.rad )
radiance: std: parsing ( lib/std/sys.rad )
radiance: std: parsing ( lib/std/io.rad )
radiance: std: parsing ( lib/std/sys/unix.rad )
radiance: std: parsing ( lib/std/lang.rad )
...
radiance: std: running semantic analysis..
radiance: std: ok

Besides all the obvious things a semantic analyzer is supposed to check, we also have the following:

Module system - mod, use, super and visibility (pub) checking
Type inference - infers types from initializers, where it can
Type coercion - automatic T → ?T lifting, numeric widening
Numeric overflow detection - validates literals fit target type
Array/slice indexing - static bounds checking for constant values
Match exhaustiveness - unions require all variants to be handled
let-else divergence - else branch must not fall through
Missing return detection - non-void functions must return
Missing try detection - must use try on fallible calls
Mutability checking - can’t assign to immutable bindings
Opaque pointer type - no dereference, arithmetic, or value creation
Built-in intrinsics - @sizeOf and @alignOf computed statically

The analyzer sits at around 4K lines of code with another 4K lines of test code covering all the functionality over 226 tests. It takes a ModuleGraph as input and spits out diagnostics.

/// Global analyzer state.
pub record Analyzer {
    /// Current scope.
    scope: *mut Scope,
    /// Package scope containing package roots and top-level symbols.
    pkgScope: *mut Scope,
    /// Nesting depth of active loop constructs.
    loopDepth: u32,
    /// Signature of the function currently being analyzed.
    currentFn: ?*FnType,
    /// Current module being analyzed.
    currentMod: u16,
    /// Configuration for semantic analysis.
    config: Config,
    /// Arena of symbols allocated during analysis.
    symbols: SymbolArena,
    /// Arena of scopes.
    scopes: ScopeArena,
    /// Combined semantic metadata table.
    nodeSemant: NodeSemantTable,
    /// Known types.
    types: Types,
    /// Diagnostics recorded so far.
    errors: ErrorList,
    /// Module graph for this package.
    moduleGraph: *module::ModuleGraph,
}

At the moment, the analyzer traverses the AST in multiple passes to first bind names, then check types and finally validate function bodies. This avoids the need for forward declarations, and allows circular dependencies within a compilation unit, called a package. Each package consists of multiple modules, which directly map to source files.

Here’s the full module graph for the current standard library (std), which includes the R’ compiler.

std {id: 0, state: Parsed, file: "lib/std.rad", path: std, children: 8, ast: true}
  fmt {id: 1, state: Parsed, file: "lib/std/fmt.rad", path: std::fmt, children: 0, ast: true}
  intrinsics {id: 2, state: Parsed, file: "lib/std/intrinsics.rad", path: std::intrinsics, children: 0, ast: true}
  mem {id: 3, state: Parsed, file: "lib/std/mem.rad", path: std::mem, children: 0, ast: true}
  vec {id: 4, state: Parsed, file: "lib/std/vec.rad", path: std::vec, children: 0, ast: true}
  debug {id: 5, state: Parsed, file: "lib/std/debug.rad", path: std::debug, children: 0, ast: true}
  sys {id: 6, state: Parsed, file: "lib/std/sys.rad", path: std::sys, children: 1, ast: true}
    unix {id: 8, state: Parsed, file: "lib/std/sys/unix.rad", path: std::sys::unix, children: 0, ast: true}
  io {id: 7, state: Parsed, file: "lib/std/io.rad", path: std::io, children: 0, ast: true}
  lang {id: 9, state: Parsed, file: "lib/std/lang.rad", path: std::lang, children: 7, ast: true}
    strings {id: 10, state: Parsed, file: "lib/std/lang/strings.rad", path: std::lang::strings, children: 0, ast: true}
    ast {id: 11, state: Parsed, file: "lib/std/lang/ast.rad", path: std::lang::ast, children: 1, ast: true}
      printer {id: 12, state: Parsed, file: "lib/std/lang/ast/printer.rad", path: std::lang::ast::printer, children: 0, ast: true}
    scanner {id: 13, state: Parsed, file: "lib/std/lang/scanner.rad", path: std::lang::scanner, children: 0, ast: true}
    parser {id: 14, state: Parsed, file: "lib/std/lang/parser.rad", path: std::lang::parser, children: 0, ast: true}
    semant {id: 15, state: Parsed, file: "lib/std/lang/semant.rad", path: std::lang::semant, children: 1, ast: true}
      printer {id: 16, state: Parsed, file: "lib/std/lang/semant/printer.rad", path: std::lang::semant::printer, children: 0, ast: true}
    module {id: 17, state: Parsed, file: "lib/std/lang/module.rad", path: std::lang::module, children: 1, ast: true}
      printer {id: 18, state: Parsed, file: "lib/std/lang/module/printer.rad", path: std::lang::module::printer, children: 0, ast: true}
    package {id: 19, state: Parsed, file: "lib/std/lang/package.rad", path: std::lang::package, children: 0, ast: true}

The analyzer traverses the module graph and assigns types to all nodes without mutating the AST. This makes it easier to parallelize type checking in the future.

/// Describes a type computed during semantic analysis.
union Type {
    /// A type that couldn't be decided.
    Invalid,
    /// Types only used during inference.
    Nil, Undefined, Int,
    /// Primitive types.
    Void, Opaque, Never, Bool,
    /// Integer types.
    U8, U16, U32, I8, I16, I32,
    /// Range types, eg. `start..end`.
    Range {
        start: ?*Type,
        end: ?*Type,
    },
    /// Eg. `*T`.
    Pointer {
        target: *Type,
        mutable: bool,
    },
    /// Eg. `*[i32]`.
    Slice {
        item: *Type,
        mutable: bool,
    },
    /// Eg. `[i32; 32]`.
    Array {
        item: *Type,
        length: u32,
    },
    /// Eg. `?T`.
    Optional(*Type),
    /// Eg. `fn id(i32) -> i32`.
    Fn(*TypeInfo),
    /// Named, ie. user-defined types.
    Nominal(*TypeInfo),
}

Symbols are created for each type or value identifier, and stored per-scope. One area of future improvement would be to do lazy/on-demand type resolution. This gives the language more flexibility and avoids issues with recursive definitions.

/// Resolved symbol allocated during semantic analysis.
record Symbol {
    /// Symbol name in source code.
    name: *[u8],
    /// Data associated with the symbol.
    data: SymbolData,
    /// Bitset of attributes applied to the declaration.
    attrs: u32,
    /// AST node that introduced the symbol.
    node: *ast::Node,
}

/// Detailed payload attached to a symbol, specialized per symbol kind.
union SymbolData {
    /// Payload describing mutable bindings like variables or functions.
    Value {
        /// Whether the binding permits mutation.
        mutable: bool,
        /// Custom alignment requirement, or 0 for default.
        alignment: u32,
        /// Resolved type associated with the value.
        type: Type,
    },
    /// Payload describing constants.
    Constant {
        /// Resolved type associated with the value.
        type: Type,
        /// Constant value, if any.
        value: ?ConstValue,
    },
    /// Payload describing union variants and the union type they instantiate.
    Variant {
        /// Variant payload.
        payload: Type,
        /// Union declaration node.
        decl: *ast::Node,
    },
    /// Module reference.
    Module {
        /// Module entry in the graph.
        entry: *module::ModuleEntry,
        /// Module scope.
        scope: *Scope,
    },
    /// Payload describing type symbols with their resolved type.
    Type(*mut TypeInfo),
}

Now that this is in a good place, I can move to the compiler backend. This involves lowering the typed AST to an intermediate representation (IR) that can then be used to generate RISC-V machine code.

I’m currently debating whether to do this with a single IR:

AST → IR → Machine Code

Or to split the IR generation into two phases, by first lowering the AST to a small subset of R’, while maintaining the general language structure, and then using that subset to generate the IR.

AST → Lowered AST → IR → Machine Code

This would make linear IR generation more straightforward and easier to debug, but it introduces an extra compiler pass, which means more code.

Simpler languages tend to have compilers that go straight from AST to IR (e.g. clang), while more complex languages like Rust and Zig have multiple IRs.

Then comes the question of what IR to use before code generation. There are many options here, from QBE to SIL. Since I can’t use anything off-the-shelf, I’ll be porting or adapting an existing IR to R’.

Finally, another important part of semantic analysis which I haven’t touched upon here is data-flow analysis. For example, detecting the use of uninitialized variables. This is very handy for code correctness, but it’s more easily done on the IR, so I’ll be tackling it soon.