Radiant Log #008

I’ve completed the parser for the R’ language, in R’. It can parse its own source code, which stands at around 4.7K lines of code, including the AST type definitions and tests. This milestone completes the syntactic analysis phase of the compiler—the parser successfully transforms R’ source text into an abstract syntax tree that can be handed over to the semantic analyzer.

The implementation demonstrates several R’ language features: pattern matching with if let and let case, error propagation with throws and try, optional types tagged unions with payloads.

The AST module defines the node types that represent parsed R’ programs:

/// Tagged union describing every possible AST node payload.
pub union NodeValue: Debug + Eq {
    /// Placeholder `_` expression.
    Placeholder,
    /// Nil literal (`nil`).
    Nil,
    /// Undefined literal (`undefined`).
    Undef,
    /// Boolean literal (`true` or `false`).
    Bool(bool),
    /// Character literal like `'x'`.
    Char(u8),
    /// String literal like `"Hello World!"`.
    String(*[u8]),
    /// Identifier expression.
    Ident(*[u8]),
    /// Numeric literal such as `42` or `0xFF`.
    Number {
        /// Raw characters that comprised the literal.
        text: *[u8],
    },

    // ... expressions ...

    /// Binary operator expression.
    BinOp {
        /// Operator applied to the operands.
        op: BinaryOp,
        /// Left-hand operand.
        left: *Node,
        /// Right-hand operand.
        right: *Node,
    },
    /// Unary operator expression.
    UnOp {
        /// Operator applied to the operand.
        op: UnaryOp,
        /// Operand expression.
        value: *Node,
    },

    // ... control flow ...

    /// Conditional statement.
    If(If),
    /// `if let` conditional binding.
    IfLet(IfLet),
    /// While loop statement.
    While(While),
    /// `for` loop statement.
    For(For),
    /// Match statement.
    Match(Match),

    // ... declarations ...

    /// Function declaration.
    FnDecl(FnDecl),
    /// Record type declaration.
    RecordDecl(RecordDecl),
    /// Union type declaration.
    UnionDecl(UnionDecl),
    /// Type signature node.
    TypeSig(TypeSig),
}

/// Full AST node with shared metadata and variant-specific payload.
pub record Node: Eq {
    /// Source span describing where the node originated.
    span: Span,
    /// Variant-specific payload for the node.
    value: NodeValue,
}

As you can see above, R’ has syntax for deriving type instances: in the above example, we tell the compiler to derive Eq and Debug instances, which gives us equality and debugging support for these types.

Recursive Descent Parser

The parser is a hand-written recursive descent parser for an LL(1) grammar of R’. Compound expressions are handled using precedence climbing, which elegantly manages operator precedence without needing a separate parsing function for each level:

/// Parse binary expressions using precedence climbing.
fn parseBinary(p: *mut Parser, left: *ast::Node, minPrec: i32) -> *ast::Node
    throws (ParseErr)
{
    mut result = left;

    loop {
        let opInfo = findNextOp(p.current.kind, minPrec) else break;
        advance(p);

        mut right = try parseUnary(p);

        while hasHigherPrecedence(p.current.kind, opInfo.prec) {
            right = try parseBinary(p, right, opInfo.prec);
        }
        result = node(p, ast::NodeValue::BinOp {
            op: opInfo.op,
            left: result,
            right: right,
        });
    }
    return result;
}

/// Parse a type annotation.
///
/// Handles primitive types (u8, u16, u32, i8, i16, i32, bool, void),
/// optional types (?T), pointer types (*T), array types ([T; N]),
/// slice types (*[T]), record types, and function types.
fn parseType(p: *mut Parser) -> *ast::Node
    throws (ParseErr)
{
    match p.current.kind {
        case scanner::TokenKind::Record => {
            return try parseRecordType(p);
        }
        case scanner::TokenKind::Question => {
            advance(p);
            let valueType = try parseType(p);

            return node(p, ast::NodeValue::TypeSig(
                ast::TypeSig::Optional { valueType: valueType }
            ));
        }
        case scanner::TokenKind::Star => {
            return try parsePointerType(p);
        }
        case scanner::TokenKind::LBracket => {
            return try parseArrayType(p);
        }
        case scanner::TokenKind::Ident => {
            mut path = try parseIdent(p, "expected type identifier");

            while consume(p, scanner::TokenKind::ColonColon) {
                let part = try parseIdent(p, "expected identifier after `::`");
                path = node(p, ast::NodeValue::ScopeAccess(
                    ast::Access { parent: path, child: part }
                ));
            }
            return path;
        }
        case scanner::TokenKind::U8 => {
            advance(p);
            return nodeTypeInt(p, 1, ast::Signedness::Unsigned);
        }
        case scanner::TokenKind::U16 => {
            advance(p);
            return nodeTypeInt(p, 2, ast::Signedness::Unsigned);
        }
        case scanner::TokenKind::U32 => {
            advance(p);
            return nodeTypeInt(p, 4, ast::Signedness::Unsigned);
        }
        case scanner::TokenKind::I8 => {
            advance(p);
            return nodeTypeInt(p, 1, ast::Signedness::Signed);
        }
        case scanner::TokenKind::I16 => {
            advance(p);
            return nodeTypeInt(p, 2, ast::Signedness::Signed);
        }
        case scanner::TokenKind::I32 => {
            advance(p);
            return nodeTypeInt(p, 4, ast::Signedness::Signed);
        }
        case scanner::TokenKind::Bool => {
            advance(p);
            return node(p, ast::NodeValue::TypeSig(ast::TypeSig::Bool));
        }
        case scanner::TokenKind::Void => {
            advance(p);
            return node(p, ast::NodeValue::TypeSig(ast::TypeSig::Void));
        }
        case scanner::TokenKind::Fn => {
            return try parseFnType(p);
        }
        default => {
            try failParsing(p, "expected type");
        }
    }
}

The parser maintains state and allocates nodes from an arena, R’ doesn’t use dynamic memory allocation:

/// Parser state.
pub record Parser {
    /// The scanner that provides tokens.
    scanner: scanner::Scanner,
    /// The current token being examined.
    current: scanner::Token,
    /// The most recently consumed token.
    previous: scanner::Token,
    /// Collection of errors encountered during parsing.
    errors: ErrorList,
    /// Pointer to the node allocation arena.
    nodes: *mut [ast::Node],
    // ...
}

/// Allocate a new node from the parser's arena.
fn node(p: *mut Parser, value: ast::NodeValue) -> *ast::Node {
    assert(p.nodesCount < p.nodes.len);

    p.nodes[p.nodesCount] = ast::Node {
        span: ast::Span {
            offset: p.current.offset,
            length: p.current.source.len,
        },
        value: value,
    };
    let node = &mut p.nodes[p.nodesCount];
    finishSpan(p, node);
    p.nodesCount = p.nodesCount + 1;

    return node;
}

Error handling uses R’s throws declarations and try keyword for propagation:

pub fn parseBlock(p: *mut Parser) -> *ast::Node
    throws (ParseErr)
{
    let blk = mkBlock();

    if not consume(p, scanner::TokenKind::LBrace) {
        try failParsing(p, "expected `{`");
    }
    try parseStmtsUntil(p, scanner::TokenKind::RBrace, &blk);
    try expect(p, scanner::TokenKind::RBrace, "expected `}`");

    return node(p, ast::NodeValue::Block(blk));
}

What’s Next

With lexical and syntactic analysis complete, the compiler can now read R’ source files and produce abstract syntax trees. The next phase is semantic analysis: type checking, name resolution, and validation of the program’s meaning.

Semantic analysis will walk the AST to:

Resolve identifiers to their declarations
Infer and check types across expressions
Validate control flow
Build symbol tables for each scope
Detect semantic errors before code generation

This phase transforms a syntactically valid AST into a semantically verified representation ready for optimization and code generation.