Radiant Log #008
I’ve completed the parser for the R’ language, in R’. It can parse its own source code, which stands at around 4.7K lines of code, including the AST type definitions and tests. This milestone completes the syntactic analysis phase of the compiler—the parser successfully transforms R’ source text into an abstract syntax tree that can be handed over to the semantic analyzer.
The implementation demonstrates several R’ language features: pattern
matching with if let and let case, error propagation with throws and try,
optional types tagged unions with payloads.
The AST module defines the node types that represent parsed R’ programs:
/// Tagged enum describing every possible AST node payload.
pub enum NodeValue: Debug + Eq {
/// Placeholder `_` expression.
Placeholder: void,
/// Nil literal (`nil`).
Nil: void,
/// Undefined literal (`undefined`).
Undef: void,
/// Boolean literal (`true` or `false`).
Bool: bool,
/// Character literal like `'x'`.
Char: u8,
/// String literal like `"Hello World!"`.
String: *[u8],
/// Identifier expression.
Ident: *[u8],
/// Numeric literal such as `42` or `0xFF`.
Number: struct {
/// Raw characters that comprised the literal.
text: *[u8],
},
// ... expressions ...
/// Binary operator expression.
BinOp: struct {
/// Operator applied to the operands.
op: BinaryOp,
/// Left-hand operand.
left: *Node,
/// Right-hand operand.
right: *Node,
},
/// Unary operator expression.
UnOp: struct {
/// Operator applied to the operand.
op: UnaryOp,
/// Operand expression.
value: *Node,
},
// ... control flow ...
/// Conditional statement.
If: If,
/// `if let` conditional binding.
IfLet: IfLet,
/// While loop statement.
While: While,
/// `for` loop statement.
For: For,
/// Switch statement.
Switch: Switch,
// ... declarations ...
/// Function declaration.
FnDecl: FnDecl,
/// Struct type declaration.
StructDecl: StructDecl,
/// Enum type declaration.
EnumDecl: EnumDecl,
/// Type signature node.
TypeSig: TypeSig,
}
/// Full AST node with shared metadata and variant-specific payload.
pub struct Node: Eq {
/// Source span describing where the node originated.
span: Span,
/// Variant-specific payload for the node.
value: NodeValue,
}
As you can see above, R’ has syntax for deriving type instances: in the above
example, we tell the compiler to derive Eq and Debug instances, which gives
us equality and debugging support for these types.
Recursive Descent Parser
The parser is a hand-written recursive descent parser for an
LL(1) grammar of R’. Compound expressions are handled using precedence
climbing, which elegantly manages operator precedence without needing a
separate parsing function for each level:
/// Parse binary expressions using precedence climbing.
fn parseBinary(p: *mut Parser, left: *ast::Node, minPrec: i32) -> *ast::Node
throws (ParseErr)
{
mut result = left;
loop {
let opInfo = findNextOp(p.current.kind, minPrec) else break;
advance(p);
mut right = try parseUnary(p);
while hasHigherPrecedence(p.current.kind, opInfo.prec) {
right = try parseBinary(p, right, opInfo.prec);
}
result = node(p, ast::NodeValue::BinOp {
op: opInfo.op,
left: result,
right: right,
});
}
return result;
}
/// Parse a type annotation.
///
/// Handles primitive types (u8, u16, u32, i8, i16, i32, bool, void),
/// optional types (?T), pointer types (*T), array types ([T; N]),
/// slice types (*[T]), struct types, and function types.
fn parseType(p: *mut Parser) -> *ast::Node
throws (ParseErr)
{
switch p.current.kind {
case scanner::TokenKind::Struct => {
return try parseStructType(p);
}
case scanner::TokenKind::Question => {
advance(p);
let valueType = try parseType(p);
return node(p, ast::NodeValue::TypeSig(
ast::TypeSig::Optional { valueType: valueType }
));
}
case scanner::TokenKind::Star => {
return try parsePointerType(p);
}
case scanner::TokenKind::LBracket => {
return try parseArrayType(p);
}
case scanner::TokenKind::Ident => {
mut path = try parseIdent(p, "expected type identifier");
while consume(p, scanner::TokenKind::ColonColon) {
let part = try parseIdent(p, "expected identifier after `::`");
path = node(p, ast::NodeValue::ScopeAccess(
ast::Access { parent: path, child: part }
));
}
return path;
}
case scanner::TokenKind::U8 => {
advance(p);
return nodeTypeInt(p, 1, ast::Signedness::Unsigned);
}
case scanner::TokenKind::U16 => {
advance(p);
return nodeTypeInt(p, 2, ast::Signedness::Unsigned);
}
case scanner::TokenKind::U32 => {
advance(p);
return nodeTypeInt(p, 4, ast::Signedness::Unsigned);
}
case scanner::TokenKind::I8 => {
advance(p);
return nodeTypeInt(p, 1, ast::Signedness::Signed);
}
case scanner::TokenKind::I16 => {
advance(p);
return nodeTypeInt(p, 2, ast::Signedness::Signed);
}
case scanner::TokenKind::I32 => {
advance(p);
return nodeTypeInt(p, 4, ast::Signedness::Signed);
}
case scanner::TokenKind::Bool => {
advance(p);
return node(p, ast::NodeValue::TypeSig(ast::TypeSig::Bool));
}
case scanner::TokenKind::Void => {
advance(p);
return node(p, ast::NodeValue::TypeSig(ast::TypeSig::Void));
}
case scanner::TokenKind::Fn => {
return try parseFnType(p);
}
default => {
try failParsing(p, "expected type");
}
}
}
The parser maintains state and allocates nodes from an arena, R’ doesn’t use dynamic memory allocation:
/// Parser state.
pub struct Parser {
/// The scanner that provides tokens.
scanner: scanner::Scanner,
/// The current token being examined.
current: scanner::Token,
/// The most recently consumed token.
previous: scanner::Token,
/// Collection of errors encountered during parsing.
errors: ErrorList,
/// Pointer to the node allocation arena.
nodes: *mut [ast::Node],
// ...
}
/// Allocate a new node from the parser's arena.
fn node(p: *mut Parser, value: ast::NodeValue) -> *ast::Node {
assert(p.nodesCount < p.nodes.len);
p.nodes[p.nodesCount] = ast::Node {
span: ast::Span {
offset: p.current.offset,
length: p.current.source.len,
},
value: value,
};
let node = &mut p.nodes[p.nodesCount];
finishSpan(p, node);
p.nodesCount = p.nodesCount + 1;
return node;
}
Error handling uses R’s throws declarations and try keyword for propagation:
pub fn parseBlock(p: *mut Parser) -> *ast::Node
throws (ParseErr)
{
let blk = mkBlock();
if not consume(p, scanner::TokenKind::LBrace) {
try failParsing(p, "expected `{`");
}
try parseStmtsUntil(p, scanner::TokenKind::RBrace, &blk);
try expect(p, scanner::TokenKind::RBrace, "expected `}`");
return node(p, ast::NodeValue::Block(blk));
}
What’s Next
With lexical and syntactic analysis complete, the compiler can now read R’ source files and produce abstract syntax trees. The next phase is semantic analysis: type checking, name resolution, and validation of the program’s meaning.
Semantic analysis will walk the AST to:
- Resolve identifiers to their declarations
- Infer and check types across expressions
- Validate control flow
- Build symbol tables for each scope
- Detect semantic errors before code generation
This phase transforms a syntactically valid AST into a semantically verified representation ready for optimization and code generation.