Radiant Log #009

At this point, I’ve written about 12K lines of R’ code for the new self-hosting R’ compiler, and I’m starting to hit the same language design issues over and over. This is a good time to pause and analyze these issues, and to figure out if we’re okay with having them in the language (no language is perfect), or if we want to remedy them.

R’ already has many features that I would consider quite ergonomic, that I’ve added to make code easier to write, reduce repetitive patterns, or make code safer. But there are also some rough edges which are slowing down the development process.

First let’s start with some nice things we can now do in R’:

One common pattern is let-else-throw, which essentially turns an absent value into an error:

let currentMod = module::get(self.moduleGraph, self.currentMod)
    else throw emitError(self, node, ErrorKind::NotFound);

std/lang/semant.rad

We also support if-let syntax which is common in Rust and Zig. Here, the if branch is executed if the optional returned by checkSuperAccess has a value:

if let superAccess = try checkSuperAccess(self, module) {
    startScope = superAccess.scope;
    pathNode = superAccess.child;
}

std/lang/semant.rad

Using catch, we can turn an error into control flow:

for i in 0..block.statements.len {
    try visitDecl(sem, block.statements.list[i]) catch {
        break;
    };
}

std/lang/semant.rad

In the snippet below, we can see many of the language features at play: function pointers, error handling, mutability annotations, module namespace access and type inference:

/// Parse a comma-separated list enclosed by the given delimiters.
fn parseList(
    p: *mut Parser,
    open: scanner::TokenKind,
    close: scanner::TokenKind,
    parseItem: fn (*mut Parser) throws (ParseErr) -> *ast::Node
) -> ast::NodeList throws (ParseErr) {
    try expect(p, open, listExpectMessage(open));
    mut items = ast::nodeList(p.arena, 8);

    while not check(p, close) {
        let item = try parseItem(p);
        ast::nodeListPush(&mut items, item);

        if not consume(p, scanner::TokenKind::Comma) {
            break;
        }
    }
    try expect(p, close, listExpectMessage(close));

    return items;
}

std/lang/parser.rad

All of this is great. There are however some areas of the language that still need work.

What needs work

R’ is a subset of Radiance, and is meant to be the language used to write RadiantOS and the first few versions of the Radiance compiler.

The primary goals are that it be simple to implement a compiler for, be easy to read and write, and have good performance.

A fairly up to date language reference for R’ can be found here: /system/radiance/prime, though there are already several planned changes to the syntax which don’t figure in that document.

Radiance will later build on R’, and add the following features that will need to be compatible with R’:

Generics, including type bounds
Deriving type bounds automatically
Concurrency primitives
Effects, via object capabilities
Methods on types
Affine types
Very basic borrow-checking
Dynamic memory allocation
Reference-counted pointers
Some form of structural typing for IPC

Ideally, there shouldn’t be any overlap in functionality between the languages, i.e. features should be orthogonal. The hardest part about this is for generics, because some level of generalization is really needed in R’, but we don’t want to implement features that will then collide with generics in Radiance.

Generic growable vectors

One of the pain points with R’ right now is that I have to re-implement list creation and append operations for many different list types, since R’ doesn’t have a generic Vec<T> type with push and pop operations.

Here’s an example of what I do for each list type at the moment:

fn addTypeEntry(a: *mut Analyzer, node: *ast::Node, ty: Type) -> *mut TypeEntry {
    if a.nodeTypes.entriesLen >= a.nodeTypes.entries.len {
        panic "addTypeEntry: node type overflow";
    }
    a.nodeTypes.entries[a.nodeTypes.entriesLen] = TypeEntry { node, ty };
    a.nodeTypes.entriesLen = a.nodeTypes.entriesLen + 1;

    return &mut a.nodeTypes.entries[a.nodeTypes.entriesLen - 1];
}

std/lang/semant.rad

One solution I’m exploring is to use raw vectors, i.e. via opaque pointers, but I’m wondering then if I should have a new opaque pointer type instead of *void, for additional type safety. The idea of a raw vector is one that operates on a u8 slice, and keeps track of element stride and alignment. It could look something like this:

// Raw `i32` vector.
mut arena: [u8; 16] align(@alignOf(i32)) = undefined; // Backing storage.
mut v = vec::new(&mut arena[..], @sizeOf(i32), @alignOf(i32));

// The `i32` is passed by address and copied into the vector.
let val: i32 = 42;
vec::push(&mut v, &val);
assert(vec::len(&v) == 1);

// We can then retrieve the address of the value,
// cast it and dereference it.
let ptr = vec::get(&v, 0) else {
    panic;
};
assert(*(ptr as *i32) == 42);

It’s a bit of a pain, but works well enough for most use-cases, and I could build the generic vector on top of this perhaps.

The other option is a built-in generic slice type like in Go, with a built-in append operation. But this doesn’t extend well to generics in Radiance, i.e. it doesn’t work well given that R’ is supposed to be a subset of Radiance which will have proper generics.

Static memory management

R’ doesn’t use dynamic memory. It uses static arenas for everything. This is by design, since R’ is meant to be used for the Radiant kernel. However the ergonomics of this are pretty poor, basically I end up having to define many static arrays like this:

/// Global arena for allocating symbols during analysis.
static SYMBOL_STORAGE: [Symbol; MAX_SYMBOLS] = undefined;
/// Global storage for lexical scopes.
static SCOPE_STORAGE: [Scope; MAX_SCOPE_RECORDS] = undefined;
/// Global storage for resolved type descriptors.
static TYPE_STORAGE: [Type; MAX_RESOLVED_TYPES] = undefined;

Then I have a type for all storage arenas like this:

pub record AnalyzerStorage {
    symbols: *mut [Symbol],
    nodeSymbols: *mut [NodeSymbol],
    nodeScopes: *mut [NodeScope],
    nodeTypes: *mut [NodeTypeEntry],
    nodeCoercions: *mut [NodeCoercion],
    ...
}

And then a function for creating this storage:

pub fn defaultStorage() -> AnalyzerStorage {
    return AnalyzerStorage {
        symbols: &mut SYMBOL_STORAGE[..],
        nodeSymbols: &mut NODE_SYMBOL_STORAGE[..],
        nodeScopes: &mut NODE_SCOPE_STORAGE[..],
        nodeTypes: &mut NODE_TYPE_STORAGE[..],
        nodeCoercions: &mut NODE_COERCION_STORAGE[..],
        constValues: &mut CONST_VALUE_STORAGE[..],
        ...
    };
}

This problem is related to the lack of generic vectors, though it’s more about how to work with static memory in an ergonomic way. The way in which it’s related to generics is that I have to implement an append function for each of these arenas, that checks the capacity of the backing array, and tracks the lengths separately. I could implement something similar to malloc with a single arena, but using static memory, but then I’d lose some safety and create complexity in other places.

I’ve considered an array-backed-slice type that tracks its capacity as well, but this needs a proper design. The idea would be that instead of slices being represented like this:

record {
    ptr: *T,
    len: u32,
}

They’d be represented like this:

record {
    ptr: *T,
    len: u32,
    cap: u32,
}

Then, I could do certain things like grow the slice via the existing syntax, but in a safe way. If the capacity is exceeded, we would get a runtime panic:

store.nodes = &mut store.nodes[..store.nodes.len + 1];

I’d use the slice length as the actual number of items in the slice, while the capacity would be the length of the backing array. Currently I have to keep track of the number of items in the slice manually. This is potentially a nice compromise that doesn’t introduce built-in generic functions before proper generics land in Radiance.

Safety

It would be nice if R’ had better memory safety while still allowing free use of pointers and some pointer arithmetic, given that it needs to be used inside the kernel. I know that the Oberon language was used to implement the Oberon operating system and it was a safe language, but of course they had a GC. There are certainly safety features that can be added without a GC and that don’t involve implementing a full borrow checker.

The primary issue I’ve been having so far in R’ is with invalid pointers. Since R’ doesn’t use dynamic memory, there are no memory leaks in the traditional sense, and no double frees. R’ also protects against null pointer dereferences. However, it’s currently possible to return a stack pointer from a function and then dereference it. This is something I’d like to prevent.

Conditional expressions

Another ergonomic issue is assigning the results of a conditional. In more imperative languages like C and Swift, you have the ternary operator:

let x = y ? t : f;

Which I find a little weird, because it’s an operator that doesn’t look or work like anything else. In more functional languages or hybrid languages like Rust and Zig, if is an expression, so you’d do:

let x = if y { t } else { f };

I don’t like this that much either, as it doesn’t read well:

'x' is if 'y' is true, then 't', else 'f'.

It’s also not super obvious that the last value in a block has special meaning, especially when there are other statements in the block:

let result = if {
    prepareThings();
    andOtherThings() // The return value will be assigned to `result`.
} else {
    ...
};

What I currently do is essentially this:

mut x = undefined;
if y {
    x = t;
} else {
    x = f;
}

Which reads a lot better:

if 'y' is true, then 'x' is 't', else 'x' is 'f'.

I don’t think control flow should be mixed with expressions, but I also would like to make this as ergonomic and safe as possible. For the above to be safe it would require checking that undefined variables are not used before they are set.

There is one other idea which I’ve seen in Hare recently, and it’s to use a keyword yield to “return” from a conditional. This makes it clearer that we’re assigning the return value of andOtherThings() to result.

let result = if {
    prepareThings();
    yield andOtherThings();
} else {
    ...
};

Equality

Finally, there’s the problem of equality. This is a small but important one. At the moment, R’ has the following rules:

For primitive types, the “==” operator is value equality.
For arrays and slices, the “==” operator is also value equality.
For pointers, the “==” operator is address equality.

This would be pretty standard if it weren’t for slice equality. Slices aren’t pointers, but they do point to something, so I’m on the fence on whether “==” should instead be pointer plus length equality. Rust compares slices by value, while Zig compares them by address plus length, and uses std.mem.eql to compare by value. To be clear, by “value” here, I mean the slice elements.

If we’re trying to make the more common case the simplest, then I do think for R’, going with Zig’s approach is best. In most cases, slice comparisons are used in the context of comparing strings (*[u8]), and strings should be interned and compared via pointer equality.

Conclusion

These are the issues I’ve been running into most often; finding solutions for these would go a long way. Unlike Radiance, R’ is not meant to be the most convenient and easy to use language – its main use will be in privileged environments like the kernel, so some level of friction and verbosity is ok if it means keeping the implementation small. The key will be to find elegant solutions that can be built upon in Radiance, and don’t increase internal complexity. This is what I’ll be focused on next.