Radiant Log #009
At this point, I’ve written about 12K lines of R’ code for the new self-hosting R’ compiler, and I’m starting to hit the same language design issues over and over. This is a good time to pause and analyze these issues, and to figure out if we’re okay with having them in the language (no language is perfect), or if we want to remedy them.
R’ already has many features that I would consider quite ergonomic, that I’ve added to make code easier to write, reduce repetitive patterns, or make code safer. But there are also some rough edges which are slowing down the development process.
First let’s start with some nice things we can now do in R’:
One common pattern is let-else-throw, which essentially turns an absent
value into an error:
let currentMod = module::get(self.moduleGraph, self.currentMod)
else throw emitError(self, node, ErrorKind::NotFound);
We also support if-let syntax which is common in Rust and Zig.
Here, the if branch is executed if the optional returned by
checkSuperAccess has a value:
if let superAccess = try checkSuperAccess(self, module) {
startScope = superAccess.scope;
pathNode = superAccess.child;
}
Using catch, we can turn an error into control flow:
for i in 0..block.statements.len {
try visitDecl(sem, block.statements.list[i]) catch {
break;
};
}
In the snippet below, we can see many of the language features at play: function pointers, error handling, mutability annotations, module namespace access and type inference:
/// Parse a comma-separated list enclosed by the given delimiters.
fn parseList(
p: *mut Parser,
open: scanner::TokenKind,
close: scanner::TokenKind,
parseItem: fn (*mut Parser) throws (ParseErr) -> *ast::Node
) -> ast::NodeList throws (ParseErr) {
try expect(p, open, listExpectMessage(open));
mut items = ast::nodeList(p.arena, 8);
while not check(p, close) {
let item = try parseItem(p);
ast::nodeListPush(&mut items, item);
if not consume(p, scanner::TokenKind::Comma) {
break;
}
}
try expect(p, close, listExpectMessage(close));
return items;
}
All of this is great. There are however some areas of the language that still need work.
What needs work
R’ is a subset of Radiance, and is meant to be the language used to write RadiantOS and the first few versions of the Radiance compiler.
The primary goals are that it be simple to implement a compiler for, be easy to read and write, and have good performance.
A fairly up to date language reference for R’ can be found here: /system/radiance/prime, though there are already several planned changes to the syntax which don’t figure in that document.
Radiance will later build on R’, and add the following features that will need to be compatible with R’:
- Generics, including type bounds
- Deriving type bounds automatically
- Concurrency primitives
- Effects, via object capabilities
- Methods on types
- Affine types
- Very basic borrow-checking
- Dynamic memory allocation
- Reference-counted pointers
- Some form of structural typing for IPC
Ideally, there shouldn’t be any overlap in functionality between the languages, i.e. features should be orthogonal. The hardest part about this is for generics, because some level of generalization is really needed in R’, but we don’t want to implement features that will then collide with generics in Radiance.
Generic growable vectors
One of the pain points with R’ right now is that I have to re-implement list
creation and append operations for many different list types, since R’ doesn’t
have a generic Vec<T> type with push and pop operations.
Here’s an example of what I do for each list type at the moment:
fn addTypeEntry(a: *mut Analyzer, node: *ast::Node, ty: Type) -> *mut TypeEntry {
if a.nodeTypes.entriesLen >= a.nodeTypes.entries.len {
panic "addTypeEntry: node type overflow";
}
a.nodeTypes.entries[a.nodeTypes.entriesLen] = TypeEntry { node, ty };
a.nodeTypes.entriesLen = a.nodeTypes.entriesLen + 1;
return &mut a.nodeTypes.entries[a.nodeTypes.entriesLen - 1];
}
One solution I’m exploring is to use raw vectors, i.e. via opaque pointers, but
I’m wondering then if I should have a new opaque pointer type instead of *void,
for additional type safety. The idea of a raw vector is one that operates on a u8
slice, and keeps track of element stride and alignment. It could look something
like this:
// Raw `i32` vector.
mut arena: [u8; 16] align(@alignOf(i32)) = undefined; // Backing storage.
mut v = vec::new(&mut arena[..], @sizeOf(i32), @alignOf(i32));
// The `i32` is passed by address and copied into the vector.
let val: i32 = 42;
vec::push(&mut v, &val);
assert(vec::len(&v) == 1);
// We can then retrieve the address of the value,
// cast it and dereference it.
let ptr = vec::get(&v, 0) else {
panic;
};
assert(*(ptr as *i32) == 42);
It’s a bit of a pain, but works well enough for most use-cases, and I could build the generic vector on top of this perhaps.
The other option is a built-in generic slice type like in Go, with a built-in
append operation. But this doesn’t extend well to generics in Radiance, i.e.
it doesn’t work well given that R’ is supposed to be a subset of Radiance which
will have proper generics.
Static memory management
R’ doesn’t use dynamic memory. It uses static arenas for everything. This is
by design, since R’ is meant to be used for the Radiant kernel. However the
ergonomics of this are pretty poor, basically I end up having to define many
static arrays like this:
/// Global arena for allocating symbols during analysis.
static SYMBOL_STORAGE: [Symbol; MAX_SYMBOLS] = undefined;
/// Global storage for lexical scopes.
static SCOPE_STORAGE: [Scope; MAX_SCOPE_RECORDS] = undefined;
/// Global storage for resolved type descriptors.
static TYPE_STORAGE: [Type; MAX_RESOLVED_TYPES] = undefined;
Then I have a type for all storage arenas like this:
pub struct AnalyzerStorage {
symbols: *mut [Symbol],
nodeSymbols: *mut [NodeSymbol],
nodeScopes: *mut [NodeScope],
nodeTypes: *mut [NodeTypeEntry],
nodeCoercions: *mut [NodeCoercion],
...
}
And then a function for creating this storage:
pub fn defaultStorage() -> AnalyzerStorage {
return AnalyzerStorage {
symbols: &mut SYMBOL_STORAGE[..],
nodeSymbols: &mut NODE_SYMBOL_STORAGE[..],
nodeScopes: &mut NODE_SCOPE_STORAGE[..],
nodeTypes: &mut NODE_TYPE_STORAGE[..],
nodeCoercions: &mut NODE_COERCION_STORAGE[..],
constValues: &mut CONST_VALUE_STORAGE[..],
...
};
}
This problem is related to the lack of generic vectors, though it’s more about
how to work with static memory in an ergonomic way. The way in which it’s
related to generics is that I have to implement an append function for each of
these arenas, that checks the capacity of the backing array, and tracks the
lengths separately. I could implement something similar to malloc with a
single arena, but using static memory, but then I’d lose some safety and create
complexity in other places.
I’ve considered an array-backed-slice type that tracks its capacity as well, but this needs a proper design. The idea would be that instead of slices being represented like this:
struct {
ptr: *T,
len: u32,
}
They’d be represented like this:
struct {
ptr: *T,
len: u32,
cap: u32,
}
Then, I could do certain things like grow the slice via the existing syntax, but in a safe way. If the capacity is exceeded, we would get a runtime panic:
store.nodes = &mut store.nodes[..store.nodes.len + 1];
I’d use the slice length as the actual number of items in the slice, while the capacity would be the length of the backing array. Currently I have to keep track of the number of items in the slice manually. This is potentially a nice compromise that doesn’t introduce built-in generic functions before proper generics land in Radiance.
Safety
It would be nice if R’ had better memory safety while still allowing free use of pointers and some pointer arithmetic, given that it needs to be used inside the kernel. I know that the Oberon language was used to implement the Oberon operating system and it was a safe language, but of course they had a GC. There are certainly safety features that can be added without a GC and that don’t involve implementing a full borrow checker.
The primary issue I’ve been having so far in R’ is with invalid pointers. Since
R’ doesn’t use dynamic memory, there are no memory leaks in the traditional
sense, and no double frees. R’ also protects against null pointer
dereferences. However, it’s currently possible to return a stack pointer from a
function and then dereference it. This is something I’d like to prevent.
Conditional expressions
Another ergonomic issue is assigning the results of a conditional. In more imperative languages like C and Swift, you have the ternary operator:
let x = y ? t : f;
Which I find a little weird, because it’s an operator that doesn’t look or work
like anything else. In more functional languages or hybrid languages like Rust
and Zig, if is an expression, so you’d do:
let x = if y { t } else { f };
I don’t like this that much either, as it doesn’t read well:
'x' is if 'y' is true, then 't', else 'f'.
It’s also not super obvious that the last value in a block has special meaning, especially when there are other statements in the block:
let result = if {
prepareThings();
andOtherThings() // The return value will be assigned to `result`.
} else {
...
};
What I currently do is essentially this:
mut x = undefined;
if y {
x = t;
} else {
x = f;
}
Which reads a lot better:
if 'y' is true, then 'x' is 't', else 'x' is 'f'.
I don’t think control flow should be mixed with expressions, but I also would
like to make this as ergonomic and safe as possible. For the above to
be safe it would require checking that undefined variables are not used before
they are set.
There is one other idea which I’ve seen in Hare recently, and it’s to
use a keyword yield to “return” from a conditional. This makes it clearer
that we’re assigning the return value of andOtherThings() to result.
let result = if {
prepareThings();
yield andOtherThings();
} else {
...
};
Equality
Finally, there’s the problem of equality. This is a small but important one. At the moment, R’ has the following rules:
- For primitive types, the “
==” operator is value equality. - For arrays and slices, the “
==” operator is also value equality. - For pointers, the “
==” operator is address equality.
This would be pretty standard if it weren’t for slice equality. Slices aren’t
pointers, but they do point to something, so I’m on the fence on whether “==”
should instead be pointer plus length equality. Rust compares slices by value,
while Zig compares them by address plus length, and uses std.mem.eql to compare
by value. To be clear, by “value” here, I mean the slice elements.
If we’re trying to make the more common case the simplest, then I do think for
R’, going with Zig’s approach is best. In most cases, slice comparisons are
used in the context of comparing strings (*[u8]), and strings should be
interned and compared via pointer equality.
Conclusion
These are the issues I’ve been running into most often; finding solutions for these would go a long way. Unlike Radiance, R’ is not meant to be the most convenient and easy to use language – its main use will be in privileged environments like the kernel, so some level of friction and verbosity is ok if it means keeping the implementation small. The key will be to find elegant solutions that can be built upon in Radiance, and don’t increase internal complexity. This is what I’ll be focused on next.