Radiant Log #003
“I found myself sinking in a swamp. The weight of my horse and arms was not inconsiderable. I was sinking deeper and deeper. What do you think I did? I took hold of my own pigtail, turned and twisted, using it as a lever, and hoisted myself and my horse out of the quagmire with one mighty heave.”
Though this story from the Baron of Munchausen sounds improbable, in compiler theory, the act of pulling yourself from your own bootstraps, or bootstrapping is fairly common; it refers to the process of developing a compiler for some target language that is eventually able to compile itself. We call this a self-hosting compiler.
The purpose of bootstrapping is independence from external systems. A self-hosting language does not require any other tools for development.
This process follows a distinct pattern: you typically start by implementing a minimal version of the target language compiler in an existing language (the host language). Once it’s mature enough to compile complex programs (particularly a compiler), you port it to the target language. Then, using the original compiler, you compile the newly ported compiler, producing an executable. At this point, you have a compiler executable and its source code, both in the target language. The original host language compiler is no longer needed, and the target language is considered bootstrapped. It can now compile newer versions of itself: compiler version N can compile version N + 1, and so on, creating a self-hosting system.
The trust problem
One of the less talked about yet fundamental challenges in bootstrapped languages is the trust problem: how can you verify that a program does what its source code says it does? The obvious answer is to read the program source and examine the compiler that translated it into executable code.
But here’s the paradox: the compiler itself is just another program. To trust it, you’d need to verify its source code too, and eventually compile it. Only to do that would require another compiler. This creates a recursive verification problem: to verify compiler version N, you must first verify version N - 1, then N - 2, and so on, all the way back to the original host language used to bootstrap the compiler. This is the essence of what Ken Thompson describes in his lecture, Reflections on Trusting Trust. Though he doesn’t specifically refer to bootstrapping, his example involves a self-hosting C compiler.
A better approach to verifying such a program would be to start from the original host language compiler, and follow the bootstrapping process, reproducing every step up to the final version of the compiler. By taking this approach, only the original source material needs to be verified, along with the program in question, since every subsequent version of the compiler can be produced from it.
In Thompson’s lecture he concedes that we must eventually rely on something we didn’t personally verify. In other words, the only practical solution to the trust problem is to choose who to trust. Though that may be the case, we can still attempt to provide those who don’t trust us the ability to verify our work. Therefore, the specific bootstrapping process we choose is important, as it’ll make the verification process more or less straightforward to follow.
Choosing a host language
The first choice we have to make is what to use as the host language. Ideally, we want something well known and mature, with more than one available implementation. For bootstrapping the Radiant language compiler, I chose C (specifically the C99 standard) as the host language.
It’s important that the initial implementation (in the host language) not stray too far from what the target language will be capable of expressing, to facilitate the process of porting. That’s why language designers often pick something that is similar in spirit to their target language.
The transition from host to target language requires careful planning so that the features needed to implement the compiler itself are prioritized early in the language’s development. The language must be powerful enough to implement its own compiler, which is a practical test of the language’s expressiveness. This process helps refine the language design through direct usage, and is the reason why aiming for self-hosting early is a good idea: it’s the best way to find out if the language design makes sense. The trade-off however is that any major change to the language will require rewriting parts of the compiler.
There’s a few reasons why I’ve chosen C as the host language:
- I know C, and have implemented parsers and code generators in C before.
- C codebases become hard to manage as they grow in size, this will force me to keep the implementation small.
- The target language is C-like, thus making the porting process more straightforward.
- Re-acquainting myself with the pains of writing C is a good refresher for what language features I should focus on.
- If I want to create a reproducible bootstrapping process to facilitate verification, starting from a C compiler is ideal: there are plenty of C compilers to choose from, and C compilers are generally available on all platforms and mature enough to have an established level of trust.
Ultimately, bootstrapping a compiler represents more than just a technical challenge; it embodies the philosophical core of what programming languages are designed to achieve: independence, self-sufficiency, and verifiability. Like the Baron pulling himself from the swamp, a bootstrapped language lifts itself into existence through its own capabilities, creating a complete system that can evolve and improve without external dependencies.