In this post, we delve into the essence, purpose and potential of the novel programming language Zig.
Efficient and portable as C, without the "footguns"
Not every technology is capable of direct value capture. This can be perilous as an infra investor, as it’s easy to become enamored by something that’s innovative, yet devoid of a business model or meaningful market.
Programming languages are a good example here. How Rust handles concurrency is a thing of beauty, Roc’s VM-less memory allocation is pretty great, and there’s a lot to like about Zig. However, these languages aren’t licensed out, and their value capture certainly isn’t tied to usage or utility.
What helps us sleep at night, however, is that these technologies (think languages, runtimes, protocols, etc.) can serve as proxies for sophisticated and opinionated technologists – dissenters capable of being wrong.
Now, this is something that history would tell us capital is worth putting towards.
Take Zig, for example. Bun is outpacing Node.js and its anagrammatic counterpart, Deno. How so? Bun does a lot of things well, but it does tout Zig’s low-level control of memory and lack of hidden control flow as a key unlock.
Continuing down this proxy, we find Stephen Gutekanst / Mach Engine working 24/7 on a game engine to help “upend the gaming industry”. Building a game engine from scratch is no small feat.
Finally, we have TigerBeetle. Again, one doesn’t simply build a financial database from the ground up over a leisurely weekend. Zig has served as a beacon, attracting those who are thinking orthogonally. We’re fond of folk like this.
In this primer we’re going to dig into Zig, getting our heads around why companies are opting to build with this nascent language, as well as what we should expect next.
Zig is many things, but at its core it is: a general-purpose programming language (think Python or JavaScript); a “toolchain”.
The programming language part here is familiar, and hence, easy to grok. However, it’s worth lingering on. Our favorite quality of Zig’s is its simplicity.
What does this mean?
Firstly, the language is tiny. It’s specified with a 500-line PEG grammar file. For context, a PEG (parsing expression grammar) file is typically used to define the structure and syntax of a programming language.
In short, the benefit here is that a “small” language ultimately means that you have fewer language-specific keywords, etc., to remember. Hence, the language is “simple”.
Having fewer keywords also means that there’s ideally “only one obvious way to do things.” Thus, it becomes much easier to read your/someone else’s code when you know that a specific keyword is typically used for a handful (vs. infinite) number of things.
Zig also has “no hidden control flow”. This essentially means that each line-of-code written in Zig executes sequentially, as you would expect:
If you haven’t written code before this is likely confusing: doesn’t all code execute line-by-line (i.e., sequentially)? Nope. Often, languages (e.g., JavaScript) will take helpful but “hidden” steps for the developer to fix or improve the execution order (i.e., “control flow”) of their code.
As the steps – while helpful – are hidden from the developer, it can make their code more difficult to reason about (especially for other developers).
Why is this important? Think about it – a developer could forget about the hidden control flow rules of a given language; all of a sudden, their code isn’t executing sequentially as expected? This happens, often in subtle ways, and is confusing for all involved.
As Zig eloquently puts it, you should “focus on debugging your application, not your programming language knowledge”.
Bun provides an equally glowing endorsement: “low-level control over memory and lack of hidden control flow makes it much simpler to write fast software.”
So, we’ve introduced a new value prop here: “Low-level control over memory”.
What is this?
Our computers have “memory”, i.e., spaces (e.g., RAM), within the system where they store data or instructions that will ultimately be used/manipulated again.
For example, in a JavaScript program: example.js we might create a variable: let age = 28. JavaScript will then pull some Houdini-work once more and dynamically allocate enough space in memory at “runtime” to store the variable age for us. Helpful.
Zig is less…presumptuous. Within Zig we have to specify what type our variable is, which reveals a tad more information about how much memory our variable requires.
Zig doesn’t stop here though. Defining your types “statically” is the easy part.
Zig, much like C and C++, enables developers to allocate memory (remember, a space) manually. This means that Zig developers can ~precisely state how much memory they require for a given variable, function, etc., as well as when this memory should be freed, and hence, used elsewhere.
Again, to make the comparison to Brendan Eich’s creation — JavaScript handles this freeing of memory automatically. This process is known as “garbage collection”.
At this point, you may be thinking, “So what?”. What’s important to know — and what relates back to Bun’s proclamation in our intro — is that this granular control of memory leads to performance gains. Why?
Well, there are a few reasons. We’ll point out two:
1. Fragmentation: As memory is allocated and deallocated dynamically, free memory blocks become scattered across the “heap” (place especially for dynamic memory). This can result in fragmented memory, where there are small gaps between allocated blocks.
We’ve “drawn” a diagram that hopefully illustrates this issue more clearly, the main point being that these fragments, because they’re small, end up being a literal waste of space.
2. Garbage Collection: Yes, garbage collection (GC) packs a punch. GC introduces additional overhead. Why? Because it’s ultimately another “program” running in the background of your own.
Andrew Kelley, the creator of Zig, goes as far as saying that GC can result in “stop the world latency glitches”. When it comes to building critical systems (think aviation software), “latency” doesn’t cut it.
GC can also result in “non-deterministic” memory deallocation, i.e., it may ultimately free memory that you would have ideally still had allocated.
To reiterate, whilst potentially perilous, software written in C, C++, Zig, etc., can be more performant than software written in dynamically allocated memory (aka DAM) languages a la Python or JavaScript.
Once again, Zig’s explicitness (in this case, explicit memory allocation) is what makes it simple. You, and your crew of developers, don’t have to figure out how memory is allocated/freed in your application; you literally state this in your code.
As hard as it may be to believe, Zig does even more to foster “simple” codebases such as omitting a “preprocessor” and “macros”. Don’t worry, we’ll get into what these terms mean.
The meaning of “toolchain” is a little more difficult to scope accurately. However, the word typically means a set of utilities: libraries, compilers, build tools, etc., that the language, or users of the language, can leverage.
1. Libraries = code that someone else has written and packaged which can now be used by others to achieve a specific task. E.g., Rust’s Pola.rs library for data manipulation.
2. Compilers = take your high-level code and convert it to “machine code” (1s and 0s) that corresponds to a specific instruction set. Do some other helpful things like optimizing your code (e.g., removing “dead code”).
3. Build Tools = a build tool manages the entire build process, which includes compilation, but also includes dependency management, testing, packaging, etc. Tapestry’s Alex Mackenzie wrote about “building” software in detail on his Nix primer.
We can use Zig’s stated goals (“maintaining robust, optimal and reusable software”) to fine-tune our definition.
With these goals in mind, we consider the Zig toolchain’s most notable features to be the following two:
Now, we’ll delve into Zig’s “Comptime”.
Zig touts its Comptime as “A fresh approach to metaprogramming based on compile-time code execution and lazy evaluation.” Let’s unpack each word emphasized, as per. First, compile-time.
Software has a “lifecycle” that ultimately results in said software being executed (i.e., running on a computer):
Developers write code (think Python), “compile” this code, “link” each compiled file generated (called an “object file”) into a final “executable” and then “run” (i.e., execute) this executable.
Programming languages are typically evaluated at either compile-time (e.g., TypeScript) or runtime (JavaScript). “Evaluation” essentially means checking for errors, determining the “type” of a given variable, etc., all with the ultimate aim of executing a program.
Like any technical decision, there isn’t an objectively “correct” way to evaluate a program. Rather, there are trade-offs.
For example, if you evaluate a language’s “types” at compile-time, then you’ll pick-up the incorrect usage of a “string” in a function that expects an “integer” before you compile said language and run it somewhere. Thus picking up a “bug” before your software is deployed. Phew.
The drawback of this compile-time eval is that developers have to specify the exact type of data they expect their function to receive. This can get rather tricky – end-users of software are unpredictable; they may end up inserting valid data types (e.g., an integer in a “first name” field on a form) that you may not expect.
Zig takes a more democratic approach. The language enables developers to state explicitly which blocks of their code they’d like “evaluated” at compile-time vs. runtime.
This is handled via Zig’s comptime keyword:
Taking all that we now know about Zig, we can assume that the primary goal of this explicit statement of compile-time vs. runtime evaluation is... you guessed it, explicitness.
A developer reading your Zig code doesn’t have to identify/recall what’s being evaluated at compile-time – you literally tell them. Much like Zig’s control flow, nothing is “hidden” from the developer.
Along with the benefit of explicitness, Comptime reiterates Zig’s ability to be fine-tuned for performance.
For example, if we offload type inference to the developer who compiles their software, then the end-user (think a general “consumer”) doesn’t have to handle type inference on their own machine at runtime. Nice.
Now we know what evaluation is and when it happens (compile-time / runtime), we’ll turn to the question: What is “lazy” evaluation?
Thankfully, it’s rather self-explanatory. Lazy evaluation, much like a “lazy person”, isn’t proactive; it only completes a task at the last minute, when it must.
We’ll make this more concrete with some simple Zig code which we’ll build on.
If we were to lazily evaluate this code, we would only check/determine the values of the variables: first_name (“Alex”) and second_name (“Mackenzie”), when we need them. In this case, we need these values to complete the first_name ++ second_name operation.
This means you’re not doing any heavy-lifting before you have to, which ultimately results in more-efficient resource allocation – why calculate the value of an expression if you’re only maybe (e.g., in the context of conditional logic) going to use it later?
We’re aware that this primer is longer-than-most, but programming languages are very much the aggregation of minute technical decisions which, in aggregate, support a handful of objectives. If you want to grok a language, you’ve got to appreciate its nuances.
Next, “Metaprogramming” – tying into our earlier (brief) mention of “preprocessors” and “macros”.
Metaprogramming is common in systems-level programming languages like C, C++, Rust. It’s what you likely expect – a program, “programming” itself.
In practice, metaprogramming involves leveraging compile-time information (e.g., type declarations like: var age: i32 = 28;) to manipulate (e.g., edit/generate code) your program in some way.
For example, with this “type information” our program could automatically edit our variable age’s data type to be “i8” vs. “i32”. i8 is a smaller data type, and hence, takes up less memory. Thus, through metaprogramming, we have optimized our Zig code at compile-time.
As mentioned, Zig is supple. It has 4 “build modes”. (For a refresher on what constitutes “building software”, see Alex’s Nix primer.)
Zig’s 4 build modes are:
You select one of these build modes via the command line like so:
In particular, these build modes speak to Zig’s stated goal of producing optimal and reusable software. Wanna run some Zig code on your toaster? Cool, use ReleaseSmall. Fancy building a database? Impressive, but please use ReleaseSafe.
As hard as it may be to believe, there’s so much more (build.zig, cross-compilation, etc.) that we’d like to take you through, but we feel we’ve covered the essentials here: to convey the essence and purpose of Zig.
If you’re interested in learning more, we also recommend the following: