Rabbit Hole: Bun #1

Tagged: zig rabbithole

Preface

I need to read more code that isn't by me. I love writing new code, but it's only in the last year or so that I've really gotten into the habit of reading code to find answers when I'm working. I still find reading a new codebase pretty intimidating. The truth is, reading code makes up a large part of my job - and it is getting easier. With that said, I often shrug off checking out open source projects because I might find them immediately intimidating.

Like most things I'm doing these days, I'm reminding myself that I just need to keep that prsctice train rolling (choo choo), and it will get easier.

Zig & Bun

I've been quietly watching Zig grow as a language and I'm keen to use it someday for something. I tried using it at the start of advent of code 2021, but frankly, advent of code isn't really for me (yet). I got stumped on the first day trying to open a file. I felt a little defeated. Yes, zig is a new language with not a lot of documentation out there, and yes, I don't really like doing algorithm puzzles, (and yes, I don't have much experience with low level languages that do memory management ) -- need I go on?

But the truth is, I could go farther and feed my curiousity with a little practice.

That brings in Bun : "bun is like postcss, babel, node & webpack in one 100x faster tool for building modern web frontends." Like Zig, I've been quietly watching the creator of Bun tweet about their progress (man, writing this I feel like such a lurker). I scoped the code a bit and felt pretty lost. But I have to say, I'm pretty inspired by seeing these tweets, and I'd like to know more about these kinds of things (parsers, compilers, low-level languages) in programming.

So! I think it's time to dig into Bun. I'm not sure what I'll write in this post, but I'm going to try and get Bun working on my computer, and after that do some reading.

After lunch that is.

Installation & Setup

Installing Bun is easy. You have to join a Discord channel to get access, via a bot, to the Github Repo (am I spilling secrets by doing this?!). Then you can install it via curl.

I browsed through the CLI trying to pick a piece of the project to explore. Bun does a lot. I'm not sure where to start exactly, but I think I'll look into how Bun can ... replace npm!

I run bun create to learn about how I can create a templated JavaScript project. I create a react project since that's familiar and seems easy. Then I just cd into the dir and run bun. It's bundling my files (and fast).

Running bun bun (cute) in your terminal will create your bundle. At first I was mystified by the .bun file that was created, but the readme explains this:

#### What is `.bun`?

The `.bun` file contains:

  • all the bundled source code

  • all the bundled source code metadata

  • project metadata & configuration

Here are some of the questions `.bun` files answer:

  • when I import `react/index.js`, where in the `.bun` is the code for that? (not resolving, just the code)

  • what modules of a package are used?

  • what framework is used? (e.g. Next.js)

  • where is the routes directory?

  • how big is each imported dependency?

  • what is the hash of the bundle’s contents? (for etags)

  • what is the name & version of every npm package exported in this bundle?

  • what modules from which packages are used in this project? ("project" defined as all the entry points used to generate the .bun)

All in one file.

It’s a little like a build cache, but designed for reuse. I hope people will eventually check it into version control so their coworkers don’t have to run `npm install` as often.

I'd like to come back to looking at the bundling mechanism (but by then I guess I'll want to look at the parser... there's so much going on here!). Let's see if I can find where the code lives that replaces the need for npm install.

Bun Install

From the readme:

## Using bun as a package manager

On Linux, `bun install` tends to install packages 20x - 100x faster than `npm install`. On macOS, it’s more like 4x - 80x.

// ...

```bash bun install ```

Let's see if I can find it.

Start with the CLI.

Woops, I started poking around and found bun/src/js_parser/js_parser.zig. It's 15,000+ lines. I'm just going to back away slowly. Stay focused here.

I found a file called bun/src/cli/install_command.zig. Cool. I don't need to know how the cli is running this command, but it's time to follow the yarn.

const Command = @import("../cli.zig").Command;
const PackageManager = @import("../install/install.zig").PackageManager;

pub const InstallCommand = struct {
    pub fn exec(ctx: Command.Context) !void {
        try PackageManager.install(ctx);
    }
};

If we head over to install.zig, we get another massive file (7000 lines). Ok, no need to worry, let's narrow in on trying to figure out what the install function does - I know it should be somewhere on the Package Manager struct:

    pub inline fn install(
        ctx: Command.Context,
    ) !void {
        var manager = try PackageManager.init(ctx, null, &install_params);

        if (manager.options.log_level != .silent) {
            Output.prettyErrorln("<r><b>bun install <r><d>v" ++ Global.package_json_version ++ "<r>\n", .{});
            Output.flush();
        }

        var package_json_contents = manager.root_package_json_file.readToEndAlloc(ctx.allocator, std.math.maxInt(usize)) catch |err| {
            if (manager.options.log_level != .silent) {
                Output.prettyErrorln("<r><red>{s} reading package.json<r> :(", .{@errorName(err)});
                Output.flush();
            }
            return;
        };

        try switch (manager.options.log_level) {
            .default => installWithManager(ctx, manager, package_json_contents, .default),
            .verbose => installWithManager(ctx, manager, package_json_contents, .verbose),
            .silent => installWithManager(ctx, manager, package_json_contents, .silent),
            .default_no_progress => installWithManager(ctx, manager, package_json_contents, .default_no_progress),
            .verbose_no_progress => installWithManager(ctx, manager, package_json_contents, .verbose_no_progress),
        };
    }

Found it. Lots happening, but not a long function. Let's pick a direction

Detour - Exploring Globals

Some answers to how does installing from npm in Bun lie somewhere in exploring the PackageManager , I'm sure - this struct probably does quite a bit I imagine, as this install function is quite short, so I'll circle back to that. I also see that we have some method calls on something called Output - which isn't part of the function's scope. So, I want to know what output does. Is it some global variable? Let's find out. I'll mark where I'm in the file in emacs so I can jump back to it quickly with ' + a.

Thankfully, someone created an LSP plugin for Zig, and it works in emacs. I can use consult-imenu to find the declaration for Output, rather than search by string (which has 201 results).

const _global = @import("../global.zig");
const string = _global.string;
const Output = _global.Output;
const Global = _global.Global;

So there is some global code. I know I'm getting a bit distracted, but I'd like to know what it is before checking out PackageManager. Let's go checkout ../global.zig.

pub const Output = struct {
    // These are threadlocal so we don't have stdout/stderr writing on top of each other
    threadlocal var source: Source = undefined;
    threadlocal var source_set: bool = false;

    pub fn flush() void {
        if (Environment.isNative and source_set) {
            source.buffered_stream.flush() catch {};
            source.buffered_error_stream.flush() catch {};
            // source.stream.flush() catch {};
            // source.error_stream.flush() catch {};
        }
    }

    // ...

Ok, so this file (about 600~ lines at time of writing) seems to setup some common structs and methods on those structs for interacting with global data/the terminal etc. Lets look at flush since that's the method I saw in the previous file.

It looks like Environment.isNative is going to check the environment we are running; src/env.zig seems to handle setting up booleans for determining the target os we are running in:

pub const BuildTarget = enum { native, wasm, wasi };
pub const build_target: BuildTarget = brk: {
    if (@import("builtin").target.isWasm() and @import("builtin").target.getOsTag() == .wasi) {
        break :brk BuildTarget.wasi;
    } else if (@import("builtin").target.isWasm()) {
        break :brk BuildTarget.wasm;
    } else {
        break :brk BuildTarget.native;
    }
};

pub const isWasm = build_target == .wasm;
pub const isNative = build_target == .native;

Cool - looks like there is some wasm stuff that could be happening here. Let's scan the readme quickly to see if that's mentioned...nope - nothing there. I'll ignore for now.

Figuring out flush()

Let's return to flush().

            source.buffered_stream.flush() catch {};
            source.buffered_error_stream.flush() catch {};

In true rabbit-hole fashion (apt, considering Bun), I'm wondering about what a buffered_stream is, even though I meant to go on and read about the PackageManager struct. There's no rules - Lets detour again!

What is a BufferedWriter?

For those who already know this answer - peruse at your own boredom!

Ok, now we're getting down to some standard library stuff, and I'm probably going to learn something new!

    pub const Source = struct {
        // ...
        pub const BufferedStream: type = std.io.BufferedWriter(4096, @typeInfo(std.meta.declarationInfo(StreamType, "writer").data.Fn.fn_type).Fn.return_type.?);

        buffered_stream: BufferedStream,
        buffered_error_stream: BufferedStream,

Time to break down what the type of BufferedStream - it's one long intimidating line with zig syntax that I don't know yet, but we're almost there (and then we'll call it a day 😅). So, this is a custom data type that is being aliased to the result of std.io.BufferedWriter. I think.

First up - std.io.BufferedWriter. This link brings us to some new and experimental documentation for the standard library. - Unfortunately, we find this:

This declaration is not tested or referenced, and it has therefore not been included in semantic analysis, which means the only documentation available is whatever is in the doc comments.

There are no doc comments for this declaration.

That's fair! Zig is a young language and as per this nascent rabbit hole post-genre - I'll keep digging. Let's go find the standard library code for BufferedWriter. Cool. Ok, so there's no documentation here.

Let's take a break and be honest with ourselves (me, with myself.) I think at this point, if I googled "what is a buffered writer" I could probably find someone explaining what it is using Golang or something. But I'd be a little disappointed. If you've read this far - you might as well get to know me a bit better - sometimes when I'm coding and I start to get a bit fatigued with a problem (when I still don't get something) I often want to give up. Who wouldn't? But what is that going to accomplish? Nothing! What will writing /long, winding posts like this? More than nothing!.

Ok. We're looking at the std library code for a BufferedWriter. We've made it. We're in the weeds (in a good way!).

pub fn BufferedWriter(comptime buffer_size: usize, comptime WriterType: type) type {
    return struct {
        unbuffered_writer: WriterType,
        fifo: FifoType = FifoType.init(),

        pub const Error = WriterType.Error;
        pub const Writer = io.Writer(*Self, Error, write);

        const Self = @This();
        const FifoType = std.fifo.LinearFifo(u8, std.fifo.LinearFifoBufferType{ .Static = buffer_size });

        pub fn flush(self: *Self) !void {
            while (true) {
                const slice = self.fifo.readableSlice(0);
                if (slice.len == 0) break;
                try self.unbuffered_writer.writeAll(slice);
                self.fifo.discard(slice.len);
            }
        }

        pub fn write(self: *Self, bytes: []const u8) Error!usize {
            if (bytes.len >= self.fifo.writableLength()) {
                try self.flush();
                return self.unbuffered_writer.write(bytes);
            }
            self.fifo.writeAssumeCapacity(bytes);
            return bytes.len;
        }

        // ...
    };
}

We're getting somewhere - we've found the flush method. Let's try and answer two questions:

  1. Remember the original params encountered in the bun code? What are they being used for here?

  2. What is flush actually doing?

Before answering, I've got to draw a line here - this function is full of unique zig syntax I don't know yet, and I'll have to stop somewhere before I end up reading the whole std lib.

The first param - the buffer_size gets passed to the construction of a fifo - a First in FirstOut which, in my limited knowledge is a specific data structure where the first datum into it is the first datum out of it (that's the best I can do right now!). So for now, I'm going to assume that we're creating a kind of storage mechanism for bytes to live in, in a nice little line.

The second question - I think this data structure has a method that lets you slice off values and then the function, provided there is something off the fifo, tries to call self.unbuffered_writer.writeAll(slice) (which I can only guess does some writing to the terminal).

At this point, I'd say I'm healthily stuck and guessing a fair bit. I'm not quite sure how a writer works, which is a bit frustrating because I've seen them in Rust and Go and I don't like not knowing what code does when I'm writing from a tutorial. I think I've earned myself a google. I found this article and it seems like a great overview.


Some meta reflections - this was a fun post to write, I let myself have free-reign to follow whatever path I wanted. I don't expect it to be very interesting to read, but I feel like it's a pretty fast way to push myself to move past ignoring the "how" behind the scenes of libraries and languages.

Until next time - I hope I get back to that PackageManager struct!