Rabbit Hole: Bun #2

Tagged: zig rabbithole

Preface

Hello! This week I hope to continue what I started last week in exploring bun (pt 1). I went full throttle, following my curiousity on whatever I felt like perusing. I originally wanted to figure out how bun was replacing npm install and I ended up reading the code on BufferedWriters. Let's get back to what's going on in the codebase.

    pub inline fn install(
        ctx: Command.Context,
    ) !void {
        // I may have been avoiding this...
        var manager = try PackageManager.init(ctx, null, &install_params);
        // ...
    }

To start, we've got a param not in the current scope: &install_params. When I navigate to it we see:

    pub const install_params = install_params_ ++ [_]ParamType{
        clap.parseParam("<POS> ...                         ") catch unreachable,
    };

Which seems to use clap (which I'm guessing is a CLI builder tool, similar to the Rust library I'm familiar with). We can ignore that (for now).

The init function returns a !*PackageManager which, if I'm understanding Zig syntax correctly, is the type of a pointer to a PackageManager. However, there is a ! at the beginning which makes this a Error Union type. This means that the function will either return an Error, or the value that follows the !.

The shape of PackageManager

Once we have a PackageManager, the install function runs the installWithManager function with differing levels log_level that are attached to the manager. So, before we look at the installWithManager function (which will probably answer some of my original question), I want to know about what constitutes the PackageManager. There's quite a bit going on here - if I use lsp-ui-imenu I can get an overview of the data and the methods on the PackageManager struct. It's a bit overwhelming (and the contents below are heavily truncated too!):

PackageManager
  ┃ // some 50-70 methods and constants :|
  ┃ init
  ┃ install
  ┃ installPackages
  ┃ installWithManager
  ┃ // some more stuff...
  ┃ writeYarnLock
  ┃  ┗ ...
  ┃  ┃ parse
  ┃     ┗ ...
  ┃ NetworkTaskQueue
  ┃ Options
  ┃ PackageDedupeList
  ┃ PackageIndex
  ┃  ┃ installPackageWithNameAndResolution
  ┃     ┗ ...
  ┃  ┗ ...
  ┃ ParamType
  ┃ PreallocatedNetworkTasks
  ┃  ┗ ...
  ┃ ResolvedPackageResult
  ┃  ┗ ...
  ┃ dependency_lists_to_check
  ┃ install_params
  ┃ install_params_

At this point I think I need to narrow down my question for today's post. What I want to figure out is how/when Bun is downloading node modules and storing them into a node_modules structure.

installWithManager

Alright, let's continue on with installWithManager. I started searching around for more strings with the word install in it. Near the end of the installWithManager function there is this:

var install_summary = PackageInstall.Summary{};
if (manager.options.do.install_packages) {
    install_summary = try manager.installPackages(
        manager.lockfile,
        log_level,
    );
}

But before we jump to follow the installPackages() method, we should lay some ground work for the surrounding context:

Three questions come to mind when I see the above code sample.

  1. What is happening in the hundreds of lines leading up to this call in installWithManager? Anything we need to know?

  2. What does the PackageInstall struct do - and how related is it to the call to installPackages (which is attached to the PackageManager struct, not the PackageInstall struct)?

  3. What does installPackages() do?

Number 3 above is the most important. We can ignore mostly 1 & 2 for now based on some assumptions:

  1. Seems to be mostly about lockfile diffing and other stuff I only briefly skimmed. I saw some stuff about managing and flushing queues that might be important. More or less though, if I get stuck ahead, I'll probably come back to here.

  2. I'm not sure what PackageInstall does, but it seems like it has a struct within it called Summary that might contains stats, results, etc. It looks like the results of calling installPackages gives us Summary struct stored in install_summary.

What does installPackages() do?

installPackages is a 300 line function - but I'm still finding it pretty intimidating. I'll try and break it down into manageable pieces:

The function signature says it returns an error union of a PackageInstall.Summary which sounds self explanatory. Next we have a few lines that set up *Progress.Node which seem to be for printing installation progress. Moving on.

Next we have a block that creates a node_modules folder if it doesn't exist, and opens the folder if it does (and crashes if neither work).

After that the function pulls data off the lockfile it is passed in and seems to break it into several parts and puts all those parts into a PackageInstaller:

var installer = PackageInstaller{
    .manager = this,
    .options = &this.options,
    .metas = metas,
    .bins = parts.items(.bin),
    .root_node_modules_folder = node_modules_folder,
    .names = names,
    .resolutions = resolutions,
    .lockfile = lockfile,
    .node = &install_node,
    .node_modules_folder = node_modules_folder,
    .progress = progress,
    .skip_verify = skip_verify,
    .skip_delete = skip_delete,
    .summary = &summary,
    .force_install = force_install,
    .install_count = lockfile.buffers.hoisted_packages.items.len,
    .successfully_installed = try Bitset.initEmpty(lockfile.packages.len, this.allocator),
};

Ok, next we have a while loop that iterates over node_modules and creates folders for each...dependency? And then calls a runTasks method on each one. The function wraps up by linking binaries into a .bin folder, I think. That's all I've got in me for today. Let's close out.

Challenges & Personal Preferences

I have to acknowledge I felt pretty challenged reading through this code. Every part of me just wanted to skip and give up. Part of that is certainly an unfamiliarity with Zig, and the other is that this is a massive project. It's hard to see what does what in this massive file - there are structs for all kinds of things, and then some structs have struct definitions inside them, which is throwing me off.

I think, partially, I find it very difficult to navigate a long file. This file is 3000 lines of code - which is long for me (I'm sure there are some people out there who scoff at that - so go ahead, scoff loud and hard!) - but I would be much more inclined to put each "subdomain" into it's own file (one for LockFiles, one for PackageInstall, one for PackageManager). It's going to take practice to get used to this.

That's it for now. I think the next time I do a post like this I'd like to at least find how and where packages are fetched from NPM (or wherever they are specified in our lockfiles.). I'll probably need to investigate the shape of a package.lock and a yarn.lock entry to get an idea of the shape of what constitutes a dependency.

Until next time!