weakty

A few days ago a post showed up on hackernews detailing using a local llama directly from emacs. Very cool stuff! I wanted to get this working and so I struggled with it for a few hours, poking programs and files that I didn’t really understand.

The little script in the repo linked above basically pipes the contents of your buffer into an LLM that lives on your local computer. That’s an impressive feat, that I barely understand. I’ll still try and deduce what’s happening for the fun of it.

It seems that a lot of this is possible thanks to the work of Justine Tunney and their work has been better covered than I will be able to in how this is all possible.

All I know, is that somehow I can download an emacs binary (and tons of other stuff) from this page on my machine and it just works. And on other machines. Then, Llamafiles seem to be bundles of portable executable that contain everything you need to run an LLM in a single file, on whatever operating system. After that, hooking one up to emacs is fairly trivial, if you know how to do that. I know enough elisp to be dangerous, and I got it working, finally with a bit of help.

On my machine, I had to use the actually portable executable (APE) to run the LLM, but other than that, it works. It’s very close to what’s in the original readme:

    (with-local-quit
      (call-process "ape"

                    nil (list (current-buffer) nil) t
                    "/Users/tees/Documents/llms/bin/wizardcoder-python-13b-main.llamafile"
                    "--prompt-cache" cash
                    "--prompt-cache-all"
                    "--silent-prompt"
                    "--temp" "0"
                    "-c" "1024"
                    "-ngl" "35"
                    "-r" "```"
                    "-r" "\n}"
                    "-f" hist))

Here’s a gif of the LLM being given some comments and the beginning of a handle_param function:

I see a lot of use in being able to get an additional hand when working in language I’m not super familiar in, or just reaching for this to write a data-munging function that I’m struggling to wrap my head around. I don’t know what people are doing with integrating LLM’s into their entire codebase, but when working on a per-buffer-basis, this seems great.

I can’t help but wonder about whether this will make me lazy, or if it will enable me to spend more time doing the kinds of things I want to do with the computer. I enjoy building products, not muddling in writing bulletproof functions, so I think I’m leaning toward the latter.