See ya async!

Tagged: clojure

The discussion evolves

I'm back with another report on investigating core.async in Trunk. My thread on clojureverse has evolved quite a bit, with interested contributions, including some from the prolific and prodigious Borkdude (who has helped me on more than one occasion). The conversation has evolved into a discussion on the merits of what's possible with core.async in contrast to using a library like promesa. I temporarily considered using promesa; I like the idea of avoiding using a complex library like core.async (and for some of the suggested arguments, including that I would be essentially be embedded another event loop inside the javascript event loop.). But before I could do that, Borkdude pointed out that there is a sqlite library that is synchronous (and purportedly better), I had come across better-sqlite-3 in my readings on callbacks in the mapbox sqlite-3 library for node, but had not noticed that it was not asynchronous. I did notice that it posited better benchmarks (up to 10x faster in some cases) - I filed it away in my mind as a possible thing to refactor to in the future in the case that my database rigging was not efficient or ended up being slow.


In the span of a short evening coding session (and an early morning one the next day) I quickly refactored to using the new library. Just like that, all my concerns with asynchronicity were gone. Funny. This latest refactor has brought up questions along with answers however.

First off, I'm still very glad that I posted my question. I've been meaning to get a better grasp of core.async. Further, I was happy to hear some folks discuss and debate its merits and challenges, especially in the javascript world. I learned a fair bit and in a short time I also noticed some of the challenges with it (notably, the challenge of debug inside the go block.)

Dumb questions?

I feel like I'm about to ask a dumb question but... why wouldn't I use a synchronous database library for a standard server/client application? Let's dig in...

Trunk has an operation that opens an article for you to read in your target language. To do this it performs the following:

  1. Send a message using IPCRenderer to the backend to say "hey! I want to read article =x="

  2. IPCMain receives this message, similar to a router, and makes the following database calls: a. article-get-by-id b. article-attach-words

  3. 2a is a simple get request, but 2b makes as many queries as there are words per article, and then attaches the results to the result of 2a before sending it back to the client.

In all this time, the client can't do anything. It can't show the article until it's back, and the database call for 2b can't be invoked until 2a is completed. I have no other things to process in the background - I have to execute these functions synchronously.

So why wouldn't I want a synchronous db library (that also happens to be faster than the existing async one...)?

I don't know the answer to this yet. I have basic understanding of the event loop in JavaScript, but I could probably use a refresher. I'll try and remember what I know. Using a async db call might be useful if I had other work that had to be executed on the main thread - but again, there is nothing. So, I'm a little stumped.

Before and After

Let's look a little at how the code evolved.

"Getting" an article, (as described above) started out like this:

(defn- words-get-for-article
  "Looks at a delimited string and queries for all the words in it.
  TODO: maybe just combine this with article-get.
  [article cb]
  (let [word-ids   (article :word_ids)
        words-orig (str/split word-ids "$")]
    (letfn [(recurse [words out]
              (if (= (count out) (count words-orig))
                (cb (clj->js (assoc article :word-data out))) ;; maybe not efficient in the future to use clj->js, could do raw js interop in this fn.
                (let [[x & xs] words]
                  (.get db "SELECT * FROM words WHERE word_id = ?" (array x)
                        (fn [err row]
                          (recurse xs (conj out (js->clj row))))))))]
      (recurse words-orig []))))

(defn article-get
  "Fetches an article, and computes the `:word-data` for it. Sets `last_opened` value before fetching."
  [id cb]
  (let [q1-sql    "UPDATE articles SET last_opened = ? WHERE article_id = ?"
        q1-params (array (js/ id)
        q2-sql    "SELECT * FROM articles WHERE article_id = ?"
        q2-params (array id)]
    (.run db q1-sql q1-params
          (fn [err]
            (.get db q2-sql q2-params (fn [err row]
                                    (words-get-for-article  (js->clj row :keywordize-keys true) cb)))))))

Then, here it is with core.async, in which I was able to make the db functions a bit cleaner, and then combine them in the IPC handler:

;; DB functions:
(defn <articles-get
  (<sql {:op :all :sql "SELECT * FROM articles ORDER BY date_created DESC"}))

;; Look into preparing statements -- less recursion:
(defn <article-attach-words
  "Split up word_ids in an article, and query for each word. This is/will be slow."
  (let [word-ids   (get article :word_ids)
        words-orig (str/split word-ids "$") ;; FIXME: shouldn't this be (u/split-article article)?
        out-chan   (promise-chan)]
    (letfn [(recurse [words out]
              (if (= (count out) (count words-orig))
                (put! out-chan (clj->js (assoc article :word-data out)))
                (let [[x & xs] words]
                  (.get db "SELECT * FROM words WHERE word_id = ?" (array x)
                        (fn [err row]
                          (recurse xs (conj out (js->clj row))))))))]
      (recurse words-orig []))

;; -- IPC Handler:

   ;; BEFORE, on node-sqlite3 (mapbox)
   ;; Time to get article with 341 words: 92.222ms
   ;; Time to get article with 2500~ words: 630.365ms
   ;; Time to get an article with
   (s-ev :article-get)
   (fn [event data]
     (go (let [id              (data :article_id)
             updated-article (db/<article-update-last-opened id)
             article-whole   (<! (db/<article-get-by-id id))
             final-res       (<! (db/<article-attach-words article-whole))]
         (reply! event (s-ev :article-received) final-res))))

And now? Now this is how things look with a synchronous library:

;; -- DB calls:

(defn articles-get
  (sql {:stmt "SELECT * FROM articles ORDER BY date_created DESC"
        :op :all}))

(defn article-attach-words
  (let [word-ids      (get article :word_ids)
        words-ids-vec (str/split word-ids "$") ;; FIXME: shouldn't this be (u/split-article article)?
        words-out     (atom [])]
    (doseq [word-id words-ids-vec]
      (let [res (sql {:stmt   "SELECT * FROM words WHERE word_id = ?"
                      :params [word-id]
                      :op     :get})]
        (swap! words-out conj res)))
    (assoc article :word-data @words-out)))

;; -- IPC handler:

   ;; AFTER, on better-sqlite3
   ;; Time to get article with 341 words: 52.726ms
   ;; Time to get article with 2500~ words: 251.335ms
   (s-ev :article-get)
   (fn [event data]
     (let [id            (data :article_id)]
       (db/article-update-last-opened id)
       (->> id
            (reply! event (s-ev :article-received)))))


As an aside, I thought to see if I could run a benchmark on my macbook before and after I refactored (and along the way I learned about the very useful console.time and console.timeEnd functions!).

On my MacBook Pro (13-inch, 2019, Two Thunderbolt 3 ports) with 1.4 GHz Quad-Core Intel Core i5 and 16 GB 2133 MHz LPDDR3 the timing was cut in half, pretty much. (Disclaimer, I don't know much about benchmarking...yet.)

Wrap up

All in all, I've encountered another technical decision that is a bit intimidating to make (am I just picking up a different footgun?). However, I'm confident that I'll stay with this change for now. I've also written up a wrapper sql function that makes most of the calls, so if things change again, I'm hopefuly I'll only need to change one function and the rest will fall into place.

Thanks for reading!