A Naive mini-parser

Tagged: elixir

At the end of last week, I had a bit of fun approaching how to migrate recipes from Ari's Gardenº to Galleyº.

The high-level problem is that Ari's Garden is an elm application that consumes a big JSON file whereas Galley needs recipe data to be input by form.

Most of the above are easily accomplished by importing Poison for decoding the json, and then iterating over those entries and mapping them to an elixir map that can happily be pushed into the database.

What remained a slight unknown was how I would parse the strange-cryptic-syntax I made in Ari's Garden used to describe a step in a recipe. It looks like this:

"[&: 4. Simmer | 00:10:00] Mix the [#: sugar-white | sugar], [#: water | water], [#: salt | salt] and ginger slices and simmer for 10 minutes.

Each step in a recipe in the Ari's Garden json might have a single timer or one or more ingredient references. Both of these are demarcated by the use of square brackets followed by a symbol to denote the type of element: # for an ingredient reference and & for a timer.

I decided that because I had made this little syntax to learn about parsers using elm's parsing tools, I should try and recreate the same thing in elixir.

What followed worked out in the end but it was pretty clumsy! Nonetheless, it's a harmless non-production migration script and so I thought I'd share my approach here. I also think it's interesting because I recently bought crafting interpreters (thanks to a book stipend from my work) which I think will cover in-depth some of the things I naively tried to do based on what I remembered from the elm parser api.

The output data I needed looked as so:

%{
  output: "",
  capturing: false,
  capturing_timer: false,
  capturing_ingredient: false,
  capturing_ingr_string: false,
  timer: ""
},

Only the output and the timer were of any relevance to me - everything else was a flag that determined where I was while parsing. The output just needed to be a cleaned string indicating the recipe step without any syntax in it. The timer needed to be an elixir map with an hour and a minute key.

The actual parsing happens by reducing over a split-up string and for each item encountered we flip some of the flags on/off, which then informs if we "capture" the string item to be put into the cleaned output or if we are capturing the timer. Here's what it looks like:

  def parse_single_step(step) do
    step_list = step |> String.graphemes()

    step_data =
      Enum.reduce(
        step_list,
        %{
          output: "",
          capturing: false,
          capturing_timer: false,
          capturing_ingredient: false,
          capturing_ingr_string: false,
          timer: ""
        },
        fn item, acc ->
          case item do
            "[" ->
              %{acc | output: acc.output, capturing: true}

            "]" ->
              %{
                acc
                | output: acc.output,
                  capturing: false,
                  capturing_ingredient: false,
                  capturing_ingr_string: false,
                  capturing_timer: false
              }

            "&" ->
              if acc.capturing do
                %{acc | capturing_timer: true}
              else
                acc
              end

            "#" ->
              if acc.capturing do
                %{acc | capturing_ingredient: true}
              else
                acc
              end

            "|" ->
              if acc.capturing && acc.capturing_ingredient do
                %{acc | capturing_ingr_string: true}
              else
                acc
              end

            _ ->
              if acc.capturing do
                cond do
                  acc.capturing_timer -> %{acc | timer: acc.timer <> item}
                  acc.capturing_ingr_string -> %{acc | output: acc.output <> item}
                  acc.capturing_ingredient -> %{acc | output: acc.output}
                  true -> acc
                end
              else
                %{acc | output: acc.output <> item}
              end
          end
        end
      )

    # trim the string, removing any consecutive 2x white space.
    {
      String.trim(step_data.output) |> String.split("  ") |> Enum.join(" "),
      step_data.timer |> String.split() |> List.last()
    }
  end

(you can see more code from the migration script here.)

One of the funny things about working in elixir is that there's no mutation of variables (At least not that I've found; I don't think it has anything akin to atoms in clojure). As a result of this, I knew I needed to accumulate some state with each loop over the split-up string to carry the knowledge of what had been encountered previously.

I can tell I'm interested in this problem space because something is gnawing at me that says there are better ways to do this. Crafting Interpreters waits for me on my desk, boosting up the monitor to eye-level, while I wait for the energy and gusto to take it on.

Thanks for reading!

o/

WT