A few Notes on the "Elixir in Action" book.

Chapters 1 & 2

  • The . indicator in modules does not actually create a hierarchical system of modules - it's just a syntactic sugar that helps to scope naming.

  • The Kernel module is called "kernel" because it supplies common functions across all modules (and is automatically imported into all modules for convenience).

  • Aliasing lets you rename a module you want to call out to (either implicitly or explicitly):

defmodule MyModule do
  alias Geometry.Rectangle as: Rect # << explicity
  # alias Geometry.Reactangle # << implicit: returns the last segment: Rectangle

  # ...

  def my_func do
    Rect.Area
  end
end
  • Module attributes are indicated by @. They can be used as constants or can be "registered" (example: @doc and @moduledoc)

  • Similarly, @spec is a module attribute and is used by dialyzer.

  • Atoms (ex: :an_atom) tend to be very memory efficient for named constants as the variables that holds one points to a location on the "atom table" and thus doesn't contain the entire text of an atom.

Chapter 3

Some really interesting notes here on matching and control flow. I wrote a post on things I learned about Streams.

Chapter 4 - Data Abstraction

Basic principles of data abstraction

  • A module is in charge of abstracting some data.

  • the modules functions usually expect an instance of the data abstraction as the first argument

  • modifier functions return a modified version of the abstraction (ex: String.upcase/1)

  • query functions return some other type of data (ie, String.length/1 returns info about a string)

  • On structs

    • Structs and Modules are tightly linked - a struct may exist only in a module, and a single module can define only one struct.

    • You can assert a variable is a struct: %Fraction{} = one_half

    • Structs are just maps, but unless specified, you can't use fns from Enum on it.

    • You can however use fns from Map module.

  • On records:

    • records are essentially tuples.

    • they are around for historical reasons, mostly.

Regarding Polymorphism

  • Polymorphism is a runtime decision about which code to execute based on the input data.

  • The Enum module is a great example of this; it works on Lists, Maps, Sets, etc.

  • Protocols enable the building polymorphic, generic fns.

  • You can think of a protocol as a contract that a data type follows.

  • A protocol is a module where you declare functions without implementing them.

  • defimpl is the syntax used to implement a specific protocal for a specific type.

Common example? Implementing String.Chars for a module (which has a struct):

defimpl String.Chars, for: TodoList do
  def to_string(_) do
    "#TodoList"
  end
end
  • defimpl can be used outside any module; making it useful for implementing a protocol for a type even if you can't modify the type's source code.

  • one of the most important protocols is Enumerable it is perhaps one of the best demonstrations of protocol usefulness.

Chapter 5 - Concurrency Primitives

Concepts

  • Three important concepts:

    • Fault-tolerance - Minimize, isolate and recover from the effects of runtime errors.

    • Scalability - Handle load increase by adding more hardware resources without changing or redeploying code.

    • Distribution - Run your system on multiple machines so that others can take over if one machine crashes.

  • The unit of concurrency on the BEAM is a "process" (not an OS process.)

  • How the BEAM does concurrency:

    • For each CPU core available there is a scheduler

    • Quad core machine = four schedulers.

    • Each scheduler runs in it's own thread.

    • The entire VM runs in a single OS process.

    • "Schedulers are in charge of the interchangeable execution of [beam] processes."

    • Beam processes are lighter than threads.

    • Beam processes are isolated; they have their own state and can receive messages from other processes that tell them to maniuplate or retrieve that state. +

Message passing

  • Processes can't share data so they communicate by message passing

  • Processes have a "mailbox" for holding received messages.

  • A message can be any kind of "Elixir term" - that is, anything that can be stored in a variable.

  • because messages can't share memory a message is deep copied when it's sent.

  • A process mailbox is a FIFO queue. The max size of the mailbox is only limited by the available memory.

  • sending a message requires having a processes' identifier, or pid

  • receive is like a case but it monitor the processes' mailbox.

Stateful server processes

  • Server's can have state through the use of recursion.

  • When you spawn a process, you call an infinitely-recursive loop function, passing it the initial state.

  • loop then uses receive to continually receive messages, where you can alter the initial state and return it to be handled after the next client interaction arrives (ie through the use of send).

  • In a "server" module, it can be useful to think of the file being divided into client/server sections, where the client functions are the interface to working with the server and the server section has functions for handling the implementation of the interface.

    • client functions are just going to take the pid of the server and then send whatever payload is needed by the server.

    • the server functions will work behind the scenes to do whatever the client is requesting, and then finish by passing the updated/munged/whatever-you-did-to-it state back into the tail-recursive loop function.

  • It's possible to name processes, which is useful when you are only going to spawn one of them. This is done using Process.register(<pid>, :some_name) and then you can use send(:some_name, :my_message). Names may only be an atom

Runtime Considerations

  • processes are sequential - while multiple processes can run in parallel, a single process is always sequential - either running some code or waiting for a message. So - look out for long running work.

  • Mailbox size - a process mailbox can be unlimited (up to how much memory you have) - so if messages are coming in faster than they are processed then you will cumulatively consume memory until the system crashes.

    • an example bug might be that your receive doesn't handle certain cases (or doesn't have a catch all _ case), where messages could pile up in the mailbox.

  • message payload size is something to be cognizant of - messages are deep-copied (since processes can't share memory) and while that can be reasonably fast, doing it in high volumes could be problematic.

  • The benefits of shared-nothing processes:

    • no need for complex state management - mutexes and locks and whatnot.

    • garbage collection can take place on the process level (meaning it is also very concurrent, I think.)

Chapter 6 - Genserver!

  • you can use Genserver to reduce the boilerplate of using plain server processes(infinite recursion, state management, message passing).

  • Erlang provides OTP, which is a framework of many useful tools, one of which is Genserver.

Building a generic server process

  • Let's build what Genserver does before using it.

  • A generic server is always going to have a set of common tasks: Ex: spawning a proccess.

  • but the implementation has to determine it's own state (again we're thinking in interfaces / implementations)

  • Module names are atoms.

    • You can store it in a variable and use that var to call functions on the module.

    • using this feature we can provide callback hooks from the generic code of our generic server. Thus, the module we've stored in a variable is called a callback module.

    • For this to work we have to ensure that the module implements and exports a well defined set of functions (ie, behaviours, which will come later, I'm assuming.)

  • When you implement a genserver, it's good to abstract away things like "handle_call" into more generic function names ("interface functions") that describe what the server process is doing.

Using GenServer

Features of GenServer:

  • support for call and cast

  • customizable timeouts for call requesets

  • propogation of server process crashes

  • support for distributed systems.

OTP behaviours

  • a behaviour is generic code that implements a common pattern.

  • a bevhavior consists of generic logic in a behavior module and a corresponding implementation callback module.

  • the callback module implements the contract of the behaviour module.

  • Tip: want to see what functions are in your module (especially after a use macro is used?): MyModule.__info__(:functions)

Other features

  • compile time checking

  • use handle_info for handling messages sent from not-the-genserver process.

  • use @impl GenServer to ensure that your callbacks are correctly implemented.

  • you can give your GenServer a name.

    • idiom: pass your implementation module (KeyValueStore in the examples) as the name to your Genserver (Modules are just atoms in disguise after all!)

    • this saves having to always pass the pid into your client's interface functions:

defmodule KeyValueStore do
  def start(), do: GenServer.start(__MODULE__, nil, name: __MODULE__))
  def put(key, val), do: GenServer.cast(__MODULE__, {:put, key, value})
end
  • you can stop the Genserver by invoking GenServer.stop/3

  • returning {:stop, reason, new_state} from handle_* callbacks causes GenServer to stop and run a terminate fn internally.

The Actor Model

Erlang is an accidental implemntation of the Actor model originally described by Carl Hewitt. An actor is a concurrent computational entity that encapsulates state and can communicate with other actors. When processing a single message, an actor can designate the new state that will be used when processing the next message. This is roughly similar to how GenServer-based processes work in Erlang. Note, though, that as Rober Virding (of of Erlang's co inventors ) has repeatedly stated, Erlang developers arrived at this idea on their own and larned about the existence of the Actor model much later.

There are some disagreements about whether Erlang is a proper implementations of the Actor model, and the term actor isn't used much in the Erlang community. This book doesn't use this terminology either. Still, it's worth keeping in mind that in the context of Erlang, an actor corresponds to a server process, most frequently a GenServer.

OTP compliant processes

  • when building production systems, you should avoid using plain processes started with spawn.

  • All your processes should be OTP-compliant processes. These adhere to OPT conventions and can be used in supervision trees.

  • Example, rather than spawn your own process, you could use the Task module, which is OTP compliant, for running one-off jobs that process some input and then stop.

Chapter 7 - Building a concurrent system

This chapter mostly goes over building a todolist with mix and using concurrent practices.

  • In elixir you can use :erlang.term_to_binary/1 for converting any term (and elixir value: strings, tuples, maps,) into binary data that can be persisted and then reloaded with :erlang.binary_to_term

iex(2)> %{foo: "bar"} |> :erlang.term_to_binary
<<131, 116, 0, 0, 0, 1, 100, 0, 3, 102, 111, 111, 109, 0, 0, 0, 3, 98, 97, 114>>

Possible reasons you may want to run your code in a process:

  • the code must manage a long living state

  • the code handles a kind of r esources that can and shoudl be reused, sucha sa TCP connection, database connection, file handle, pipe to an os process, etc

  • a critical section of the code must be syncrhonized, only one process may run this code in any moment.

Chapter 8 - Fault tolerance basics

  • fault tolerance if a first class concept in beam.

  • the goal of fault tolerance is to acknowledge a failure, minimize impact, and then recover without human intervention.

  • there are three types of runtime errors - errors, exits and throws.

  • you can thtrow errors - which allows non local returns. It can help you escape deeply nested loops... but it's hacky. Try not to do it.

  • you can pattern match in catch.

  • you can link processes using Process.link. Or, you can just use spawn_link.

    • when one process (A) fails that is linked to another (B), by default, B will fail as well.

    • You can instead "trap" exits so that B does not go down, but receives the error message in it's queue.

  • "Monitors" are a means of a parent monitoring multiple process children (the children are called "monitors"?).

    • You can use Process.monitor(target_pid) to do that.

  • Supervisors

    • is a generic process that manages the lifecycle of other process in a system.

    • a supervisor process can start other processes, which are considered it's children.

    • using links/monitors/exit traps, a supervisor detects terminations of any child and can restart it if needed.

    • You can think of the Supervisor module in erlang as an API wrapper around the spawn_link/trap_exit/monitor primitives.

Chapter 9 - Isolating error effects

covers: understanding supervision trees, starting workers dynamically and "let it crash."

  • it's all about considering what will happen if a process crashes.

  • if you are using start_link, you are going to be taking down other processes when a process fails.

    • for example, if a database worker process fails, why should the cache_process server fail?

  • this kind of "one crashes them all" comes from starting working processes from within other processes.

  • note: supervisor's start their children synchronously.

    • Make sure that the init function for each child doesn't take too long.

    • if it does have to take a long time, use send(self(), :real_init) for example, to preform the real work. (page 193)

Regitering processes

  • what if you need to access multiple processes that can't all be registere under the __MODULE__ name when created?

  • for example, you might have a database process that uses one of many database workers (from a database pool), to make queries.

Registery.register(:my_register, {:database_worker, 1}, nil)