Hi, I am Saša Jurić, a software developer with 10+ years of professional experience in programming of web and desktop applications using Elixir, Erlang, Ruby, JavaScript, C# and C++. I'm also the author of the upcoming Elixir in Action book. In this blog you can read about Erlang and other programming related topics. You can subscribe to the feed, follow me on Twitter or fork me on GitHub.

Actors in Erlang/Elixir

| Comment on this post
Updated Apr. 06, 2013. - ExActor library note

Introduction

The topic of today's post is introduction to actor model in Erlang. Actors are a more formal way of looking at the Erlang concurrency, and probably the main reason why doing concurrency in Erlang is very easy, even when running many parallel units of execution.

Actors are a fairly large topic, and to present them in code, I would have to provide some introduction of Erlang syntax which is very different than most modern mainstream OO languages. To avoid dealing with the "weirdness" of the Erlang language, I will instead use the Elixir language, which is kind of a more elegant/concise wrapper around Erlang. This will make the presented code extremely simple, so you should be able to follow it even without previous knowledge of Elixir or Erlang.

Elixir is a very young language, built on top of Erlang platform. I regard it as a sort of Ruby flavored version of Erlang. The language is very flexible and hides away much of the unnecessary noise which often occurs in a typical Erlang source. At the same time, it is semantically aligned to Erlang and the underlying principles map 1:1 to the corresponding Erlang representation. After all, Elixir code is compiled to Erlang byte code, and can normally run in Erlang VM, as well as cooperate with other Erlang code.

The code in Elixir will allow us to examine actors from a somewhat higher level, hiding away the mechanical details and tedium of the Erlang language. However, if you find the topic interesting, and plan to investigate it deeper, or possibly even use it in production, I suggest you first read a book on Erlang, thoroughly understand the low level workings, write some pure Erlang code, and only then possibly move to Elixir.

(Im)mutability

As a consequence of being functional languages, both Erlang and Elixir use immutable variables. Once assigned, they can't be modified. Of course, almost every program, more complex than hello world, will have to deal with a state which changes based on some external interactions (user input, tcp/http requests, ...).

The primary (though not the only) way of maintaining a mutable state in Erlang is to run a separate Erlang process. We can start a process (not only) by running a spawn command:

pid = spawn(
  # ...
)

In between parentheses will be a reference to a function, or a body of an inline anonymous function, which will run concurrently. The return value is the id of the created Erlang process, often called pid. We can use that value to send messages to that process:

pid <- message

Messages are arbitrary Elixir/Erlang terms, whatever you can put in a variable (e.g. list, structures, ...), and sending a message means that its value is placed in the mailbox of the receiving process, after which the sender goes on executing its own code. The receiver can obtain the next message by calling the receive statement. Messages are processed in the order they are placed in the mailbox (although this behavior can be altered in code).

When we want to maintain a continuous mutable state, we have to run an endless recursion in a separate Erlang process. The code outline looks like this:

1
2
3
4
5
def loop(state) do
  message = receive
  new_state = f(state, message)
  loop(new_state)
end

The process enters the loop function with the current state, which is an arbitrary Elixir/Erlang term. It then waits for a message from some other process, and, upon receiving it, computes the new state, depending on the message content and the current state. Finally, the loop function is called recursively, effectively setting the new state in place of the old one. The next message will operate on the new state.

In Elixir/Erlang, such recursion will not cause a stack overflow, since both languages have special handling of the so called "tail calls" which will, on a byte code level, be transformed to a jump/goto instructions. Consequently, this code simply runs an endless loop.

Once we have such process running, and hold its pid, we can interact with it via messages. For example:

1
2
3
4
5
6
# async send and pray
pid <- {:set, :value, "123"}

# sync call and get response
pid <- {:get, :value}
response = receive # the receiver must send us the response

The {...} is a tuple which in Elixir/Erlang is a sort of a weak type struct. The :something represents an atom, similar to Ruby symbol, kind of a named constant.

Of course, for this code to work, handling of such messages must be implemented in the receiving process. In the previous snippet, that would be the implementation of the f(current_state, message).

To summarize: our process runs concurrently, it encapsulates a state, and we can send that process messages to modify the state, or to retrieve it. We call such process an actor.

The examples

The principles above outline the workflow of an actor on the lowest level. To do something useful with it, a fair amount of code is required. It gets even more complex if you want to do production level code. Erlang/OTP address this issue by offering an abstraction called gen_server (a generic server process), which abstracts typical message passing patterns, but adds more boilerplate.

Elixir simplifies the use of gen_server, and there is an additional wrapper called genx which removes most of the duplication. On top of this, I have utilized Elixir's extensibility and built additional abstractions which allows me to write very dense, OO-ish like code, and hide away most of the mechanical details.

The abstractions I wrote are quick hacks, made specifically for the purposes of this blog. I don't advise you to use them in production.
Update (06. April, 2013.): I have since modified the library and use it in production for some time.

The presented code will be deceptively simple, which will help us to observe actors from a somewhat higher level. However, I'd like to point out that the underlying implementation relies on the mentioned gen_server, which is in turn powered by the endless recursion and message passing mechanism presented earlier. The recursion will therefore not be coded explicitly, but under the hood it will still execute, approximately as described.

The complete code of the examples, together with build/start instructions can be found here.

Simple calculator

The first example is a simple calculator actor which supports increment/decrement operations. Let's see how we can use it:

1
2
3
4
5
calculator = Calculator.actor_start(0)
calculator.inc(10)
calculator.dec(5)
result = calculator.get
IO.puts result

The code is not spectacular, but it illustrates the simplicity of use. The actor is created with an initial value of 0. Then I add the value of 10, and subtract a value of 5. Finally, I retrieve the result and print it. Since calculator is an actor it works concurrently. Specifically, inc/dec operations are asynchronous, while get is obviously synchronous, since it has to return the result.

In the first line, an actor is created. Under the hood, the function start will spawn a process, sending it value 0 as an argument. The actor will use that value as its initial state, which internally means, we will enter the infinite recursion with the value of 0.

Now we can use the calculator variable to do something with an actor, for example invoke increment/decrement operations. Behind the scene, these functions will send asynchronous messages {:inc, 10} or {:dec, 5} to the calculator process without waiting for the response (actually these messages will be decorated by gen_server with a bit more contents).

In the fourth line, the actor's state is retrieved by calling synchronous get operation. Internally, this function sends a get message to the actor, and waits for it to respond.

As I already mentioned, actors normally process messages in the order they are received. Therefore the get message will be processed after inc and dec, although they were issued asynchronously. Consequently, in the final line, we will print the result of 5 (0 + 10 - 5).

The actor's implementation looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
defmodule Calculator do
  use ExActor

  defcast inc(x), state: value, do
    new_state(value + x)
  end
  
  defcast dec(x), state: value do 
    new_state(value - x)
  end
  
  defcall get, state: value do
    value
  end
end

First, a module is defined, which is simply a collection of functions. In the line 2, there is a use construction which includes additional functions and macros to support easier actor definition and manipulation. For example, defcast and defcall are macros defined in ExActor.

In the line 4, the inc operation is defined. The defcast means that an operation will be defined as a cast which in Erlang terminology means it will be asynchronous. When a client calls inc, a message will be sent, and the caller will continue its execution immediately.

The state: value means that, in the function's body, we will refer to the actor's state (the argument of an infinite recursive function) using the name value.

Between do/end is the implementation of our operation. This is the code that will run in the actor process. More specifically, the code will handle the {:inc, x} messages. In the message handler we have to compute the new value of our state and return it in the special form required by gen_server. The gen_server will reenter the infinite recursion with the new value, which, as explained earlier, changes the state of the actor.
Elixir/Erlang use implicit returns: the result of the function's last statement is its return value. The new_state function forms the response so it can satisfy gen_server requirements.

The operation dec is implemented in the same way as inc, while get has some differences. Since it must return a value to the caller, it is defined via defcall, which means it will be a synchronous operation (aka call). Here the return value is treated differently: it is sent back to the caller, while the state is not modified, which means that recursion will be reentered with the same value.


Call/cast responses

To recap, our cast operation can return new_state(something) which will set the new state. If we return a "naked" value (i.e. without wrapping it with new_state) it will be ignored, and current state will be used. This is usually helpful when actor has to interact with external entities (other actors, files, networks, databases) without changing its state.

For call, the return value is sent to caller without state modification. If we want to modify the state as well, we can use reply(value, new_state) which would mean that we respond to the caller with a value and at the same time change the state to new_state.

Finally, there are some other special forms of responses. For example, one form of response can be used to stop the actor. When gen_server receives such response, it simply does not reenter recursion, and the process will consequently finish. 


Actors vs objects

Actors and objects have some properties in common: both encapsulate state while hiding its details from their clients. The clients can manipulate the state via messages (actors) or methods (objects).

Coming from OO, it helps me to think of actors as objects. More specifically I reason about connections between actors i.e. how they interact together. The composition of actors resembles that of objects. You can make global actors, singletons, or "local" ones which are known only to a limited number of actors. However, be aware of the fact that actors are not garbage collectible. They are long running processes which will not be terminated if no "reference" to them (their pid) exists.

Unlike objects, actors are inherently concurrent, and can run in parallel. They are also completely independent and have no data in common. One actor cannot corrupt the state of another nor can a crash in one actor impact the other ones, unless explicitly specified by the programmer.

Consumer/producer

This slightly more complicated example involves two actors. The producer creates random integers and sends it to the consumer. The consumer simply prints the received values to the screen. Since both are actors, they operate concurrently.

The producer exposes one service: produce, which, when called, will produce a number and send it to the consumer. This is the code:

1
2
3
4
5
6
7
8
9
10
defmodule Producer do
  use  ExActor

  defcast produce, state: consumer do
    :timer.sleep(100)
    value = :random.uniform(100)
    consumer.consume(value)
    IO.puts "produced #{value}"
  end
end

The producer sleeps for some time, which is a simulation of a long running operation, then it creates a number and passes it to the consumer.
Notice that the produce operation doesn't end with a new_state(...) statement. This means that the actor's state will not be changed by the operation. That's ok, because producer's state is a pid of the consumer, and we don't want to change that when performing the produce operation.
The weird :timer.sleep and :random.uniform calls are Elixir's way of calling Erlang functions and they demonstrate how we can easily interoperate with Erlang libraries.

This is how the consumer looks like:

1
2
3
4
5
6
7
8
defmodule Consumer do
  use  ExActor

  defcast consume(value) do
    :timer.sleep(200)
    IO.puts "                consumed #{value}"
  end
end

The consumer's code is even simpler: it sleeps for some time, and then prints the received value to the screen without using the state at all (hence, no state: identifier part in the consume definition).

Notice that both produce and consume operations are defined as casts. This means that the calling processes will continue with the execution immediately after invoking them.

This is the usage example:

1
2
3
4
5
6
7
consumer = Consumer.actor_start
producer = Producer.actor_start(consumer)

times(5, fn(_) -> producer.produce end)

IO.puts "main process finished\n"
:timer.sleep(2000)

Nothing fancy here: we create both actors, connect them, and invoke the produce operation five times. This is the output:

main process finished

produced 45
produced 73
                consumed 45
produced 95
produced 51
                consumed 73
produced 32
                consumed 95
                consumed 51
                consumed 32

The output illustrates the asynchronous nature of cast operations. The "main" process has finished immediately, and we can also see how producer was generating values faster than the consumer was able to handle them (since producer sleeps for 100ms, and the consumer for 200ms). Finally, comparing the produced and consumed values, we can confirm that the messages are processed in the order received.

Chat backend

To provide a more complex example, I made a small sketch of how a basic chat server would be implemented. Since this article is already getting long, I'll only briefly outline the concepts. The full code can be found in the github repository together with the previous two examples.

In a typical Erlang based chat server, each chatroom and each user are represented with an actor. In the most basic implementation, a chatroom actor's state will be the list of its users while a user's state will be the pid (reference) of the chatroom he is currently in. When a user wants to post a message to the room, its process will send the message to the chatroom process, which will in turn loop through all of its users (except the sender) and send them the message. The user process will then transport the message to the physical user via network. To keep the example simple, I didn't implement the networking interface, but have instead simply printed the message to the screen.

In such architecture, everything runs concurrently (since chatrooms and users are actors), and yet the code is fairly simple and straightforward, not burdened with intricacies of locking and synchronizing, typically found in conventional multithreading approaches.

The concurrent property of the system means that we can use available CPU resources, and easily scale up by adding more processor power to address higher load of the system. Obviously, the scalability will depend on interactions between actors. If, for example, many actors are synchronously calling one specific actors, then that one actor can be a potential bottleneck. However, such situation is now easier to identify, since we can analyze dependencies between our actors, discover bottlenecks, and work on resolving them.

Recap

This article presented a lot of concepts. Most important to remember is that an actor is an Erlang process which encapsulates some state. The clients can communicate with it by sending messages to it, providing they have its pid (process id). There are two forms of communication: cast (asynchronous), and call (synchronous: client sends a message, receiver sends the response back).

The concept of actors allows us to create neatly designed solutions for complex concurrent problems, and also to look at our system from a higher perspective, analyzing the dependencies between actors, or studying each one in isolation.

Teaching orthogonal programming paradigms

| 1 comment

I didn't plan to post so soon, once a month is about as often as I can spare. However, this post from @rosettacode provoked me:

Pick three to five programming languages for teaching orthogonal programming paradigms. Which did you choose? Why? Blog and I'll share.

So, without further ado, here's my list, in the particular order:

1. C (procedural / imperative)
Why: It's ubiquitous on most platforms, it's extremely fast,  it's the lingua franca of combining multiple technologies together. Most importantly: every time I try to understand the inner workings of any piece of technology, I think of how a corresponding C code would look like.

2. Ruby (OO / dynamic)
Why: It's easy, elegant and concise. Probably my favorite language I've worked with so far. Supports many, if not all OO concepts, very flexible, especially suitable for building small internal DSLs. I find it incredible how much can be done with only a few lines of very readable code.

3. SQL (declarative)
Note: only declarative parts of the language i.e. queries, updates, DML/DDL, without stored procedures, triggers and custom extensions.

Why: SQL is to me the best example of separating "what" from "how". You build powerful queries, stating what you need, and the database engine does the rest. It is incredible how data can be flexibly sliced, filtered, joined and grouped, even when it is not normalized.

4. Erlang (concurrency / functional)
Why: The language is super simple, yet extremely powerful. Its support for concurrency is beyond anything I have ever seen. In addition, it supports pattern matching and promotes functional style. Finally, I consider the platform i.e. how the pieces of the language + VM + framework work together, as one of the greatest masterpiece in the IT field I have ever seen.

5. Lisp (functional)
Why: It's the Latin of programming languages. Almost every language and development platform, no matter which paradigm does it come from, borrows something from it.

Erlang based server systems

| 2 comments
One of the biggest advantages of Erlang is that you can use it to implement and run your entire server system. More specifically, you can develop following parts of your system in Erlang:
  • web server
  • shared and/or persistent state
  • jobs and scheduled tasks
  • process monitoring and restarting
  • distributed systems running on multiple machines
In fact, distributed systems aside, you can run everything inside exactly one OS process using only a handful of OS threads. In addition, that one OS process can run arbitrary number of independent server systems (e.g. multiple web servers).

Web server

Erlang can be used to build completely standalone web server, serving both dynamic and static content. Even without an HTTP server such as Nginx or Apache in front of it, an Erlang web server will work efficiently on its own, and be able to to handle large number of concurrent requests. Typical Erlang web server uses one Erlang process per request. Since Erlang processes are lightweight, you can create a large number of them, so there's no fear that you will run out of available web request handlers.

Additionally, Erlang's preemptive scheduler will ensure that long running requests don't block the rest of the system, regardless of whether the processing is I/O or CPU bound.

Finally, the server will usually be vertically scalable, so it will be able to use all available CPU resources, allowing you to handle increased load by adding more hardware power.

Shared and/or persistent state

In a typical server, you often need to manage state which extends beyond the context of a single request, and/or is shared among different requests, or even users, for example caches, user session data, any global server data, etc. Erlang gives you a couple of built-in mechanisms for doing this, thus eliminating the need for external components such as memcached, Redis or external databases.

Typical approach is to create one or more separate Erlang processes, which run for a long time (possibly forever) and manage state. This is often referred to as the Actor model, where an actor is a concurrent entity (in Erlang case a process) which encapsulates state, can receive messages from other actors and modify its state accordingly, or send parts of that state to other actors. Consequently, from your web request handler processes, you can communicate with the state related processes, retrieving the current state or providing some new data which must be incorporated into it.

In addition, Erlang offers a fast in memory "mutable" key-value structure, called ETS, with concurrent read/write access, meaning that it can be shared and simultaneously used from multiple running processes.

On top of this, Erlang comes with a nosql database called Mnesia which offers typical database services such as transactions, flexible queries, indexes, and gives you the full control over which data will be kept only in memory, and which will be persisted to disc. Therefore you can use Mnesia not only to keep in memory state, but also to persist that state in order to recover after OS process or machine restarts.

Jobs and scheduled tasks

Typical servers often perform some amount of background processing, and you can implement these entirely in Erlang, without having to resort to approaches such as cron jobs, daemons, services, Resque, delayed_job, etc. When you need to do a background job, simply start a separate Erlang process and perform your task in that process constantly in an infinite loop, periodically in regular time intervals, irregularly depending on some external event, or once, depending on your needs. Again, the preemptive scheduler will ensure that CPU resources are fairly used, i.e. that background tasks don't block the rest of your system.

Process monitoring and restarting

The standard Erlang framework (OTP) provides a facility called supervisor, the Erlang process which monitors (supervises) other processes (workers), restarting them if they crash. You can use supervisors to ensure that all parts of your system are working. In addition, Erlang has the so called "heart" service: a separate OS process which monitors and, if required, restarts the Erlang VM. This removes the need for solutions such as monit or god.

Distributed systems running on multiple machines

Erlang offers simple, yet powerful primitives of communication between different Erlang VM instances. Instead of combining restful or SOAP based web services, and implement json or xml (de)serialization of your data model, you can send messages and invoke functions in a completely strongly typed manner, and in this way distribute your system over multiple machines. I've touched the topic briefly in this post, so I won't repeat myself here. I will only add that Mnesia has built in support for replication across nodes, giving you an easy way of sharing state between multiple Erlang VM instances.

Multiple servers in one OS process

It is easy to start multiple independent Erlang applications inside a single instance of an Erlang VM, and at the same time separately deploy, stop and restart those applications. In this way, Erlang is sort of an OS inside an OS, with similar characteristics: multiple applications can run (in)dependently in an Erlang system and each application is divided into multiple (in)dependent processes. A crash of one application will not impact another (unless it explicitly depends on it). However, a malicious application can intentionally kill the other ones, consume all hardware resources, and even crash the entire system. Therefore, use this approach only for applications coming from trusted sources (e.g. in-house developed, or official ones coming from Ericsson).

Final thoughts

Implementing entire system with one technology offers many advantages. It provides a uniform development platform, promotes code reuse, while simplifying operational tasks, such as environment setup, deployment, monitoring, testing, scaling, balancing, etc. Erlang gives you that option, and in this aspect it outshines every other development platform I am familiar of. In addition, none of the mentioned approaches is an improvisation or a hack. Instead, they are all mechanisms and services developed by Ericsson to fulfill exactly those needs, and have been used in production in large systems for two decades.

Every presented Erlang based approach should suffice for small to medium uses, and many of them will also excel under heavy load. Still, not all are full fledged substitutes for corresponding mainstream technologies. For example, when serving large number of static files, I would resort to Nginx, Apache or CDN. When dealing with large data quantity, I would prefer a traditional (R)DBMS to Mnesia. In systems, consisting of components implemented in many different technologies, I would use e.g. Redis, or some message queue for data sharing. In addition, Erlang distributed model lacks a serious security model, so it is appropriate only in a trusted environment.

Nevertheless, Erlang based approaches will often suit your needs, and when they don't, you have the option of using something else. With many other development platforms, you must turn to external technologies simply because there's no alternative provided in the platform.