Hi, I am Saša Jurić, a software developer with 10+ years of professional experience in programming of web and desktop applications using Elixir, Erlang, Ruby, JavaScript, C# and C++. I'm also the author of the upcoming Elixir in Action book. In this blog you can read about Erlang and other programming related topics. You can subscribe to the feed, follow me on Twitter or fork me on GitHub.

Immutable programming, OO style

| Comment on this post
Continuing the topic started in the previous post, today I will go a bit deeper and present some examples on how to manipulate hierarchical data using only immutable data structures. In this article I will use the object oriented approach, based on polymorphic dispatch and data + code coupling. This style is often frowned upon by functional programmers, it is usually avoided by Erlang developers, and even the creator of Elixir, José Valim, discourages this approach.

Despite the controversy surrounding it, I will use this aproach as a starting point for easy transition from mainstream OO languages to the immutable world. The pure functional approach will be discussed in the next article.

Since Elixir is often wrongfully labeled as OO language, I would like to stress that the programming style used in these examples is in no way Elixir specific. Erlang itself supports polymorphic dispatches via a lesser known feature called "tuple modules", which allows us to couple code with data. Although somewhat controversial, the feature is still here, and we can exploit it to make the code more readable.

Before starting, let me give you a slight preview of what will we be able to accomplish:

Company.new(name: "Initech").
  add_employee(Employee.new(name: "Peter Gibbons",  salary: 10000)).
  add_employee(Employee.new(name: "Michael Bolton", salary: 12000))

Although the code looks very OO-ish, no in-place mutation of a variable takes place. Instead, each function call creates a new intermediate variable which holds the modified version, which (as explained in my previous post) shares as much data as possible with the old version (assuming appropriate data structures are used).


Basic manipulation

The basic unit of code organization in Elixir is a module, a simple collection of functions. On top of this, drawing from the power of its macro system, Elixir provides records, which are nothing more than modules with some autogenerated functions, namely field readers and modifiers. Records are mainly meant to be used as pure structures which group data together. However, since they are essentially modules, we can extend them with our own functions which, when defined in a particular way, can simulate something akin to OO methods:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
defrecord Company, [
  name: nil, employees: []  # record fields
] do
  
  def add_employee(employee, this) do
    this.update_employees(fn(current_employees) ->
      [employee | current_employees]
    end)
  end
end

c0 = Company.new(name: "Initech")
c1 = c0.add_employee("Peter Gibbons")
c2 = c1.add_employee("Michael Bolton")

IO.inspect c2

We start off by defining a record and its fields. Then in line 5 a function add_employee is defined. The function explicitly states this (an instance of a record) as its last argument. Because of this (pun intended), the function can be used as a "method" (lines 13 and 14).

The code of the function is fairly simple: it invokes update_employees on this to push new employee to the top of the list. The function update_employees is generated by defrecord macro, and it follows the same style as add_employee, i.e. it explicitly expects the instance of the record as its last argument. The lambda we are passing to it takes the current employees collection and must return the modified version. The update_employee function then returns a new instance of the Company record, its employees field containing whatever we returned from the lambda. This is an immutable way of modifying things: instead of an in-place mutation, we return a modified version.

Lines 12-14 demonstrate how we can use the Company record. First we create a new instance, then we can use it to call add_employee "method", and print the final result. The explicit intermediate variables are redundant, but I used them to emphasize the workflow.

The OO dispatch works because of two reasons. First, the call to Company.new(name: "Initech") creates a tuple in the form of {Company, "Initech", []}. When calling c0.add_employee(...) Erlang will in runtime translate the call into Company.add_employee(..., c0). This runtime dispatch/translation takes place because c0 is a tuple, and its first element identifies a module where the "method" must reside. Due to dispatch mechanism, c0 will be implicitly passed as the last argument, which is why we have to explicitly state this as the last argument of the function we want to use as a method. Let me state again: this is completely Erlang mechanism, with no special enhancements from Elixir.


Complex mutations

Extending the basic example above, let's introduce a dedicated Employee record which can hold additional employee data (e.g. its salary). When being added to a company, an employee will obtain its own autogenerated id which can later be used for fast retrieval. Here is the complete code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
defrecord Employee, id: nil, name: nil, salary: nil

defrecord Company, [
  name: nil, employees: HashDict.new, autoid: 1
] do

  def add_employee(employee, this) do
    this.
      store_employee(employee.id(this.autoid)).
      update_autoid(&1 + 1)
  end
  
  def store_employee(employee, this) do
    this.
      update_employees(HashDict.put(&1, employee.id, employee))
  end

  def get_employee(id, this), do: Dict.get(this.employees, id)
end

c = Company.new(name: "Initech").
      add_employee(Employee.new(name: "Peter Gibbons",  salary: 10000)).
      add_employee(Employee.new(name: "Michael Bolton", salary: 12000))

IO.inspect c
IO.inspect c.get_employee(1)
IO.inspect c.get_employee(5)

We start off by defining an Employee record. The record doesn't implement any custom functions, but as I mentioned earlier, defrecord will autogenerate some accessors and modifiers for us.

The Company record is then defined, this time using HashDict (key-value dictionary) to store employees. It also contains a field called autoid, which will be used to automatically provide incremental ids for new employees.

To add a new employee, we must first set its id to the current value of the autoid field, store it to employees dictionary (line 9) and finally increment the autoid field of the company record (line 10). The strange looking &1 + 1 construction is a shorthand of writing a lambda: fn(x) -> x + 1 end.

Notice in lines 8-10 how calls are chained by the dot (.) operator. The return value of a function ("method") servers as an invocation point of the next call. By doing this we can avoid explicit declaration of intermediate variables.

The storing of the employee (lines 13-16) simply amounts to putting an employee record in the dictionary, using its id as the key. Retrieving of an employee is a simple id based lookup.

The usage is illustrated in lines 21-23. Notice that a single call to add_employee internally results in multiple mutations: one modification of the employee record and two modifications of the company record. This should demonstrate that complex mutations are not only possible, but actually very easy to implement with immutables.


Modifying children

Extending the example further, let's implement a support for modifications of company employees. There are a couple of ways to do this, and here I will show the generic approach. The Company record is extended with the update_employee method which can be used by clients to arbitrarily modify an existing employee:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
defrecord Company
  ...
  
  def update_employee(id, update_fun, this) do
    this.get_employee(id)
    |> this.maybe_update_employee(update_fun)
  end
  
  def maybe_update_employee(nil, _, this), do: this
  def maybe_update_employee(employee, update_fun, this) do
    update_fun.(employee) 
    |> this.store_employee
  end
end

The update_employee "method" takes an employee's id and the function which will modify it (return the new version). It first retrieves the employee (line 5) and then maybe (if the employee exists) updates it (line 6). The |> operator feeds the result of the previous call as the first argument to the next call.

In case no employee with a given id exists, the company is not modified (line 9). Otherwise, the updater function is called (line 11) and its result is fed (again via |> operator) to the earlier described store_employee "method" (line 13).

We can now use this function for example to give a 20 percent raise to the specific employee:

1
2
3
4
5
c1 = c.update_employee(1, fn(employee) ->
  employee.update_salary(&1 * 1.2)
end)

IO.inspect c1

The presented code illustrates how to deal with updates in a deep hierarchy. To modify a child, we have to go through its parent. If a child is deep in the hierarchy, we will have to go through all of its parents. This is also known as "Law of Demeter" which states that we should communicate only with our immediate neighbors.

People often complain that immutable hierarchy is hard to manipulate and modify, especially if we want to change a child somewhere deep in the tree. In Elixir, I simply don't find this to be true. If we organize our code systematically, making sure that each level in hierarchy is modeled with the corresponding module, and that updates are appropriately delegated through these modules, it should not be complicated to modify any element, regardless of its position in the tree. Of course, we cannot directly update a child deep in the hierarchy, which is a good thing since it forces us to organize our code and structure our updates, making it harder (often not possible) to resort to quick and dirty hacks.


Iterative updates

So far we have only been doing manual hierarchy modifications. Often we want to perform a series of mutations based on some input. Let's say we need to implement a functionality which uses a list of raw employees data (for example obtained from a database) to construct the corresponding Company instance.

To do this, we must run some kind of loop over employees, adding them to the Company instance, one by one. However, when working with immutables, classical loops, such as while or for, are not possible, since they rely on mutable variables. Instead (as I briefly described in my last article), we have to use either recursion or higher order functions (which are basically wrappers around recursion). The code below uses the latter approach:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
input = [
  [name: "Peter Gibbons",  salary: 10000],
  [name: "Michael Bolton", salary: 12000]
]

c = Enum.reduce(
  input,                          # enumerable
  Company.new(name: "Initech"),   # initial accumulator
  fn(employee_data, company) ->   # accumulator modifier
    company.add_employee(Employee.new(employee_data))
  end
)

IO.inspect c

A list of raw employee data is defined (lines 1-4), then a new company instance is created (line 8), and finally, each employee entry is added to that instance (line 10).

To do this, the reduce function (elsewhere known as inject or fold) is utilised, which can be used to transform anything enumerable into anything else. When calling reduce we must provide enumerable (line 7), the initial accumulator value (line 8) and accumulator modifier function (lines 9-11), which will be called by the reduce function for each element of the enumerable. The modifier receives the current element of the enumerable (first argument in line 9) and the current accumulator value (second argument in line 9). Based on these arguments it must return the new version of the accumulator (line 10), which will then be passed to the modifier in the next iteration step. The final accumulator value will be returned by the reduce function to its caller. In line 6 we store it to the variable c which is then printed in line 14.


OO dispatch downsides

As I mentioned, OO-like dynamic dispatch is often frowned upon by functional programmers. While some arguments are more philosophical, there are a couple of practical downsides. 

First, since the dispatch is resolved in runtime, it means the call will be somewhat slower. I did a quick measure test couple of months ago, and observed a slowdown of a fraction of microsecond per function call. Whether this is acceptable or not depends on the specific case. If you're doing very heavy processing, manage a large number of concurrent processes, or run your system on a limited hardware, you might find this unacceptable. For most uses, though, I don't think it is a significant problem.

For OO dispatch to work, all "methods" must be made public. I think public/private scoping is a generally useful tool to distinguish between interface and implementation of a module, and with OO dispatch this benefit is lost. In previous examples, the internal function store_employee had to be declared as public and consequently can be called by anyone.

Another problem is that OO style doesn't mix well with the pure functional style which is used in Elixir core libraries, and probably will be the preferred style of most Elixir programmers. While two styles can be combined, it doesn't work so elegantly, requiring explicit declaration of intermediate variables. Moreover, since the styles are not compatible, it will not be easy to convert OO styled complex code to pure functional, if you decide to do so sometimes in the future.

Therefore, OO style is usually not the optimal approach. Regardless, for me personally, this approach was the first time immutable programming really clicked, having been struggling with pure functional approach for more than two years. So although burdened with issues, I find the approach as a good way to dive deeper into managing changing data with immutables. Once this style is grasped, it should be much easier to understand and use pure functional approach.


Wrapping up

Hopefully, this article has somewhat dispelled a myth that managing hierarchical data is hard with immutables. The examples should get you started with some basic techniques which you can use in Elixir, but also in most mainstream OO languages.

While the presented OO approach is not very popular in FP community, I still consider it as a good way to start shifting towards more functional style, which I will discuss the next time.

Post a Comment