Hi, I am Saša Jurić, a software developer with 10+ years of professional experience in programming of web and desktop applications using Elixir, Erlang, Ruby, JavaScript, C# and C++. I'm also the author of the upcoming Elixir in Action book. In this blog you can read about Erlang and other programming related topics. You can subscribe to the feed, follow me on Twitter or fork me on GitHub.

Understanding Elixir macros, Part 5 - Reshaping the AST

| Comment on this post
Last time I presented a basic version of deftraceable macro that allows us to write traceable functions. The final version of the macro has some remaining issues, and today we'll tackle one of those - arguments pattern matching.

Today's exercise should demonstrate that we have to carefully consider our assumptions about possible inputs to our macros can receive.

The problem

As I hinted the last time, the current version of deftraceable doesn't work with pattern matched arguments. Let's demonstrate the problem:

1
2
3
4
5
6
7
8
iex(1)> defmodule Tracer do ... end

iex(2)> defmodule Test do
          import Tracer

          deftraceable div(_, 0), do: :error
        end
** (CompileError) iex:5: unbound variable _

So what happened? The deftraceable macro blindly assumes that input arguments are plain variables or constants. Hence, when you call deftracable div(a, b), do: ... the generated code will contain:

1
passed_args = [a, b] |> Enum.map(&inspect/1) |> Enum.join(",")

This will work as expected, but if one argument is an anonymous variable (_), then we generate the following code:

1
passed_args = [_, 0] |> Enum.map(&inspect/1) |> Enum.join(",")

This is obviously not correct, and therefore we get the unbound variable error.

So what's the solution? We shouldn't assume anything about input arguments. Instead, we should take each argument into a dedicated variable generated by the macro. Or to say it with code, if our macro is called with:

1
deftraceable fun(pattern1, pattern2, ...)

We should generate the function head:

1
def fun(pattern1 = arg1, pattern2 = arg2, ...)

This allows us to take argument values into our internal temp variables, and print the contents of those variables.

The solution

So let's implement this. First, I'm going to show you the top-level sketch of the solution:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
defmacro deftraceable(head, body) do
  {fun_name, args_ast} = name_and_args(head)

  # Decorates input args by adding "= argX" to each argument.
  # Also returns a list of argument names (arg1, arg2, ...)
  {arg_names, decorated_args} = decorate_args(args_ast)

  head = ??   # Replace original args with decorated ones

  quote do
    def unquote(head) do
      ... # unchanged

      # Use temp variables to make a trace message
      passed_args = unquote(arg_names) |> Enum.map(&inspect/1) |> Enum.join(",")

      ... # unchanged
    end
  end
end

First, we extract name and args from the head (we resolved this in previous article). Then we have to inject = argX into the args_ast and take back the modified args (which we'll put into decorated_args).

We also need pure names of generated variables (or more exactly their AST), since we'll use these to collect argument values. The variable arg_names will essentially contain quote do [arg_1, arg_2, ...] end which can be easily injected into the tree.

So let's implement the rest. First, let's see how we can decorate arguments:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
defp decorate_args(args_ast) do
  for {arg_ast, index} <- Enum.with_index(args_ast) do
    # Dynamically generate quoted identifier
    arg_name = Macro.var(:"arg#{index}", __MODULE__)

    # Generate AST for patternX = argX
    full_arg = quote do
      unquote(arg_ast) = unquote(arg_name)
    end

    {arg_name, full_arg}
  end
  |> List.unzip
  |> List.to_tuple
end


Most of the action takes place in the for comprehension. Essentially we go through input AST fragment of each variable, and compute the temp name (quoted argX) relying on the Macro.var/2 function which can transform an atom into a quoted variable that has a name of that atom. The second argument to Macro.var/2 ensures that the variable is hygienic. Although we'll inject arg1, arg2, ... variables into the caller context, the caller won't see these variables. In fact, a user of deftraceable can freely use these names for some local variables without interfering with temps introduced by our macro.

Finally, at the end of the comprehension we return a tuple consisting of the temp's name, and the quoted full pattern - (e.g. _ = arg1, or 0 = arg2). The little dance after the comprehension with unzip and to_tuple ensures that decorate_args returns the result in form of {arg_names, decorated_args}.

With decorate_args helper ready we can pass input arguments, and get decorated ones, together with the names of temp variables. Now we need to inject these decorated arguments into the head of the function, in place of the original arguments. In particular, we must perform following steps:
  1. Walk recursively through the AST of the input function head.
  2. Find the place where function name and arguments are specified.
  3. Replace original (input) arguments with the AST of decorated arguments
This task can be reasonably simplified if we rely on Macro.postwalk/2 function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
defmacro deftraceable(head, body) do
  {fun_name, args_ast} = name_and_args(head)

  {arg_names, decorated_args} = decorate_args(args_ast)

  # 1. Walk recursively through the AST
  head = Macro.postwalk(
    head,

    # This lambda is called for each element in the input AST and
    # has a chance of returning alternative AST
    fn
      # 2. Pattern match the place where function name and arguments are
      # specified
      ({fun_ast, context, old_args}) when (
        fun_ast == fun_name and old_args == args_ast
      ) ->
        # 3. Replace input arguments with the AST of decorated arguments
        {fun_ast, context, decorated_args}

      # Some other element in the head AST (probably a guard)
      #   -> we just leave it unchanged
      (other) -> other
    end
  )

  ... # unchanged
end


Macro.postwalk/2 walks the AST recursively, and calls the provided lambda for each node, after all of the node's descendants have been visited. The lambda receives the AST of the element, and there we have a chance of returning something else instead of that node.

So what we do in this lambda is basically a pattern match where we're looking for the {fun_name, context, args}. As explained in part 3, this is the quoted representation of the expression some_fun(arg1, arg2, ...). Once we encounter the node that matches this pattern, we just replace input arguments with new (decorated) ones. In all other cases, we simply return the input AST, leaving the rest of the tree unchanged.

This is somewhat convoluted, but it solves our problem. Here's the final version of the trace macro:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
defmodule Tracer do
  defmacro deftraceable(head, body) do
    {fun_name, args_ast} = name_and_args(head)

    {arg_names, decorated_args} = decorate_args(args_ast)

    head = Macro.postwalk(head,
      fn
        ({fun_ast, context, old_args}) when (
          fun_ast == fun_name and old_args == args_ast
        ) ->
          {fun_ast, context, decorated_args}
        (other) -> other
      end)

    quote do
      def unquote(head) do
        file = __ENV__.file
        line = __ENV__.line
        module = __ENV__.module

        function_name = unquote(fun_name)
        passed_args = unquote(arg_names) |> Enum.map(&inspect/1) |> Enum.join(",")

        result = unquote(body[:do])

        loc = "#{file}(line #{line})"
        call = "#{module}.#{function_name}(#{passed_args}) = #{inspect result}"
        IO.puts "#{loc} #{call}"

        result
      end
    end
  end

  defp name_and_args({:when, _, [short_head | _]}) do
    name_and_args(short_head)
  end

  defp name_and_args(short_head) do
    Macro.decompose_call(short_head)
  end

  defp decorate_args([]), do: {[],[]}
  defp decorate_args(args_ast) do
    for {arg_ast, index} <- Enum.with_index(args_ast) do
      # dynamically generate quoted identifier
      arg_name = Macro.var(:"arg#{index}", __MODULE__)

      # generate AST for patternX = argX
      full_arg = quote do
        unquote(arg_ast) = unquote(arg_name)
      end

      {arg_name, full_arg}
    end
    |> List.unzip
    |> List.to_tuple
  end
end

Let's try it out:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
iex(1)> defmodule Tracer do ... end

iex(2)> defmodule Test do
          import Tracer

          deftraceable div(_, 0), do: :error
          deftraceable div(a, b), do: a/b
        end

iex(3)> Test.div(5, 2)
iex(line 6) Elixir.Test.div(5,2) = 2.5

iex(4)> Test.div(5, 0)
iex(line 5) Elixir.Test.div(5,0) = :error

As you can see, it's possible, and not extremely complicated, to get into the AST, tear it apart, and sprinkle it with some custom injected code. On the downside, the code of the resulting macro gets increasingly complex, and it becomes harder to analyze.

This concludes today's session. Next time I'm going to discuss some aspects of in-place code generation.

Understanding Elixir macros, Part 4 - Diving deeper

| 4 comments
In previous installment, I've shown you some basic ways of analyzing input AST and doing something about it. Today we'll take a look at some more involved AST transformations. This will mostly be a rehash of already explained techniques. The aim is to show that it's not very hard to go deeper into the AST, though the resulting code can easily become fairly complex and somewhat hacky.

Tracing function calls

In this article, we'll create a deftraceable macro that allows us to define traceable functions. A traceable function works just like a normal function, but whenever we call it, a debug information is printed. Here's the idea:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
defmodule Test do
  import Tracer

  deftraceable my_fun(a,b) do
    a/b
  end
end

Test.my_fun(6,2)

# => test.ex(line 4) Test.my_fun(6,2) = 3

This example is of course contrived. You don't need to devise such macro, because Erlang already has very powerful tracing capabilities, and there's an Elixir wrapper available. However, the example is interesting because it will demand some deeper AST transformations and techniques.

Before starting, I'd like to mention again that you should carefully consider whether you really need such constructs. Macros such as deftraceable introduce another thing every code maintainer needs to understand. Looking at the code, it's not obvious what happens behind the scene. If everyone devises such constructs, each Elixir project will quickly turn into a soup of custom language extentions. It will be hard even for experienced developers to understand the flow of the underlying code that heavily relies on complex macros.

All that said, there will be cases suitable for macros, so you shouldn't avoid them just because someone claims that macros are bad. For example, if we didn't have tracing facilities in Erlang, we'd need to devise some kind of a macro to help us with it (not necesarilly similar to the example above, but that's another discussion), or our code would suffer from large boilerplate.

In my opinion, boilerplate is bad because the code becomes ridden with bureaucratic noise, and therefore it is harder to read and understand. Macros can certainly help in reducing crust, but before reaching for them, consider whether you can resolve duplication with run-time constructs (functions, modules, protocols).

With that long disclaimer out of the way, let's write deftraceable. First, it's worth manually generating the corresponding code.

Let's recall the usage:

1
2
3
deftraceable my_fun(a,b) do
  a/b
end

The generated code should look like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
def my_fun(a, b) do
  file = __ENV__.file
  line = __ENV__.line
  module = __ENV__.module
  function_name = "my_fun"
  passed_args = [a,b] |> Enum.map(&inspect/1) |> Enum.join(",")

  result = a/b

  loc = "#{file}(line #{line})"
  call = "#{module}.#{function_name}(#{passed_args}) = #{inspect result}"
  IO.puts "#{loc} #{call}"

  result
end

The idea is simple. We fetch various data from the compiler environment, then compute the result, and finally print everything to the screen.

The code relies on __ENV__ special form that can be used to inject all sort of compile-time informations (e.g. line number and file) in the final AST. __ENV__ is a struct and whenever you use it in the code, it will be expanded in compile time to appropriate value. Hence, wherever in code we write __ENV__.file the resulting bytecode will contain the (binary) string constant with the containing file name.

Now we need to build this code dynamically. Let's see the basic outline:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
defmacro deftraceable(??) do
  quote do
    def unquote(head) do
      file = __ENV__.file
      line = __ENV__.line
      module = __ENV__.module
      function_name = ??
      passed_args = ?? |> Enum.map(&inspect/1) |> Enum.join(",")

      result = ??

      loc = "#{file}(line #{line})"
      call = "#{module}.#{function_name}(#{passed_args}) = #{inspect result}"
      IO.puts "#{loc} #{call}"

      result
    end
  end
end

Here I placed question marks (??) in places where we need to dynamically inject AST fragments, based on the input arguments. In particular, we have to deduce function name, argument names, and function body from the passed parameters.

Now, when we call a macro deftraceable my_fun(...) do ... end, the macro receives two arguments - the function head (function name and argument list) and a keyword list containing the function body. Both of these will of course be quoted.

How do I know this? I actually don't. I usually gain this knowledge by trial and error. Basically, I start by defining a macro:

1
2
3
4
defmacro deftraceable(arg1) do
  IO.inspect arg1
  nil
end

Then I try to call the macro from some test module or from the shell. If the argument numbers are wrong, an error will occur, and I'll retry by adding another argument to the macro definition. Once I get the result printed, I try to figure out what arguments represent, and then start building the macro.

The nil at the end of the macro ensures we don't generate anything (well, we generate nil which is usually irrelevant to the caller code). This allows me to further compose fragments without injecting the code. I usually rely on IO.inspect and Macro.to_string/1 to verify intermediate results, and once I'm happy, I remove the nil part and see if the thing works.

In our case deftraceable receives the function head and the body. The function head will be an AST fragment in the format I've described last time ({function_name, context, [arg1, arg2, ...]).

So we need to do following:
  • Extract function name and arguments from the quoted head
  • Inject these values into the AST we're returning from the macro
  • Inject function body into that same AST
  • Print trace info
We could use pattern matching to extract function name and arguments from this AST fragment, but as it turns out there is a helper Macro.decompose_call/1 that does exactly this. Given these steps, the final version of the macro looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
defmodule Tracer do
  defmacro deftraceable(head, body) do
    # Extract function name and arguments
    {fun_name, args_ast} = Macro.decompose_call(head)

    quote do
      def unquote(head) do
        file = __ENV__.file
        line = __ENV__.line
        module = __ENV__.module

        # Inject function name and arguments into AST
        function_name = unquote(fun_name)
        passed_args = unquote(args_ast) |> Enum.map(&inspect/1) |> Enum.join(",")

        # Inject function body into the AST
        result = unquote(body[:do])

        # Print trace info"
        loc = "#{file}(line #{line})"
        call = "#{module}.#{function_name}(#{passed_args}) = #{inspect result}"
        IO.puts "#{loc} #{call}"

        result
      end
    end
  end
end

Let's try it out:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
iex(1)> defmodule Tracer do ... end

iex(2)> defmodule Test do
          import Tracer

          deftraceable my_fun(a,b) do
            a/b
          end
        end

iex(3)> Test.my_fun(10,5)
iex(line 4) Test.my_fun(10,5) = 2.0   # trace output
2.0

It seems to be working. However, I should immediately point out that there are a couple of problems with this implementation:
  • The macro doesn't handle guards well
  • Pattern matching arguments will not always work (e.g. when using _ to match any term)
  • The macro doesn't work when dynamically generating code directly in the module.
I'll explain each of these problems one by one, starting with guards, and leaving remaining issues for future articles.

Handling guards

All problems with deftraceable stem from the fact that we're making some assumptions about the input AST. That's a dangerous teritory, and we must be careful to cover all cases.

For example, the macro assumes that head contains just the name and the arguments list. Consequently, deftraceable won't work if we want to define a traceable function with guards:

1
2
3
deftraceable my_fun(a,b) when a < b do
  a/b
end

In this case, our head (the first argument of the macro) will also contain the guard information, and will not be parsable by Macro.decompose_call/1 The solution is to detect this case, and handle it in a special way.

First, let's discover how this head is quoted:

1
2
3
4
5
iex(1)> quote do my_fun(a,b) when a < b end
{:when, [],
 [{:my_fun, [], [{:a, [], Elixir}, {:b, [], Elixir}]},
  {:<, [context: Elixir, import: Kernel],
   [{:a, [], Elixir}, {:b, [], Elixir}]}]}

So essentially, our guard head has the shape of {:when, _, [name_and_args, ...]}. We can rely on this to extract the name and arguments using pattern matching:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
defmodule Tracer do
  ...
  defp name_and_args({:when, _, [short_head | _]}) do
    name_and_args(short_head)
  end

  defp name_and_args(short_head) do
    Macro.decompose_call(short_head)
  end
  ...

And of course, we need to call this function from the macro:

1
2
3
4
5
6
7
8
9
defmodule Tracer do
  ...
  defmacro deftraceable(head, body) do
    {fun_name, args_ast} = name_and_args(head)

    ... # unchanged
  end
  ...
end

As you can see, it's possible to define additional private functions and call them from your macro. After all, a macro is just a function, and when it is called, the containing module is already compiled and loaded into the VM of the compiler (otherwise, macro couldn't be running).

Here's the full version of the macro:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
defmodule Tracer do
  defmacro deftraceable(head, body) do
    {fun_name, args_ast} = name_and_args(head)

    quote do
      def unquote(head) do
        file = __ENV__.file
        line = __ENV__.line
        module = __ENV__.module

        function_name = unquote(fun_name)
        passed_args = unquote(args_ast) |> Enum.map(&inspect/1) |> Enum.join(",")

        result = unquote(body[:do])

        loc = "#{file}(line #{line})"
        call = "#{module}.#{function_name}(#{passed_args}) = #{inspect result}"
        IO.puts "#{loc} #{call}"

        result
      end
    end
  end

  defp name_and_args({:when, _, [short_head | _]}) do
    name_and_args(short_head)
  end

  defp name_and_args(short_head) do
    Macro.decompose_call(short_head)
  end
end

Let's try it out:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
iex(1)> defmodule Tracer do ... end

iex(2)> defmodule Test do
          import Tracer

          deftraceable my_fun(a,b) when a<b do
            a/b
          end

          deftraceable my_fun(a,b) do
            a/b
          end
        end

iex(3)> Test.my_fun(5,10)
iex(line 4) Test.my_fun(10,5) = 2.0
2.0

iex(4)> Test.my_fun(10, 5)
iex(line 7) Test.my_fun(10,5) = 2.0

The main point of this exercise was to illustrate that it's possible to deduce something from the input AST. In this example, we managed to detect and handle a function guard. Obviously, the code becomes more involved, since it relies on the internal structure of the AST. In this case, the code is relatively simple, but as you'll see in future articles, where I'll tackle remaining problems of deftraceable, things can quickly become messy.

Understanding Elixir macros, Part 3 - Getting into the AST

| 2 comments
It's time to continue our exploration of Elixir macros. Last time I've covered some essential theory, and today, I'll step into a less documented territory, and discuss some details on Elixir AST.

Tracing function calls

So far you have seen only basic macros that take input AST fragments and combine them together, sprinkling some additional boilerplate around and/or between input fragments. Since we don't analyze or parse the input AST, this is probably the cleanest (or the least hackiest) style of macro writing, which results in fairly simple macros that are reasonably easy to understand.

However, in some cases we will need to parse input AST fragments to get some specific informations. A simple example are ExUnit assertions. For example, the expression assert 1+1 == 2+2 will fail with an error:

1
2
3
4
Assertion with == failed
code: 1+1 == 2+2
lhs:  1
rhs:  2

The macro assert accepts the entire expression 1+1 == 2+2 and is able to extract individual sub-expressions of the comparison, printing their corresponding results if the entire expression returns false. To do this, the macro code must somehow split the input AST into separate parts and compute each sub-expression separately.

In more involved cases even richer AST transformations are called for. For example, with ExActor you can write this code:

1
defcast inc(x), state: state, do: new_state(state + x)

which translates to roughly the following:

1
2
3
4
5
6
7
def inc(pid, x) do
  :gen_server.cast(pid, {:inc, x})
end

def handle_cast({:inc, x}, state) do
  {:noreply, state+x}
end


Just like assert, the defcast macro needs to dive into the input AST fragment and detect individual sub-fragments (e.g. function name, individual arguments). Then, ExActor performs an elaborate transformation, reassembling this sub-parts into a more complex code.

Today, I'm going to show you some basic techniques of building such macros, and I'll continue with more complex transformations in subsequent articles. But before doing this, I should advise you to carefully consider whether your code needs to be based on macros. Though very powerful, macros have some downsides.

First, as you'll see in this series, the code can quickly become much more involved than "plain" run-time abstractions. You can quickly end up doing many nested quote/unquote calls and weird pattern matches that rely on undocumented format of the AST.

In addition, proliferation of macros may make your client code extremly cryptic, since it will rely on custom, non-standard idioms (such as defcast from ExActor). It can become harder to reason about the code, and understand what exactly happens underneath.

On the plus side, macros can be very helpful when removing boilerplate (as hopefully ExActor example demonstrated), and have the power of accessing information that is not available at run-time (as you should see from the assert example). Finally, since they run during compilation, macros make it possible to optimize some code by moving calculations to compile-time.

So there will definitely be cases that are suited for macros, and you shouldn't be afraid of using them. However, you shouldn't choose macros only to gain some cute DSL-ish syntax. Before reaching for macros, you should consider whether your problem can be solved efficiently in run-time, relying on "standard" language abstractions such as functions, modules, and protocols.

Discovering the AST structure

At the moment of writing this there is very little documentation on the AST structure. However, it's easy to explore and play with AST in the shell session, and this is how I usually discover the AST format.

For example, here's how a quoted reference to a variable looks like:

1
2
iex(1)> quote do my_var end
{:my_var, [], Elixir}

Here, the first element represents the name of the variable. The second element is a context keyword list that contains some metadata specific for this particular AST fragment (e.g. imports and aliases). Most often you won't be interested in context data. The third element usually represents the module where the quoting happened, and is used to ensure hygiene of quoted variables. If this element is nil then the identifier is not hygienic.

A simple expression looks a bit more involved:

1
2
iex(2)> quote do a+b end
{:+, [context: Elixir, import: Kernel], [{:a, [], Elixir}, {:b, [], Elixir}]}

This might look scary, but it's reasonably easy to understand if I show you the higher-level pattern:

1
{:+, context, [ast_for_a, ast_for_b]}

In our example, ast_for_a and ast_for_b follow the shape of a variable reference you've seen earlier (e.g. {:a, [], Elixir}. More generally, quoted arguments can be arbitrary complex since they describe the expression of each argument. Essentially, AST is a deep nested structure of simple quoted expressions such as the ones I'm showing you here.

Let's take a look at a function call:

1
2
iex(3)> quote do div(5,4) end
{:div, [context: Elixir, import: Kernel], [5, 4]}

This resembles the quoted + operation, which shouldn't come as a surprise knowing that + is actually a function. In fact, all binary operators will be quoted as function calls.

Finally, let's take a look at a quoted function definition:

1
2
3
4
iex(4)> quote do def my_fun(arg1, arg2), do: :ok end
{:def, [context: Elixir, import: Kernel],
 [{:my_fun, [context: Elixir], [{:arg1, [], Elixir}, {:arg2, [], Elixir}]},
  [do: :ok]]}

While this looks scary, it can be simplified by looking at important parts. Essentially, this deep structure amounts to:

1
{:def, context, [fun_call, [do: body]]}

with fun_call having the structure of a function call (which you've just seen).

As you can see, there usually is some reason and sense behind the AST. I won't go through all possible AST shapes here, but the approach to discovery is to play in iex and quote simpler forms of expressions you're interested in. This is a bit of reverse engineering, but it's not exactly a rocket science.

Writing assert macro

For a quick demonstration, let's write a simplified version of the assert macro. This is an interesting macro because it literally reinterprets the meaning of comparison operators. Normally, when you write a == b you get a boolean result. However, when this expression is given to the assert macro, a detailed output is printed if the expression evaluates to false.

I'll start simple, by supporting only == operator in the macro. To recap, when we call assert expected == required, it's the same as calling assert(expected == required), which means that our macro receives a quoted fragment that represents comparison. Let's discover the AST structure of this comparison:

1
2
3
4
5
iex(1)> quote do 1 == 2 end
{:==, [context: Elixir, import: Kernel], [1, 2]}

iex(2)> quote do a == b end
{:==, [context: Elixir, import: Kernel], [{:a, [], Elixir}, {:b, [], Elixir}]}

So our structure is essentially, {:==, context, [quoted_lhs, quoted_rhs]}. This should not be surprising if you remember the examples shown in previous section, where I've mentioned that binary operators are quoted as two arguments function calls.

Knowing the AST shape, it's relatively simple to write the macro:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
defmodule Assertions do
  defmacro assert({:==, _, [lhs, rhs]} = expr) do
    quote do
      left = unquote(lhs)
      right = unquote(rhs)

      result = (left == right)

      unless result do
        IO.puts "Assertion with == failed"
        IO.puts "code: #{unquote(Macro.to_string(expr))}"
        IO.puts "lhs: #{left}"
        IO.puts "rhs: #{right}"
      end

      result
    end
  end
end

The first interesting thing happens in line 2. Notice how we pattern match on the input expression, expecting it to conform to some structure. This is perfectly fine, since macros are functions, which means you can rely on pattern matching, guards, and even have multi-clause macros. In our case, we rely on pattern matching to take each (quoted) side of the comparison expression into corresponding variables.

Then, in the quoted code, we reinterpret the == operation by computing left- and right-hand side individually, (lines 4 and 5), and then the entire result (line 7). Finally, if the result is false, we print detailed informations (lines 9-14).

Let's try it out:

1
2
3
4
5
6
7
8
iex(1)> defmodule Assertions do ... end
iex(2)> import Assertions

iex(3)> assert 1+1 == 2+2
Assertion with == failed
code: 1 + 1 == 2 + 2
lhs: 2
rhs: 4


Generalizing the code

It's not much harder to make the code work for other operators:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
defmodule Assertions do
  defmacro assert({operator, _, [lhs, rhs]} = expr)
    when operator in [:==, :<, :>, :<=, :>=, :===, :=~, :!==, :!=, :in]
  do
    quote do
      left = unquote(lhs)
      right = unquote(rhs)

      result = unquote(operator)(left, right)

      unless result do
        IO.puts "Assertion with #{unquote(operator)} failed"
        IO.puts "code: #{unquote(Macro.to_string(expr))}"
        IO.puts "lhs: #{left}"
        IO.puts "rhs: #{right}"
      end

      result
    end
  end
end


There are only a couple of changes here. First, in the pattern-match, the hard-coded :== is replaced with the operator variable (line 2).

I've also introduced (or to be honest, copy-pasted from Elixir source) guards specifying the set of operators for which the macro works (line 3). There is a special reason for this check. Remember how I earlier mentioned that quoted a + b (and any other binary operation) has the same shape as quoted fun(a,b). Consequently, without these guards, every two-arguments function call would end up in our macro, and this is something we probably don't want. Using this guard limits allowed inputs only to known binary operators.

The interesting thing happens in line 9. Here I make a simple generic dispatch to the operator using unquote(operator)(left, right). You might think that I could have instead used left unquote(operator) right, but this wouldn't work. The reason is that operator variable holds an atom (e.g. :==). Thus, this naive quoting would produce left :== right, which is not even a proper Elixir syntax.

Keep in mind that while quoting, we don't assemble strings, but AST fragments. So instead, when we want to generate a binary operation code, we need to inject a proper AST, which (as explained earlier) is the same as the two arguments function call. Hence, we can simply generate the function call unquote(operator)(left, right).

With this in mind, I'm going to finish today's session. It was a bit shorter, but slightly more complex. Next time, I'm going to dive a bit deeper into the topic of AST parsing.

Understanding Elixir macros, Part 2 - Micro theory

| Comment on this post
This is the second part of the mini-series on Elixir macros. Last time I discussed compilation phases and Elixir AST, finishing with a basic example of the trace macro. Today, I'll provide a bit more details on macro mechanics.

This is going to involve repeating some of the stuff mentioned last time, but I think it's beneficial to understand how things work and how the final AST is built. If you grasp this, you can reason about your macro code with more confidence. This becomes important, since more involved macros will consist of many combined quote/unquote constructs which can at first seem intimidating.

Calling a macro

The most important thing to be aware of is the expansion phase. This is where compiler calls various macros (and other code-generating constructs) to produce the final AST.

For example, a typical usage of the trace macro will look like this:

1
2
3
4
5
6
7
defmodule MyModule do
  require Tracer
  ...
  def some_fun(...) do
    Tracer.trace(...)
  end
end

As previously explained, the compiler starts with an AST that resembles this code. This AST is then expanded to produce the final code. Consequently, in the snippet above, the call to Tracer.trace/1 will take place in the expansion phase.

Our macro receives the input AST and must produce the output AST. The compiler will then simply replace the macro call with the AST returned from that macro. This process is incremental - a macro can return AST that will invoke some other macro (or even itself). The compiler will simply re-expand until there's nothing left to expand.

A macro call is thus our opportunity to change the meaning of the code. A typical macro will take the input AST and somehow decorate it, adding some additional code around the input.

That's exactly what we did in the trace macro. We took a quoted expression (e.g. 1+2) and spit out something like:

1
2
3
result = 1 + 2
Tracer.print("1 + 2", result)
result

To call the trace macro from any part of the code (including shell), you must invoke either require Tracer or import Tracer. Why is this? There are two seemingly contradicting properties of macros:
  • A macro is an Elixir code
  • A macro runs in expansion time, before the final bytecode is produced
How can Elixir code run before it is produced? It can't. To call a macro, the container module (the module where the macro is defined) must already be compiled.

Consequently, to run macros defined in the Tracer module, we must ensure that it is already compiled. In other words, we must provide some hints to the compiler about the module ordering. When we require a module, we instruct the Elixir to hold the compilation of the current module until the required module is compiled and loaded into the compiler run-time (the Erlang VM instance where compiler is running). We can only call trace macro when the Tracer module is fully compiled, and available to the compiler.

Using import has the same effect but it additionally lexically imports all exported functions and macros, making it possible to write trace instead of Tracer.trace.

Since macros are functions and Elixir doesn't require parentheses in function calls, we can use this syntax:

1
Tracer.trace 1+2

This is quite possibly the most important reason why Elixir doesn't require parentheses in function calls. Remember that most language constructs are actually macros. If parentheses were obligatory, the code we'd have to write would be noisier:

1
2
3
4
defmodule(MyModule, do:
  def(function_1, do: ...)
  def(function_2, do: ...)
)

Hygiene

As hinted in the last article, macros are by default hygienic. This means that variables introduced by a macro are its own private affair that won't interfere with the rest of the code. This is why we can safely introduce the result variable in our trace macro:

1
2
3
4
quote do
  result = unquote(expression_ast)  # result is private to this macro
  ...
end


This variable won't interfere with the code that is calling the macro. In place where you call the trace macro, you can freely declare your own result variable, and it won't be shadowed by the result from the tracer macro.

Most of the time hygiene is exactly what you want, but there are exceptions. Sometimes, you may need to create a variable that is available to the code calling the macro. Instead of devising some contrived example, let's take a look at the real use case from the Plug library. This is how we can specify routes with Plug router:

1
2
3
4
5
6
7
get "/resource1" do
  send_resp(conn, 200, ...)
end

post "/resource2" do
  send_resp(conn, 200, ...)
end

Notice how in both snippets we use conn variable that doesn't exist. This is possible because get macro binds this variable in the generated code. You can imagine that the resulting code is something like:

1
2
3
4
5
6
7
defp do_match("GET", "/resource1", conn) do
  ...
end

defp do_match("POST", "/resource2", conn) do
  ...
end

Note: the real code produced by Plug is somewhat different, this is just a simplification.

This is an example of a macro introducing a variable that must not be hygienic. The variable conn is introduced by the get macro, but must be visible to the code where the macro is called.

Another example is the situation I had with ExActor. Take a look a the following example:

1
2
3
4
5
defmodule MyServer do
  ...
  defcall my_request(...), do: reply(result)
  ...
end

If you're familiar with GenServer then you know that the result of a call must be in form {:reply, response, state}. However, in the snippet above, the state is not even mentioned. So how can we return the non-mentioned state? This is possible, because defcall macro generates a hidden state variable, which is then implicitly used by the reply macro.

In both cases, a macro must create a variable that is not hygienic and must be visible beyond macro's quoted code. For such purposes you can use var! construct. Here's how a simple version of the Plug's get macro could look like:

1
2
3
4
5
6
7
defmacro get(route, body) do
  quote do
    defp do_match("GET", unquote(route), var!(conn)) do
      # put body AST here
    end
  end
end

Notice how we use var!(conn). By doing this, we're specifying that conn is a variable that must be visible to the caller.

In the snippet above, it's not explained how the body is injected. Before doing so, you must understand a bit about arguments that macros receive.

Macro arguments

You should always keep in mind that macros are essentially Elixir functions that are invoked in expansion phase, while the final AST is being produced. The specifics of macros is that arguments being passed are always quoted. This is why we can call:

1
2
3
def my_fun do
  ...
end

Which is the same as:

1
def(my_fun, do: (...))

Notice how we're calling the def macro, passing my_fun even when this variable doesn't exist. This is completely fine, since we're actually passing the result of quote(do: my_fun), and quoting doesn't require that the variable exists. Internally, def macro will receive the quoted representation which will, among other things, contain :my_fun. The def macro will use this information to generate the function with the corresponding name.

Another thing I sort of skimmed over is the do...end block. Whenever you pass a do...end block to a macro, it is the same as passing a keywords list with a :do key.

So the call

1
my_macro arg1, arg2 do ... end

is the same as

1
my_macro(arg1, arg2, do: ...)

This is just a special syntactical sugar of Elixir. The parser transforms do..end into {:do, ...}.

Now, I've just mentioned that arguments are quoted. However, for many constants (atoms, numbers, strings), the quoted representation is exactly the same as the input value. In addition, two element tuples and lists will retain their structure when quoted. This means that quote(do: {a,b}) will give a two element tuple, with both values being of course quoted.

Let's illustrate this in a shell:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
iex(1)> quote do :an_atom end
:an_atom

iex(2)> quote do "a string" end
"a string"

iex(3)> quote do 3.14 end
3.14

iex(4)> quote do {1,2} end
{1, 2}

iex(5)> quote do [1,2,3,4,5] end
[1, 2, 3, 4, 5]

In contrast, a quoted three element tuple doesn't retain its shape:

1
2
iex(6)> quote do {1,2,3} end
{:{}, [], [1, 2, 3]}

Since lists and two element tuples retain their structure when quoted, the same holds for a keyword list:

1
2
3
4
5
iex(7)> quote do [a: 1, b: 2] end
[a: 1, b: 2]

iex(8)> quote do [a: x, b: y] end
[a: {:x, [], Elixir}, b: {:y, [], Elixir}]

In the first example, you can see that the input keyword list is completely intact. The second example proves that complex members (such as references to x and y) are quoted. But the list still retains its shape. It is still a keyword lists with keys :a and :b.

Putting it together

Why is all this important? Because in the macro code, you can easily retrieve the options from the keywords list, without analyzing some convoluted AST. Let's see this in action on our oversimplified take on get macro. Earlier, we left with this sketch:

1
2
3
4
5
6
7
defmacro get(route, body) do
  quote do
    defp do_match("GET", unquote(route), var!(conn)) do
      # put body AST here
    end
  end
end

Remember that do...end is the same as do: ... so when we call get route do ... end, we're effectively calling get(route, do: ...). Keeping in mind that macro arguments are quoted, but also knowing that quoted keyword lists keep their shape, it's possible to retrieve the quoted body in the macro using body[:do]:

1
2
3
4
5
6
7
defmacro get(route, body) do
  quote do
    defp do_match("GET", unquote(route), var!(conn)) do
      unquote(body[:do])
    end
  end
end

So we simply inject the quoted input body into the body of the do_match clause we're generating.

As already mentioned, this is the purpose of a macro. It receives some AST fragments, and combines them together with the boilerplate code, to generate the final result. Ideally, when we do this, we don't care about the contents of the input AST. In our example, we simply inject the body in the generated function, without caring what is actually in that body.

It is reasonably simple to test that this macro works. Here's a bare minimum of the required code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
defmodule Plug.Router do
  # get macro removes the boilerplate from the client and ensures that
  # generated code conforms to some standard required by the generic logic
  defmacro get(route, body) do
    quote do
      defp do_match("GET", unquote(route), var!(conn)) do
        unquote(body[:do])
      end
    end
  end
end

Now we can implement a client module:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
defmodule MyRouter do
  import Plug.Router

  # Generic code that relies on the multi-clause dispatch
  def match(type, route) do
    do_match(type, route, :dummy_connection)
  end

  # Using macro to minimize boilerplate
  get "/hello", do: {conn, "Hi!"}
  get "/goodbye", do: {conn, "Bye!"}
end

And test it:

1
2
3
4
5
MyRouter.match("GET", "/hello") |> IO.inspect
# {:dummy_connection, "Hi!"}

MyRouter.match("GET", "/goodbye") |> IO.inspect
# {:dummy_connection, "Bye!"}

The important thing to notice here is the code of match/2. This is the generic code that relies on the existence of the implementation of do_match/3.

Using modules

Looking at the code above, you can see that the glue code of match/2 is developed in the client module. That's definitely far from perfect, since each client must provide correct implementation of this function, and be aware of how do_match function must be invoked.

It would be better if Plug.Router abstraction could provide this implementation for us. For that purpose we can reach for the use macro, a rough equivalent of mixins in other languages.

The general idea is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
defmodule ClientCode do
  # invokes the mixin
  use GenericCode, option_1: value_1, option_2: value_2, ...
end

defmodule GenericCode do
  # called when the module is used
  defmacro __using__(options) do
    # generates an AST that will be inserted in place of the use
    quote do
      ...
    end
  end
end

So the use mechanism allows us to inject some piece of code into the caller's context. This is just a replacement for something like:

1
2
3
4
defmodule ClientCode do
  require GenericCode
  GenericCode.__using__(...)
end

Which can be proven by looking in Elixir source code. This proves another point - that of incremental expansion. The use macro generates the code which will call another macro. Or to put it more fancy, use generates a code that generates a code. As mentioned earlier, the compiler will simply reexpand this until there's nothing left to be expanded.

Armed with this knowledge, we can move the implementation of the match function to the generic Plug.Router module:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
defmodule Plug.Router do
  defmacro __using__(_options) do
    quote do
      import Plug.Router

      def match(type, route) do
        do_match(type, route, :dummy_connection)
      end
    end
  end

  defmacro get(route, body) do
    ... # This code remains the same
  end
end

This now keeps the client code very lean:

1
2
3
4
5
6
defmodule MyRouter do
  use Plug.Router

  get "/hello", do: {conn, "Hi!"}
  get "/goodbye", do: {conn, "Bye!"}
end

As mentioned, the AST generated by the __using__ macro will simply be injected in place of the use Plug.Router call. Take special note how we do import Plug.Router from the __using__ macro. This is not strictly needed, but it allows the client to call get instead of Plug.Router.get.

So what have we gained? The various boilerplate is now confined to the single place (Plug.Router). Not only does this simplify the client code, it also keeps the abstraction properly closed. The module Plug.Router ensures that whatever is generated by get macros fits properly with the generic code of match. As clients, we simply use the module and call into the provided macros to assemble our router.

This concludes today's session. Many details are not covered, but hopefully you have a better understanding of how macros integrate with the Elixir compiler. In the next part I'll dive deeper and start exploring how we can tear apart the input AST.