At the start of the week I started reading Metaprogramming Elixir by Chris
McCord. One of the items you quickly learn is just how easy it is to
programmatically create methods that can be compiled into your application. One
of the examples in the book that is cited from this is how the Unicode.upcase
method is delegated to down to methods that are created at compile time from a
file. Here is where you can find this being used
After seeing this I began to wonder what the “performance” of matching was in
the Elixir system. So I wired up some tests to begin identifying this for myself.
To me it would seem that since the order of the methods is important; the best
you could get was linear growth in lookup time. Meanwhile if you have a more
robust data structure that can avoid doing this kind of lookup I would assume it
to be faster.
These are the two different modules I put together, building both of them from
my local dictionary of about ninety-nine thousand words:
The PatternPerformance.Matching module creates close to 100 thousand methods
where it matches literally on the word given and returns a simple true. Then
at the end it has a catch all that returns false. I’m not sure how great this
would be for production, but it serves for what I am testing. Some things to
note about this module is it took 6 minutes and 23 seconds to compile on my
Macbook Pro and the binary it generates is a little over three megabytes big.
The PatternPerformance.HashDict module creates a HashDict structure at
compile time and sets it as @words. The is_word? function defined simply
references this structure, returning is value which is true, and is given
false as the default value to return if the key is not found. Items to note
about this module is it takes about 4 seconds to compile and is 1.6 megabytes
big.
To test both of these I created the following crude module:
The PatternPerformance.Matching was about twice as fast at coming back with
the correct answer! This is not what I was suspecting, and as such I wanted to
believe perhaps something was not right with the HashDict structure so I
created two more test modules to try out the tried-and-true Erlang gb_trees
and gb_sets.
These were even slower! Although an interesting note that their “slowest”
times were way faster than the pattern matching and HashDict slowest lookup
times on average. I guess for now I am left with more questions than answers;
however for the time it appears compiling in a bunch of methods may be your best
bet when referring to a static list of values.
If you want to tinker with this yourself I threw up all of my code in a simple
mix project on my github: elixir_pattern_performance