In a lot of applications, we want to apply some operation to all elements in a collection, and then aggregate these elements together in a grand unified answer. In this module, we will have a look at the map-filter-reduce strategy, as well as the accumulate operation.
Calculating an average. Easy, right? We load Statistics, and call mean
.
What if we made it fun? What if we made it way, way more complicated than it
has any right to be?
Let’s take a vector $\mathbf{x}$. The average value of $\mathbf{x}$ is its sum divided by its length:
$$ \frac{x_1}{n} + \frac{x_2}{n} + … + \frac{x_n}{n} $$
In other words, for every element $x_i$, we want to divide it by $n$, and then
we want to reduce all of these using the sum operation. This is a task for
map
and reduce
.
The map
function will take a function as argument, and apply it to every
element of a collection. For example, this is a way to use map
to double
every entry in an array:
x = [1, 2, 3, 4]
map(e -> 2e, x)
4-element Vector{Int64}:
2
4
6
8
On the other hand, reduce
accepts a binary operator (something requiring two
arguments), and applies it sequentially alongside an array until a single
value remains. For example, the sum of an array expressed as reduce
is:
y = [1, 2, 4, 5]
reduce(+, y)
12
And so, we have enough to build a very crude function to get the average of an array of values:
x = [1, 2, 3, 4, 5, 6, 7]
n = length(x)
reduce(+, map(e -> e / n, x))
4.0
reduce
operation has no well defined behavior when using an
operator without associativity, like a substraction. This is because
reduce(-, [a,b,c])
can be a-(b-c)
or (a-b)-c
; the documentation for
reduce
has a number of alternatives to suggest.Another function that is often used together with map
and reduce
is
filter
. The filter
function evaluates a condition on every element of a
collection:
filter(isodd, 1:10)
5-element Vector{Int64}:
1
3
5
7
9
Using filter
can be done before map
(we want to apply on operation, but
only on some elements), or after map
(we want to apply the operation and
see where we stand). This sequence of operations is commonly known as
map-filter-reduce, and is a very expressive way of chaining together
operations.
Another related function is accumulate
, which works much like reduce
but
without collapsing the vector to a single element. For example the sequence of
$n!$ is
accumulate(*, 1:5)
5-element Vector{Int64}:
1
2
6
24
120
and the cumulative sum of an array is
accumulate(+, 1:5)
5-element Vector{Int64}:
1
3
6
10
15
Chained together, these four functions can get really powerful. For example,
we can use accumulate
to write a logistic growth model in a single line. We
can define some parameters:
r, K, n₀ = 2.3, 1.0, 0.01
(2.3, 1.0, 0.01)
We can now define a model that takes two arguments, as per the documentation
of accumulate
(which you should definitely read):
model = (n, _) -> n + n * r * (1 - n / K)
#5 (generic function with 1 method)
And run it for a number of steps defined by the array; note that we use the
init
keyword to “seed” the process with a value of our choice, here the
initial population size:
nt = accumulate(model, zeros(Float64, 10); init = n₀)
10-element Vector{Any}:
0.03277
0.10567109233
0.3230319312543046
0.8260012273364679
1.156564586819236
0.7400873564894737
1.1825108973734353
0.6861223097964039
1.1814468271273222
0.6883963372629648
This is a time-discrete model with no loop!
map
function has a variant for arrays with more than one
dimension called mapslices
, which works on slices of high-dimensional
arrays. It’s useful to perform, e.g., row-wise or column-wise operations on
matrices.