Archive | Criterion

Lately, I’ve been working on parallelizing some Haskell. I’ve been furiously coding away, but at some point you have to stop “optimizing”, and see if you’ve actually accomplished anything.

Unfortunately, Haskell is kind of lazy, which makes benchmarking difficult. You could write some massive huge function, run it in `main`, and time it. Your time will likely be just above 0 seconds.

After some searching, it turns out that the answer is on Hackage. The Criterion package is a benchmarking library that can be used to effectively analyze performance. Using it could probably be simpler, but not by much!

## First a function:

Let’s optimize a function:

``````fibs :: Int -> Int
fibs 0 = 0
fibs 1 = 1
fibs n = fibs (n - 1) + fibs (n - 2)``````

Good old Fibonacci Sequence. As you likely know, this implementation isn’t very good. It starts to get slow at around 20, and at around 40, it’s runtime begins to be measurable in minutes. Let’s try to optimize it:

`````` Int -> Int
goodFibs 0 = 0
goodFibs 1 = 1
goodFibs n = goodFibs' (n, 2, 1, 0)
where
goodFibs' (to, count, n, nmo)
| to == count = n + nmo
| otherwise = goodFibs'
(to, (count + 1),
(n + nmo), n)``````

You may have seen an implementation like this before. The gimmick here is that instead of using the traditional recursive definition, we just count up to the nth position of the sequence. But is it actually faster? Let’s find out.

We can write a simple test bench for this. First, we need to `cabal install criterion`, then we are ready to go. Now, let’s write our test bench:

``````import Criterion.Main

main :: IO ()
main = defaultMain [bgroup "Good Fibs"
[bench "4" \$ nf goodFibs 4,
bench "15" \$ nf goodFibs 15,
bench "30" \$ nf goodFibs 30],
[bench "4" \$ nf fibs 4,
bench "15" \$ nf fibs 15,
bench "30" \$ nf fibs 30]]``````

There’s a fair amount going on here. Let’s walk through this function-by-function:

#### `defaultMain :: [Benchmark] -> IO ()`

This is a function that takes a list of benchmarks and runs them. This function allows you to pass arguments to Criterion to control its operation. A list of these options can be obtained by passing `--help` to the program.

#### `nf :: NFData b => (a -> b) -> a -> Benchmarkable`

This function takes a function from `a -> b`, and an `a`, and evaluates the function to normal form. This solves the laziness problem as a value that is evaluated to normal form is completely evaluated. You could also have used `whnf :: (a -> b) -> a -> Benchmarkable` to evaluate to weak head normal form. There are also IO versions of these functions.

#### `bench :: String -> Benchmarkable -> Benchmark`

This function takes a string, and a `Benchmarkable`, and returns a `Benchmark`. Nothing magical going on here; this is how you make a benchmark. The string is the name of your benchmark.

#### `bgroup :: String -> [Benchmark] -> Benchmark`

This function takes a string and a list of benchmarks, and returns a `Benchmark`. This allows you to group benchmarks. As you can see, I’ve grouped 3 separate calls to `fibs` and `goodFibs` using `bgroup`. As above, the string is the name of your benchmark group.

## Now, the payoff

So, how did I do? Let’s try it out. Build your program, and execute it. You should see something like this:

``````
benchmarking Good Fibs/4
time                 16.87 ns   (16.87 ns .. 16.88 ns)
1.000 R²   (1.000 R² .. 1.000 R²)
mean                 16.88 ns   (16.87 ns .. 16.88 ns)
std dev              7.316 ps   (5.814 ps .. 8.984 ps)

benchmarking Good Fibs/15
time                 29.59 ns   (29.57 ns .. 29.61 ns)
1.000 R²   (1.000 R² .. 1.000 R²)
mean                 29.62 ns   (29.61 ns .. 29.63 ns)
std dev              28.03 ps   (23.33 ps .. 33.59 ps)

benchmarking Good Fibs/30
time                 47.32 ns   (47.21 ns .. 47.43 ns)
1.000 R²   (1.000 R² .. 1.000 R²)
mean                 47.27 ns   (47.20 ns .. 47.35 ns)
std dev              231.7 ps   (210.4 ps .. 255.3 ps)

time                 33.35 ns   (33.34 ns .. 33.37 ns)
1.000 R²   (1.000 R² .. 1.000 R²)
mean                 33.37 ns   (33.35 ns .. 33.42 ns)
std dev              99.38 ps   (25.73 ps .. 205.0 ps)

time                 9.446 μs   (9.444 μs .. 9.448 μs)
1.000 R²   (1.000 R² .. 1.000 R²)
mean                 9.435 μs   (9.416 μs .. 9.444 μs)
std dev              41.64 ns   (3.462 ns .. 69.91 ns)

As you can see, my `goodFibs` does well. It grows mostly linearly with increasingly higher arguments, and all invocations take some amount of nanoseconds.
My `fibs` doesn’t do so well. `fibs 4` takes nanoseconds, `fibs 15` takes microseconds, and `fibs 30` takes milliseconds. Clearly `goodFibs` was worth the time put into it, and I’m not one of Knuth’s monsters. Yay me!