Lately, I’ve been working on parallelizing some Haskell. I’ve been furiously coding away, but at some point you have to stop “optimizing”, and see if you’ve actually accomplished anything.
Unfortunately, Haskell is kind of lazy, which makes benchmarking difficult. You could write some massive huge function, run it in
main, and time it. Your time will likely be just above 0 seconds.
After some searching, it turns out that the answer is on Hackage. The Criterion package is a benchmarking library that can be used to effectively analyze performance. Using it could probably be simpler, but not by much!
First a function:
Let’s optimize a function:
fibs :: Int -> Int fibs 0 = 0 fibs 1 = 1 fibs n = fibs (n - 1) + fibs (n - 2)
Good old Fibonacci Sequence. As you likely know, this implementation isn’t very good. It starts to get slow at around 20, and at around 40, it’s runtime begins to be measurable in minutes. Let’s try to optimize it:
Int -> Int goodFibs 0 = 0 goodFibs 1 = 1 goodFibs n = goodFibs' (n, 2, 1, 0) where goodFibs' (to, count, n, nmo) | to == count = n + nmo | otherwise = goodFibs' (to, (count + 1), (n + nmo), n)
You may have seen an implementation like this before. The gimmick here is that instead of using the traditional recursive definition, we just count up to the nth position of the sequence. But is it actually faster? Let’s find out.
We can write a simple test bench for this. First, we need to
cabal install criterion, then we are ready to go. Now, let’s write our test bench:
import Criterion.Main main :: IO () main = defaultMain [bgroup "Good Fibs" [bench "4" $ nf goodFibs 4, bench "15" $ nf goodFibs 15, bench "30" $ nf goodFibs 30], bgroup "Bad Fibs" [bench "4" $ nf fibs 4, bench "15" $ nf fibs 15, bench "30" $ nf fibs 30]]
There’s a fair amount going on here. Let’s walk through this function-by-function:
defaultMain :: [Benchmark] -> IO ()
This is a function that takes a list of benchmarks and runs them. This function allows you to pass arguments to Criterion to control its operation. A list of these options can be obtained by passing
--help to the program.
nf :: NFData b => (a -> b) -> a -> Benchmarkable
This function takes a function from
a -> b, and an
a, and evaluates the function to normal form. This solves the laziness problem as a value that is evaluated to normal form is completely evaluated. You could also have used
whnf :: (a -> b) -> a -> Benchmarkable to evaluate to weak head normal form. There are also IO versions of these functions.
bench :: String -> Benchmarkable -> Benchmark
This function takes a string, and a
Benchmarkable, and returns a
Benchmark. Nothing magical going on here; this is how you make a benchmark. The string is the name of your benchmark.
bgroup :: String -> [Benchmark] -> Benchmark
This function takes a string and a list of benchmarks, and returns a
Benchmark. This allows you to group benchmarks. As you can see, I’ve grouped 3 separate calls to
bgroup. As above, the string is the name of your benchmark group.
Now, the payoff
So, how did I do? Let’s try it out. Build your program, and execute it. You should see something like this:
benchmarking Good Fibs/4 time 16.87 ns (16.87 ns .. 16.88 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 16.88 ns (16.87 ns .. 16.88 ns) std dev 7.316 ps (5.814 ps .. 8.984 ps) benchmarking Good Fibs/15 time 29.59 ns (29.57 ns .. 29.61 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 29.62 ns (29.61 ns .. 29.63 ns) std dev 28.03 ps (23.33 ps .. 33.59 ps) benchmarking Good Fibs/30 time 47.32 ns (47.21 ns .. 47.43 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 47.27 ns (47.20 ns .. 47.35 ns) std dev 231.7 ps (210.4 ps .. 255.3 ps) benchmarking Bad Fibs/4 time 33.35 ns (33.34 ns .. 33.37 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 33.37 ns (33.35 ns .. 33.42 ns) std dev 99.38 ps (25.73 ps .. 205.0 ps) benchmarking Bad Fibs/15 time 9.446 μs (9.444 μs .. 9.448 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 9.435 μs (9.416 μs .. 9.444 μs) std dev 41.64 ns (3.462 ns .. 69.91 ns) benchmarking Bad Fibs/30 time 12.88 ms (12.86 ms .. 12.89 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 12.86 ms (12.85 ms .. 12.87 ms) std dev 23.34 μs (5.597 μs .. 49.02 μs)
As you can see, my
goodFibs does well. It grows mostly linearly with increasingly higher arguments, and all invocations take some amount of nanoseconds.
fibs doesn’t do so well.
fibs 4 takes nanoseconds,
fibs 15 takes microseconds, and
fibs 30 takes milliseconds. Clearly
goodFibs was worth the time put into it, and I’m not one of Knuth’s monsters. Yay me!