Quipu 0.2.2, a Nelder-Mead solver with a side of NaN
20 Apr 2024The main reason I created Quipu is that I needed a Nelder-Mead solver for a
real-world project. And, as I put Quipu through its paces on real-world data,
I ran into some issues, revolving around “Not a Number” floating point values,
aka NaN
.
tl;dr: the latest release of Quipu, version 0.2.2, available on nuget, should handle
NaN
values decently well, and has some minor performance improvements, too.
In this post, I will go over some of the changes I made, and why. Fixing the main issue made me realize that I didn’t know floating point numbers as well as I thought, even though I have been using them every working day for years. I will take some tangents to discuss some of my learnings.
So let’s dig in. My goal with Quipu was to implement the Nelder-Mead algorithm in F#. The purpose of Nelder-Mead is to find a numeric approximation of values that minimize a function. As an illustration, if we took the function $f(x) = x ^ 2$, we would like to know what value of $x$ produces the smallest possible value for $f(x)$, which happens to be $x = 0$ in this case.
Quipu handles that case just fine:
open Quipu.NelderMead
let f x = x ** 2.0
f
|> NelderMead.minimize
|> NelderMead.solve
val it: Solution = Optimal (0.0, [|0.0|])
So far, so good. Now, what about a function like $f(x) = \sqrt{x}$?
That function is interesting for 2 reasons:
- $f(x)$ has a minimum, for $x = 0$.
- $f(x)$ is defined only for $x \ge 0$: it is a partial function.
Sadly, the previous version of Quipu, version 0.2.1, failed to find it. It would go into an infinite loop instead.
The Weird World of Floating Point Numbers
Why is that? Assuming we are working with real numbers (as opposed to imaginary ones), from the mathematician standpoint, the function $\sqrt{x}$ is not defined for strictly negative real numbers. Negative numbers are not part of the function domain: a negative number is not a “valid input”, and so there is no output for such a number.
However, in code, we don’t work with real numbers. We use floating point numbers. As a result, this will run:
> sqrt -1.0;;
val it: float = nan
From that standpoint, nan
, or Not a Number, could be thought of as an
implicit encoding of the function domain of definition. Where a mathematician
would say “$f$ is defined only for this subset of real numbers”, with floating
point numbers, we have “if $f(x) = nan$, then $x$ is not part of the domain of
definition”.
At first glance, one would think that floating point
numbers are more or less equivalent to real numbers. However, floating point
numbers are an interesting beast. If you look up System.Double
, you will
notice a few fields:
NaN
PositiveInfinity
,NegativeInfinity
MaxValue
,MinValue
PositiveInfinity
and NegativeInfinity
are sensible, and correspond to the
extended real number line.
MaxValue
and MinValue
make sense from a practical standpoint, but are
amusing:
> let x = System.Double.MaxValue;;
val x: float = 1.797693135e+308
> x = x + 1.0;;
val it: bool = true
> x + x
val it: float = infinity
… which reminded me of this toot:
As a side note, I could not quite figure out what the rules were to go from
MaxValue
toPositiveInfinity
.
But the one that is really interesting is NaN
. If floating point numbers were
representing real numbers, then by definition NaN
could not be part of it,
because, well, NaN
is not a number. NaN
makes the whole idea of floats
representing reals a very weird can of worms. For instance, what are we saying
when we write, for instance, NaN + NaN
? How is that akin to a real number?
So NaN
is weird, but it is also convenient. Contrast floats with integers:
where 1 / 0
will throw an expensive exception, 1.0 / 0.0
will
return NaN
, fast. Speed is good - the price we have to pay for it is, it is
up to us to remember that floats are a lie that could result in NaN
potentially anywhere, and it is up to us to deal with it.
My mental picture for floats at that point is that they are not that far off an
Option
type, where NaN
is the None
case, and everything else is a
“proper number”. Overall, they propagate through float function in a way that
is similar to Option.map
: once a NaN
enters the flow of calculations, it
will flow through and remain a NaN
.
This is not entirely true. An odd tidbit of floating point trivia:
nan ** 0.0
is equal to…1.0
. While it is true that any real number raised to the power 0 is 1, whyNaN
has been included here is a little bizarre!
Anyways, after this long digression on floating point numbers, let’s go back to the original topic, the solver bug. If you are interested in the topic of floats, I enjoyed reading this section on the design rationale of IEEE 754.
The bug and the fix
So to summarize the long meandering tangent above: floating point numbers NaN
signal that a function inputs are outside of its domain of definition, and can
arise anywhere floating point calculations are involved.
In the case of the Nelder-Mead algorithm, this is something we want to take into account, in case the function we are trying to minimize is partially defined, like $f(x) = \sqrt{x}$.
We can break down how the algorithm works in a few steps:
1) It maintains a set of candidate points, the simplex,
2) It tries to replace the worst point with a better one, to move to a “more promising search area”,
3) If that fails, the points are possibly surrounding the optimum value, so it shrinks them towards the best candidate and narrows down the “search area”,
4) When the “search area” is small enough, it stops the search.
If, as in the case of $f(x) = \sqrt{x}$, the function is partially defined, we
will run into a couple of issues. We cannot reach step (3), because if we have
points around 0 (the optimum value), then some points will be negative. These
points will be outside the domain of definition (NaN
), and we won’t be able
to evaluate if that point is better or worse than others.
In other words, we need to ensure that at every step, every point in our simplex remains within the domain of definition, so we can evaluate whether it is better or worse.
This also means that our simplex will never be surrounding the optimum value. However, assuming the function is not too degenerate, it can work at finding a local minumum. Instead of finding a good area and shrinking, it will move towards a better area, until it gets too close to the boundaries of the domain, shrink, then move again closer towards the optimum, taking a smaller step, and possibly get close enough to the optimum to stop.
One interesting aspect of Nelder-Mead is how the steps are taken. Where a method like gradient descent decides on the direction and amplitude of the step based on the objective function, Nelder-Mead proceeds differently:
1) It tries a few candidate steps that depend only on the simplex. The direction and amplitude of the step does not depend on the objective function.
2) It evaluates each candidate step using the objective function.
This is helpful. It implies that all we have to do is make sure that in (2) we
never take a step outside of the domain of definition, which only requires
checking that the value of the objective function is not NaN
.
Another potential issue is making sure that the simplex itself always remains
“clean”, that is, we need to ensure that points in the simplex never contains
NaN
values. The nice thing here is that, if you look at the way the candidate
steps are computed, they are all linear combinations of the simplex points. As
adding and multiplying numbers that are not NaN
will always produce a value
that is not NaN
, all we need to do is check once that the initial values
contain no NaN
. If that is the case, then mechanically every step taken will
also result in a NaN
-free simplex.
And that is pretty much what I did. I added a pre-check to enforce a valid initial simplex, and tracked down every evaluation of the objective function, removing candidate steps that would lead outside of the domain of definition.
Does it work? I am pretty sure it will not always work (Nelder-Mead does not guarantee a global optimum anyways), but it does work on some examples, like our starting problem:
let f x = sqrt x
f
|> NelderMead.minimize
|> NelderMead.startFrom (StartingPoint.fromValue [ 1.0 ])
|> NelderMead.solve
val it: Solution = Optimal (1.490116119e-08, [|2.220446049e-16|])
That’s what I got for today! Hope you found something interesting in this post. If you are interested in the code or the specific changes I made, you can find it on GitHub. Cheers!