Maximum Likelihood estimation with Quipu, part 1
28 May 2025Back in 2022, I wrote a post around using Maximum Likelihood Estimation with DiffSharp to analyze the reliability of a production system. Around the same time, I also started developing - and blogging about - Quipu, my F# implementation of the Nelder-Mead algorithm.
The two topics are related. Using gradient descent with DiffSharp worked fine, but wasn’t ideal. For my purposes, it was too slow, and the gradient approach was a little overly complex. This led me to investigate if perhaps a simpler maximization technique like Nelder-Mead would do the job, which in turn led me to develop Quipu.
Fast forward to today: while Quipu is still in pre-release, its core is fairly solid now, so I figured I would revisit the problem, and demonstrate how you could go about using Quipu on a Maximum Likelihood Estimation (or MLE in short) problem.
In this post, we will begin with a simple problem first, to set the stage. In the next installment, we will dive into a more complex case, to illustrate why MLE can be such a powerful technique.
The setup
Imagine that you have a dataset, recording when a piece of equipment experienced failures. You are interested perhaps in simulating that piece of equipment, and therefore want to model the time elapsed between failures. As a starting point, you plot the data as a histogram, and observe something like this:
It looks like observations fall in between 0 and 8, with a peak around 3.
What we would like to do is estimate a distribution that fits the data. Given the shape we are observing, a LogNormal distribution is a plausible candidate. It takes only positive values, which we would expect for durations, and its density climbs to a peak, and then decreases slowly, which is what we observe here.
Let’s create an F# script to illustrate the process, and get the dependencies out of way, loading first the libraries I will be using throughout this post.
#r "nuget: MathNet.Numerics, 5.0.0"
#r "nuget: MathNet.Numerics.FSharp, 5.0.0"
#r "nuget: Plotly.NET, 5.0.0"
#r "nuget: Quipu, 0.5.2"
First, observing data that looks like a LogNormal
isn’t really a surprise, as
I simulated the sample data using a LogNormal
and Math.NET
, using random
parameter values of mu = 1.3
and sigma = 0.3
:
open MathNet.Numerics.Random
open MathNet.Numerics.Distributions
let mu, sigma = 1.3, 0.3
let rng = MersenneTwister 42
let duration = LogNormal(mu, sigma, rng)
let sample =
duration.Samples()
|> Seq.take 100
|> Array.ofSeq
The histogram above was generated using Plotly.NET
:
open Plotly.NET
sample
|> Chart.Histogram
|> Chart.withXAxisStyle "Time between failures"
|> Chart.withYAxisStyle "# Observations"
|> Chart.show
The “true” distribution of that LogNormal
can be plotted using its density
(the function that represents the relative likelihood of observing each value),
which is helpfully available in Math.NET
:
[ 0.0 .. 0.1 .. 10.0 ]
|> List.map (fun t -> t, duration.Density(t))
|> Chart.Line
|> Chart.show
The problem
We happen to know the true distribution behind our sample, a LogNormal
with
parameters (1.3, 0.3)
, because we are the ones who generated that sample in
the first place. However, in real life, we wouldn’t know these parameters. The
problem we are interested in is to figure out what these parameters are, by
using just our sample observations.
Maximum Likelihood Estimation approaches that problem this way: assuming a specific distribution / shape, what is the set of parameters for that distribution that is the most likely to have produced the sample we observe?
In our case, if we think a LogNormal
is a plausible candidate,
what we are looking for is values for the 2 parameters of a LogNormal
, mu
and sigma
, that maximize the likelihood of observing our sample.
Without going into the details of why the formula below works (see this link and this one for more), we can measure how likely it is that a particular probability model generated data by measuring its Log Likelihood, the sum over the sample of the log of the likelihood of each observation.
This is a mouthful, but it isn’t that complicated. Let’s illustrate in code.
In our example, what this means is that we can measure the likelihood that a
LogNormal
with parameters mu
and sigma
produced our sample using the
following function:
let logLikelihood sample (mu, sigma) =
let distribution = LogNormal(mu, sigma)
let density = distribution.Density
sample
|> Array.sumBy (fun t ->
t
|> density
|> log
)
As an example, using the actual parameters (1.3, 0.3)
that we used to generate
our sample, we get:
(1.3, 0.3) |> logLikelihood sample
> -151.8225702
If we try this with a different set of parameters, say (1.0, 1.0)
, we get
(1.0, 1.0) |> logLikelihood sample
> -233.0546586
The log-likelihood using the “wrong” parameters (1.0, 1.0)
is -233
, much
lower than the value we get using the “correct” parameters (1.3, 0.3
, -151
.
In other words, a higher log-likelihood indicates a better fit. The idea here
is to search across the possible parameters, and try to find the pair that
maximizes the log-likelihood, which should give us a good fit.
Solution with Quipu
Finding the parameters that minimize or maximize a function is exactly the type
of problems Quipu
is intended to handle. Quipu
takes an objective
function (the function we are trying to maximize), a starting value for the
parameters (optional, defaulting to 0.0
), and searches by probing different
directions and following the most promising ones until it reaches a solution.
Setting up the problem should be as simple as this:
open Quipu
logLikelihood sample
|> NelderMead.objective
|> NelderMead.maximize
However, this doesn’t quite work:
val it: SolverResult =
Abnormal
[|[|0.2588190451; -0.9659258263|]; [|-0.9659258263; 0.2588190451|];
[|0.7071067812; 0.7071067812|]|]
What is going on here?
The solver signals that it could not complete its search, and encountered an
Abnormal
situation, probing around values (0.25, -0.96)
, (-0.96, 0.25)
and
(0.70, 0.70)
. This makes sense, if you happen to know that the parameter
sigma
of a LogNormal
is expected to be positive:
LogNormal(1.0, -1.0)
> System.ArgumentException: Invalid parametrization for the distribution.
The first of the 3 values the solver is probing, (0.25, -0.96)
, causes an
exception. What can we do?
An easy way to solve this is to add a guard in the log likelihood function, like so:
let logLikelihood sample (mu, sigma) =
if sigma < 0.0
then - infinity
else
let distribution = LogNormal(mu, sigma)
let density = distribution.Density
sample
|> Array.sumBy (fun t ->
t
|> density
|> log
)
If the parameter sigma
is less than 0, instead of instantiating a LogNormal
and causing an exception, we simply return a log-likelihood of - infinity
.
As we are trying to maximize that function, any direction evaluating to
- infinity
will be rejected.
Let’s try again:
#time "on"
logLikelihood sample
|> NelderMead.objective
|> NelderMead.maximize
Real: 00:00:00.002, CPU: 00:00:00.010, GC gen0: 0, gen1: 0, gen2: 0
val it: SolverResult =
Successful
{ Status = Optimal
Candidate = { Arguments = [|1.31780528; 0.2951006595|]
Value = -151.6237793 }
Simplex =
[|[|1.317242263; 0.2953732612|]; [|1.31723014; 0.2948290322|];
[|1.31780528; 0.2951006595|]|] }
The solver proposes an optimal solution of (1.31, 0.29)
, which is pretty
close to the value we used to generate that sample, (1.3, 0.3)
.
How close? Let’s compare the real and estimated densities:
Parting thoughts
When I originally set to write Quipu, my goal was to have a library to solve minimization / maximization problems that was reasonably fast and easy to use. Hopefully this example illustrates the point!
One thing I don’t like at the moment is how issues are surfaced to the user.
The only information I had to figure out why the solver wasn’t happy was that
it encountered an Abnormal
situation, with the input that caused the problem.
This is about as uninformative as it gets, basically “something went wrong,
figure it out”. I’ll revisit that part of the solver, to see if I can surface
more detailed diagnosis, in this case something like “the objective function
threw this exception”.
Otherwise, readers familiar with statistics might be thinking “this example is a bit pointless, because the parameters of a LogNormal can be estimated easily by transforming the sample back to a Normal distribution and computing some averages” - and they would be right. Using MLE in that particular example is a bit of overkill. However, in my next post, I will keep that same setup, but go into some more interesting examples, illustrating the flexibility of Maximum Likelihood Estimation, and why I like it so much!