Maximum Likelihood estimation with Quipu, part 2
11 Jun 2025In my previous post, I went over fitting the parameters of a Log-Normal distribution to a sample of observations, using Maximum Likelihood Estimation (MLE) and Quipu, my Nelder-Mead solver. MLE was overkill for the example I used, but today I want to illustrate some more interesting things you could do with MLE, building up from the same base setup.
Let’s do a quick recap first. I will be using the following libraries:
#r "nuget: MathNet.Numerics, 5.0.0"
#r "nuget: MathNet.Numerics.FSharp, 5.0.0"
#r "nuget: Plotly.NET, 5.0.0"
#r "nuget: Quipu, 0.5.2"
Our starting point is a sample of 100 independent observations, generated by a
Log-Normal distribution with parameters Mu=1.3
and Sigma=0.3
(which
describe the shape of the distribution), like so:
open MathNet.Numerics.Random
open MathNet.Numerics.Distributions
let mu, sigma = 1.3, 0.3
let rng = MersenneTwister 42
let duration = LogNormal(mu, sigma, rng)
let sample =
duration.Samples()
|> Seq.take 100
|> Array.ofSeq
If we want to find a distribution that fits the data, we need a way to compare how well 2 distributions fit the data. The Maximum Likelihood function does just that: it measures how likely it is that a particular distribution could have generated a sample - the higher the number, the higher the likelihood:
let logLikelihood sample distributionDensity =
sample
|> Array.sumBy (fun observation ->
observation
|> distributionDensity
|> log
)
There are many distributions we could try to fit to a sample. If we assume that
a Log-Normal distribution is what generated our observations, the problem
becomes simpler: out of all the parameters Mu
and Sigma
that correspond to
a valid Log-Normal distribution, which pair is the most likely to have produced
what we observe?.
We can then make our log-likelihood function more specific, and use Quipu to find the parameters that maximize that function:
open Quipu
let logLikelihood sample (mu, sigma) =
if sigma < 0.0
then - infinity
else
let distribution = LogNormal(mu, sigma)
let density = distribution.Density
sample
|> Array.sumBy (fun t ->
t
|> density
|> log
)
logLikelihood sample
|> NelderMead.objective
|> NelderMead.maximize
… which happens to return a solution of (1.31, 0.29)
, pretty close to the
values we used to generate that sample, (1.3, 0.3)
.
Is this overkill?
I mentioned earlier that using MLE for this particular example was overkill.
A Log-Normal distribution, as the name suggests, has the following property:
if X
follows a Log-Normal distribution, then Ln(X)
follows a Normal
distribution. The mean Mu
and standard deviation Sigma
of a Normal
distribution can be estimated in a direct fashion from a sample, so we can get
estimates for our Log-Normal by transforming the sample into logarithms, and
using that transformed sample instead:
// convert to log
let logSample = sample |> Array.map log
// mu is the sample average
let muEstimate = logSample |> Array.average
// sigma is the average distance from the average
let sigmaEstimate =
let sampleSize = sample |> Array.length
logSample
|> Array.sumBy (fun x ->
pown (x - mu) 2 / (float (sampleSize - 1))
)
|> sqrt
printfn $"%.2f{muEstimate}, %.2f{sigmaEstimate}"
1.32, 0.30
This is both simpler and faster than using Quipu and MLE. So… why use MLE?
First, while we can easily estimate the parameters of a Log-Normal, this is not the case for other distributions in general. The Log-Normal is a special case, where a shortcut exists. If we thought the observations were generated by something that was not a Log-Normal, say, a Gamma or a Weibull, the MLE approach would still work. All we would need is an expression for the likelihood of an observation, that is, the density of the distribution. As an added benefit, we would also immediately get an answer to the question “is the data I am looking at more likely to be generated by a Log-Normal, or by this other distribution?” just by comparing their log likelihoods.
In the next section, I will follow this train of thought further, and explore what other type of questions you could ask of your data using Maximum Likelihood Estimation.
Censored observations
So far, we looked into the straightforward estimation case, where we have a nice, clean sample of observations. Now let’s look into more interesting things you could do with Maximum Likelihood Estimation.
Imagine for a minute that instead of observing all the data, we had only partial observations. As an example, let’s imagine that we are observing things through a device / sensor that can only record values above a certain level.
Building on our original example, let’s say we have a limit
, and if the true
value is below that limit, we can’t observed what the actual value was, but
only that there was an observation.
For instance, if that limit
was 4
, instead of the original sample, we would
now have the following partial
sample, where None
denotes the case where
we know there was a value, but we don’t have a record of it because it was
below the limit:
let limit = 4.0
let partial =
sample
|> Array.map (fun obs ->
if obs < limit
then None
else Some obs
)
Instead of 100
observations, we only have 42
actual values to work with:
partial
|> Array.choose id
|> Array.length
So how could we estimate the parameters of the underlying Log-Normal?
The naive way to do this would be to ignore the fact that our dataset is censored, and work off just the clean observations, like so:
// don't do this!
partial
|> Array.choose id
|> LogNormal.Estimate
This runs just fine, and estimates the parameters to be Mu=1.60
and
Sigma=0.16
, pretty far off from the correct values. It is incorrect, because
we are treating the sample as if it had been generated by a Log-Normal, but it
hasn’t. It is a Log-Normal, which has been transformed by another function.
Treating it as a regular Log-Normal sample has no reason to produce the correct
estimates.
So what can we do? Maximum Likelihood hinges on having a function that can calculate the likelihood of any observation, let’s see if we can do that. We have 2 cases to consider here:
If the true observation was below the limit=4
, we observe None
. The
likelihood to observe None
for a Log-Normal is therefore the probability that
a value is less than 4
. This we can get using the so-called cumulative
distribution function.
If the true observation was above the limit, we observe Some value
. The
likelihood to observe value
is, like before, the density distribution
function.
Let’s modify our likelihood function to represent that:
let logLikelihood sample (mu, sigma) =
if sigma < 0.0
then - infinity
else
let distribution = LogNormal(mu, sigma)
let density = distribution.Density
let cumulative = distribution.CumulativeDistribution
let likelihood obs =
match obs with
| None -> cumulative limit
| Some value -> density value
sample
|> Seq.sumBy (likelihood >> log)
This function will return the likelihood of any observation, and we can then
re-use the same approach as before, using Quipu
to maximize the
log-likelihood:
partial
|> logLikelihood
|> NelderMead.objective
|> NelderMead.maximize
Quipu
estimates Mu=1.33
and Sigma=0.29
. It is not as good as what we got
using the full sample, which is not a surprise, given that we have only 42
clean observations instead of 100
initially. Still, this is pretty close to
the true values of 1.3
and 0.3
, and definitely much better than what the
“incorrect” estimation gave us!
You could use the exact same technique if the observations were recorded only
up to a certain limit
. In that case, instead of using the cumulative
function, you would compute the likelihood of an observation being greater than
the limit as 1 - cumulative(observation)
.
Unknown limit
In the previous case, we assumed that we knew what the limit
was. What if we
didn’t? Again, we can use Maximum Likelihood, we just need to specify a
function that computes the likelihood of each observation, given the model we
assume.
Here, the transformation is as simple as turning the limit
into a parameter
of the model:
let logLikelihood sample (limit, mu, sigma) =
if sigma < 0.0
then - infinity
else
let distribution = LogNormal(mu, sigma)
let density = distribution.Density
let cumulative = distribution.CumulativeDistribution
let likelihood obs =
match obs with
| None -> cumulative limit
| Some value ->
if value < limit
then - infinity
else density value
sample
|> Seq.sumBy (likelihood >> log)
Instead of assuming that we know the value of the limit
, we turn it into a
third argument to the likelihood function. In essence, what we are now saying
is “how likely is it that the data was generated using a Log-Normal with
parameters Mu
and Sigma
, AND this particular value of limit
?”
Most of the function remains the same, with one small change here:
| Some value ->
if value < limit
then - infinity
else density value
We cannot have an observation below the limit
. Therefore, we make the
likelihood of that -infinity
. This will force the search to avoid impossible
values for limit
.
And we are done! All we need to do next is search for a set of 3 values that maximize the likelihood, like so:
partial
|> logLikelihood
|> NelderMead.objective
|> NelderMead.startFrom (Start.around([ 0.0; 0.0; 1.0 ], 0.1))
|> NelderMead.maximize
Quipu
proposes the following solution: limit=4.01
, Mu=1.31
and
Sigma=0.34
, which is once again fairly close to the correct values.
I had to make one small adjustment to the Quipu
setup to make this work.
After setting the objective function, I direct Quipu
to start the search
around limit=0
, Mu=0
and Sigma=1
, within 0.1
of these 3 values.
The reasons are a bit technical. By default, Quipu
will start its search
around 0
, or, if we are searching for 3 parameters, around 0, 0, 0
. In most
cases, starting around 0
works just fine, but this depends on the
objective function. In our particular situation, it doesn’t, and causes the
solver to terminate without finding a solution.
This was also a reminder that I need to revisit how
Quipu
surfaces solver issues to the user.
I haven’t worked out the details of what goes wrong exactly, but I believe the issue goes something like this. The default initial candidates for 3 parameters (the so-called simplex) ends up being these 4 candidates (see this file for details):
0.577, -0.577, -0.577
-0.577, 0.577, -0.577
-0.577, -0.577, 0.577
0.577, 0.577, 0.577
Now out of these 4 candidates, 2 already have “bad” starting values for
Sigma
(which should always be positive), at -0.577
. This might not be the
only problem, but we are certainly not giving the solver an easy starting
point. So rather than using the defaults, we supply the solver with a starting
point where none of the initial candidates is problematic, which
addresses the issue. And… we are done.
Parting thoughts
I remember being introduced to Maximum Likelihood Estimation while I was still a student, and feeling very intimidated. As a result, I avoided the subject for a long time. I wish I hadn’t! I really enjoy how explicit the approach is: state what you believe the probability model to be, and find the parameters that are the most likely to have produced the observations you have.
My goal with this post was to illustrate how powerful and versatile the technique is. We started with a simple problem, estimating the parameters of a Log-Normal distribution on a sample. In this particular case, Maximum Likelihood Estimation was more complex than necessary. However, we were able to re-use essentially the same scaffold, with minor modifications, to tackle much more complicated cases like censored data. As long as our model involves distributions whose density or cumulative distribution can be evaluated, we can fit a model.
Anyways, this is where I will leave it today!