For many reasons, I am not a fan of the current hype around Large Language
Models (LLMs). However, a few months ago, I was asked to work on a project to
evaluate using LLMs for a practical use case. I figured this would be an
interesting opportunity to see by myself what worked and what didn’t, and
perhaps even change my mind on the overall usefulness of LLMs.
In this post, I will go over some of the things I found interesting.
Caveat: I have a decent knowledge of Machine Learning, but this was my first
foray into LLMs. As a result, this post should not be taken as competent
advice on the topic. It is intended as a beginners’ first impressions.
Context
The client - let’s call them ACME Corp - produces and distributes many products
all over the world. Plenty of useful information about these products, such as
inventory or shipments, are available in a database. Unfortunately, most
employees at ACME Corp have neither access nor a good enough grasp of SQL (or
of the database itself) to make use of that information.
The thought then was to explore if, by using LLMs, we could give users a way to
access that information, in their own language (“what is the current inventory
of sprockets model 12345 in Timbuktu”), without the hurdle of writing complex
SQL queries. And, because ACME Corp is international, “in their own language”
is meant quite literally: the question could be asked in English, as well as in
a wide range of other languages.
At a high level, we want something like this:
Given the time budget on the project, we did not have the option to fine-tune a
model for our domain, and used a “stock” LLM.
In my previous post, I went over fitting the parameters of a Log-Normal
distribution to a sample of observations, using Maximum Likelihood Estimation
(MLE) and Quipu, my Nelder-Mead solver. MLE was overkill for the example I
used, but today I want to illustrate some more interesting things you could do
with MLE, building up from the same base setup.
Let’s do a quick recap first. I will be using the following libraries:
Our starting point is a sample of 100 independent observations, generated by a
Log-Normal distribution with parameters Mu=1.3 and Sigma=0.3 (which
describe the shape of the distribution), like so:
If we want to find a distribution that fits the data, we need a way to compare
how well 2 distributions fit the data. The Maximum Likelihood function does
just that: it measures how likely it is that a particular distribution could
have generated a sample - the higher the number, the higher the likelihood:
The two topics are related. Using gradient descent with DiffSharp worked fine,
but wasn’t ideal. For my purposes, it was too slow, and the gradient approach
was a little overly complex. This led me to investigate if perhaps a
simpler maximization technique like Nelder-Mead would do the job, which in turn
led me to develop Quipu.
Fast forward to today: while Quipu is still in pre-release, its core is fairly
solid now, so I figured I would revisit the problem, and demonstrate how you
could go about using Quipu on a Maximum Likelihood Estimation (or MLE in short)
problem.
In this post, we will begin with a simple problem first, to set the stage. In
the next installment, we will dive into a more complex case, to illustrate why
MLE can be such a powerful technique.
The setup
Imagine that you have a dataset, recording when a piece of equipment
experienced failures. You are interested perhaps in simulating that piece of
equipment, and therefore want to model the time elapsed between failures. As a
starting point, you plot the data as a histogram, and observe something like
this:
It looks like observations fall in between 0 and 8, with a peak around 3.
What we would like to do is estimate a distribution that fits the data. Given
the shape we are observing, a LogNormal distribution is a plausible
candidate. It takes only positive values, which we would expect for durations,
and its density climbs to a peak, and then decreases slowly, which is what we
observe here.
In our last installment, I hit a roadblock. I attempted to
implement Delaunay triangulations using the Bowyer-Watson algorithm,
followed this pseudo-code from Wikipedia,
and ended up with a mostly working F# implementation.
Given a list of points, the code produces a triangulation, but
occasionally the outer boundary of the triangulation is not convex,
displaying bends towards the inside, something that should never
supposed happen for a proper Delaunay triangulation.
While I could not figure out the exact issue, by elimination I
narrowed it down a bit. My guess was that the issue was
probably a missing unstated condition, probably related to the initial
super-triangle. As it turns out, my guess was correct.
The reason I know is, a kind stranger on the internet reached out
with a couple of helpful links (thank you!):
The second link in particular mentions that the Wikipedia page is
indeed missing conditions, and suggests that the initial
super triangle should verify the following property to be valid:
it seems that one should rather demand that the vertices of the
super triangle have to be outside all circumcircles of any three
given points to begin with (which is hard when any three points are almost collinear)
That doesn’t look overly complicated, let’s modify our code
accordingly, and check if this fixes our problem!
During the recent weeks, I have been making slow but steady progress
implementing Delaunay triangulation with the Bowyer-Watson algorithm.
However, as I mentioned in the conclusion of my previous post, I spotted a bug,
which I hoped would be an easy fix, but so far no such luck: it has me stumped.
In this post, I will go over how I approached figuring out the problem, which
is interesting in its own right. My hope is that by talking it through I
might either get an idea, or perhaps someone else will spot what I am missing!
a Delaunay triangulation […] of a set of points in the plane subdivides their
convex hull into triangles whose circumcircles do not contain any of the points;
that is, each circumcircle has its generating points on its circumference, but
all other points in the set are outside of it. This maximizes the size of the
smallest angle in any of the triangles, and tends to avoid sliver triangles.
The Bowyer-Watson algorithm seemed straightforward to implement, so that’s
what I did. For those interested, my current code is here, and it mostly
works. Starting from a list of 20 random points, I can generate a triangulation
like so: