Drawing mountains: setting up Bolero

With a return to dungeon master duties, making maps has made a come back in my weekend activities, and I started revisiting an old side-project of mine, drawing mountains in a style similar to topographic maps. The aspect I am mostly interested in is not the contour lines, but rather the usage of shadows to visualize the relief.

My goal here is as follows: given a grid of altitudes describing a terrain, can I draw it and suggest the relief by rendering the shadows of mountains?

I am pretty certain that there must exist solutions to this. I am less interested in the result than in understanding lights and shadows. In other words, this project is entirely pointless, except as an exploration exercise!

In aprevious series of posts, I used SVG to draw geometric figures and found it reasonably enjoyable, so that’s what I decided to use.

One aspect of that previous project wasn’t very pleasant, though: the process of generating documents by manually running a scripts to create a html file, and opening it it the browser to see the results. So I figured I would try something different, and use this as an excuse to give Bolero, the F# WebAssembly library, a spin.

Initial setup

The setup of Bolero was completely straightforward:

  • install the template,
  • create a project, using the --minimal option,
  • start the server with dotnet watch run,
  • go to the browser, a basic elmish app is running, with hot-reload.
More...

Attempting to auto-tune RANSAC, take 2

This post is a continuation of my exploration of the RANSAC algorithm. In my previous post, I began investigating if I could auto-tune some of the input parameters. The first attempt was not a success, but gave me an idea, which will be today’s post.

As a quick recap, RANSAC is a method to estimate a model in the presence of noisy data (so-called outliers). The method requires 2 input parameters:

  • t, “A threshold value to determine data points that are fit well by the model (inlier)”,
  • d, “The number of close data points (inliers) required to assert that the model fits well to the data”.

Rather than having to specify myself these 2 arguments, I would like the model to figure out good values by itself. In my initial attempt I tried to directly estimate the proportion of inliers in the dataset. This time, I will try a different angle: what if I started from pessimistic estimates, assuming many outliers and very high noise, and progressively tightened up the estimates?

More...

Attempting to auto-tune RANSAC

In my previous posts, I looked into the RANSAC algorithm. One thing I wondered about is, could the algorithm usage be made simpler by automatically tuning some of its input parameters? That is, rather than requiring the user to enter parameters, can we derive reasonable values for these parameters from the data? In this post, I will go over my first attempt. This was not a success, in that it did not produce immediately useful results, but it gave me a better understanding of the algorithm, and the process was interesting, so that’s what this post will focus on.

In the algorithm as described in Wikipedia, 2 parameters stand out in particular:

  • t, “A threshold value to determine data points that are fit well by the model (inlier)”,
  • d, “The number of close data points (inliners) required to assert that the model fits well to the data”.

My thinking here was that, given a particular model, I should be able to derive these numbers from the data itself. t, the threshold value, is about how far predictions are from the correct value, something we can measure. d is about how many predictions should be close to their actual values overall. It is related to a different measure, the proportion of outliers. If I had, say, 20% of outliers in my sample, I would expect roughly 80% of my model predictions to be close to the correct value.

RANSAC assumes that you, the user, will input these values. What I wondered
about is, if all I had was a model and a dataset, how I would go about evaluating the value of d. Or, stated differently, could I estimate p(outliers), the proportion of outliers in a dataset?

More...

RANSAC: estimating a model using very noisy data (part 2)

In our previous installment, we set up the stage for our exploration of the RANSAC algorithm, illustrating how traditional linear regression could perform quite badly on a dataset with many suspect observations. In a nutshell, the estimation penalizes large prediction errors more heavily, and as a result, it can over compensate for a few very large errors (so called “outliers”), with a poor model overall.

This issue is particularly prevalent with “standard” linear regression, because it evaluates goodness of fit using the square of the prediction errors, which mechanically penalizes heavily larger errors. However, it applies beyond that specific case. Most estimation techniques proceed by minimizing some measurement of prediction error, and, as large errors are worse than small ones, will usually penalize large errors more heavily, introducing the same issue of sensitivity to outliers.

The RANSAC algorithm attempts to address that issue, using a few simple ideas together. In pseudo-code, slightly modified from the Wikipedia version, the key part of the algorithm goes like this:

- Take a small random subset of the data available, hoping it contains mostly 
"clean" observations, so-called inliners,
- Estimate a prediction model on that small sample,
- Use that prediction model on the entire data. If the error on an observation 
is too large, discard it as an outlier, otherwise keep the observation as a 
potential inliner,
- If the overall number of inliners is too small (i.e. we have too many 
outliers), this model is a bad fit, discard it,
- Otherwise estimate a model on the inliners, and return the result.
More...

RANSAC: estimating a model using very noisy data (part 1)

Recently, as part of a project I am working on, I had to estimate a model to make some predictions. And, as is usually the case, the data available was not very good, with a lot of suspect data points, so called “outliers”. Estimating the parameters of a model becomes tricky then, because a few data points that are very wrong can have an outsized impact on the parameters, and tilt the results of the estimation far off the correct values.

This lead me to the RANSAC algorithm, which is designed to handle that exact problem. And, in my experience, re-implementing an algorithm from scratch is a great way to really understand how it works, so that’s what I did: you can find the current code here.

In this post, I will motivate the problem first, demonstrating how outliers can wreck model estimation. In our next installment, we will go over RANSAC and see if it fares any better!

Let’s illustrate the problem on a simple example. Imagine that we have a model, where we observe a value X and a value Y, and that these 2 follow a simple relationship where

Y = 1 + 0.5 * X

That is, they form a straight line. Now imagine that we have a dataset with a sample of observations for X and Y, with 2 caveats:

  • In half the cases, the observation is completely wrong, and there is no relationship between X and Y,
  • When the observation is correct, the observation is a bit off, and the recorded values are Y = 1 + 0.5 * X + some noise.

Let’s first create a model to represent that:

type Obs = { X: float }

type Parameters = {
    Constant: float
    Slope: float
    }

let predictor (parameters: Parameters) =
    fun (obs: Obs) ->
        parameters.Constant
        + parameters.Slope * obs.X

let trueParameters = {
    Constant = 1.0
    Slope = 0.5
    }

We have observations Obs and model Parameters. The “true” model has parameters { Constant = 1.0; Slope = 0.5 }, which we can use to predict the value Y like so:

predictor trueParameters { X = 1.0 }
val it: float = 1.5
More...