Using the ALGLIB random forest with F#

The intent of this post is primarily practical. During the Kaggle Home Depot competition, we ended up using the Random Forest implementation of ALGLIB, which worked quite well for us. Taylor Wood did all the work figuring out how to use it, and I wanted to document some of its aspects, as a reminder to myself, and to provide a starting point for others who might need a Random Forest from F#.

The other reason I wanted to do this is, I have been quite interested lately in the idea of developing a DSL to specify a machine learning model, which could be fed to various algorithms implementation via simple adapters. In that context, I thought taking a look at ALGLIB and how they approached data modelling could be useful.

I won’t discuss the Random Forest algorithm itself; my goal here will be to “just use it”. In order to do this, I will be using the Titanic dataset from the Kaggle “Learning From Disaster” competition. I like that dataset because it’s not too big, but it hits many interesting problems: missing data, features of different types, … I will be using it two ways, for classification (as is usually the case), but also for regression.

Let’s dive in the ALGLIB random forest. The library is available as a nuget package, alglibnet2. To use it, simply reference the assembly #r @"alglibnet2/lib/alglibnet2.dll"; you can then immediately train a random forest, using the alglib.dfbuildrandomdecisionforest method - no need to open any namespace. The training method comes in 2 flavors, alglib.dfbuildrandomdecisionforest and alglib.dfbuildrandomdecisionforestx1. The first one is a specialization of the second one, which takes an additional argument; therefore, I’ll work on the second, most general version.


Gradient Boosting using Automatic Differentiation

Today, we’ll close our exploration of Gradient Boosting. First, we looked into a simplified form of the approach, and saw how to combine weak learners into a decent predictor. Then, we implemented a very basic regression tree. Today, we will put all of this together. Instead of stumps, we will progressively fit regression trees to the residuals left by our previous model; and rather than using plain residuals, we will leverage DiffSharp, an F# automatic differentiation library, to generalize the approach to arbitrary loss functions.

I won’t go back over the whole setup again here; instead I will just recap what we have at our disposition so far. Our goal is to predict the quality of a bottle of wine, based on some of its chemical characteristics, using the Wine Quality dataset from the UCI Machine Learning repository. (References: P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.)

Gist available here

We are using a couple of types to model our problem:

type Wine = CsvProvider<"data/winequality-red.csv",";",InferRows=1500>

type Observation = Wine.Row
type Feature = Observation -> float
type Example = Observation * float
type Predictor = Observation -> float

GSD with F#, or how I ported my blog to Jekyll

One of the reasons I use F# so much is that it’s an awesome scripting language to Get Stuff Done. Case in point: this blog. I recently decided to switch from BlogEngine.NET to Jekyll, which meant porting over nearly 9 years of blog posts (about 300), extracting html-formatted content from SQL and converting it to markdown. After a couple of weeks of manual process, I realized that at the current cadence, it would take me about a year to complete, and that by then I would probably have lost my mind out of boredom. Time for some automation with F# scripts!

Automation Source: xkcd


Basic Regression Tree

In our previous installment, we began exploring Gradient Boosting, and outlined how by combining extremely crude regression models - stumps - we could iteratively create a decent prediction model for the quality of wine bottles, using one Feature, one of the chemical measurements we have available.

In and of itself, this is an interesting result: the approach allows us to aggregate mediocre indicators together into a predictor that is better than its individual parts. However, so far, we are using only a tiny subset of the information available. Why restrict ourselves to a single Feature, and not use all of them? And, if the approach works with something as weak as a stump, perhaps we can do better, by aggregating less trivial prediction models?

This will be our goal today: we will create a Regression Tree, which we will in a future installment use in place of stumps in our Boosting procedure.


Exploring Gradient Boosting

I have recently seen the term “gradient boosting” pop up quite a bit, and, as I had no idea what this was about, I got curious. Wikipedia describes Gradient Boosting as

a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees.

The page contains both an outline of the algorithm, and some references, so I figured, what better way to understand it than trying a simple implementation. In this post, I’ll start with a hugely simplified version, and will build up progressively over a couple of posts.