Gradient Boosting using Automatic Differentiation

Today, we’ll close our exploration of Gradient Boosting. First, we looked into a simplified form of the approach, and saw how to combine weak learners into a decent predictor. Then, we implemented a very basic regression tree. Today, we will put all of this together. Instead of stumps, we will progressively fit regression trees to the residuals left by our previous model; and rather than using plain residuals, we will leverage DiffSharp, an F# automatic differentiation library, to generalize the approach to arbitrary loss functions.

I won’t go back over the whole setup again here; instead I will just recap what we have at our disposition so far. Our goal is to predict the quality of a bottle of wine, based on some of its chemical characteristics, using the Wine Quality dataset from the UCI Machine Learning repository. (References: P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.)

Gist available here

We are using a couple of types to model our problem:

type Wine = CsvProvider<"data/winequality-red.csv",";",InferRows=1500>

type Observation = Wine.Row
type Feature = Observation -> float
type Example = Observation * float
type Predictor = Observation -> float

GSD with F#, or how I ported my blog to Jekyll

One of the reasons I use F# so much is that it’s an awesome scripting language to Get Stuff Done. Case in point: this blog. I recently decided to switch from BlogEngine.NET to Jekyll, which meant porting over nearly 9 years of blog posts (about 300), extracting html-formatted content from SQL and converting it to markdown. After a couple of weeks of manual process, I realized that at the current cadence, it would take me about a year to complete, and that by then I would probably have lost my mind out of boredom. Time for some automation with F# scripts!

Automation Source: xkcd


Basic Regression Tree

In our previous installment, we began exploring Gradient Boosting, and outlined how by combining extremely crude regression models - stumps - we could iteratively create a decent prediction model for the quality of wine bottles, using one Feature, one of the chemical measurements we have available.

In and of itself, this is an interesting result: the approach allows us to aggregate mediocre indicators together into a predictor that is better than its individual parts. However, so far, we are using only a tiny subset of the information available. Why restrict ourselves to a single Feature, and not use all of them? And, if the approach works with something as weak as a stump, perhaps we can do better, by aggregating less trivial prediction models?

This will be our goal today: we will create a Regression Tree, which we will in a future installment use in place of stumps in our Boosting procedure.


Exploring Gradient Boosting

I have recently seen the term “gradient boosting” pop up quite a bit, and, as I had no idea what this was about, I got curious. Wikipedia describes Gradient Boosting as

a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees.

The page contains both an outline of the algorithm, and some references, so I figured, what better way to understand it than trying a simple implementation. In this post, I’ll start with a hugely simplified version, and will build up progressively over a couple of posts.


Notes from DotNetFringe 2016

After quite some time on the road, I am finally back in San Francisco. My last stop on the way back was Portland, for the DotNetFringe conference. I’ll be totally honest: after 2 months away from home, I was a bit wiped out, and looking forward to some quiet time in my own place - not necessarily the best mindset heading to a conference. However, as it turns out, I ended up having a fantastic time there, and came back with a nice boost of energy.