GSD with F#, or how I ported my blog to Jekyll

One of the reasons I use F# so much is that it’s an awesome scripting language to Get Stuff Done. Case in point: this blog. I recently decided to switch from BlogEngine.NET to Jekyll, which meant porting over nearly 9 years of blog posts (about 300), extracting html-formatted content from SQL and converting it to markdown. After a couple of weeks of manual process, I realized that at the current cadence, it would take me about a year to complete, and that by then I would probably have lost my mind out of boredom. Time for some automation with F# scripts!

Automation Source: xkcd


Basic Regression Tree

In our previous installment, we began exploring Gradient Boosting, and outlined how by combining extremely crude regression models - stumps - we could iteratively create a decent prediction model for the quality of wine bottles, using one Feature, one of the chemical measurements we have available.

In and of itself, this is an interesting result: the approach allows us to aggregate mediocre indicators together into a predictor that is better than its individual parts. However, so far, we are using only a tiny subset of the information available. Why restrict ourselves to a single Feature, and not use all of them? And, if the approach works with something as weak as a stump, perhaps we can do better, by aggregating less trivial prediction models?

This will be our goal today: we will create a Regression Tree, which we will in a future installment use in place of stumps in our Boosting procedure.


Exploring Gradient Boosting

I have recently seen the term “gradient boosting” pop up quite a bit, and, as I had no idea what this was about, I got curious. Wikipedia describes Gradient Boosting as

a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees.

The page contains both an outline of the algorithm, and some references, so I figured, what better way to understand it than trying a simple implementation. In this post, I’ll start with a hugely simplified version, and will build up progressively over a couple of posts.


Notes from DotNetFringe 2016

After quite some time on the road, I am finally back in San Francisco. My last stop on the way back was Portland, for the DotNetFringe conference. I’ll be totally honest: after 2 months away from home, I was a bit wiped out, and looking forward to some quiet time in my own place - not necessarily the best mindset heading to a conference. However, as it turns out, I ended up having a fantastic time there, and came back with a nice boost of energy.


Try MBrace hassle-free with MBrace Minimal

I have been spending quite a bit of time lately with MBrace, a wonderful library that allows you to scale data processing or run heavy work-loads on a cloud cluster, using simple F# scripts. The library is very nicely documented, and comes with a Starter Kit project that contains all you need to provision a cluster, together with many scripts illustrating various use cases.

This is great, but… if you just want to play with the library and get a sense for what it does, it might be a bit initimidating. Furthermore, not everyone has an Azure subscription ready, which creates a bit of friction. So I figured, let’s try to create the smallest possible project that would allow someone to try out MBrace, without any Azure subscription needed.