When more data won't help

The following situation happened to me a few days ago. I was working for a client on a model to predict how many customers would cancel their order the next day, using whatever data was available about the order. The model wasn’t too bad, but not quite good enough, so naturally I figured I would just get more data. The more data, the better, right?

Well, maybe. The situation brought back memories from way back, from a class in decision analysis, which considered a similar question: given a decision you have to make, what is the value of acquiring more information?

As it turns out, I ended up doing something different from my initial plan: I told my client that, while the model could be improved with more data, it wasn’t worth the effort.

More...

CNTK: études in F# (sequential models)

In my previous post, I introduced CNTK and how to use it from F#, with some comments on how the .Net API design makes it unpleasant to work with. In this post, I’ll present one direction I have been exploring to address these, to build models by stacking up layers into sequential models.

Let’s start by taking a step back, and briefly explaining what a sequential model is. In our previous post, we stated that the purpose of CNTK was to learn parameters of a Function, to minimize the error observed between known input and output data. That Function is a model, which transforms an input (what we observe) into a output (what we want to predict). The example we used was a simple linear combination, but CNTK supports arbitrarily complex models, created by combining together multiple functions into a single one.

Sequential models are one specific way of combining functions into a model, and are particularly interesting in machine learning. Imagine that you are trying to recognize some pattern in an image, say, a cat. You will probably end up with a pipeline of transformations of filters, along the lines of:

Original Image: pixels -> Gray Scale -> Normalize -> Filter -> … -> 0 or 1: is it a Cat?

As an F# developer, this probably looks eerily familiar, reminescent of pipelining with the |> operator:

[ 1; 2; 3; 4; 5 ] 
|> List.map grayScale 
|> List.map normalize 
|> List.map someOtherOperation
...

Can we achieve something similar with CNTK, to make the creation of models by stacking transformation layers on top of each other? Let’s give it a try.

More...

Baby steps with CNTK and F#

So what have I been up to lately? Obsessing over CNTK, the Microsoft deep-learning library. Specifically, the team released a .NET API, which got me interested in exploring how usable this would be from the F# scripting environment. I started a repository to try out some ideas already, but, before diving into that in later posts, I figure I could start by a simple introduction, to set some context.

First, what problem does CNTK solve?

Imagine that you are interested in predicting something, and that you have data available, both inputs you can observe (the features), and the values you are trying to predict (the labels). Imagine now that you have an idea of the type of relationship between the input and the output, something along the lines of:

labels ≈ function(features, parameters).

More...

Notes from the San Francisco F# Dugnad

We had our first ever F# Dugnad at the San Francisco F# meetup last week! The event worked pretty well, and I figured I could share some quick notes on what we did, what worked, and what could be improved.

The origin story for this event is two-fold. First, the question of how to encourage people to start actively contributing to open source projects has been on my mind for a while. My personal experience with open source has been roughly this. I have always wanted to contribute back to projects, especially the ones that help me daily, but many small things get in the way. I clone a project, struggle for a bit (“how do I build this thing?”), and after some time, I give up. I also remember being terrified when I sent my first pull request - this is a very public process, with the risk of looking foolish in a very public way.

The second element was me coming across the wonderful Dugnad tradition in Norway.

More...

Azure Functions tip: working locally with F# Scripts

I have been working with Azure Functions quite a bit lately. As a result, with more and more functions to develop and maintain, figuring out what a sane workflow might look like has gained some urgency. This post is not particularly deep, and is intended mainly as notes on things I have been trying out to get a decent local development experience with Azure Functions using F# scripts.

First, what is the problem?

While the development experience of Azure Functions in the Azure portal is decent, given the constraints, this is clearly not acceptable for anything beyond work “in the small”. What works for a small script quickly becomes painful for larger function apps: the editor is slow, offers limited support (no Intellisense…), and the workflow ends up being essentially “try out code and hope it works”, with no source control.

What we really want is, not that: we want a decent editor, and the ability to run code locally before committing it to source control and shipping it.

More...