year ++ 2013: my crazy year with F#

Tis’ still the Season for yearly retrospectives, and making foolish predictions or commitments; here is a very incomplete and disorganized review of my year 2013 with F#, and some of my take-aways for the year ahead.

2013 has been a CRAZY year for me. I used to be proud of myself when I gave one talk per quarter – this is the map of places where I gave F# presentations / Dojos this year (note that I spoke multiple times in some of these places, and some online talks are not listed…):

So yes, it’s been a crazy year. One thing I find interesting here is that most of these talks were direct requests from the community. Not so long ago, I had to knock on doors and “sell” F# talks to user groups – recently, my worry is that I won’t be able to keep up with the requests for F# presentations. In my opinion, the interest in F# is currently totally under-estimated; I was stunned at how many developers showed up for some of these, in unexpected places.

What 2013 has taught me is that the picture “the F# community is all finance in London and NYC” is rather incorrect; while they have the largest concentrations of F# developers, there are also incredibly passionate developers all over the place, and the interest for the language is widespread. The problem here is that the community is pretty scattered; there are many trees, but it’s easy not to notice the sparse forest (yes I am looking at you, Microsoft :)).

I believe one of the main reasons for the current surge in interest is the fantastic work happening around the F# Foundation. It’s been instrumental in shaping a consistent message around the language, providing resources, and getting the community to coalesce around the idea “F# is our language, let’s make it what we want to be”. Huge props and thanks to Don Syme, Tomas Petricek and Philip Trelford for getting the ball rolling there – what has happened in very little time is amazing. And if you want to get involved in a language with an amazing community, where you can make a difference and help shape an ecosystem, GO TO FSHARP.ORG. NOW. We want you!

It’s been a crazy amount of fun, but I have been flirting with burnout quite a bit, too, hence my new year resolution:

As much as I would love to keep going everywhere talking about F# to anyone who expresses interest (my 2013 policy in a nutshell…), I’ll have to focus in 2014 and calm down a bit. Instead of talking everywhere (I’ll probably still come if you ask nicely and offer a couch to crash on ;) ), I want to work on scaling.

One format that has worked extremely well is hands-on Dojos: instead of a formal presentation, just get people to code together on an interesting problem, in a fun and friendly atmosphere. It’s great for bootstrapping people, and has the added benefit of being more centered on the community itself, and less on a speaker. So one of my goals this year is to begin building a library of ready-to-use Dojos, which groups can simply grab and run, without the hassle of finding speakers, something which is always a bottleneck for Meetups / user groups. I plan on doing this via Community for F# (@c4fsharp), the brain child of functional cow-boy Ryan Riley. If you are interested in that project, and in general in questions around growing a local community, I’d love to hear from you!

In a similar vein, I want to spend more time in my own backyard, San Francisco and the Bay Area, and help grow a stronger, inclusive, economically viable F# community there. Lots of reasons for optimism: we already have a strong, passionate, and growing community (hello, @FoxyJackFox!), we have stable hosting for sfsharp.org at ThoughtWorks (thanks for your support and enthusiasm, Logan!), and seeing exciting companies like GitHub, Xamarin or Kaggle embrace F# is awesome. The goal for 2014 is simple: crank up the level with Dojos and talks in SF, and start an outpost in the Silicon Valley (I hear there are some developers there, too).

As an aside, I wanted to tip my hat off Bryan Hunter, whose ideas on community building have been very inspirational. Nashville is slowly becoming a hotbed of functional programmers, and seems to be the place to be for F#ers lately; and I am sure this is in no small part due to Bryan’s focus on building a community that emphasizes cross-language, inclusion, empowerment, and dare I say, happiness.

What else? A big part of my year has been focused on my brand-new hashtag (vocation?) #OpenSourceMom ©. If you follow this blog, it shouldn’t come as a surprise to hear that I am very, very interested in Machine Learning and Data Science. I have been busy doing my best helping F# gain the recognition it deserves in that space (in part for selfish reasons: I think it’s a fantastic language for the job, and I want to be able to use it as much as possible), and that has lead me to try and help the community work better together. It never ceases to amaze me how much high-quality code the community has produced already; at the same time, writing code alone is only fun for that long, and because we are so dispersed, good ideas go unfinished or unnoticed, which is a shame. So while I prefer writing my own code (and not write any documentation for it), I was very excited when the F# foundation began launching working groups, and did my best to take a backseat and just try to facilitate communication and cooperation in the area of data science. It has been a fantastic experience, and I am incredibly happy with the results, and the opportunity this has given me to get to know better, and learn a lot, from all you guys (you know who you are). Also, a tip of the hat to Keith and his “Up for Grabs” initiative, which I hope we’ll get to leverage more this year – it’s IMO a great way to channel help, as well as provide easy entry points for beginners who are interested in getting started with a new language. Oh, and this was also the year of my first pull request ever :)

Finally, one highpoint of the year was the month of December, which gave me the chance to get to know the European community better. I had the pleasure to speak at BuildStuff in Vilnius, which was a fantastic conference. Greg, Neringa and Laura put together the kind of event you know you won’t forget – great speakers, of course, but also, and perhaps more importantly, an event with a soul. So thank you guys, and everyone I had the pleasure to talk to there! Oh, and by the way registration is open for 2014, and the price is unbelievable. Go there, buy your ticket now, you’ll thank me later.

While in Europe, I figured I might as well travel around a bit; isn’t that what people do while on vacation? So I went to visit the F# communities in Paris, London and Minsk, which was a blast. Having no organized F# community in France, a country with a strong OCaml history and my place of origin, was a thorn in my side for a long time; that problem has since been solved, the Paris meetup is in very good hands, and I was thrilled to speak there. Similarly, taking the trip to Minsk to speak at that user group was awesome. It was so great to finally meet Natallie and Serguey in person, after years of online contact! And I don’t know what they put in the water in Minsk (Vitamin F, maybe?) but the talent level there is just unbelievable. And I capped that year with London, which I expected to be great, and totally delivered.

So yes, this has been a pretty crazy year of F# for me. At the same time, this has been one of my most fun and rewarding years – all because of you, the F# community. I don’t know how to say it better, but this community just completely, utterly, massively kicks ass. Which makes me even more grateful and humbled that I got nominated F# MVP of the year for 2013. So from the bottom of my heart, thank you – even if it was a grueling year at times, you made it all worth it, and I can’t wait to see what we’ll do together this year. Happy 2014 – the Year of F#!

MVP-of-the-year

More...

Safe Refactoring with Units of Measure

A couple of weeks ago, I had the pleasure to attend Progressive F# Tutorials in NYC. The conference was fantastic – two days of hands-on workshops, great organization by the good folks at SkillsMatter, Rickasaurus and Paul Blasucci, and a great opportunity to exchange with like-minded people, catch up with old friends and make new ones.

After some discussion with Phil Trelford, we decided it would be a lot of fun to organize a workshop around PacMan. Phil has a long history with game development, and a lot of wisdom to share on the topic. I am a total n00b as far as game programming goes, but I thought PacMan would make a fun theme to hack some AI, so I set to refactor some of Phil’s old code, and transform it into a “coding playground” where people could tinker with how PacMan and the Ghosts behave, and make them smarter.

More...

First steps with Accord.NET SVM in F#

Recently, Cesar De Souza began moving his .NET machine learning library, Accord.NET, from Google Code to GitHub. The move is still in progress, but that motivated me to take a closer look at the library; given that it is built in C#, with an intended C# usage in mind, I wanted to see how usable it is from F#.

There is a lot in the library; as a starting point, I decided I would try out its Support Vector Machine (SVM), a classic machine learning algorithm, and run it on a classic problem, automatically recognizing hand-written digits. The dataset I will be using here is a subset of the Kaggle Digit Recognizer contest; each example in the dataset is a 28x28 grayscale pixels image, the result of scanning a number written down by a human, and what the actual number is. From that original dataset, I sampled 5,000 examples, which will be used to train the algorithm, and another 500 in a validation set, which we’ll use to evaluate the performance of the model on data it hasn’t “seen before”.

More...

CSV Type Provider, now with more awesome

About a month ago, FSharp.Data released version 1.1.9, which contains some very nice improvements – you can find them listed on Gustavo Guerra’s blog. I was particularly excited by the changes made to the CSV Type Provider, because they make my life digging through datasets even simpler, but couldn’t find the time to write about it, because of my recent cross-country peregrinations.

Now that I am back, let’s talk about why this update made me so happy, with a concrete example. My latest week-end project is an F# implementation of Random Forests; as part of the process, I am trying out the algorithm on various datasets, to get a sense for potential performance problems, and dog-food my own API, the best way I know to quickly spot suckiness.

One of the problems I ran into was the representation of missing values. Most datasets don’t come clean and ready to use – usually you’ll have a few records with missing data. I opted for what seemed the most straightforward representation in F#, and decided to represent every feature value as an Option – anything can either have Some value, or None.

The original CSV Type Provider introduced a bit of friction there, because it inferred types “optimistically”: if the sample used contained only integers, it would create an integer, which is great in most cases, except when you want to be “pessimistic” (which is usually a safe world-view when setting expectations regarding data).

The new-and-improved CSV Type Provider fixes that, and introduces a few niceties. Case in point: the Kaggle Titanic dataset, which contains the Titanic’s passenger list. With the new version, extracting the data is as simple as this:

type DataSet = CsvProvider<"titanic.csv", 
                           Schema="PassengerId=int, Pclass->Class, Parch->ParentsOrChildren, SibSp->SiblingsOrSpouse", 
                           SafeMode=true, 
                           PreferOptionals=true>

type Passenger = DataSet.Row

This is pretty awesome. In a couple of lines, just by passing in the path to my CSV file and some (optional) schema information, I get a Passenger type:

All properties are optional

What’s neat here is that first, I immediately get a Passenger with properties – with the correct Optional types, thanks to SafeMode and PreferOptional. Then, notice in the Schema the Pclass->Class, Parch->ParentsOrChildren, SibSp->SiblingsOrSpouse bit? This renames “on the fly” the properties; instead of the pretty obscurely named Parch feature coming from the CSV file header, I get a nice and readable ParentsOrChildren property. The Type Provider even does a few more cool things, automagically; for instance, the feature “Survived”, which is encoded in the original dataset as 0 or 1, gets automatically converted to a boolean. Really nice.

And just like that, I can now use this CSV file, and send it to my (still very much in alpha version) Decision Tree classifier:

// We read the training set into an array,
// defining the Label we want to classify on:
let training =
    use data = new DataSet()
    [| for passenger in data.Data -> 
        passenger.Survived |> Categorical, // the label
        passenger |]
// We define what features should be used:
let features = [|
    "Sex", (fun (x:Passenger) -> x.Sex |> Categorical);
    "Class", (fun x -> x.Class |> Categorical); |]
// We run the classifier...
let classifier, report = createID3Classifier training features { DefaultID3Config with DetailLevel = Verbose }
// ... and display the resulting tree:
report.Value.Pretty()

… which produces the following results in the F# Interactive window:

> titanicDemo();;
├ Sex = male
│   ├ Class = 3 → False
│   ├ Class = 1 → False
│   └ Class = 2 → False
└ Sex = female
   ├ Class = 3 → False
   ├ Class = 1 → True
   └ Class = 2 → True
val it : unit = ()
>

The morale of the story here is triple. First, it was a much better idea to be a rich lady on the Titanic, rather than a (poor) dude. Then, Type Providers are really awesome – in a couple of lines, we extracted from a CSV file a collection of Passengers, all of them statically typed, with all the benefits attached to that; in a way, this is the best of both worlds – access the data as easily as with a dynamic language, but with all the benefits of types. Finally, the F# community is just awesome – big thanks to everyone who contributed to FSharp.Data, and specifically to @ovatsus for the recent improvements to the CSV Type Provider!

You can find the full Titanic example here on GitHub

More...

“Summer of F#” Tour

It looks like this summer will be my strangest vacation in a while – I’ll be taking a F# road trip of sorts in August, talking about F# at user groups all over the United States. How this crazy plan took shape exactly I am not quite sure in retrospect, but I am really looking forward to meeting all the local communities – this will be fun!

As of July 13th July 28th, here is the plan:

… and a few more should be added to the list soon! I’ll let you extrapolate what possible cities could be following, given the map below. Stay tuned for updates.


View Larger Map View Larger Map

Huge thanks to the people who helped make this happen – I am sure I forgot some of you, sorry about that, and I’ll owe you a beer when I visit your city!

… and of course @INETA!

More...