Simple Markov chains in F#

A A Markov

Markov chains are a classic in probability model. They represent systems that evolve between states over time, following a random but stable process which is memoryless. The memoryless-ness is the defining characteristic of Markov processes,  and is known as the Markov property. Roughly speaking, the idea is that if you know the state of the process at time T, you know all there is to know about it – knowing where it was at time T-1 would not give you additional information on where it may be at time T+1.

While Markov models come in multiple flavors, Markov chains with finite discrete states in discrete time are particularly interesting. They describe a system which is changes between discrete states at fixed time intervals, following a transition pattern described by a transition matrix.

Let’s illustrate with a simplistic example. Imagine that you are running an Airline, AcmeAir, operating one plane. The plane goes from city to city, refueling and doing some maintenance (or whatever planes need) every time.

Each time the plane lands somewhere, it can be in three states: early, on-time, or delayed. It’s not unreasonable to think that if our plane landed late somewhere, it may be difficult to catch up with the required operations, and as a result, the likelihood of the plane landing late at its next stop is higher. We could represent this in the following transition matrix (numbers totally made up):

Current \ Next Early On-time Delayed
Early 10% 85% 5%
On-Time 10% 75% 15%
Delayed 5% 60% 35%
More...

Infinite Monkeys and their typewriters

During a recent Internet excursions, I ended up on the Infinite Monkey Theorem wiki page. The infinite monkey is a somewhat famous figure in probability; his fame comes from the following question: suppose you gave a monkey a typewriter, what’s the likelihood that, given enough time randomly typing, he would produce some noteworthy literary output, say, the complete works of Shakespeare?

Somewhat unrelatedly, this made me wonder about the following question: imagine that I had a noteworthy literary output and such a monkey – could I get my computer to distinguish these?

For the sake of experimentation, let’s say that our “tolerable page” is the following paragraph by Jorge Luis Borges:

Everything would be in its blind volumes. Everything: the detailed history of the future, Aeschylus’ The Egyptians, the exact number of times that the waters of the Ganges have reflected the flight of a falcon, the secret and true nature of Rome, the encyclopedia Novalis would have constructed, my dreams and half-dreams at dawn on August 14, 1934, the proof of Pierre Fermat’s theorem, the unwritten chapters of Edwin Drood, those same chapters translated into the language spoken by the Garamantes, the paradoxes Berkeley invented concerning Time but didn’t publish, Urizen’s books of iron, the premature epiphanies of Stephen Dedalus, which would be meaningless before a cycle of a thousand years, the Gnostic Gospel of Basilides, the song the sirens sang, the complete catalog of the Library, the proof of the inaccuracy of that catalog. Everything: but for every sensible line or accurate fact there would be millions of meaningless cacophonies, verbal farragoes, and babblings. Everything: but all the generations of mankind could pass before the dizzying shelves—shelves that obliterate the day and on which chaos lies—ever reward them with a tolerable page.

Assuming my imaginary typewriter-pounding monkey is typing each letter with equal likelihood, my first thought was that by comparison, a text written in English would have more structure and predictability – and we could use Entropy to measure that difference in structure.

More...

Optimizing some old F# code

I am digging back into the Bumblebee code base, to clean it up before talking at the New England F# user group in Boston in June. As usual for me, it’s a humbling experience to face my own code, 6 months later, or, if you are an incorrigible optimist, it’s great to see that I am so much smarter today than a few months ago…

In any case, while toying with one of the samples, I noted that performance was degrading pretty steeply as the size of the problem was increasing. Most of the action revolved around producing random shuffles of a list, so I figured it would be interesting to look into it and see where I messed up how this could be improved upon.

Here is the original code, a quick-and-dirty implementation of the Fisher-Yates shuffle:

open System

let swap fst snd i =
   if i = fst then snd else
   if i = snd then fst else
   i

let shuffle items (rng: Random) =
   let rec shuffleTo items upTo =
      match upTo with
      | 0 -> items
      | _ ->
         let fst = rng.Next(upTo)
         let shuffled = List.permute (swap fst (upTo - 1)) items
         shuffleTo shuffled (upTo - 1)
   let length = List.length items
   shuffleTo items length

[<EntryPoint>]
let main argv = 
     let test = [1..10000]
     let random = new Random()
     let shuffled = shuffle test random
     System.Console.WriteLine("done...")
     0
More...

“For Those About to Mock” presentation material

I presented “For Those About to Mock”, an introduction to Mocking for C# developers, at the San Francisco chapter of Bay .NET last week, and promised I would make the material available.

Here it is: you can download the code “For Those About to Mock” here.

Thanks for everyone who made it, it was a great crowd with lots of good questions – I had a great time!

More...

Convex Hull

For no clear reason, I got interested in Convex Hull algorithms, and decided to see how it would look in F#. First, if you wonder what a Convex Hull is, imagine that you have a set of points in a plane – say, a board – and that you planted a thumbtack on each point. Now take an elastic band, stretch it, and wrap it around the thumbtacks. The elastic band will cling to the outermost tacks, leaving some tacks untouched. The convex hull is the set of tacks that are in contact with the elastic band; it is convex, because if you take any pair of points from the original set, the segment connecting them remains inside the hull.

The picture below illustrates the idea - the blue thumbtacks define the Convex Hull; all the yellow tacks are included within the elastic band, without touching it.

Convex-Hull

There are a few algorithms around to identify the Convex Hull of a set of points in 2 dimensions; I decided to go with Andrew’s monotone chain, because of its simplicity.

The insight of the algorithm is to observe that if you start from the leftmost tack, and follow the elastic downwards, the elastic turns only clockwise, until it reaches the rightmost tack. Similarly, starting from the right upwards, only clockwise turns happen, until the rightmost tack is reached. Given that the left- and right-most tacks belong to the convex hull, the algorithm constructs the upper and lower part of the hull by progressively constructing sequences that contain only clockwise turns.

More...