# Baby steps with CNTK and F#

23 Dec 2017So what have I been up to lately? Obsessing over CNTK, the Microsoft deep-learning library. Specifically, the team released a .NET API, which got me interested in exploring how usable this would be from the F# scripting environment. I started a repository to try out some ideas already, but, before diving into that in later posts, I figure I could start by a simple introduction, to set some context.

First, what problem does CNTK solve?

Imagine that you are interested in predicting something, and that you have data available, both inputs you can observe (the `features`

), and the values you are trying to predict (the `labels`

). Imagine now that you have an idea of the type of relationship between the input and the output, something along the lines of:

`labels ≈ function(features, parameters)`

.

To make this more concrete, that function could be quite complex, and involve multiple layers of input transformation into the final output (“deep learning”), or it could be quite simple, for instance a traditional linear regression, something along the lines of:

`car price ≈ car years * coefficient1 + car engine size * coefficient2 + constant`

.

In this particular case, we have 2 features (`car years`

and `car engine size`

), 1 label (`car price`

), and 3 parameters (`coefficient1`

, `coefficient2`

and `constant`

) - and we would like to find “good” values for the 3 parameters so that the predicted value is in general close to the correct value.

The purpose of CNTK is to:

- let you specify a function connecting input and output,
- let you specify how to read example data to learn from,
- learn good parameter values from the example data,
- let you learn parameters on CPU or GPU, for large datasets and complex functions.

With that in mind, let’s take a look at a very basic example, a simple linear regression. Using CNTK here is complete overkill, and not worth the overhead; I would not use it for something that simple. Our goal here is simply to illustrate the basics of how CNTK works, from F#. In future posts, we will look into scenarios where CNTK is actually useful. As a secondary goal, I want to discuss some of the aspects that make building a nice F# API on top of the current .NET one tricky.

## Loading CNTK into the F# scripting environment

First order of business: let’s load this thing into VS Code.

CNTK has a few packages on Nuget, based on what environment you want to run on. In our case, we will focus on a CPU-only scenario, using the CNTK.CPUOnly 2.3.1 package.

We assume that the Ionide-fsharp and Ionide-Paket extensions are installed in VS Code. Open the Folder where you want to work, and run the `Paket: Init`

command (`CTRL`+`SHIFT`+`P` reveals the available commands). This will create a `paket.dependencies`

file in the folder, where you can now specify what packages are needed, like this:

```
framework:net46
source https://www.nuget.org/api/v2
nuget CNTK.CPUOnly
```

Run `Paket: Install`

next, and let Paket do its magic, and download the required packages. Once the operation completes, you should see a new folder, `packages`

, with the following structure:

```
packages
CNTK.CPUOnly
lib
net45
x64
Cntk.Core.Managed-2.3.1.dll
support
x64
Debug
Dependency
Release
```

Let’s start creating the script we will be working with now, by adding an F# script file `CNTK.fsx`

to our folder. Unfortunately, CNTK depends on a few native libraries to run properly. As a result, the setup is a bit more involved than the usual `#r "path/to/library.dll`

. We’ll follow @cdrnet approach to load native libraries described here, and add to the `PATH`

every folder that contains the dlls we need, so `Cntk.Core.Managed-2.3.1.dll`

can find them:

Note: I put the full code used in the post on a gist here

```
open System
open System.IO
Environment.SetEnvironmentVariable("Path",
Environment.GetEnvironmentVariable("Path") + ";" + __SOURCE_DIRECTORY__)
let dependencies = [
"./packages/CNTK.CPUOnly/lib/net45/x64/"
"./packages/CNTK.CPUOnly/support/x64/Dependency/"
"./packages/CNTK.CPUOnly/support/x64/Dependency/Release/"
"./packages/CNTK.CPUOnly/support/x64/Release/"
]
dependencies
|> Seq.iter (fun dep ->
let path = Path.Combine(__SOURCE_DIRECTORY__,dep)
Environment.SetEnvironmentVariable("Path",
Environment.GetEnvironmentVariable("Path") + ";" + path)
)
#I "./packages/CNTK.CPUOnly/lib/net45/x64/"
#I "./packages/CNTK.CPUOnly/support/x64/Dependency/"
#I "./packages/CNTK.CPUOnly/support/x64/Dependency/Release/"
#I "./packages/CNTK.CPUOnly/support/x64/Release/"
#r "./packages/CNTK.CPUOnly/lib/net45/x64/Cntk.Core.Managed-2.3.1.dll"
open CNTK
```

## Creating a Function

We can now start using CNTK in our script. Let’s build a function that takes 2 floats as input, and returns a float as an output, multiplying each of the inputs by a parameter.

A core element in CNTK is the `NDShape`

, for n-dimensional shape. Think of an `NDShape`

as an n-dimensional array. A vector of size 5 would be an NDShape of dimension [ 5 ] (rank 1), a 12x18 image a NDShape [ 12; 18 ] (rank 2), a 10 x 10 RGB image a NDShape [ 10; 10; 3 channels ] (rank 3), and so on. In our case, the input is an array of size 2, and the output an array of size 1:

```
let inputDim = 2
let outputDim = 1
let input = Variable.InputVariable(NDShape.CreateNDShape [inputDim], DataType.Double, "input")
let output = Variable.InputVariable(NDShape.CreateNDShape [outputDim], DataType.Double, "output")
```

Which produces the following output:

```
val inputDim : int = 2
val outputDim : int = 1
val input = Variable
val output = Variable
```

Note how the numeric type of the `Variable`

, `DataType.Double`

, is passed in as a argument, and not generic. Note also how the numeric types are aligned with the C# convention; that is, a `DataType.Double`

is an F# `float`

, and a `DataType.Float`

is an F# `single`

.

We can ask a `Variable`

about its shape, for instance `input.Shape`

:

```
val it : NDShape = CNTK.NDShape { Dimensions = seq [2]; (* more stuff *) Rank = 1; }
```

Let’s create our `Function`

now:

```
let device = DeviceDescriptor.CPUDevice
let predictor =
let dim = input.Shape.[0]
let weights = new Parameter(NDShape.CreateNDShape [dim], DataType.Double, 0.0, device, "weights")
// create an intermediate Function
let product = CNTKLib.TransposeTimes(input, weights)
let constant = new Parameter(NDShape.CreateNDShape [ outputDim ], DataType.Double, 0.0, device, "constant")
CNTKLib.Plus(new Variable(product), constant)
```

```
val device : DeviceDescriptor
val predictor : Function
```

A couple of comments here. Our `predictor`

creates a named `Parameter`

weights of dimension and type matching the input `Variable`

, with values initialized at `0.0`

. We multiply the two shapes together, by calling `CNTKLib.TransposeTimes`

, computing `x1 * w1 + x2 * w2`

, which returns a `Function`

. We then create another `Parameter`

for our constant, and sum them up, using `CNTKLib.Plus`

.

Note how we have to explicitly convert `product`

into a `Variable`

in the final step, using `new Variable(product)`

. `CNTKLib.Plus`

(and the other functions built in `CNTKLib`

) expects 2 `Variable`

arguments. Unfortunately, a `Function`

is not a `Variable`

, and they do not derive from a common class or interface. The .NET API supports implicit conversion between these 2 types, which works well in C#, where you could just sum these up directly, like this: `CNTKLib.Plus(product, constant)`

. F# doesn’t support implicit conversion, and as a result, this requires an annoying amount of explicit manual conversion to combine operations together.

Note also how we passed in `device`

, a `DeviceDescriptor`

, to the `Parameter`

constructor. A CNTK `Function`

is intended to run on a device, which must be specified. In this case, we could have omitted the device, in what case it would have picked up by default `CPU`

.

## Working with CNTK Functions

Now that we have a `Function`

- what can we do with it?

Unsuprisingly, we can pass input to a function, and compute the resulting value. We will do that next. However, before doing that, it’s perhaps useful to put things in perspective, to understand why this isn’t as straightforward as you might expect from something named a function. Once an F# function has been instantiated, its whole purpose is to transform an input value into an output value. The intent of a CNTK `Function`

is subtly different: the objective here is to take a function, and modify its `Parameters`

so that when passed in some input, the output it produces is close to some desired output, the `Labels`

. In other words, we want a `Function`

to be “trainable”: we want to be able to pass it known input/output pairs, and adjust the function parameters to fit the data better.

With that said, let’s evaluate our `predictor`

function. To do that, we will need to do 3 things:

- Supply values to fill in the “input” placeholder shape,
- Specify what values we want to observe - we might be interested in the output, but also the weights, for instance,
- Specify what device we want the function to run on.

Let’s do that:

```
open System.Collections.Generic
let inputValue = Value.CreateBatch(NDShape.CreateNDShape [inputDim], [| 3.0; 5.0 |], device)
let inputMap =
let map = Dictionary<Variable,Value>()
map.Add(input, inputValue)
map
let predictedOutput = predictor.Output
let weights =
predictor.Parameters ()
|> Seq.find (fun p -> p.Name = "weights")
let constant =
predictor.Parameters ()
|> Seq.find (fun p -> p.Name = "constant")
let outputMap =
let map = Dictionary<Variable,Value>()
map.Add(predictedOutput, null)
map.Add(weights, null)
map.Add(constant, null)
map
predictor.Evaluate(inputMap,outputMap,device)
```

To evaluate a `Function`

, we pass it the input we care about, a `Dictionary<Variable,Value>`

, which we fill in with `input`

, the `Variable`

we defined earlier. We provide (completely arbitrarily) a value of `[3.0;5.0]`

as an input value. In a similar fashion, we specify what we want to observe: the predicted value, `predictor.Output`

, as well as the 2 named parameters we created, “weights” and “constant”, which we also retrieve from the `Function`

itself. In this case, we set the `Value`

to `null`

, because we have no input to supply. Finally, we run `predictor.Evaluate`

, which will take the `inputMap`

and fill in the missing values in the `outputMap`

.

We can now review the outputs:

```
let currentPrediction =
outputMap.[predictedOutput].GetDenseData<float>(predictedOutput)
|> Seq.map (fun x -> x |> Seq.toArray)
|> Seq.toArray
let currentWeights =
outputMap.[weights].GetDenseData<float>(weights)
|> Seq.map (fun x -> x |> Seq.toArray)
|> Seq.toArray
let currentConstant =
outputMap.[constant].GetDenseData<float>(constant)
|> Seq.map (fun x -> x |> Seq.toArray)
|> Seq.toArray
```

This is not pretty, but… we have values.

```
val currentPrediction : float [] [] = [| [| 0.0 |] |]
val currentWeights : float [] [] = [| [| 0.0; 0.0 |] |]
val currentConstant : float [] [] = [| [| 0.0 |] |]
```

The values we get back are pretty unexciting, but at least they are what we would expect to see. Given that both weights and constant were initialized at 0.0, the function should produce a `currentPrediction`

of `0.0 * 3.0 + 0.0 * 5.0 + 0.0`

, which is indeed `0.0`

.

Two quick notes here. First, because a value could be of any `DataType`

, we have to manually specify a type when retrieving the values, as in `GetDenseData<float>`

. Then, this is a very stateful model: when we fill in values for the input in the `inputMap`

, we pass in the `input`

instance we initially created to construct the `Function`

. In a similar fashion, we are retrieving values from the instances we passed into the `outputMap`

.

## Training a model

This was pretty painful. So what is our reward for that pain?

As I stated earlier, one defining feature of a `Function`

is that it can be trained. What we mean by that is the following: we can take a `Function`

, supply it batches of input and desired output pairs, and progressively adjust the internal `Parameter`

(s) of the `Function`

so that the values computed by the `Function`

become close(r) to the desired output.

Let’s start with a simple illustration. Suppose for a minute that, for our input `[ 3.0; 5.0 ]`

, we expected a result of `10.0`

. Currently, our weights and constant are set to `0.0`

. By modifying these 3 values, we should be able to tune our `predictor`

to get an answer of `10.0`

.

This is, of course, a silly example. There are many ways I could change the parameters to produce `10.0`

- I could set the constant to `10.0`

, or the second weight to `2.0`

, or infinitely many other combinations. To get something meaningful, I would need many different input/output pairs. However, we’ll start with this, strictly to illustrate the mechanics involved.

Training a `Function`

involves 3 elements:

- Supplying a batch of input / output pairs (features and labels),
- Defining a measure of fit, that is, how to measure if a value is close to the desired value,
- Specifying how parameters should be adjusted to improve the function.

```
let batchInputValue = Value.CreateBatch(NDShape.CreateNDShape [inputDim], [| 3.0; 5.0 |], device)
let batchOutputValue = Value.CreateBatch(NDShape.CreateNDShape [outputDim], [| 10.0 |], device)
let batch =
[
input,batchInputValue
output,batchOutputValue
]
|> dict
let loss = CNTKLib.SquaredError(new Variable(predictor), output, "loss")
let evaluation = CNTKLib.SquaredError(new Variable(predictor), output, "evaluation")
let learningRatePerSample = new TrainingParameterScheduleDouble(0.01, uint32 1)
let learners =
ResizeArray<Learner>(
[
Learner.SGDLearner(predictor.Parameters(), learningRatePerSample)
]
)
let trainer = Trainer.CreateTrainer(predictor, loss, evaluation, learners)
for i in 0 .. 10 do
let _ = trainer.TrainMinibatch(batch, true, device)
trainer.PreviousMinibatchLossAverage () |> printfn "Loss: %f"
trainer.PreviousMinibatchEvaluationAverage () |> printfn "Eval: %f"
```

First, we create a batch of input/output values (`[ 3.0; 5.0 ]`

and `[ 10.0 ]`

), and link them to the `input`

and `output`

`Variable`

(s) we created. Then we define what measure we want to use to determine if a prediction is close or not from the target value. In this case, we use the built-in `CNTKLib.SquaredError`

, which computes the square difference between the predicted value (`new Variable(predictor)`

) and the target value (`output`

). For instance, with the initial weights and constant, the predicted value will be `0.0`

, and we specified that the desired value was `10.0`

, so the `loss`

function will evaluate to `(0.0 - 10.0)^2`

, that is, `100.0`

- and a perfect prediction of `10.0`

would result in a loss of `0.0`

. Finally, without going into much detail, we specify in learners which strategy to apply when updating the function parameters. In this case, we use the built-in Stochastic Gradient Descent (SGD) strategy, with a learning rate of `0.01`

(how aggressively to update the parameters) and a batch size of 1, using only one input/output pair at a time when performing adjustments.

We feed all that into a `Trainer`

, and perform 10 updates (`trainer.TrainMinibatch`

), using the same example input/output each time, and writing out the current value of the loss function:

```
Loss: 100.000000
Eval: 100.000000
Loss: 9.000000
Eval: 9.000000
// omitted intermediate results for brevity
Loss: 0.000000
Eval: 0.000000
Loss: 0.000000
Eval: 0.000000
```

As you can observe, the prediction error decreases rapidly, from `100.0`

initially (as expected), to basically `0.0`

after only 10 steps.

Let’s make this a bit more interesting, by feeding different examples to the model:

```
let realModel (features:float[]) =
3.0 * features.[0] - 2.0 * features.[1] + 5.0
let rng = Random(123456)
let batch () =
let batchSize = 32
let features = [| rng.NextDouble(); rng.NextDouble() |]
let labels = [| realModel features |]
let inputValues = Value.CreateBatch(NDShape.CreateNDShape [inputDim], features, device)
let outputValues = Value.CreateBatch(NDShape.CreateNDShape [outputDim], labels, device)
[
input,inputValues
output,outputValues
]
|> dict
```

Here we simply create a “true” function, `realModel`

, which we use to generate synthetic data. We then modify our previous example, to feed 1,000 different examples for training:

```
#time "on"
for _ in 1 .. 1000 do
let example = batch ()
trainer.TrainMinibatch(example,true,device) |> ignore
trainer.PreviousMinibatchLossAverage () |> printfn "Loss: %f"
```

On my machine, extracting the weights and constant from the `Function`

after training yields `3.0019`

, `-1.9978`

and `4.9975`

- pretty close to the correct values of `3.0`

, `-2.0`

and `5.0`

that we used in `realModel`

.

Note: I put the full code used in the post on a gist here

## Parting thoughts

First, I want to re-iterate that the example we went through is not showcasing a good example of where and how to use CNTK. It is intended primarily as an illustration of CNTK’s building blocks and how they work together. For a trivial linear regression example like this one (shallow learning, if you will), you would be better served with a standard library such as Accord.NET. CNTK becomes interesting if you have a deeper, more complex model, and a larger dataset - we’ll explore this in later posts.

As a side-note, my initial intent was to use real batches for the final example, passing in multiple examples at once, but for reasons I couldn’t figure out yet, the code kept crashing.

My second goal was to explore the design of the current .NET API, as a preliminary step before trying to build an F#-scripting friendly layer on top of it.

In its current state, the CNTK .NET library is fairly low-level, and rather unpleasant to work with from F#. Ideally, one would like to be able to create re-usable blocks and compose them easily, along the lines of the Keras model, using a DSL to, for instance, define a network by stacking standard transformation layers on top of each other.

Such a DSL seems quite possible to achieve in F#, but requires taking into account a few design considerations. First, the choice to use implicit conversion between `Variable`

and `Function`

makes composition of functions in F# painful. This choice is reasonable for C#, but requires re-wrapping every `Function`

into a `Variable`

to string operations together on the F# side.

One aspect I am not a fan of in the library is how the `DeviceDescriptor`

leaks all the way down. With the current model, I could create 2 parameters, one on CPU, one on GPU, and combine them together, which doesn’t make a lot of sense. In an ideal world, I would like to define a `Function`

independently of any device, and only then decide whether I want to train that model on a CPU or a GPU.

Finally, the fact that a `Variable`

or a `Function`

cannot be named after it was instantiated, as far as I can tell, introduces complications in composing blocks together. If naming was separate from instantiation, we could create a function like `named : string -> Function -> Function`

, which could be inserted anywhere.

I haven’t had much time yet to dig into the data readers; so far, most of my efforts have gone into exploring possible directions to address the questions above. If you are interested, the master branch of my repository contains working, straight conversions of the C# examples published by the CNTK team; the results of my explorations can be found in the 3 branches experiment-varorfun, experiment-interpreter and experiment-stacking.

I hope you found something of interest in this post! If you have feedback or suggestions, I would be quite interested to hear about them :) In the meanwhile, I will keep exploring - expect more on the topic in the near future!

## Comments

*Have a comment or a question? Ping me on Twitter, or use the comments section!*