Count distinct items with the CVM algorithm

I came across this post on the fediverse the other day, pointing to an interesting article explaining the CVM algorithm. I found the algorithm very intriguing, and thought I would go over it in this post, and try to understand how it works by implementing it myself.

The CVM algorithm, named after its creators, is a procedure to count the number of distinct elements in a collection. In most situations, this is not a hard problem. For example, in F#, one could write something like this:

open System

let rng = Random 42
let data = Array.init 1_000_000 (fun _ -> rng.Next(0, 1000))
data
|> Array.distinct
|> Array.length

val it: int = 1000

We create an array filled with 1 million random numbers between 0 and 999, and directly extract the distinct values, which we then count. Easy peasy.

However, imagine that perhaps your data is so large that you can’t just open it in memory, and perhaps even the distinct items you are trying to count are too large to fit in memory. How would you go about counting the distinct items in your data then?

The CVM algorithm solves that problem. In this post, we will first write a direct, naive implementation of the algorithm as presented in the paper, and try to discuss why it works. Then we’ll test it out on the same example used in the article, counting the words used in Hamlet.

More...

Study notes: Softmax function

I was thinking recently about ways to combine prediction models, which lead me to the Softmax function. This wasn’t my first encounter with it (it appears regularly in machine learning, neural networks in particular), but I never took the time to properly understand how it works. So… let’s take a look!

What is the Softmax function

The Softmax function normalizes a set of N arbitrary real numbers, and converts them into a “probability distribution” over these N values. Stated differently, given N numbers, Softmax will return N numbers, with the following properties:

  • Every output value is between 0.0 and 1.0 (a “probability”),
  • The sum of the output values equals 1.0,
  • The output values ranking is the same as the input values.

In F#, the standard Softmax function could be implemented like so:

let softmax (values: float []) =
    let exponentials = values |> Array.map exp
    let total = exponentials |> Array.sum
    exponentials |> Array.map (fun x -> x / total)
More...

Editing a list in Avalonia FuncUI

In my previous post, I took a look at handling the selected item in an Avalonia ListBox with FuncUI, so the ListBox properly reflects what item is currently selected, based on the current State. In this post, I will go into another aspect of the ListBox that gave me some trouble, handling dynamic updates to the list of items. Once again, this post is nothing particularly fancy, and is mainly intended as notes to myself so I can remember later some of the steps I took.

First, what do I mean by dynamic updates? The examples in the FuncUI docs go over displaying a list of items that do not change. However, in many real world applications, you would want to be able to change that list, in a couple of different ways:

  • adding or removing an item,
  • editing the selected item,
  • filtering the contents of the list.

While editing an item is not particularly complicated in general, and follows the standard Elmish / MVU pattern, one case that tripped me up was editing an item in a fashion that impacts how it is rendered in the list, such as changing the display name of the item. I will go over the solution I landed on, but I am not sure this is the best way to do it, so if anybody can suggest a better approach, I would be very interested in hearing about it!

Anyways, let’s dig into it, and build a simple example illustrating all of these features. The final result will look something like this, and, in case you are impatient, you can find the full code example here.

A dynamic ListBox, with add, delete, edit and filter items

We’ll start from where we left off last time, with a State that contains a collection of Items, and the currently selected item:

type Item = {
    Id: Guid
    Name: string
    }

type State = {
    Items: Item []
    SelectedItemId: Option<Guid>
    }
More...

Handling list view selection in Avalonia FuncUI

After a brief summer hiatus, I am back! I wish this pause was due to exciting vacation plans, but unfortunately, the main reason was that I had a gas leak in my apartment, which ended up disrupting my routine quite a bit. Anyways, I am looking forward to enjoying simple pleasures of life like warm showers or home cooking again hopefully soon.

Today’s post is not anything fancy. I have been working on deskop applications in F# recently, using Avalonia FuncUI, and getting the ListBox to do what I wanted it to do was a bit more involved than I expected. This post is intended mainly as notes to myself, documenting some of the details that tripped me up.

Today’s post will focus on handling selection. I intend to have a follow-up post soon, covering dynamic updates. Until that is published, you can take a look at the full code example on GitHub.

The ListBox in Avalonia FuncUI

The ListBox in Avalonia is a control intended to display a collection of items, and track which item is selected. The documentation gives a pretty good description of its basic usage in FuncUI:

ListBox.create [
    ListBox.dataItems [ "Linux"; "Mac"; "Windows" ]
    ListBox.selectedItem state.os
    ListBox.onSelectedItemChanged (fun os -> dispatch ChangeOs)
]
  • ListBox.dataItems expects a collection of Items to display, which would typically coming from the State,
  • ListBox.onSelectedItemChanged tracks changes of selection,
  • ListBox.selectedItem drives which Item should visually appear as selected in the list.

I will focus only on single-item selection in this post. Multi-selection is also supported, but I haven’t dug into that very much, because this wasn’t something I needed. The use case I am after is very basic:

  • Present a list of items to the user in a ListBox,
  • Allow the user to edit the item currently selected,
  • Highlight the item currently selected in the ListBox.

As it turns out, this was less straightforward than I expected. Let’s dig into it!

More...

Releasing Quipu version 1.0.0, a .NET Nelder-Mead solver

On February 25, 2023, I made the initial commit to Quipu. I needed a Nelder-Mead solver in .NET, and couldn’t find one, so I started writing my own. Today, I am happy to announce version 1.0.0 of Quipu!

What does it do?

Quipu takes in a function, and searches for the arguments that minimize (or maximize) the value of that function. This is a problem that arises in many areas (curve fitting, machine learning, finance, optimization, …).

Let’s demonstrate on a simple example, rather than go into a lengthy explanation. Imagine that we have a fictional factory, where we produce Widgets:

  • We sell Widgets for $12 per unit
  • Producing a Widget costs $5 per unit
  • Shipping widgets: the more Widgets we produce on a day, the further we have to ship to reach customers and sell them. Shipping n Widgets costs us $0.5 * n * n. As a result, the total transportation cost increases rapidly. Shipping 1 Widget would cost us half a dollar only, whereas 10 Widgets would cost us $50 total.

We could represent this fictional model in C# like so:

public class ProfitModel
{
    public static double ProductionCost(double volume)
    {
        return 5 * volume;
    }
    public static double TransportationCost(double volume)
    {
        return 0.5 * (volume * volume);
    }
    public static double Revenue(double volume)
    {
        return 12 * volume;
    }
    public static double Profit(double volume)
    {
        return
            Revenue(volume)
            - ProductionCost(volume)
            - TransportationCost(volume);
    }
}

How many widgets should we produce, if we wanted to maximize our daily profit?

Let’s ask Quipu:

using Quipu.CSharp;

var solverResult =
    NelderMead
        .Objective(ProfitModel.Profit)
        .Maximize();

if (solverResult.HasSolution)
{
    var solution = solverResult.Solution;
    Console.WriteLine($"Solution: {solution.Status}");
    var candidate = solution.Candidate;
    var args = candidate.Arguments;
    var value = candidate.Value;
    Console.WriteLine($"Profit({args[0]:N3}) = {value:N3}");
}

The answer we get from Quipu is:

Solution: Optimal
Profit(7.000) = 24.500
More...