Having fun with Immutable Data Structures: Implementing a Markov Chain in Scala
In this article, I will describe a simple implementation of a Markov chain using Scala. In particular, I will show how to build a purely functional data structure, without the use of any mutable state or variables.
On the way, I will introduce the usage of the builder design pattern, and I will describe a simple method that trains automatically a Markov Chain from a training set.
This article will help in understanding immutable data structures, helping to reach the right mindset since these data structures are not so common in object-oriented programming like Java or C++.
Generally speaking, immutable objects are easier/simpler to reason about. Plus they make concurrent programming easier.
The knowledge of Scala is not mandatory, you can follow also if you have only a basic Java knowledge.
What Is a Markov Chain?
Simply said, a Markov Chain is a directed graph. It is composed of nodes and directed edges, where each edge links a source node to a destination node.
Plus, each edge is labelled with a probability: this is the probability of jumping (or walking) from a current node to a next node. The only constraint is that given a node the sum of the probabilities of the outgoing edges should be 1.
For example in the above chain, if I’m currently sleeping I have the 60% probability of taking a run as next action, the 20% probability of remaining at sleep or the 20% probability of eating ice cream. The process of going from one node to the next is also called “random walk”. The probability of jumping to a particular next state depends only on the current step, the previous path does not help in choosing the next step.
This model helps to solve many mathematical problems (like queueing theory in networking), but also it has many practical applications like studying the interaction of users with a website or to implement text suggestion/autocomplete (given that you are typing “good” what is the probability that your next word will be “morning” or “evening”?).
Implementing a chain builder
The first step is to design a builder for a node of the graph:
The value of the node is a generic T value, then we have a sequence of outgoing links (made up of output state and the probability of walking on that link). Instead of modifying a private field/variable, for each call to linkedTo(to: T, prob: Probability) the copy method will return a new instance of NodeBuilder, containing the old fields plus the new one updated.
When building the chain the probability constraints are relaxed. Only when transforming a NodeBuilder to a MarkovNode I’m checking that the sum of the probability is 1, this is ensured in the toNode method. The list of outgoing states is implemented using an immutable list-like data structure (appending an element to the list will return a new instance of the list without modifying the old instance. Note that the list is not copied, just the new first item points to previous immutable old list).
The chain builder maintains the state of all the nodes of the chain:
I’m using a map for each state T we have a related NodeBuilder containing the outgoing links. The method linkTo(from: T, to: T, prob: Probability) produce a new instance of ChainBuilder containing a new immutable Map (constructed in a similar way as immutable Seq described before, the new Map is constructed pointing to the old Map data when possible, and copying only a few pointers when needed. The scala documentation says that the cost of creating a new immutable Map adding a new item to an old Map is effectively constant time, but this might depend on some assumptions such as maximum length of a vector or distribution of hash keys).
Finally, this is how the builder usage appears to the end-user:
Probabilities are represented as a rational number (Probability(3, 10) = 0.3) to avoid approximations when training the chain from a training set.
The Markov chain implementation
Finally, the Markov chain implementation is trivial. Each node T is pointing to a list of nodes, represented by these classes:
The code here is not complete, the full implementation contains also some convenient methods (like random walks) and some validation in the constructor (ensuring that each destination node is also part of the graph, avoiding dangling edges).
Training the Markov chain from a sequence of steps
I implemented also a utility method that trains a chain from a sequence of ordered steps. For each step, I’m simply counting all the occurrence of distinct next chain steps and using that count to generate the related probability of jump. For example, imagine to train a model from a given textbook:
The Markov chain will then contains for each state (a word) the probability of jumping to a next state (the next word in the phrase). I provided a random walk method that generates a random sequence of states (in this example a random English text).
This is a simple implementation, but on my (not-so-new) laptop, I can easily train a model from large books in a few seconds and then generating random phrases in English using a random walk.
Here you can find more examples and a full implementation: https://github.com/DarioBalinzo/Markov4s