Markov Chains in Oolong
Oolong uses Markov chains to randomly generate words. This is the same probability model that many predictive services are based on, such as the word prediction on your phone’s keyboard. The underlying concept of Markov chains is simple: the probability of every possible next value can be determined from the current value, but this section will go more into detail about how Oolong uses Markov chains. You can also generate words step-by-step in Oolong using the “Next Step” button if you’d like a better understanding of the process.
Example 1
First, consider a very simple case: an Markov chain built using a two input words with an order of 1, meaning to only look at the last letter when deciding the next letter. In this example, let’s use the words hello
and oolong
. If you want to try this in Oolong make sure to set your order to 1 and your data type to “Words”, then type hello oolong
into the text input and select the “Update from Text” button.
- One word starts with an
h
, while the other starts with ano
, so there is a 1⁄2 chance of choosing anh
first, and a 1⁄2 chance of choosing ano
. For this example, we’ll start withh
. - Every time there’s an
h
, it is followed by ane
, so there is a 100% chance that the next letter in our word is ane
. - Our current word is
he
, but since this is a Markov chain with an order of 1 we only look at the last letter. Everye
is followed by anl
, so the next letter in our word isl
. - There are three
l
s in our input: one is followed by anotherl
while the other two are followed by ano
, so there is a 1⁄3 chance of the next letter being anl
and a 2⁄3 chance of the next letter being ano
. We’ll go with the more likely option and chooseo
. - Again, there are three
o
s in our input: one is followed by anothero
, one is followed by ann
, and the word ends after the other one. This gives a 1⁄3 chance of the next letter beingo
, a 1⁄3 chance to ben
, and a 1⁄3 chance for the word to end here. Let’s choosen
. - Every
n
is followed by ag
, so the next letter will beg
. - Every time there is a
g
, the word ends, so there is a 100% chance that our word will end. - This gives us our final word:
helong
.
Example 2
Let’s try another example, this time with an order of 3 using the words tree
, eels
, and reeler
. Remember to set the order to 3 and the data type to “Words” if you want to step through this in Oolong, then type tree eels reeler
into the text input and select “Update from Text”.
- There is a 1⁄3 chance to start with each of
t
,e
, orr
. Let’s pickr
for this example. - The order is 3, but we only have one letter in our word, so we should still only consider the starts of words. The only time there is an
r
at the start of a word, it is followed by ane
, so our next letter ise
. Note that we ignore ther
at the end ofreeler
. - Now we have two letters, but an order of 3, so like last step we will still only look at the start of words. The only time
re
occurs at the start of a word is inreeler
, so the next letter will be ane
. - Currently, we have
ree
. This is three characters, so we can now look beyond the start of words.ree
occurs once intree
and once inreeler
, so there is a 1⁄2 chance of the word ending and a 1⁄2 chance of the next letter beingl
. Let’s choosel
. - Our word is now
reel
, which is longer than our order of 3, so we only look at the last three letters:eel
.eel
occurs ineels
andreeler
, meaning that there is a 1⁄2 chance of our next letter being ans
and a 1⁄2 chance of the next letter being ane
. Let’s chooses
. - Finally, we look at the last 3 letters of
reels
, which gives usels
. This only occurs at the endeels
, so our word has a 100% chance of ending. - This gives us our final word:
reels
.
Summary
These examples are brief, and just designed to show the concept of Markov chains and how they can be used to randomly generate words. Oolong typically works on inputs of thousands of words, where there are many more possibilities and where manually working out Markov chains would be a huge amount of work.
Some other things worth noting when using Oolong are that you can never have a unique word with fewer letters than the order. Setting the order too high will result in words being either similar or identical to words in the data source, while a low order will produce very random and nonsense words have little resemblance to the input. Finding a good balance may take a few tries, but starting with and order between 3 and 6 will usually produce good results.