# Markov Chains in Oolong

Oolong uses Markov chains to randomly generate words. This is the same probability model that many predictive services are based on, such as the word prediction on your phone’s keyboard. The underlying concept of Markov chains is simple: the probability of every possible next value can be determined from the current value, but this section will go more into detail about how Oolong uses Markov chains. You can also generate words step-by-step in Oolong using the “Next Step” button if you’d like a better understanding of the process.

## Example 1

First, consider a very simple case: an Markov chain built using a two input words with an order of 1, meaning to only look at the last letter when deciding the next letter. In this example, let’s use the words `hello`

and `oolong`

. If you want to try this in Oolong make sure to set your order to 1 and your data type to “Words”, then type `hello oolong`

into the text input and select the “Update from Text” button.

- One word starts with an
`h`

, while the other starts with an`o`

, so there is a 1⁄2 chance of choosing an`h`

first, and a 1⁄2 chance of choosing an`o`

. For this example, we’ll start with`h`

. - Every time there’s an
`h`

, it is followed by an`e`

, so there is a 100% chance that the next letter in our word is an`e`

. - Our current word is
`he`

, but since this is a Markov chain with an order of 1 we only look at the last letter. Every`e`

is followed by an`l`

, so the next letter in our word is`l`

. - There are three
`l`

s in our input: one is followed by another`l`

while the other two are followed by an`o`

, so there is a 1⁄3 chance of the next letter being an`l`

and a 2⁄3 chance of the next letter being an`o`

. We’ll go with the more likely option and choose`o`

. - Again, there are three
`o`

s in our input: one is followed by another`o`

, one is followed by an`n`

, and the word ends after the other one. This gives a 1⁄3 chance of the next letter being`o`

, a 1⁄3 chance to be`n`

, and a 1⁄3 chance for the word to end here. Let’s choose`n`

. - Every
`n`

is followed by a`g`

, so the next letter will be`g`

. - Every time there is a
`g`

, the word ends, so there is a 100% chance that our word will end. - This gives us our final word:
`helong`

.

## Example 2

Let’s try another example, this time with an order of 3 using the words `tree`

, `eels`

, and `reeler`

. Remember to set the order to 3 and the data type to “Words” if you want to step through this in Oolong, then type `tree eels reeler`

into the text input and select “Update from Text”.

- There is a 1⁄3 chance to start with each of
`t`

,`e`

, or`r`

. Let’s pick`r`

for this example. - The order is 3, but we only have one letter in our word, so we should still only consider the starts of words. The only time there is an
`r`

at the start of a word, it is followed by an`e`

, so our next letter is`e`

. Note that we ignore the`r`

at the end of`reeler`

. - Now we have two letters, but an order of 3, so like last step we will still only look at the start of words. The only time
`re`

occurs at the start of a word is in`reeler`

, so the next letter will be an`e`

. - Currently, we have
`ree`

. This is three characters, so we can now look beyond the start of words.`ree`

occurs once in`tree`

and once in`reeler`

, so there is a 1⁄2 chance of the word ending and a 1⁄2 chance of the next letter being`l`

. Let’s choose`l`

. - Our word is now
`reel`

, which is longer than our order of 3, so we only look at the last three letters:`eel`

.`eel`

occurs in`eels`

and`reeler`

, meaning that there is a 1⁄2 chance of our next letter being an`s`

and a 1⁄2 chance of the next letter being an`e`

. Let’s choose`s`

. - Finally, we look at the last 3 letters of
`reels`

, which gives us`els`

. This only occurs at the end`eels`

, so our word has a 100% chance of ending. - This gives us our final word:
`reels`

.

## Summary

These examples are brief, and just designed to show the concept of Markov chains and how they can be used to randomly generate words. Oolong typically works on inputs of thousands of words, where there are many more possibilities and where manually working out Markov chains would be a huge amount of work.

Some other things worth noting when using Oolong are that you can never have a unique word with fewer letters than the order. Setting the order too high will result in words being either similar or identical to words in the data source, while a low order will produce very random and nonsense words have little resemblance to the input. Finding a good balance may take a few tries, but starting with and order between 3 and 6 will usually produce good results.