# Hidden Markov Model: A Comprehensive Overview (2021)

**Introduction**

You may be wondering what a Hidden Markov Model (HMM) is. Well, this model is a global branch in the world of Machine Learning. It helps solve real-life problems, including Natural Language Processing (NLP) problems, Time Series, and many more. You will be studying many concepts that fall under this topic, Hidden Markov Model. In this blog, we will explore the definition of Hidden Markov Model, applications of Hidden Markov Model, and much more.

**Hidden Markov Model (HMM)****Definition of Hidden Markov Model (HMM)****OBSERVATIONS****Terminology in HMM (Annotations)****Likelihood Computation: The Forward Algorithm****Decoding: The Viterbi Algorithm****Hidden Markov Model Advantages and Disadvantages****Applications of Hidden Markov Model (HMM)**

## 1. Hidden Markov Model (HMM)

Before delving into what the Hidden Markov Model is, let’s understand the Markov Chain.

A Markov Chain is a model or a type of random process that explains the probabilities of sequences of random variables, commonly known as states. Each of the states can take values from some set. In other words, we can explain it as the probability of being in a state, which depends on the previous state. We use the Markov Chain when we need to calculate the probability for a sequence of observable events. However, in most cases, the chain is hidden or invisible, and each state randomly generates 1 out of every *k* observations visible to us. Now, we will define the Hidden Markov Model.

## 2. Definition of Hidden Markov Model (HMM)

Here comes the definition of Hidden Markov Model*: *The Hidden Markov Model (HMM) is an analytical Model where the system being modeled is considered a Markov process with hidden or unobserved states. Machine learning and pattern recognition applications, like gesture recognition & speech handwriting, are applications of the Hidden Markov Model.

HMM, Hidden Markov Model enables us to speak about observed or visible events and hidden events in our probabilistic model. Here is an example of the weather prediction, as discussed in the Markov Chains:

## 3. OBSERVATIONS

An observation is termed as the data which is known and can be observed. The below diagram depicts the interaction between two ‘HIDDEN’ states, ‘Rainy’ and ‘Sunny’ in this case. ‘Walk’, ‘Shop’, and ‘Clean’ in the below diagram are known as data, referred to as OBSERVATIONS.

Source link: https://medium.com/@kangeugine/hidden-markov-model-7681c22f5b9

Markov Chain – three states (snow, rain, and sunshine)

P – the transition probability matrix

q – the initial probabilities

The above example represents the invisible Markov Chain; for instance, we are at home and cannot see the weather. However, we can feel the temperature inside the rooms at home. There will be two possible observations, hot and cold, where:

P(Hot|Snow) = 0, P(Cold|Snow) = 1

P(Hot|Rain) = 0.2, P(Cold|Rain) = 0.8

P(Hot|Sunshine) = 0.7, P(Cold|Sunshine) = 0.3

**Example:** **HMM **to compute the probability of whether we will feel cold for two consecutive days; there are 3*3=9 possibilities or options for the underlying Markov states in these two days.

P((Cold, Cold), P(Rain, Snow)) = P((Cold, Cold)|(Rain, Snow)).P(Rain, Snow) = P(Cold|Rain).P(Cold|Snow).P(Snow|Rain).P(Rain) = 0.8 . 1 . 0.1 . 0.2 = 0.016

The probability will be calculated as the sum of all the possibilities.

## 4. Terminology in HMM (Annotations)

The Hidden Markov Model annotations are listed below:

- T = length of the observation sequence
- N = number of states in the model
- M = number of observation symbols
- Q = {q0,q1,….,qN-1} = distinct states of the Markov process
- V = {0,1,….,M – 1} = set of possible observations
- A = state transition probabilities
- B = observation probability matrix
- Π = initial state distribution
- O = (O0,O1,….,OT-1) = observation sequence

**Now, with the HMM, what are some key problems to solve?**Three key problems characterize the Hidden Markov Model:

**Problem 1 (Likelihood):**Given a known HMM model, λ = (A, B) and an observation sequence O, determine the likelihood of the sequence O happening, P(O|λ).**Problem 2 (Decoding):**Given an HMM model, λ = (A, B) and an observation sequence O, determine the best or optimal hidden state sequence.**Problem 3 (Learning):**Given an observation sequence O and a set of hidden states in the HMM, learn the parameters A and B while determining the optimal model maximizing the probability of O.

## 5. Likelihood Computation: The Forward Algorithm

Our first problem is to compute or figure out the likelihood of a particular observation sequence.

**Computing Likelihood:** Given a known HMM Hidden Markov Model, λ = (A, B) and an observation sequence O, determine the likelihood of sequence O happening, P(O|λ)

Two methods or algorithms exist to calculate the likelihood, namely, Forward and Backward Algorithms. The Forward Algorithm contains three steps:

**1.** **Initialization**

Source link: https://medium.com/@kangeugine/hidden-markov-model-7681c22f5b9

We multiply the initial probability of state i with the emission probability b of the state, given the observable O at time 1, to calculate the first forward variable, which is a part of the above given Forward Algorithm Initialization Equation.

**2. Recursion: **Computing the forward variable of Sunny state at time 2 by adding the answers of two multiplications:

The preceding forward variable of the earlier state ‘Sunny’ = 0.24

Sunny to Sunny, transition probability = 0.8

Sunny to Clean, emission probability = 0.1

Multiplying,

0.24 * 0.8 * 0.1 = 0.0192

The previous forward variable of the previous state ‘Rainy’ = 0.12

Rainy to Sunny, transition probability = 0.4

Sunny to Clean, emission probability = 0.1

Multiplying,

0.12 * 0.4 * 0.1 = 0.0048

Source link: https://medium.com/@kangeugine/hidden-markov-model-7681c22f5b9

The total summation of the results into a forward variable, alpha = 0.0192 + 0.0048 = 0.024

Source link: https://medium.com/@Ayra_Lux/hidden-markov-models-part-1-the-likelihood-problem-8dd1066a784e

**3. Termination: **The probability of an observation sequence O derived from the HMM model λ, the sum of all the variables at time T.

Source link: https://medium.com/@kangeugine/hidden-markov-model-7681c22f5b9

The final equation will be: P(O|*λ*) = 0.0028512 + 0.0003048 = 0.003156

## 6. Decoding: The Viterbi Algorithm

The Viterbi Algorithm uses dynamic programming to solve the second problem. Dynamic programming involves breaking down a complex problem into simpler sub-problems using a recursive approach. It includes the steps of Initialization, Recursion, and Termination to find the sequence of the hidden states.

Source link: https://medium.com/@Ayra_Lux/hidden-markov-models-part-2-the-decoding-problem-c628ba474e69

**1. Initialization: **We multiply the initial probability of state i with the emission probability to observe O from state i at time t = 1.

Source link: https://medium.com/@kangeugine/hidden-markov-model-7681c22f5b9

In the case of Rainy State, 0.4 * 0.2 = 0.08

In the case of Sunny State, 0.6 * 0.2 = 0.12

**2. Recursion: **We find the maximum value among all the product results and assign it to the Viterbi variable. The previous Viterbi variable of the state i is multiplied with the transition probability from state i to j, times the emission probability to observation O from state j.

Source link: https://medium.com/@kangeugine/hidden-markov-model-7681c22f5b9

Selecting the maximum value of the two product results:

The previous Viterbi variable of the previous state, ‘Sunny’ = 0.12,

Sunny to Sunny, transition probability = 0.8

Sunny to clean, emission probability = 0.1

Multiplying,

0.12 * 0.8 * 0.1 = 0.0096

The previous Viterbi variable of the previous state, ‘Rainy’ = 0.08

Rainy to Sunny, transition probability = 0.4

Sunny to Clean, emission probability = 0.1

Multiplying,

0.08 * 0.4 * 0.1 = 0.0032

Source link: https://medium.com/@Ayra_Lux/hidden-markov-models-part-2-the-decoding-problem-c628ba474e69

**3. Termination: **The following equation depicts the probability of the complete state sequence up to point T + 1, so we find the greatest value among all Viterbi variables at time T.

0.00082944 > 0.00015552 => P = 0.00082944

**HMM Training: The Forward-Backward Algorithm**

The Forward-Backward Algorithm, also known as the Baum-Welch Algorithm, is a dynamic programming approach to tune the parameters of HMM. There are four phases in the algorithm, including the initial phase, the forward phase, the backward phase, and the update phase.

The following recursive function is calculated in the forward phase.

Source link: https://medium.com/@kangeugine/hidden-markov-model-7681c22f5b9

A few points worth remembering in the Forward Phase are:

- The alpha function refers to the joint probability of the observed data and state at the time k.
- The alpha function appears as a first term of the RHS equation. Therefore, it’s a recursive function.
- The second term present in the RHS is the state transition probability from A. The last term, on the other hand, is the emission probability from B.

The formula for the backward phase is given as follows:

Source link: https://medium.com/@kangeugine/hidden-markov-model-7681c22f5b9

The next concept is the advantages and disadvantages of the Hidden Markov Model.

## 7. Hidden Markov Model Advantages and Disadvantages

HMM Hidden Markov Model has become a very prominent mathematical and graphical representation for appliances. Here’s an analysis of the advantages and disadvantages of Hidden Markov Model:

**Advantages**

- HMM is an analyzed probabilistic graphical model. The algorithms applied in this model are studied for approximate learning and conclusion.
- Hidden Markov Models (HMM) are said to acquire the contingency between successive measurements, as defined in the switch continuity principle.
- HMMs represent the variance of appliances’ power demands via probability distributions.

**Disadvantages**

- HMM cannot represent any dependency between the appliances. The conditional HMM can capture the dependencies, though.
- HMM does not consider the state sequence dominating any given state because of its Markovian nature.
- HMMs do not explicitly capture the time in a specified state due to their Markovian behavior. Nonetheless, the hidden semi-Markov model is responsible for capturing that kind of behavior.

Yet another question arises at this point- Where is the Hidden Markov Model used?

## 8. Applications of Hidden Markov Model (HMM)

Here, we will go through the concept of where is the Hidden Markov Model used in the different areas of Computational Biology:

**Pairwise Sequence Alignment:**Aligning two sequences based on a common similarity between them to deduce functional similarity is referred to as Pairwise Sequence Alignment. The parameters are estimated using a unique training method, and the alignment model is extended to allow multiple parameter sets, all of which get selected using HMM.**Genomic Annotation:**Computational Genomic Annotation, in general, includes structural annotation for genes and other functional elements and functional annotations for assigning functions to the predicted functional elements. The computational approach for gene recognition brings together a large amount of diverse information.

**Conclusion**

We can conclude and summarise the following points for the HMM as discussed in the above sections including, what is Hidden Markov Model (HMM), where is the Hidden Markov Model used, and others.

- The data visible to us is the observational data and not the data fetched from the states.
- Using the Forward Algorithm, we can find the conditional distribution over the hidden states.
- Using the Viterbi Algorithm, we can find the sequence of hidden states in the form of a Viterbi path.
- The forward and the backward phase formulas in the Baum-Welch algorithm reveal the expected hidden states with the help of the given observed data.

If you are interested in making a career in the Data Science domain, our 11-month in-person **Postgraduate Certificate Diploma in Data Science **course can help you immensely in becoming a successful Data Science professional.

## Also Read