Stochastic Processes

5.1 Definition and properties of Markov chains

Citation:

Markov chains are stochastic models that describe systems transitioning between states over time. They're defined by a state space, transition probabilities, and the memoryless property, making them powerful tools for analyzing complex systems with probabilistic behavior.

This topic explores the key components and properties of Markov chains, including state space, transition probabilities, and classifications. It also delves into long-term behavior, first passage times, and various applications in fields like physics, biology, and computer science.

Definition of Markov chains

A Markov chain is a stochastic process that models a system transitioning between different states over time, where the probability of transitioning to a particular state depends only on the current state and not on the past states
Markov chains are widely used in various fields, including physics, biology, economics, and computer science, to model and analyze complex systems with probabilistic behavior
The key components of a Markov chain include the state space, transition probabilities, initial state distribution, and the memoryless property

State space in Markov chains

The state space of a Markov chain is the set of all possible states that the system can be in at any given time
Each state represents a unique configuration or condition of the system (e.g., the number of customers in a queue or the health status of a patient)
The state space can be discrete (finite or countably infinite) or continuous (uncountably infinite)
Example: In a weather model, the state space could be {sunny, cloudy, rainy}

Transition probabilities

Transition probabilities describe the likelihood of moving from one state to another in a single time step
The transition probability from state $i$ to state $j$ is denoted as $P(X_{t+1} = j | X_t = i)$ , where $X_t$ represents the state at time $t$
Transition probabilities are typically represented in a transition matrix $P$ , where $P_{ij}$ is the probability of transitioning from state $i$ to state $j$
Example: In a two-state Markov chain with states {0, 1}, the transition matrix could be $P = \begin{bmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{bmatrix}$

Initial state distribution

The initial state distribution, denoted as $\pi_0$ , represents the probability of the system starting in each state at time $t=0$
It is a probability vector that sums to 1, where $\pi_0(i)$ is the probability of starting in state $i$
The initial state distribution can be used to compute the probability distribution of the system at any future time step
Example: In a three-state Markov chain, the initial state distribution could be $\pi_0 = [0.2, 0.5, 0.3]$

Memoryless property of Markov chains

The memoryless property, also known as the Markov property, is a fundamental characteristic of Markov chains
It states that the probability of transitioning to a particular state depends only on the current state and not on the history of previous states
Mathematically, $P(X_{t+1} = j | X_t = i, X_{t-1} = k, ...) = P(X_{t+1} = j | X_t = i)$
The memoryless property simplifies the analysis and computation of Markov chains by reducing the dependence on past states

Classifications of Markov chains

Markov chains can be classified based on various properties, such as the nature of time (discrete or continuous), time-homogeneity, state space size, and structural properties like irreducibility and aperiodicity
Understanding these classifications helps in selecting appropriate models and analysis techniques for different applications
The classifications also impact the long-term behavior and convergence properties of Markov chains

Discrete-time vs continuous-time

Discrete-time Markov chains (DTMCs) evolve over discrete time steps, where state transitions occur at fixed intervals (e.g., daily, weekly, or yearly)
Continuous-time Markov chains (CTMCs) evolve over continuous time, where state transitions can occur at any time point
DTMCs are characterized by transition probabilities, while CTMCs are characterized by transition rates
Example: A DTMC could model the daily weather, while a CTMC could model the time until a machine failure

Time-homogeneous vs time-inhomogeneous

Time-homogeneous Markov chains have transition probabilities that remain constant over time, i.e., $P(X_{t+1} = j | X_t = i)$ is independent of $t$
Time-inhomogeneous Markov chains have transition probabilities that vary with time, i.e., $P(X_{t+1} = j | X_t = i)$ depends on $t$
Time-homogeneous Markov chains are easier to analyze and have well-established theory for long-term behavior
Example: A time-homogeneous Markov chain could model customer preferences, while a time-inhomogeneous Markov chain could model seasonal variations in sales

Finite vs infinite state space

Markov chains with a finite state space have a countable number of states (e.g., {1, 2, ..., n})
Markov chains with an infinite state space have an uncountable number of states (e.g., the set of real numbers)
Finite state space Markov chains are more common in applications and have simpler analysis techniques
Example: A finite state space Markov chain could model the number of customers in a queue, while an infinite state space Markov chain could model the position of a particle in a continuous space

Irreducible Markov chains

An irreducible Markov chain is one where it is possible to reach any state from any other state in a finite number of steps
In an irreducible Markov chain, there are no absorbing states or closed subsets of states
Irreducibility is a desirable property for many applications as it ensures that the long-term behavior of the chain is independent of the initial state
Example: A social network where users can navigate from any user's profile to any other user's profile through a series of connections is an irreducible Markov chain

Aperiodic Markov chains

A state in a Markov chain is aperiodic if the greatest common divisor of the return times to that state is 1
A Markov chain is aperiodic if all its states are aperiodic
Aperiodicity is important for the convergence of Markov chains to a unique stationary distribution
Example: A Markov chain modeling the outcomes of a fair coin flip (heads or tails) is aperiodic, while a Markov chain modeling the positions of a knight on a chessboard is periodic

Long-term behavior of Markov chains

The long-term behavior of Markov chains refers to the probability distribution of states as the number of time steps approaches infinity
Understanding the long-term behavior is crucial for making predictions, assessing the stability of the system, and designing control strategies
Key concepts in the long-term behavior of Markov chains include stationary distribution, convergence, and ergodicity

Stationary distribution

A stationary distribution, denoted as $\pi$ , is a probability distribution over the states that remains unchanged under the transition matrix $P$
Mathematically, $\pi P = \pi$ , where $\pi$ is a row vector
If a Markov chain has a stationary distribution, the long-term probability of being in each state converges to the stationary distribution, regardless of the initial state distribution
Example: In a two-state Markov chain with transition matrix $P = \begin{bmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{bmatrix}$ , the stationary distribution is $\pi = [0.57, 0.43]$

Existence and uniqueness of stationary distribution

Not all Markov chains have a stationary distribution, and some may have multiple stationary distributions
For a finite-state, irreducible, and aperiodic Markov chain, there exists a unique stationary distribution
The existence and uniqueness of the stationary distribution can be determined using the Perron-Frobenius theorem for non-negative matrices
Example: A Markov chain modeling the weather (sunny, cloudy, rainy) with positive transition probabilities between all states has a unique stationary distribution

Convergence to stationary distribution

If a Markov chain has a unique stationary distribution $\pi$ , the probability distribution of states converges to $\pi$ as the number of time steps approaches infinity
The rate of convergence depends on the eigenvalues of the transition matrix $P$
The convergence to the stationary distribution is important for making long-term predictions and understanding the steady-state behavior of the system
Example: In a Markov chain modeling the spread of a disease, the stationary distribution represents the endemic equilibrium, where the disease persists at a constant level in the population

Ergodicity in Markov chains

A Markov chain is ergodic if it is irreducible, aperiodic, and has a unique stationary distribution
Ergodicity implies that the long-term behavior of the chain is independent of the initial state distribution
In an ergodic Markov chain, the time average of a function of the states converges to the ensemble average with respect to the stationary distribution
Example: A Markov chain modeling the stock prices of a company is ergodic if the long-term average return converges to a constant value, regardless of the initial stock price

First passage times

First passage times are important quantities in Markov chain analysis that describe the time it takes for the chain to reach a specific state or set of states for the first time
The study of first passage times helps in understanding the transient behavior of Markov chains and in designing optimal control strategies
Key concepts related to first passage times include the definition, expected values, and absorption probabilities

Definition of first passage time

The first passage time from state $i$ to state $j$ , denoted as $T_{ij}$ , is the random variable representing the number of steps it takes for the chain to reach state $j$ for the first time, starting from state $i$
Mathematically, $T_{ij} = \min\{n \geq 1 : X_n = j | X_0 = i\}$ , where $X_n$ is the state at time $n$
The first passage time can be defined for a single state or a set of states
Example: In a Markov chain modeling a game, the first passage time from the starting state to the winning state represents the number of moves required to win the game

Expected first passage times

The expected first passage time from state $i$ to state $j$ , denoted as $E[T_{ij}]$ , is the average number of steps it takes for the chain to reach state $j$ for the first time, starting from state $i$
Expected first passage times can be computed using the fundamental matrix of the Markov chain
The fundamental matrix, denoted as $N$ , is defined as $N = (I - Q)^{-1}$ , where $I$ is the identity matrix and $Q$ is the submatrix of the transition matrix $P$ corresponding to the transient states
Example: In a Markov chain modeling a queue, the expected first passage time from an empty queue to a full queue represents the average time it takes for the queue to fill up

Absorption probabilities

Absorption probabilities are related to first passage times in Markov chains with absorbing states
An absorbing state is a state that, once entered, cannot be left (i.e., the transition probability from the state to itself is 1)
The absorption probability from state $i$ to absorbing state $j$ , denoted as $a_{ij}$ , is the probability that the chain, starting from state $i$ , will eventually be absorbed in state $j$
Absorption probabilities can be computed using the fundamental matrix $N$ as $A = NR$ , where $A$ is the matrix of absorption probabilities and $R$ is the submatrix of the transition matrix $P$ corresponding to the transitions from transient states to absorbing states
Example: In a Markov chain modeling the stages of a disease, the absorption probabilities represent the likelihood of a patient reaching a terminal stage, given their current stage

Applications of Markov chains

Markov chains have a wide range of applications in various fields, including physics, biology, economics, and computer science
They are particularly useful for modeling systems with probabilistic transitions between states and for making predictions about the long-term behavior of the system
Some common applications of Markov chains include modeling, decision-making, and hidden state inference

Modeling with Markov chains

Markov chains can be used to model various real-world systems, such as weather patterns, customer behavior, or the spread of diseases
By capturing the essential features of the system in terms of states and transition probabilities, Markov chains provide a concise and tractable representation of the system dynamics
Markov chain models can be used for simulation, prediction, and sensitivity analysis
Example: A Markov chain can model the customer journey on an e-commerce website, with states representing different pages (home page, product page, cart, checkout) and transition probabilities capturing the likelihood of moving between pages

Markov decision processes

Markov decision processes (MDPs) extend Markov chains by incorporating actions and rewards
In an MDP, at each state, the decision-maker chooses an action, which influences the transition probabilities and generates a reward
The goal is to find a policy (a mapping from states to actions) that maximizes the expected cumulative reward over a finite or infinite horizon
MDPs are widely used in reinforcement learning, robotics, and operations research for sequential decision-making problems
Example: An MDP can model a robot navigating a maze, with states representing the robot's position, actions representing the movement directions, and rewards capturing the goal of reaching the target while avoiding obstacles

Hidden Markov models

Hidden Markov models (HMMs) are a class of probabilistic models where the system being modeled is assumed to be a Markov chain with unobserved (hidden) states
In an HMM, the states are not directly observable, but they emit observations according to a probability distribution
The goal is to infer the most likely sequence of hidden states given the observed sequence of emissions
HMMs are commonly used in speech recognition, bioinformatics, and natural language processing for sequence analysis and pattern recognition
Example: An HMM can model the part-of-speech tagging problem in natural language processing, where the hidden states represent the part-of-speech tags (noun, verb, adjective) and the observations are the words in a sentence

Table of Contents

🔀stochastic processes review