Markov Chains

Khaled Nakhleh included in Stochastic Processes

2021-11-20 863 words 5 minutes

Contents

The document covers both Markov chains in both the discrete and continuous time models.

1. Preamble

A Markov chain is described by,

State space $\mathcal{S}$: Set of all possible states $s$ that the chain can be in.
Transition Kernel\Matrix $\mathbf{P}$: A two-dimensional matrix that stores the probability of going from state $i$ to another state in the state space $j$. State space $\mathcal{S}$ cardinality is $|\mathcal{S}| = N$. The probability is denoted by, $$p_{ij} := Pr(s_{t+1} = j | s_t = i), i,j \in \mathcal{S}$$ where $t$ is the timestep counter for a *discrete* Markov chain.

$$ \mathbf{P} = \begin{bmatrix} p_{11} & p_{12} & \cdots & p_{1N}\\ p_{22} & p_{22} & \ddots & p_{2N}\\ p_{N1} & p_{N2} & \cdots & p_{NN} \end{bmatrix} $$

Note that the row vectors give transition probabilities for a state $i$, which combined with the normalization axiom must give $\sum_{j=1}^{N} p_{ij} = 1 \hspace{1em} \forall i \in \mathcal{S}$.

A self-transition $p_{ii}$ gives the probability of remaining in the same state at the next timestep.

Markov Property

The current transition at timestep $t$ is independent from the history $\mathcal{H}$, where it does not depend on previous timestep transitions $t-1, t-2, \cdots, t=0$. Formally, the probability of transitioning to state $j$ is given by,

$$ p_{ij} = $$ $$ P(s_{t+1} = j | s_t = i_t, s_{t-1} = i_{t-1}, \cdots, s_{t=0} = i_{t=0}) $$ $$ = P(s_{t+1} = j | s_t = i_t) $$

Transition is said to be memoryless for all trajectories' enumerations $i_{t-1}, i_{t-2}, \cdots, i_{t=0}$.

Important note
we are not restricted to only use information given at the current timestep in transitioning. It is possible to perserve the Markov property by designing the state $s_t$ to include past information up to a window $W_t = [t-1, t-2, \cdots, t-|W_t|]$. One popular example is presented in the DQN paper ¹, where the authors used the past three frames as part of the current state. The DQN agent made its decision based on a state constructed from fresh and past information.

State Classifications

2. Discrete-Time Markov Chains (DTMCs)

Let $t = 0,1,\cdots, T$ be the timestep with horizon $T$. A Markov chain transition based on $\mathbf{P}$ in each $t$.

3. Continuous-Time Markov Chains (CTMCs)

The continuous-time equivalent builds on top of the discrete version, but where a defined timeslots exists in the discrete version, continuous-time take the limit over timeslot interval. The continuous-time version is evaluated for integrals over time.

A stochastic process $X(t)$ is a CTMC if:

Timesteps are $t \in \mathbb{R}$.
Has state spacec $X(t) \in \mathcal{S}$, with $\mathcal{S}$ being a countable set ($|\mathcal{S}|$ either finite or infinite).
Holds the Markov property, $$ Pr(X(t+s) | X(u), u \leq s) $$ $$ = Pr(X(t+s) | X(s)) $$

Meaning that the conditional probability depends only on the current state $X(s)$.

Assumption 1
Non-explosivness For a finite time interval $\delta > 0$, the chain transitions to a finite number of states.

Definitions

Time-homogenous CTMC: if transitions probabilities $Pr(X(t+s) | X(s))$ are independent of time $s$, then the CTMC is time-homogenous.

Transitions in a CTMC are defined as jumps, with the state $Y(k)$ being the state after $k$ jumps. The time interval between the $(k-1)^{th}$ and $k^{th}$ jumps is defined as $T_k$. $T_k$ is an exponentially distributed random variable that depends only on the $Y(k-1)$ state. We define the time spent in state $i$ at time $t$ as $\gamma_i(t)$, $$\gamma_i(t) := inf\{s > 0 : X(t+s) \neq X(t) \text{ and } X(t) = i\}$$

$\gamma_i(t)$ is an exponentially distributed random variable if the CTMC is time-homogenous. Denote $\frac{1}{q_i}$ as the mean time spent in state $i \in \mathcal{S}$.

Stationary Distribution $\pi$ of CTMCs

Theorem 1
CTMCs with finite state space $\mathcal{S}$ and is irreducible has a stationary distribution $\pi$ and $\lim\limits_{t \rightarrow \infty} p(t) = \pi \text{ } \forall \text{ } p(0)$. The stationary distribution may not necessarily be unique.

The irreducability condition is not enough to ensure a stationary distribution for infinite state spaces. A stationary distribution may still exist for infinite state spaces.

In a CTMC, the states can be categorized as recurrent or transient (same as in DTMCs), but using different time intervals. A state $i$ is recurrent if, $$ \lim\limits_{T \rightarrow \infty} Pr \{ \tau_i < T \} = 1 $$

With the intervals $\tau_i$ and $\gamma_i$ for state $i$ defined as, $$ \tau_i := inf \{ t > \gamma_i : X(t) = i \text{ and } X(0) = i \} $$

$$ \gamma_i := inf \{t > 0 : X(t) \neq i \text{ and } X(0) = i \} $$

If the above condition is not satisfied, then the state $i$ is transient.

Global and Local Balance Equations

Foster-Lyapunov for CTMCs

With the same goal as in DTMCs, of proving positive-recurrency for a Markov chain, the Foster-Lyapunov theorem can be extended to the continuous-time domain. This is another method of providing a sufficient condition for positive recurrency.

Theorem 2
For an irreducible, non-explosive CTMC, if a function $V : \mathcal{S} \rightarrow \mathbb{R}^{+}$ exists such that:

$\sum\limits_{j \neq i} Q_{ij}(V(j) - V(i)) \leq - \epsilon$ if $i \in \beta^{c}$.

$\sum\limits_{j \neq i} Q_{ij} (V(j) - V(i)) \leq M$ if $i \in \beta$.

Reversability

Deep-Q Network (DQN) paper link ↩︎