# The Theory of Speculation [11/12]

The Theory of Speculation

The focus of Bachelier’s work (The Theory of Speculation) was to establish the probability law for the price fluctuations that the market admits at this instant. Indeed, while the market does not foresee fluctuations, it considers which of them are more or less probable, and this probability can be evaluated mathematically.

There are innumerable influences on the movements of stock exchange.

Stock Exchange acts upon itself, implies: its current movement is a function not only of earlier fluctuations, but also of the present market position.

The determination of these fluctuations is subject to an infinite number of factors: it is therefore impossible to expect a mathematically exact forecast.

(An excellent translation of the original paper: http://www.radio.goldseek.com/bachelier-thesis-theory-of-speculation-en.pdf)

A few important terms defined

Spot Price: The current price at which a particular security can be bought or sold at a specified time and place. (A securities futures price is the expected value of the security.)

Futures Contract: A contractual agreement to buy or sell a particular commodity or financial instrument at a pre-determined price in the future.

Option: offers the buyer the right, but not the obligation, to buy (call) or sell (put) a security or other financial asset at an agreed-upon price (the strike price) on a specific date (exercise date).

Mathematical advantage: Mathematical expectation does not provide a coefficient representing, in some sense, the intrinsic value of the game, it only logically tells whether a game will be profitable or not, Hence the concept of Mathematical advantage.

We can understand mathematical advantage from the example of a gambler’s odds as explained in the paper by Louis Bachelier: Define the mathematical advantage of a gambler as the ratio of his positive expectation and the arithmetic sum of his positive and negative expectations. Mathematical advantage varies like probability from zero to one, it is equal to 1/2 when the game is fair.

Mathematical expectation of gain / (mathematical expectation of gain + absolution value of mathematical expectation of loss).

The mathematical expectation of the speculator is zero

The General Form of probability curve

The probability that a price y be quoted at a given epoch is a function of y.

The price considered by the market as the most likely is the current true price.

Prices can vary between −xo and +∞: xo being the current absolute price.

It will be assumed that it can vary between −∞ and +∞. The probability of a spread greater than xo being considered a priori entirely negligible.

The probability of a deviation from the true price is independent of the absolute level of this price, and that the probability curve is symmetrical with respect to the true price.

Only relative prices will matter because the origin of the coordinates will always correspond to the current true price.

Probability law of Financial Product

Method 1:

• px,tdx: probability for price to be in the interval [x,x+dx) at time t
px,tdx denotes the probability that, at epoch t, the price is to be found in the elementary interval x, x + dx.

The desired probability is therefore
$p_{x,t_1}p_{z-x,t_2} dx dz$

• Markov property: pz,t1+t2dz = −∞px,t1pz−x,t2dxdz
Because at epoch t1, the price could be located in any the intervals dx between −∞ and +∞, so this is the probability of the price z being quoted at epoch t1 + t2.
• Homogeneous Gaussian process:

$p_{x,t} = {1\over 2\pi k\sqrt t}\exp\left(-{1\over 4\pi k^2 t}\right)$

Where $k=\int_0^\infty p_{x,1} dx$ is the gain at time 1.

Method 2:  Alternate derivation of the Probability Law

• conduct m Bernoulli trials in [0,t), with success rate p

Suppose that two complementary events A and B have the respective probabilities p and q = 1 – p. The probability that, on m occasions, it would produce α equal to A and m − α equal to B, is given by the expression.
${P}(\alpha; m,p) = {m\choose \alpha}p^\alpha q^{m-\alpha}$

• mathematical expectation of gain

$\sum_{h\ge0}\mbox{P}(mp+h;m,p)$
$= pq\cdot\left(\partial_{p}\sum_{h\ge0}\mbox{P}(mp+h;m,p)-\partial_{q}\sum_{h\ge0}\mbox{P}(mp+h;m,p)\right)$
$= \frac{m!}{(mp)!(mq)!}p^{mp}q^{mq}mpq$

The quantity h is called the spread.
Let us seek for the mathematical expectation for a gambler who would receive a sum equal to the spread whenever this spread be positive.
We have just seen that the probability of a spread h is the term from the expansion of (p + q)m in which the exponent of p is mp + h, and that of q is mqh.

To obtain the mathematical expectation corresponding to this term, it is necessary to multiply this probability by h. Now,

h = q(mp + h) − p(mqh)

mp + h and mqh being the exponents of p and q in a term from (p + q)m.

• Apply Stirling’s formula
$n!=e^{-n}n^n\sqrt{2\pi n}$

${E}={dh\over \sqrt{2\pi m p q}}\exp\left(-{h^2\over 2 m p q}\right)$

# Gibbs Sampling and Metropolis-Hastings Sampling [10/06]

Prepared by Aayush Uppal, Madhur Gupta and Nishchita Purushothama Patel

### Gibbs Sampling

#### Introduction to Monte Carlo Sampling

Gibbs Sampling is a Sampling technique, So the first fundamental question is, Why Sampling ?

Gibbs Sampling allows us to sample from a distribution that asymptotically follows P(π|X) (Prior information) without having to explicitly calculate the integrals. Gibbs Sampling is an instance of a Markov Chain Monte Carlo technique. Monte Carlo methods are algorithms that help you obtain a desired value by performing simulations involving probabilistic choices.

Understanding Monte Carlo technique with a simple example to estimate value of $\pi$

Draw a perfect square on the ground. Inscribe a circle in it i.e. the circle and the square are centered in exactly the same place, and the circle’s diameter has length identical to the side of the square. Now take a bag of rice, and scatter the grains uniformly at random inside the square. Finally, count the total number of grains of rice inside the circle (call that C), and inside the square (call that S). Therefore we can say that: $\frac{C}{S} \approx \frac{\pi(\frac{d}{2})^2}{d^2}$. Solving for $\pi$, we get $\pi \approx \frac{4C}{S}$

We just solved a problem by approximating the values of integrals. The more grains of rice we use, the better our approximation will be.

Understanding the Markov Chains aspect

In the above example the distribution was uniform, now we will assume that the distribution is not uniform, also we are interested in $E_{p(x)} [f(x)]$ The expected value is represented as: $E[f(z)] = \int f(z)p(z)dz$.

Conceptually, the integral in equation sums up f(z)p(z) over infinitely many values of z. But rather than touching each point in the sum exactly once, if you sample N points z(0) , z(1) , z(2) , . . . , z(N) at random from the probability density p(z), then.

$E_{p(z)}[f(z)] = \lim_{N\to\infty} \frac{1}{N}\sum \limits_{n=1}^{N} f(z^{(t)})$

Looking at this equation, it’s clear that we can get an approximate value by sampling only a finite number of times, T:

$E_{p(z)}[f(z)] \approx \frac{1}{T}\sum \limits_{t=1}^{T} f(z^{(t)})$

So, now we have a way to approximate the integral. The remaining problem is this: how do we sample $z^{(0)}, z^{(1)}, z^{(2)}, \cdots , z^{(T)}$ according to $p(z)$. For our purposes, though, the key idea is to think of z’s as points in a state space, and find ways to “walk around” the space — going from $z^{(0)}$ to $z^{(1)}$ to $z^{(2)}$ and so forth — so that the likelihood of visiting any point z is proportional to p(z). Here g is a function that makes a probabilistic choice about what state to go to next according to an explicit or implicit transition probability $P_{trans}(z^{(t +1)} | z^{(0)}, z^{(1)}, \cdots , z^{(t )})$

Now what will make it a Markov Chain Monte Carlo technique is defining things so that the next state you visit, $z^{(t+1)}$, depends only on the current state $z^{(t)}$ . i.e:

$P_{trans}(z^{(t+1)}|z^{(0)}, z^{(1)},\cdots,z^{(t)}) = P_{trans}(z^{(t+1)}|z^{(t)})$

The heart of Markov Chain Monte Carlo methods is designing g s o that the probability of visiting a state z will turn out to b e p(z), as desired. Gibbs sampling is one algorithm that meets the conditions.

#### The Gibbs sampling algorithm

Gibbs sampling is applicable in situations where Z has at least two dimensions, i.e. each point z is really z = <z1 , . . . , zk>, with k > 1. The basic idea in Gibbs sampling is that, rather than probabilistically picking the next state all at once, you make a separate probabilistic choice for each of the k dimensions, where each choice depends on the other k-1 dimensions.
$z^{(0)} := <z_1^{(0)},…,z_k^{(0)}>$
for t = 1 to T do
for i = 1 to k do
$z_i^{(t+1)} \approx P( Z_i | z_1^{(t+1)},..,z_{i-1}^{(t+1)} ,z_{i+1}^{(t)}….,,z_{k}^{(t)})$
end for
end for

The distribution we are sampling from by using the definition of conditional probability :

$P( Z_i | z_1^{(t+1)},..,z_{i-1}^{(t+1)} ,z_{i+1}^{(t)}….,,z_{k}^{(t)}) = \frac{P(z_1^{(t+1)},..,z_{i-1}^{(t+1)} ,z_{i}^{(t)},z_{i+1}^{(t)}….,,z_{k}^{(t)})}{P(z_1^{(t+1)},..,z_{i-1}^{(t+1)},z_{i+1}^{(t)}….,,z_{k}^{(t)})}$

#### Gibbs Sampler Stationary Distribution

The Gibbs sampler has been used extensively in the statistics literature. It relies on iteratively sampling from a set compatible conditional distributions and the sampler is known to converge to a Stationary distribution. The Markov Chain generated by Gibbs Sampler converges to the target density as the number of iterations become large. P(X) is the stationary distribution of the Gibbs sampler, i.e. $P(\phi) = \int P(\phi| X)P(X)dX = \int P(X, \phi) dX$. The reason is the following.

\begin{align}
\int P(\phi| X)P(X)dX = & \int d X_1 d X_2 P(\phi_1|X_2) P(\phi_2|\phi_1)P(X_1,X_2) \\
= & P(\phi_2|\phi_1) \int d X_2 P(\phi_1|X_2) \int d X_1 P(X_1,X_2) \\
= & P(\phi_2|\phi_1) \int d X_2 P(\phi_1|X_2) P(X_2) \\
= & P(\phi_2|\phi_1) P(\phi_1) \\
= & p(\phi) \\
\end{align}

#### Detailed Balance

(Principle of detailed balance) Transition probability matrix P and probability distribution $\pi$ are said to be in detailed balance, or equivalently, the principle of detailed balance is said to hold, if: $\pi_i p_{ij} = pi_j p_{ji} \forall_{i,j} \epsilon S$

Ultimately if the detailed balance equations are satisfied, it implies the markov chain is reversible and it has an invariant distribution. Gibbs sampler has detailed balance if it

scans through components in fixed order, then
scans through components in reverse order

\begin{align} P(X)P(\phi|X) = & P(X) \int d X’_1 P(X’_1 |X_2) P(\phi_2|X’_1) P(\phi_1|\phi_2)\\ = & P(X) P(\phi_1|\phi_2) \int d X’_1 P(X’_1 |X_2) P(\phi_2|X’_1)\\ = & P(X_1|X_2)P(\phi_1|\phi_2) \int d X’_1 P(X’_1) P(X_2|X’_1) P(\phi_2|X’_1)\\ = & P(\phi)P(X|\phi) {\rm~by~symmetry} \end{align}

Gibbs Algorithm is a special case of Metropolis-Hastings Algorithm,i.e Metropolis is a generalization for sampling . Metropolis sampling algorithm can draw samples from a distribution of complex random Variable $X$. $X$ could be a stochastic process or e.g. a process of formation of random network model or whole social diffusion process.

### Metropolis-Hastings Sampling

In order to work with Metropolis-Hastings algorithm our only requirement is that we should be able to write down the $P(X)$ up to some scaling factor.

The Metropolis algorithm first proposes a possible new state $X^{*}$ in the Markov chain, based on a previous state $X^{(t-1)}$, according to the proposal distribution $q(X^{*} | X^{(t-1)})$. The algorithm accepts or rejects the proposed state based on the density of the the target distribution $P(X)$ evaluated at $X^{*}$ .

One constraint of the Metropolis sampler is that the proposal distribution $q(X^{*} | X^{(t-1)})$ must be symmetric. The constraint originates from using a Markov Chain to draw samples: a necessary condition for drawing from a Markov chain’s stationary distribution is that at any given point in time t, the probability of moving from $X^{(t-1)}$ $\rightarrow$ $X^{(t)}$ must be equal to the probability of moving from $X^{(t-1)}$ $\rightarrow$ $X^{(t)}$ , a condition known as reversibility or detailed balance.

Metropolis-Hastings algorithm implements an additional correction factor \textit{c}, defined from the proposal distribution as
$$c = \frac{q(X^{t-1} | X^{(*)})}{q(X^{*} | X^{(t-1)})}$$

The correction factor adjusts the transition operator to ensure that probability of moving from $q(X^{*} | X^{(t-1)})$ is equal to the probability of moving from $q(X^{*} | X^{(t-1)})$, no matter the proposal distribution.

The Metropolis-Hastings algorithm is implemented with essentially the same procedure as the Metropolis sampler, except that the correction factor is used in the evaluation of acceptance probability $\alpha$. Specifically, to draw $M$ samples using the Metropolis-Hastings sampler:

1. set $t$ = 0, generate an initial state $X^{(0)}$ $\sim$ $\pi^{(0)}$
2. repeat until $t$ = $M$
3. set $t$ = $t$+1
4. generate a proposal state $X^{*}$ from $q(X | X^{(t-1)})$
5. calculate the proposal correction factor $c = \frac{q(X^{t-1} | X^{(*)})}{q(X^{*} | X^{(t-1)})}$
6. calculate the acceptance probability $\alpha = \min \left(1,\frac{P(X^{*})}{P(X^{(t-1})}\times c \right )$
7. draw a random number $u$ from $Unif(0,1)$
1. if $u \leq \alpha$ accept the proposal state $X^{*}$ and set $X^{(t)} = X^{*}$
2. else set $X^{(t)} = X^{(t-1)}$

Many consider the Metropolis-Hastings algorithm to be a generalization of the Metropolis algorithm. This is because when the proposal distribution is symmetric, the correction factor is equal to one, giving the transition operator for the Metropolis sampler.

#### Metropolis-Hastings Sampling — Example

mh.gamma <- function(n.sims, start, burnin, cand.sd, shape, rate) {
draws <- start
theta.update <- function(theta.cur, shape, rate) {
theta.can <- rnorm(1, mean = theta.cur, sd = cand.sd)
accept.prob <- dgamma(theta.can, shape = shape, rate = rate)/dgamma(theta.cur, shape = shape, rate = rate)
if (runif(1) <= accept.prob) theta.can else theta.cur
}
for (i in 2:n.sims) draws = c(draws, theta.update(draws[i-1], shape = shape, rate = rate))
return(draws[(burnin + 1):n.sims])
}
mh.draws <- mh.gamma(10000, start = 1, burnin = 1000, cand.sd = 2, shape = 2, rate = 4)
plot(density(mh.draws))
lines(seq(0,2,length.out = 100), dgamma(seq(0,2,length.out = 100), shape=2, rate=4), col='red')
legend('topright',lty=1, col=c('black','red'), legend=c('M-H','gamma(shape=2,rate=4)'))


#### Detailed Balance of Metropolis-Hastings algorithm

$P(\phi|X) = q(X,\phi)\alpha(X, \phi)$, if $X\ne\phi$

$P(X|X) = 1-\int d\phi q(X,\phi)\alpha(X,\phi)$

Hence
\begin{align}
& P({Y\le\phi}|X) = \int_{-\infty}^\phi dY q(X,Y)\alpha(X,Y) + 1(\phi\ge X)\left(1-\int dY q(X,Y)\alpha(X,Y)\right)\\
\Rightarrow & p(X, \phi) = {d P({Y\le\phi}|X) \over d\phi} = q(X,\phi)\alpha(X,\phi) + \delta(X-\phi)\left(1- \int dY q(X,Y)\alpha(X,Y)\right)\\
\Rightarrow & P(X)p(X,\phi) = P(X) q(X,\phi) \min \left(1,{P(\phi) q(\phi, X) \over P(X) q(X, \phi)} \right) \\
& +\delta(X-\phi)\left(P(X)-\int dY P(X) q(X,Y) \min \left(1,{P(Y) q(Y, X) \over P(X) q(X, Y)} \right) \right)\\
\Rightarrow & P(X)p(X,\phi) = \min \left(P(X) q(X,\phi),P(\phi) q(\phi, X) \right)
+\delta(X-\phi)\left(P(X)-\int\min dY \left(P(X) q(X,Y), P(Y) q(Y, X) \right) \right)\\
& =P(\phi)p(\phi, X) {\rm~by~symmetry}
\end{align}

### Gibbs Sampling from Hidden Markov Models

A hidden markov model is defined in terms of the following 3 latent states: $X(t)\in \{1,\cdots,k\}$, State transition kernel $p_{ij} = P(X_{t+1}=j|X_t=i)$ and the initial state $X_0$, and Observation model $P(Y_t|X_t)$

One simple way to sample from the hidden Markov model is to iterate through every sample $X_t$ for $t = 1,\cdots,T$. Here we fix everything else and write down the probability of $X_t$ as given below:

\begin{align*} & P(X_t | X_1,\cdots,X_{t-1},X_{t+1},\cdots,X_T,Y_1,\cdots,Y_T)\\ =& (P(X_1,\cdots,X_T,Y_1,\cdots,Y_T))⁄(\sum_{X_t} P(X_1,\cdots,X_T,Y_1,\cdots,Y_T))\\ \end{align*}

However in order to make an efficient sampling algorithm it has to satisfy the following 2 conditions:

• Sample points should diffuse quickly inside the sample space.
• The equilibrium distribution of the sample should be very similar to the distribution that is desired.

However the problem with the above sampling algorithms is that the points will diffuse slowly. This is because we fix the previous values and the next values at every state. This means although the above algorithm works, it takes infinite amount of time for it to ultimately mix well to be able to explore the whole probability space. To avoid this, we use the following algorithm:

Forward Filtering Backward Sampling Algorithm

1. We first conduct the forward sweep and extract the forward statistic $\alpha_t(X_t)= P(X_t|Y_1,\cdots,Y_t)$ for $t=1,2,\cdots$
2. Next step is the backward sampling stage. Here, we sample $X_T$ first and so on until $X_0$

\begin{align*} & P(X_{t-1}│X_t,Y_1,\cdots,Y_T)\\ =& \frac{P(X_{t-1},X_t,Y_1,\cdots,Y_T)}{P(X_t,Y_1,\cdots,Y_T)}\\ =& \frac{P(X_{t-1},Y_1,\cdots,Y_{t-1})P(X_t|X_{t-1})P(Y_t,\cdots,Y_T|X_t)}{\sum_{X_{t-1}} P(X_{t-1},Y_1,\cdots,Y_{t-1})P(X_t|X_{t-1})P(Y_t,\cdots,Y_T|X_t)}\\ =& \frac{\alpha_{t-1}(X_{t-1})P(X_t|X_{t-1})P(Y_t|X_t)\beta_t(x_t)}{\alpha_{t-1}(X_{t-1})P(X_t|X_{t-1})P(Y_t|X_t)\beta_t(x_t)} \end{align*}