# Problem Set 1 Solution

### Solution to Problem 1

1.1 There are ${6 \choose 1}=1$ event with 0 outcomes (also called atomic event), ${6 \choose 1}=6$ events with 1 outcomes, ${6 \choose 2}=15$ events with 2 outcomes, …, ${6 \choose 6}=1$ event with all 6 outcomes. The total number of events are $\sum_{i=0}^{6}{6 \choose i}=2^{6}=64$ events. The events with 2 outcomes are for instance “the outcome is either 1 or 2”.

1.2 The events generated by $X$ are $\left\{ \emptyset,\{\omega:X(\omega)=0\},\{\omega:X(\omega)=1\},\Omega\right\}$.
In other words, they are $\left\{ \emptyset,\{1,2\},\{3,4,5,6\},\{1,2,3,4,5,6\}\right\}$.

The events generated by $Y$ are $\left\{ \emptyset,\{\omega:Y(\omega)=0\},\{\omega:Y(\omega)=1\},\Omega\right\}$. In other words, they are $\left\{ \emptyset,\{2,3,5\},\{1,4,6\},\{1,2,3,4,5,6\}\right\}$.

The events generated by $X$ and $Y$ jointly are the events “generated” by the following 4 disjoint events: $A=\{\omega:X(\omega)=0\&Y(\omega)=0\}=\{2\}$, $B=\{\omega:X(\omega)=0\&Y(\omega)=1\}=\{1\}$, $C=\{\omega:X(\omega)=1\&Y(\omega)=0\}=\{3,5\}$ and $D=\{\omega:X(\omega)=1\&Y(\omega)=1\}=\{4,6\}$. There are $2^{4}=16$ of them: $\emptyset$, $A$, $B$, $C$, $D$, $A\cup B$, $A\cup C$, $A\cup D$, $B\cup C$, $B\cup D$, $C\cup D$, $A\cup B\cup C$, $A\cup B\cup D$, $A\cup C\cup D$, $B\cup C\cup D$, $\Omega$.

Since every outcomes have a equal probability of $\frac{1}{6}$. The probabilities associated with the events can be computed in terms of the number of outcomes in the events. For example $P(A)=\frac{1}{6}$, $P(B)=\frac{1}{6}$, $P(C)=\frac{2}{6}$ and $P(D)=\frac{2}{6}$.

The mean of $X$ is $\mbox{E}X=0\cdot P(X=0)+1\cdot P(X=1)=\frac{4}{6}$. The variance of $X$ is $(0-\mbox{E}X)^{2}\cdot P(X=0)+(1-\mbox{E}X)^{2}\cdot P(X=1)=P(X=0)P(X=1)=\frac{8}{36}$. The mean of $Y$ is $\mbox{E}Y=0\cdot P(Y=0)+1\cdot P(Y=1)=\frac{3}{6}$. The variance of $Y$ is $\frac{9}{36}$.

The events defined by $XY$ are “generated” by $\{\omega:(XY)(\omega)=0\}$ and $\{\omega:(XY)(\omega)=1\}=\{\omega:X(\omega)=1\}\cap\{\omega:Y(\omega)=1\}=\{4,6\}$, with probability $P(\{\omega:(XY)(\omega)=1\})=P(\{4,6\})=\frac{2}{6}$. The mean of $XY$ is hence $\frac{2}{6}$ and the variance is hence $\frac{8}{36}$.

The random variables $X$ and $Y$ are dependent, because $P(X,Y)\ne P(X)P(Y)$, where $(X,Y)\in\{(0,0),(0,1),(1,0),(1,1)\}$.

1.3 Conditional expectations are random variables, and we can work out the events defined by the conditional expectations as we did previously. $\left(\mbox{E}(A|{\cal F}_{X})\right)(\omega)=\sum_{i\in\mbox{range}(X)}A(\omega)P(X(\omega)=i)$.

\begin{eqnarray*}
& & \begin{array}{c|cccccc}
\omega & 1 & 2 & 3 & 4 & 5 & 6\\
\hline I(\omega) & 1 & 2 & 3 & 4 & 5 & 6\\
\left(\mbox{E}(I|{\cal F}_{X})\right)(\omega) & \frac{2}{6} & \frac{2}{6} & \frac{4}{6} & \frac{4}{6} & \frac{4}{6} & \frac{4}{6}\\
\left(\mbox{E}(I|{\cal F}_{X})\right)(\omega) & \frac{3}{6} & \frac{3}{6} & \frac{3}{6} & \frac{3}{6} & \frac{3}{6} & \frac{3}{6}\\
\left(\mbox{E}(I|{\cal F}_{X,Y})\right)(\omega) & \frac{1}{6} & \frac{1}{6} & \frac{2}{6} & \frac{2}{6} & \frac{2}{6} & \frac{2}{6}
\end{array}
\end{eqnarray*}

### Solution to Problem 2

2.1 Let us count the number of success in a unit time, if we conduct Bernoulli trials every $\Delta$ time repeatedly and the probability of success is $\lambda\Delta$. It is a binomial distribution $B(\frac{1}{\Delta},\lambda\Delta)$. The probability of $k$ successes is $p(k;\frac{1}{\Delta},\lambda\Delta)={\frac{1}{\Delta} \choose k}\left(\lambda\Delta\right)^{k}\left(1-\lambda\Delta\right)^{\frac{1}{\Delta}-k}$. Taking the limit
\begin{eqnarray*}
\lim_{\Delta\to0}p(k;\frac{1}{\Delta},\lambda\Delta) & = & \lim_{\Delta\to0}{\frac{1}{\Delta} \choose k}\left(\lambda\Delta\right)^{k}\left(1-\lambda\Delta\right)^{\frac{1}{\Delta}-k}\\
& = & \lim_{\Delta\to0}\frac{\frac{1}{\Delta}(\frac{1}{\Delta}-1)\cdots(\frac{1}{\Delta}-k+1)\cdot\Delta^{k}}{k!}\lambda^{k}\exp(-\lambda)\\
& = & \frac{1}{k!}\lambda^{k}\exp(-\lambda).
\end{eqnarray*}
We get a Poisson distribution with rate $\lambda$.

2.2 Let $X$ have exponential distribution with rate $\lambda$. In other words , $X$ has probability density function $f(x)=\lambda\exp(-\lambda x)$ and cumulative probability function $P(\{X\le x\})=1-\exp(-\lambda x)$. It follows that
\begin{eqnarray*}
P(\{X\ge s+t\}) & = & \exp(-\lambda(s+t))\\
& = & \exp(-\lambda s)\cdot\exp(-\lambda t)\\
& = & P(\{X\ge s\})\cdot P(\{X\ge t\}).
\end{eqnarray*}
Let $X$ have geometric distribution, with cumulative probability function $P(\{X\le k\})=1-(1-p)^{k}$. It follows that
\begin{eqnarray*}
P(\{X\ge s+t\}) & = & (1-p)^{s+t}\\
& = & (1-p)^{s}(1-p)^{t}\\
& = & P(\{X\ge s\})\cdot P(\{X\ge t\}).
\end{eqnarray*}

In fact, the memory property can also be understand in terms of how we sample the time to the next success — We do not refer to the history in the sampling process.

2.3 We first construct the Lagrange function $L=-\sum_{i=1}^{\infty}p_{i}\log p_{i}+\lambda_{0}(\sum_{i=1}^{\infty}p_{i}-1)+\lambda_{1}(\sum_{i=1}^{\infty}i\cdot p_{i}-\mu)$. The Lagrange function is continuous with the parameters $p_{i}$. Taking the partial derivative of the Lagrange function over the parameters, we get
\begin{eqnarray*}
& & \frac{\partial L}{\partial p_{i}}=-1-\log p_{i}+\lambda_{0}+i\lambda_{1}\stackrel{\mbox{set}}{=}0\\
& \Rightarrow & p_{i}=\exp\left(1-\lambda_{0}-i\lambda_{1}\right)=\exp(1-\lambda_{0})\exp(-\lambda_{1}\cdot i).\\
& & \sum_{i=1}^{\infty}p_{i}=\exp(1-\lambda_{0})\sum_{i=1}^{\infty}\exp(-\lambda_{1}\cdot i)\stackrel{\mbox{set}}{=}1\\
& \Rightarrow & \exp(1-\lambda_{0})\frac{\exp(-\lambda_{1})}{1-\exp(-\lambda_{1})}=1\Rightarrow\exp(1-\lambda_{0})=\exp(\lambda_{1})-1\\
& & \sum_{i=1}^{\infty}i\cdot p_{i}\stackrel{\mbox{set}}{=}\left(\exp(\lambda_{1})-1\right)\sum_{i=1}^{\infty}i\cdot\exp(-\lambda_{1}\cdot i)\\
& & =\left(\exp(\lambda_{1})-1\right)\frac{\partial}{\partial\lambda_{1}}\sum_{i=1}^{\infty}\exp(-\lambda_{1}\cdot i)\\
& & =\left(\exp(\lambda_{1})-1\right)\frac{\partial}{\partial\lambda_{1}}\frac{\exp(-\lambda_{1})}{1-\exp(-\lambda_{1})}=-\frac{1}{\exp(\lambda_{1})-1}\stackrel{\mbox{set}}{=}\mu.
\end{eqnarray*}
Hence $p_{i}=(1-\frac{1}{\mu})^{i-1}\frac{1}{\mu}$.

2.4 We first express the cumulative probability function of a Gamma distribution in terms a Poisson distribution $P(X\le x)=1-\sum_{i=0}^{k-1}\frac{(\lambda x)^{k}}{k!}\exp(-\lambda x)$, then take the partial derivative over “time” to get the probability density function: $\frac{d}{dx}P(X\le x)=\frac{\lambda}{(k-1)!}(\lambda x)^{k-1}\exp(-\lambda x)$
.

### Solution to Problem 3

3.1 The moment generating function of a random variable $X$ is $\mbox{M}(t)=\mbox{E}\left(\exp(tX)\right)=\int dx\cdot p(x)\exp(t\cdot x)$. We prove the moment generating property by induction. Its $0^{\mbox{th}}$ order derivative at $t=0$ (i.e., its value at 0) is $M(0)=\int dx\cdot p(x)\exp(0\cdot x)=1$. Its $1^{\mbox{st}}$ order derivative is
\begin{eqnarray*}
\frac{\partial}{\partial t}\mbox{M}(t) & = & \frac{\partial}{\partial t}\int dx\cdot p(x)\exp(t\cdot x)\\
& = & \int dx\cdot p(x)\frac{\partial}{\partial t}\exp(t\cdot x)\mbox{ , with mild regularity condition}\\
& = & \int dx\cdot p(x)\cdot x\exp(t\cdot x).
\end{eqnarray*}
Hence $\frac{\partial}{\partial t}\mbox{M}(t)|_{t=0}=\int dx\cdot p(x)\cdot x\exp(0\cdot x)=\int dx\cdot p(x)\cdot x=\mbox{E}(x)$.
Now suppose $\frac{\partial^{n}}{\partial t^{n}}\mbox{M}(t)=\int dx\cdot p(x)\cdot x^{n}\exp(t\cdot x)$,
it follows that $\frac{\partial^{n}}{\partial t^{n}}\mbox{M}(t)|_{t=0}=\mbox{E}(X^{n})$,
and
\begin{eqnarray*}
\frac{\partial^{n+1}}{\partial t^{n+1}}\mbox{M}(t) & = & \frac{\partial}{\partial t}\left(\frac{\partial^{n}}{\partial t^{n}}\mbox{M}(t)\right)\\
& = & \frac{\partial}{\partial t}\int dx\cdot p(x)\cdot x^{n}\exp(t\cdot x)\\
& = & \int dx\cdot p(x)\cdot x^{n}\frac{\partial}{\partial t}\exp(t\cdot x)\mbox{, with mild regularity condition}\\
& = & \int dx\cdot p(x)\cdot x^{n+1}\exp(t\cdot x).
\end{eqnarray*}
Hence $\frac{\partial^{n+1}}{\partial t^{n+1}}\mbox{M}(t)|_{t=0}=\mbox{E}(X^{n+1})$.

Given random variable $X$ and its moment generating function $\mbox{M}_{X}(t)$, the linear transform of the random variable is $\alpha X+\beta$, where $\alpha$ and $\beta$ are real numbers (or vectors). The moment generating function by definition is
\begin{eqnarray*}
\mbox{M}_{\alpha X+\beta}(t) & = & \mbox{E}\left(\exp\left(t\cdot(\alpha x+\beta)\right)\right)\\
& = & \mbox{E}\left(\exp(\alpha t\cdot x)\exp(t\cdot\beta)\right)\\
& = & \mbox{E}\left(\exp(\alpha t\cdot x)\right)\cdot\exp(t\cdot\beta)\mbox{, linearity of expectation}\\
& = & \mbox{M}_{X}\left(\alpha t\right)\cdot\exp(t\cdot\beta).
\end{eqnarray*}

Given indenpendent random variables $X_{1},\cdots,X_{n}$ with moment generating functions $\mbox{M}_{1}(t),\cdots,\mbox{M}_{n}(t)$. The moment generating function of the sum is
\begin{eqnarray*}
\mbox{M}_{\sum_{i=1}^{n}X_{i}}(t) & = & \mbox{E}\left(\exp(t\cdot\sum_{i=1}^{n}X_{i})\right)\\
& = & \mbox{E}\left(\prod_{i=1}^{n}\exp(t\cdot X_{i})\right)\\
& = & \prod_{i=1}^{n}\mbox{E}\left(\exp(t\cdot X_{i})\right)\mbox{, functions of independent r.v.s are independent}\\
& & \prod_{i=1}^{n}\mbox{M}_{X_{i}}\left(t\right).
\end{eqnarray*}
3.2 The moment generating function of linear transform is $\mbox{M}_{\frac{X_{i}-\mu}{\sigma}}(t)=\mbox{M}_{X_{i}}(\frac{t}{\sigma})\cdot\exp(-\frac{\mu}{\sigma}\cdot t)$.
The moment generating function of the sum is $\mbox{M}_{\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{X_{i}-\mu}{\sigma}}(t)=\exp\left(-\frac{\sqrt{n}\mu}{\sigma}\cdot t\right)\prod_{i=1}^{n}M_{X_{i}}(\frac{t}{\sqrt{n}\sigma})$.

3.3
\begin{eqnarray*}
\mbox{M}_{\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{X_{i}-\mu}{\sigma}}(t) & = & \prod_{i=1}^{n}M_{\frac{X_{i}-\mu}{\sigma}}(\frac{t}{\sqrt{n}})\mbox{, m.g.f. of linear transformation}\\
& = & \prod_{i=1}^{n}\left(1+\frac{t}{\sqrt{n}}\cdot0+\frac{t^{2}}{2!\cdot n}1+\frac{1}{\sqrt{n}}\xi\right)\mbox{, Taylor expansion}\\
& = & \left(1+\frac{t^{2}}{2n}+\frac{1}{\sqrt{n}}\xi\right)^{n},\\
\lim_{n\to\infty}\mbox{M}_{\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{X_{i}-\mu}{\sigma}}(t) & = & \frac{t^{2}}{2}\mbox{, L’Hospital’s Rule.}
\end{eqnarray*}

Because the moment generating functions of the series of random variables converge to that of the standard normal variable, and each moment generating function uniquely , it follows that the series of random variables converges to one with standard normal distribution.