# Concepts from Probability Theory [09/03/2015]

In this lecture we will review some concepts from probability theory. Since we are using probability theory, and in particular we’re using the concept of stochastic process to understand our complex social systems. I think it is better that we can reach some common ground about the concepts of probability theory. Today we’ll begin to involve some math so please bare with me about that. The most important thing is that we use math to do different things, to do important things. My understanding is that there are different people doing different things. There are mathematicians, and there are physicists, and there are engineers.

All people do math but they do math in different ways. If we think about how mathematicians do math, they treat math as a string of symbols. What they do is, they set up some actions and they set up some rules of derivation then they foil the rules then they get pages after pages of symbols and that’s it. That’s how mathematicians do math. They don’t care about the interpretation because once they interpret this then it’s metamath, it’s not longer math. You may ask why they want to do that, just play with the symbols just like artists play with their brush. The reason is that by doing this, they will no longer be constrained by our real world so they’re in a virtual world. Although it’s not very easy, they gain new freedom in their thought experiment in doing new things because in the real world, they are constrained by forces and by physical laws. But in a virtual world defined by the actions and by the laws, they no longer have such constraints so they gain new freedom although it is not easy.

There are also physicists. Physicists also work with math but physicists work with math in a different way. A physicist is always very interested about the intuition about the math laws and the theorems. Whenever they have something, they ask, “What is the intuition and can we visualize that?” What is the force and what is acceleration? The emphasis of physicists is they want to find the intuition, they want to use math to assist them in their understanding of physical laws. This is their way of working with math.

Engineers work still in a very different way. Engineers tend to solve problems with math. This is what we do. One thing that I can say is that whenever we work with math, it’s always that we just build things. I think the most important thing is that we understand how we build those conclusions with math and I will show you the way of building the conclusions. First, it’s nothing that is not understandable. I think that we’re more interested in using math to solve our problems.

### Sample Space, Events and Field of Events

We have heard a lot about probability as random events. Let’s first have some common ground on what is sample space and event, what is a field of events, what is expectation and conditional expectation.

Let’s throw a dice twice. What is the sample space? A sample space is the space of all outcomes ($\Omega$). From the sample space, we might outcome $(1,1)$, meaning we get a one from the first throw and get another one from the first throw, and so on and so forth. The size of the sample space is $6\times 6=36$.

If we continue to throw a dice then we get more and more information about the sequence.

• Before we throw the dice, the only events that we can talk about is the set of all possible outcomes or its complement the empty set ($(\emptyset, \Omega)$).
• After we throw the dice once, we get some information — the information about the first throw: We could get 1, 2, 3, 4, 5 or 6, which could be an even or odd number, or a prime or composite number. Those are all examples of events about the first throw. How large will it be the set of events from the first throw? — It is $2^6$: The event could include 1 or exclude 1, and it could include 2 or exclude 2, and so on. Different inclusions and exclusions give different events. If we consider whether or not this event include 1 then we have two choices. Then if we consider whether the event include 2, we have another two choices. In total, we will have $2^6$ different choices. Those are all different events. The set of events after the first throw includes the set before the first throw. (Verify it!)
• When we throw this dice a second time, we can talk about the outcome of not only the first throw but also the second throw. The set of events after the first throw is a subset of the events of the second throw, which is a subset of the events in the third throw, and so on. (Verify it!) This is how, as we conduct more and more experiments, we’ll gain more and more information about this system and we will have larger and larger sets of events.

The sample space is $\Omega$. An event is any subset of $\Omega$. An atomic event is a event that cannot be subdivided, a compound event is a union of atomic events, the complement of an event is  the case that the event doesn’t happen. For example, after the first throw, the atomic events are all the numbers we could read from the dice: $\{1,2,3,4,5,6\}$, a even number is a compound event and its complement is an odd number. The set of all events, which is a set of sets, form a field. We call something a field if the following conditions are satisfied.

• The empty sets should belong to the field.
• The sample space $\Omega$ should belong to the field.
• If we have two events $A$ and $B$, the union of the two events (either A or B), the intersection of the events (both A and B), and the subtraction (A but not B) are also events. For example, one event could be that the outcome is an odd number. Another event could be that the outcome is a prime number, then we can talk about odd prime number, odd or prime number, and a number that is odd and is not prime.

### Random Variables

Up to now, we have defined a sample space, atomic events, compound events, and field of events. We will proceed to define random variables or functions defined on a field of events.

In the experiment of throwing the dice twice, we can define a function from our sample space which  $\{1,\cdots,6\}\times \{1,\cdots,6\}$, it’s a tuple. We can define a function on this sample space, meaning that we can assign numerical values to different outcomes. An outcome could be $(1,1)$, for example and then we set a function that is the sum of the throws $1+1=2$. Or we could define a characteristic function which equals 1 if the number yielded by the second throw is 3 times the number yielded by the first throw, and 0 otherwise. There are many different ways to define a function on a sample space of finite number of outcomes.

A random variable defines a field. Let us inspect a random variable $X: \{1,2,3,4,5,6\}\to\{0,1\}$ which is the characteristic function for an outcome to be an even number $X(\omega)=(\omega+1) \mod 2$. So, for $\omega$ taking 1,2,3,4,5,6 an $X(\omega)$ takes 0,1,0,1,0,1. What is the field defined by $X$? According to our definition of a field, we have to add in empty set $\emptyset$. We have to add in the sample space $\{1,2,3,4,5,6\}$. We have to add in event $\{X=0\}=\{\omega: X(\omega)=0\}$. The outcomes 1,3 and 5 belong to this set. We have to add in event $\{X=1\}$, which contains the outcomes 2,4, and 6. We can verify that this is the field generated by X: The empty set and the world set $\Omega$ belong to the set of sets. For any event $A$ and $B$, $A \cup B$, $A\cap B$ and $A\\ B$ also belong to a set.

In addition, there are many ways to define a field on the sample space, a random variable may be on field set or maybe not on this field. For example, we have a field of events defined by $X(\omega)$ — that $\omega$ is a odd number or a even number. Is random variable $Y(\omega)=\omega$ defined on the field generated by $X$? It is not because the atomic events ($\{1,2,3,4,5,6\}$) are not events generated by $X$. The field generated by $Y$ includes and is larger than the field generated by $X$. If we look at the field subset relationship in the other direction, we find that the events generated by X, which is $\{X=0\}$, $\{X=1\}$, the whole set $\Omega$, and the empty set $\emptyset$, all belong to the events generated by Y. In this incidence we say that random variable $X$ is $Y$ measurable or is the random variable defined on Y.

For another example, we can define a random variable as the sum of two throws. This sum defines a field generated by events $\{X_1+X_2+=2\},\cdots,\{X_1+X_2+=12\}$. (Verify this!)

It is a little complicated to define a random variable as a function defined on a field rather than to directly talk about atomic events and assign a probability measure. It is actually necessary if we want to deal with complicated randomness in reality. For example  the world economics as our sample space is too complex to be modeled. A random variable X such as “whether the stock price of Apple will rise or fall tomorrow” could be much simpler and manageable. X is a map from the very complicated world into a much simpler one, and X does not capture all randomness about the world economics. To see that X doesn’t capture all randomness in the world of economics, we can define a random variable Y, which is “whether the funding in my retirement account is going to rise or fall tomorrow”, and which is likely independent of random variable $X$ and not defined on the field generated by $X$. In contrast if we confine ourselves to only work with the values of X as our sample space, we cannot describe something as complex as world economics. In addition since X captures all randomness that we know of, we cannot defined a random variable $Y$ from the sample space defined by the value of $X$ that is independent of $X$. That is the reason why we bother to first define omega which is sample space and then define a random variable X which maps this sample space into a simpler space. We can start from some very complex world and we chip something out to study which is simple and manageable.

### Probability, Expectation & Conditional Expectation

A probability measure is positive numbers assigned to the atomic events in a sample space. The sum of the probabilities assigned to the atomic events should sum up to 1 because we do not want to leave out something.

Let’s assign equal probability the 36 possible outcomes from two throws of a dice. The probability for each outcome is 1/36. Now that we have assigned probability to all of the atomic events, we can talk about the probability assigned to compound events, which is the sum of the probabilities assigned to its elements. For example, what is the probability that the sum of the two throws is 2? It is 1/36 because the only possible atomic event to generate a 2 is $(1,1)$. Similarly the probability for the sum of the two throws to be 3 is 2/36, and the probability for the sum to be 4 is 3/36.

After we have assigned probability to the sample space, we can estimate the expectation and variance of a random variable. (A random variable is a function that assigns a value to an outcome of a experiment.) What is the probability mass function of $X$, that is the sum of two throws, supposing that every one of the 36 outcomes are equally likely? The outcome of the sum of the two throws is from 2 to 12 (where 12 corresponds to $(6, 6)$), with each outcome $x$ assigned a probability that is the number of occurances of $x$ in the following table divided by 36. The sum of the probabilities sum up to 1.

$$\begin{array}{c|cccccc} + & 1 & 2 & 3 & 4 & 5 & 6\\ \hline 1 & 2 & 3 & 4 & 5 & 6 & 7\\ 2 & 3 & 4 & 5 & 6 & 7 & 8\\ 3 & 4 & 5 & 6 & 7 & 8 & 9\\ 4 & 5 & 6 & 7 & 8 & 9 & 10\\ 5 & 6 & 7 & 8 & 9 & 10 & 11\\ 6 & 7 & 8 & 9 & 10 & 11 & 12 \end{array}$$
What is the expectation of X? The expectation of this random variable is the weighted sum over all possible outcomes of X, where the weight is the probability ($E(X)==\sum_\omega X(\omega)P(\omega)=\sum_x x\cdot p(x)$). If we sum these together, we get 7. What is the variance of X? The variance of X is the weighted square of the difference between X to the average value of X, where the weight is the probability($\mbox{Var}(X)=\sum_x (x-E(X))^2\cdot p(x)$). We can show that $\mbox{Var}(X)=E(X^2)-(E(X))^2$.

Similarly, if we define a random variable $g\circ X$ by function composition, where random variable X is already a function. For example, if X is identity function $X(\omega)=\omega$, $g\circ X(\omega)=2\times\omega$ for $g(X)=2\times X$, and$g\circ X(\omega)=\omega^2$ for $g(X)=X^2$. We can estimate the expectation of function composition $E(g\circ X) = \sum_\omega g(X(\omega))p(\omega)$, where $\omega$ takes atomic events in the sample space. There is a simpler way to estimate expectation: $g\circ X=\sum_x g(x) p(x)$, because the field generated by $X$ is a subfield of the field generated by all $\omega\in \Omega$.

Now we talk about conditional expectationconditional probability and the conditional independence. Recall that we assigned probability 1/36 to each of those 36 events. From this probability assignment it follows that the probability for the first throw and the probability for the second throw are independent. Here is the reason. We know that $P((m,n))={1\over 36}$, $P((m,\bullet))={1\over 6}$ and $P((\bullet,n))={1\over 6}$ by our setup. Hence $P((m,n))=P((m,\bullet))\times P((\bullet,n))$. This means that given the result of the first throw does not help us much to make inference about what will be the result of the second throw. That’s the first throw and the second throw as independent based on our assignment of 1/36 to the atomic events.

For another example, conditioned on that is the sum of the two throws is 3, the probability of the first throw to yield 1 is 1/2 and the probability of the first throw to yield 2 is also 1/2. The reason is that we only have two cases that we can yield a sum of 3, which is $(1,2)$ and $(2,1)$.

Below are several important properties of expectation and variance:

• Expectation of linear combination Let $X_i$ be random variables, then $\bf E(a_0+a_1 X_1+\dots + a_n X_n) = a_0 + a_1\bf E(X_1) + \dots + a_n \bf E(X_n)$. (Why?)
• Expectation of independent product Let $X$ and $Y$ be independent random variables, then $\bf E(X Y) = \bf E(x)\bf E(Y)$. (Why?)
• Law of total expectation Let $X$ be integrable random variable (i.e., $\bf E(|X|)<\infty$) and $Y$ be random variable, then $\bf E(X) = \bf E_Y(\bf E_{X|Y}(X|Y))$. (Why?)
• Variance of linear combination Let $X_i$ be random variables, then $\bf{Var}(a_0+a_1 X_1+\dots + a_n X_n) = a_0^2 + a_1^2\bf{Var}(X_1) + \dots + a_n^2 \bf{Var}(X_n)$
• $\bf{Var}(X) = \bf E(X^2) – {(\bf E(x))}^2$