Prepared by Hao Fan, Manu Kothegal Shivakumar and Pritika Mehta

In this lecture, we will specific talk about how can we combine agent-based modeling on one hand and vehicle tracking data on other hand to make inference about the states of the roads dynamically.

The resources for us to understand our cities are not only the publication of papers or the books, more importantly are the models, computational tools, and visualization tools. Listed as follow are the open source simulation software.

• MATSim
• TRANSIMS — Transportation Analysis Simulation System
• DynaMIT & MITSIMLab — DynaMIT & MITSIMLab is developed by the civil engineering department at MIT, supported by the department of transportation.
• UrbanSim — UrbanSim is a tool to simulate the migration of people within a city and to simulate the interaction of different industries as well as people. It is pretty useful for people working with cities and working on social dynamics at large-scale.
• SimCity open source — SimCity is a famous computer game, and there are several versions of it. The first version of it was open-sourced, and there is also a java version.

If we work with real world cities, there can be two major problems. The first one is how can we combine those tools to make real time prediction about what will happen, the second one is how can we bridge the gap between the data that we can collect and the data that we will really find useful, and justify that our bridge can lead to convincing conclusions about our cities. In such cases, we need visualization tools to help display our models.

• OpenLayers — OpenLayers is a set of JavaScript to bring tiles of a map from map server.
• MapServer — MapServer is read in the geographical features such as roads, rivers, the boundaries of countries or administrative areas at the server side and then serve those tiles.
• Google Analytics — Google Analytics can help generate all types of dynamic charts about many different things.
• Google Maps API — After we get simulation and visualization tools, the next step is to find useful data. Below are some resources of data that can help us understand the cities.

Census Data

• Demographic & Health Surveys: The Demographics & Health Surveys contains data set created by people from the US about many different countries, it includes information such as the latitudes and longitudes of the different villages of the survey is and surveys such as the annual income, the average education, the birthrates and the death rate and other things.
• US Cenesus Bureau: US Cenesus Bureau contains comprehensive US census data, includes road segments, household statistics, travel patterns and so on.
• Senegal: National Agency of Statistics and Demography (ANSD) ANSD is data for development collaboration, contains it contains additional information about epidemics, employment and many other things.
• Open government initiative

Street Map & Yellow Pages

• Open Street Map: Open Street Map data is especially good in the US and in Europe, but it is also good at many other parts of the world.
• Yellow Pages

Call Detail Records

Data for Development of Ivory Coast and Senegal contains flow of population and the interaction of population in different countries. It can also used to track the spreading of epidemic diseases at country site, or make predictions about the economics.

Vehicle Tracking

• Niagara Frontiers Transportation Authority Developer Tools
• MTA Bus Time: There are many city governments developed some web API for people to extract the locations of the individual vehicles of different public transportation fleet, above are the web API of transportation at Buffalo and New York City.
• NYCT&L Taxi Data: NYCT&L Taxi Data is taxi data with information of latitudes and longitudes of all yellow cabs at Manhattan for over three years at a temporal resolution of one sample per minute. It can also help us understand whether there is a traffic, accident, or other things happening in the city.
• Cabspotting — Cabspotting is a data set that tracks the vehicles at San Francisco.
• mPat sample data
• Mobile Millennium

Mobile Millennium is a data set about the dynamics in the San Francisco Bay Area, including San Francisco, San Jose, Berkeley and many other cities.

In a paper “A survey of results on mobile phone datasets analysis” from Vincent D. Blondel, Adeline Decuyper and Gautier Krings, some advances made recently in the study of mobile phone datasets are reviewed.

Models of the behavior captured by mobile phone data sets

• Diffusion of information and diseases
• Trajectory as Levy flight, exploration vs exploration, very few habitats, predictability of next location, amount of commuting between two counties, neighborhood signatures revealed by calling patterns, movement and geographic information propagation in response to emergency
• Probability of making a phone call, duration of a phone vs distance between the two ends of this phone call, social/ethnic/historical partitioning revealed by phone calls, economic development vs diversity
• Duration of being in phone-contact, burstiness of inter phone call intervals
• Power law degree distribution, clustering/small world (maximizing utility), symmetry, community detection/ dynamics of small and large communities, inferring gender and age from call pattern

Applications of mobile phone data sets

• Urban sensing (o/d matrix, flow of people, classifying trips)
• Tracking diseases
• Viral marketing

In the Transportation Modeling, we want to identify a transportation simulator as event driven model and assign probability to different several paths of this event driven model. Based on this type of ideas if we identify the observation as the observation of the system, then we can think of simulator as hidden Markov process (although this kind of hidden Markov process is pretty complicated) and make inference about what’s happening about this hidden Markov process in terms of the observations.

In order to do the simulation and inference, several things are needed.

• Network file: information about roads The road network makes it a really a network. The nodes represent the road intersections, the edges are either the locations (such as buildings) or the road segments. There are road segments from one intersection to another intersection, and at these intersection a vehicle needs to make a decision whether it should go through one branch or another.
• Plans/population file: trips of all individuals between home, work, school etc. by car, walk, bus etc. Things actually determines the dynamic is the plans of people, the individuals. Such as their locations, places they go to on a typical day, the characteristics of the trips, and the times they plan those trips. The plans of the populations determine the dynamics because from the plans and the populations we can find out the state transition matrix
• Facility file: open time, close time and capacity of buildings
• Configuration file: choice to simulate traffic on a road, replan trips, score trips, generate output, etc.

The paper “A large-scale agent-based traffic microsimulation based on queue model” from Nurhan Cetin, Adrian Burri and Kai Nagela provide us a specific case study in terms of how the events are scheduled and how the events are registered against the timer and how we up to date the state of the road segments as well as the states of the agents in response to the different individual events.

• Each road has length, free speed and capacity.
• If number of cars ahead is less than capacity, a car can move to the next segment with free speed.
• Otherwise, the car wait (at a queue) until all cars move out.

Another paper “Generating complete all-day activity plans with genetic algorithms” from David Charypar and Kai Nagel provide us a dynamic model in terms of different people choose different trips every day.

• Change trip starting time
• Change route
• Change mode (car, bus, walk)
• Change destination

In this model, people change routes, change the travel mode and change the destination, and the road segments responds to the same demands in different ways at different times according to weather and many other things.

In event driven model we define set of events and events will change the state of the system in different ways. If a sample path of events with corresponding time is given we can find the probabilities.

$p\circ l_1\to p\circ l_2$
This represents one event where person p goes from one location $l_1$ to another location $l_2$.
The state of the system is defined in terms of the location of a person at time t and the number of vehicles on road segment A at time t.
$X^{(p)}_t$ is location of person $p$ in time t
$X^{(l)}_t$ is number of vehicles on road $l$ at time t

If road segment has a lot of vehicles then this vehicle will stay on the road segment A for longer time. If a road segment has a lot of vehicles then the driver might decide to go off to one branch instead of another branch. So the state transition matrix is actually a function of the state of this system. Anyway, we can identify this system in terms of just one event and then we can get the states of the event and from the states we can proceed to get the probability measure of different sample paths.

One way we can calculate probabilities is by using mean field approximation, a projection based method. Based on the model that we defined we will be able to find out the probability for us to make a transition from location $l_1$ to another location $l_2$, conditioned on our observations.

The posterior probability of event $pl_1\to pl_2$ and latent states $x_{t-1,t}^{(l)}$ , $x_{t-1,t}^{(p)}$ conditioned on observations is

\begin{align*}
& P(pl_{1}\to pl_{2},\{x_{t-1,t}^{(l)}:l\},\{x_{t-1,t}^{(p)},p\}|\mbox{obs})\\
& =\prod_{l}\alpha_{t-1}^{(l)}(x_{t-1}^{(l)})\cdot\prod_{p’}\alpha_{t-1}^{(p’)}(x_{t-1}^{(p’)})\\
& \cdot p_{l_{1},l_{2}}\cdot\delta(x_{t-1}^{(p)}=l_{1})\\
& \cdot\beta_{t}^{(l_{1})}(x_{t}^{(l_{1})})\delta(x_{t}^{(l_{1})}=x_{t-1}^{(l_{1})}-1)P(\mbox{obs}_{t}^{(l_{1})}|x_{t}^{(l_{1})})\\
& \cdot\beta_{t}^{(l_{2})}(x_{t}^{(l_{2})})\delta(x_{t}^{(l_{2})}=x_{t-1}^{(l_{2})}+1)P(\mbox{obs}_{t}^{(l_{2})}|x_{t}^{(l_{2})})\\
& \cdot\prod_{l\ne l_{1},l_{2}}\beta_{t}^{(l)}(x_{t}^{(l)})\delta(x_{t}^{(l)}=x_{t-1}^{(l)})P(\mbox{obs}_{t}^{(l)}|x_{t}^{(l)})\\
& \cdot\beta_{t}^{(p)}(x_{t}^{(p)})\delta(x_{t}^{(p)}=l_{2})P(\mbox{obs}_{t}^{(p)}|x_{t}^{(p)})\prod_{p’\ne p}\beta_{t-1}^{(p’)}(x_{t}^{(p’)})\delta(x_{t}^{(p’)}=x_{t-1}^{(p’)})P(\mbox{obs}_{t}^{(p’)}|x_{t}^{(p’)}).
\end{align*}

We can also estimate the probability that we do not have any event in a $\epsilon$ time (very small fraction of time)
The posterior probability of no event and latent states $x_{t-1,t}^{(l)}$ , $x_{t-1,t}^{(p)}$ conditioned on observations is
\begin{align*}
& P(\emptyset,\{x_{t-1,t}^{(l)}:l\},\{x_{t-1,t}^{(p)},p\}|\mbox{obs})\\
& =\prod_{l}\alpha_{t-1}^{(l)}(x_{t-1}^{(l)})\cdot\prod_{p’}\alpha_{t-1}^{(p’)}(x_{t-1}^{(p’)})\\
& \cdot\left(1-\sum_{p’:\mbox{person}}\sum_{\mbox{event }p’l_{1}\to p’l_{2}}p_{l_{1},l_{2}}\cdot\delta(x_{t-1}^{(p’)}=l_{1})\right)\\
& \cdot\prod_{l}\beta_{t}^{(l)}(x_{t}^{(l)})\delta(x_{t}^{(l)}=x_{t-1}^{(l)})P(\mbox{obs}_{t}^{(l)}|x_{t}^{(l)})\cdot\prod_{p’}\beta_{t-1}^{(p’)}(x_{t}^{(p’)})\delta(x_{t}^{(p’)}=x_{t-1}^{(p’)})P(\mbox{obs}_{t}^{(p’)}|x_{t}^{(p’)}).
\end{align*}

The state space is Combinatorial state space and it is big. For example, if we want to model San Francisco we normally have about sixty thousand road segments and if we want the simulation about a city to be realistic we need to have about three hundred thousand vehicles which is approximately 1% of the vehicles running in San Francisco for a whole day. The state space is going to be very big so we have to either use Markov Chain Monte Carlo or Variational method.

There are two ways to make inference using variational method. One is through mean field approximation meaning that the vehicles move around and the road segments change state according to mean field effects. This means that we will project all other things, all other states into just this one individual or one road segment. The problem becomes more tractable.

# Opinion Dynamics Case study: Tracking Contact-Induced disease at individual level with mobile phones [10/22]

Prepared by Ankit Sharma, Karthik Kiran and Mohit Arora

The idea is to combine our theories and models with big data sets to obtain inference about them and perform interesting analysis.

Cell phones have become an important platform for the understanding of social dynamics and influence, because of their pervasiveness, sensing capabilities, and computational power. Many applications have emerged in recent years in mobile health, mobile banking, location based services, media democracy, and social movements. With these new capabilities, we can potentially be able to identify exact points and times of infection for diseases, determine who most influences us to gain weight or become healthier, know exactly how information flows among employees and productivity emerges in our work spaces, and understand how rumors spread.

There are many data sets that track communities which are collected from mobile phones. There are many applications to track user behavior and generate very large data sets. The privacy of the user has to be taken care of according to the standards.

These data sets are inter related with one another forming a network of data sets.

Some of the Open source datasets are:

• Nokia Mobile Dataset – Sensors embedded in mobile phones and other wireless devices are used to collect large quantities of continuous data pertaining to the behavior of individuals and social networks.
• Reality Commons – Contains the dynamic of several communities of about 100 people each which were collected in the MIT Human Dynamics Lab.
• MERL (Mitsubishi Electronic Research Lab)Motion detector data set – Data set collected with printed circuit boards worn by the researchers of the MERL for nearly a year which gives the traces in the life of a research laboratory. There were also anchor nodes installed at different corners of a floor. From those kinds of things we can use localization algorithm to find out the exact locations of the workers, for the workers to finish the task, so where have they been, with which other persons they have been interacting and observations of similar kinds.
• Dartmouth Student Life Dataset – The whole Student Life dataset is in one big file. This contains all the sensor data, EMA (ecological momentary assessments) data, survey responses and educational data.
• Telecom Italia Big Data Challenge – The dataset provides information about the telecommunication activity over different places.
• CRAWDAD – Crawdad is a community resource for archiving wireless data at Dartmouth containing many such data sets about networks. Those sensor networks may not necessarily be collected with mobile phones, could be collected another such as embedded devices or printer circuit boards. The data set includes data reflecting interaction among students, students in dormitories, workers etc.

In the data set where mobile phones collect data based on blue tooth proximity, mobile phones can scan another phone within the range which is an indication of proximity. Information about nearby access points within range of the mobile phones are also retrieved in the process. By knowing the location of wifi access points, the location of people can be determined. By knowing the building in which the wifi access point is present, the activity of students can be determined.

Above is a snapshot of blue tooth proximity in the data set. People are grouped into clusters and there are multiple clusters. This is a snapshot of proximity in the data among a group of 100 people who know one another.

The second plot shows the locations of students in one week. The y axis is indexed by hour of a week. The x axis is indexed by different locations representing different wifi access points.

It can be seen that the thick yellow line represents the time when the students are in the dormitory and the other yellow lines represent the classrooms. This gives an overall picture of behavior of students and their interaction.

From the above plots, it can be seen that there is evidence about diffusion in the network. Relationship between activity, proximity and friendship can be obtained.

For each person, distribution of the different access points is plot. There are multiple distributions obtained when data is obtained for many people and the correlation of this distribution gives the behavior and other characteristics of people.

In the third plot, ordering of correlation of non friends and correlation of friends are plotted against each other. What we see here is that when the correlation of friends becomes about 40, the correlation of non-friends is still about 20%. This means that friends have higher correlation about their visit of location. This means that if we know that two persons are friends then we could be able to predict the behavior of one person from the behavior of another person. This means that from the sensor data we can actually be able to predict the behavior of people through their friendship.

In the second plot, the aerobic activity of person A and person B are plotted. From the plot it can be seen that, if A and B are friends then number of aerobic exercises per week of A can be determined by number of aerobic exercises per week of B as they as correlated. So from this dataset, behaviors of friend pairs like physical activity can also be determined. Friends converge on type of behavior.

Another piece of information we have is result from cold and flu surveys. Daily reports are take n from students about cold and flu. First plot is report of fever and second is for running nose (common cold).

Red mark indicates that the student has caught fever/cold on the particular day represented by X-axis.

The count of students catching fever is much lesser than students catching cold. Black marks indicate not reported data which the students might have not been able to report on some days.

What we can do from this data is to how can we track this gradient of cold and flu at the individual level from those symptoms report and from the proximity network. The report generated is for a smaller population and task is to generate models which can infer from this data and apply the results to a larger population.

In order to do that infectious susceptible model can be used.

Agent-based Susceptible-Infectious-Susceptible model:

$$S+I\to 2I \mbox{ with rate } \beta\\ I\to S \mbox{ with rate } \gamma$$

This model has two events. One event is called infection and another event is called recovery. Here, S represents a susceptible individual and I represent an infectious individual. Following are the softwares that can be used to simulate an epidemic spreading dynamics:

1. Influenza Epidemics Simulation Model: This is actually a model to simulate the spreading of epidemic Influenza over the globe. The input data includes the number of passengers go from one airport to another airport globally, as well as the local movement on highways and on railroads.
2. Spatial Temporal Epidemiological Modeler (STEM): It is developed in Java. It also has the capability to simulate the dynamics of epidemics globally but most of the examples are at the city level.

SIS dynamics:

Input:

• $G=(N,E) \mbox{: dynamic network} \\ N=\{1,\dots,\mbox{number of people}\} \mbox{: people} \\ E=\{(n_1,n_2,t):n_1\mbox{ is near }n_2\mbox{ at time }t\} \mbox{ : “nearby” relation}$
• $\alpha\sim\mbox{beta}(a,b) \mbox{: probability of infection from outside,} \\ \beta\sim\mbox{beta}(a’,b’) \mbox{: probability of infection from within the network,} \\ \gamma\sim\mbox(a”,b”) \mbox{: probability of recovery.}$

After we have defined the probability for each sample path, we will be able to make inferences.

• $P(Y|X):$ how symptoms are dependent on common cold and flu state.

The output of this is in terms of a matrix, X and Y. X is the latent state. X can take two states, which is either zero, meaning a person is susceptible, and one meaning a person is infected or infectious.

Output:

• A matrix structure $\{X_{n,t},Y_{n,t}:n,t\}$ indexed by time t and node n. Latent state $X_{n,t}$ of node n at time t is either 0 (susceptible) or 1 (infected). Symptom $Y_{n,t}$ is probabilistically dependent on the state $X_{n,t}$.
• A collection of events R drives change in state.

We also have symptoms, which is a vector. Include whether a person has fever, has runny nose, has diarrhea, is stressed, is sad or something like that.

Now we want to understand how this infection spreads. This simulation model works in the following way.

• Sample parameters according to beta distributions. $\alpha\sim\mbox{beta}(a,b) , \beta\sim\mbox{beta}(a’,b’) and \gamma\sim\mbox(a”,b”)$
• $X_{n,1}=0,\forall n\in N$ : all people are susceptible at t=1

After we fix those three parameters, we work with a Markov process:

This means that we first sample the state of all people at time 1. After we have the state of people at the beginning of our simulation, we iterate over time. At each time we will randomly sample the probability for infectious people to be recovered with the parameter gamma. If we have an edge that connects an infectious person and a susceptible person, we will sample the probability of this infectious person to infect this susceptible person. Also, for any susceptible person in this system, we will sample the probability for another person outside of the system to infect this susceptible person in the system.

Probability Measure Defined by Agent-based SIS Model:

$\begin{eqnarray} && P\left({X_{n,t},Y_{n,t}:n,t},\alpha,\beta,\gamma\right)\\ &=&P(\alpha)P(\beta)P(\gamma)\prod_{n,t}P(X_{n,t+1}|\{X_{n’,t}:n’\},\alpha,\beta,\gamma)\prod_{n,t}P(Y_{n,t}|X_{n,t})\\ &=&P(\alpha)P(\beta)P(\gamma)\prod_{n,t}\gamma^{1_{X_{n,t}=1\wedge X_{n,t+1}=0}}\cdot (1-\gamma)^{1_{X_{n,t}=1\wedge X_{n,t+1}=1}}\cdot \\ &&\left[\alpha+\beta\cdot\sum_{(n’,n,t)\in E}X_{n’,t}\right]^{1_{X_{n,t}=0\wedge X_{n,t+1}=1}}\cdot \left[1-\alpha-\beta\cdot\sum_{(n’,n,t)\in E}X_{n’,t}\right]^{1_{X_{n,t}=0\wedge X_{n,t+1}=0}}\cdot\\ &&\prod_{n,t}P(Y_{n,t}|X_{n,t})\\ \end{eqnarray}$

We can observe the four different types of events here:
1. If a person at time T is infectious and time T+1 is susceptible. This is our recovery event (gamma).
2. The probability for an infectious person to be still infectious, is consequently one minus gamma.
3. If a person at time T is zero and at time T+1 is one, this is an event of infection.
4. The total probability of infection from another person within the system and not infected, which means that we have state zero at time T and state zero at time T+1 is one minus this number.

Gibbs-Sampling of Agent-based SIS Model:

$\begin{eqnarray} X_{n,t+1}|\{X,Y\}\backslash X_{n,t+1};\alpha,\beta,\gamma &\sim& \mbox{Bernoulli}\left(\frac{P(X_{n,t+1}=1)}{\sum_{x=0,1}P(X_{n,t+1}=x)}\right)\\ &&\mbox{where }P(X_{n,t+1}=1)=P(X|\alpha,\beta,\gamma)P(Y|X)\\ R_{n,t}&\sim&\mbox{Categorical}\left(\frac{\alpha,\beta,\dots,\beta}{\alpha+\beta\sum\limits _{n’}1_{(n’,n)\in E_{t}\cap X_{n’,t}=1}}\right)\\ \alpha &\sim& \mbox{Beta}(a+\sum_{n,t}1_{\{R_{n,t}=1\}},b+\sum_{n,t:X_{n,t}=0}1-\sum_{n,t}1_{\{R_{n,t}=1\}})\\ \beta &\sim& \mbox{Beta}(a’+\sum_{n,t}1_{\{R_{n,t}>1\}},b’+\sum_{n,t:X_{n,t}=0;n’}1_{(n’,n)\in E_{t}\cap X_{n’,t}=1}-\sum_{n,t}1_{\{R_{n,t}>1\}})\\ \gamma &\sim& \mbox{Beta}(a”+\sum_{n,t:X_{n,t}=1}1_{\{X_{n,t+1}=0\}},b”+\sum_{n,t:X_{n,t}=1}1-\sum_{n,t:X_{n,t}=1}1_{\{X_{n,t+1}=0\}}). \end{eqnarray}$

The Gibbs sampling works if we know the probability distribution of one variable condition and condition of other variables. The variables in our system include alpha, beta, gamma, which are the rates of events. We also have x which is the state of the individuals. What we do is we iterate over all persons and all times and we sample those x conditions on everything else. First we sample alpha, then we sample beta, and then gamma. Then we come back to sample all of those x. Theoretically, after we have iterated this sampling process for long enough time, we will be able to get the distribution of not only the rates of the system, but also the states of the individuals and the joint states.

Variational Inference with Agent-based SIS Model

The challenge here is that we are coping with a system with a huge number of states. In order to make the problem tractable we need to use some variational influence method. The variational influence method that we have used is called expectation propagation.

$\begin{eqnarray} \alpha(X_{n,t+1}=1) &=& \alpha(X_{n,t}=0)\cdot (a+b\sum_{(n’,n,t)\in E}\gamma(X_{n’,t}))\cdot P(\mbox{obs}|X_{n,t+1}=1)\\ &&+\alpha(X_{n,t}=1)\cdot(1-c)\cdot P(\mbox{obs}|X_{n,t+1}=1)\\ \\ \alpha(X_{n,t+1}=0) &=& \alpha(X_{n,t}=0)\cdot (1-a-b\sum_{(n’,n,t)\in E}\gamma(X_{n’,t}))\cdot P(\mbox{obs}|X_{n,t+1}==0)\\ &&+\alpha(X_{n,t}=1)\cdot c\cdot P(\mbox{obs}|X_{n,t+1}=0) \end{eqnarray}$

The probability alpha, such that the probability for a person n at time T+1 to be infectious which is 1 equals the probability for this person at the previous time step to be susceptible times the probability for this person to be infected between time T and T+1. Another probability, that this person is infectious at time T and this person hasn’t been recovered between time T and time T+1 times the probability observation.

Similarly we can iteratively estimate the probability for a person to be susceptible at time T+1 from the probability of the four statistics, for this person to be infectious or susceptible at time T.

Validation of Agent-based SIS Model to Fit Data:

The next thing that we want to do is to validate, to make the case that this model is applicable to the data and the dynamics as we hypothesized. The way to make the case that the model is applicable to the data is through hypothesis testing. Here are the three observations:

First one shows that the infection is related to the network. This means that if I have a lot of friends, who contacts infected today then I will have a higher probability of being infected tomorrow. This is done through a permutation test, i.e.,permute the symptoms of the individuals within a system. This makes the network independent.

The next plot is of people reported certain symptoms and the number of friends with symptoms with its log probability.

Last plot has the probability distribution of recovery versus the theoretical probability, assuming exponential distribution. This means that this hypothesis also works. If it doesn’t work, the probability for us to see those kinds of plots will be very low.

Performance of Agent-based SIS Model

Now we will predict symptoms from dynamic network and symptoms from “probes” by comparing our estimation with “ground truth”.

The black line here is the ground truth and the red line here is the result of the algorithm. The blue line here is the result of scaling. It turns out that scaling does not work very well. This is saying that by combining the model with the data set we can actually be able to estimate the infectious population quite well.

Application to real world:

In summary, in this lecture we discussed a specific example of how we can combine a specific model with real world data set and to make predictions about it. Once we are able to do that, then we can tweak the network and  be able to direct the infection or information flow in the ways that we want to maximize the utility of the network. Similarly we can do many other things, for example, to combine the simulation models about transportation with transporting data set and to apply the behavior data, how people move around in a city and how people interact with one another and combine this type of data with the systems dynamic model about the economic development of a city and talk about how we can tune the behavior and interaction of people to increase the productivity of the city.

# Exponential Random Graph Model [10/13]

Le Yang, Harshith Kumar Ramadev and Karthik Kakarla Nagendra Prakash. October 13, 2015

### Introduction to Dynamic Networks

With these dynamic networks, what we can do?

1. Summarizing: What are the characteristics of a dynamic network data? For example, the triangle closure property says if A and B are friends and B and C are friends, then A and C are friends. This property can be summarized from different dataset including some dataset mentioned above.
2. Modeling: how to modeling the dynamic networks? There are two ways: a) from the perspective of economists and social scientists; b) from the perspective of physicists. Both perspectives will be discussed later in details.
3. Prediction: One important objective of research on dynamic network is to predict what will happen in the future. For example, in political relationship dataset, we want to know if one nation will support another nation or fight with it.

Network formation with game theory node/agent based modeling (economists and social scientists)

In this perspective, each node in network represents the individual people. Each people has utility function which they want to maximize. For example, people want to maximize their financial benefit or productivity. More importantly, the utility function is related with not only the resources of the individual but also the properties of the other edge through the edges. For example, people may cost money in forming or maintaining a link while gains profit in having connections to others.

One question is how can the individual agents build edges and get rid of edges to maximize the utilities of themselves. Once we have utility function for each individual, we can think of what network structure will emerge as a equilibrium of this game and what dynamic do we have in network evolution. From the overall perspective, another question is whether or not we have a way to design overall policy over the dynamic network so that the overall utility function can be maximized.

Network formation with tie based modeling (physicists)

First model is random graph model. In this model, a network have parameters N and P which means there are $n \choose 2$ potential edges and each of them might appear with probability P. Obviously, the average number of edges is $N \times P$. Some questions we may ask about this random graph is as follows:

• What is the average path length from one node to another node?
• What is the average distance?
• Whether there will be a lot of small clusters or only few large clusters?

Second model is so-called small world model. First we construct a rigor model where every node is connected with its full neighbors. Then we randomly rewire those edges from one node to any other randomly sampled node in the world. One interesting feature of this kind of random graph is the shortest path from any node to any other node is going to be the logarithm of the size of the world, i.e. the number of nodes in this network.

The third model is preferential attachment model constructed by Barabasi to talk about the fat tail distribution of the edges. The fat tail distribution means some people have a lot of connection and most of the people have very few connection. His idea is to start from some kernel and then grow the network by adding in new nodes. At each time a new node is added, it will establish connection with existing nodes according to the number of degrees. Obviously, rich get richer.

### Markov networks exhibiting Maximum Entropy property

The underlying idea in this part of the discussion is to form a dynamic network, which primarily means constructing a network by adding new edges or removing existing edges. Here we only consider the current state of the system and details about the previous system states are not taken into consideration. This process of starting from an initial state and transforming the network by adding/deleting edges to reach a state of equilibrium/asymptotic distribution demonstrates the principle of Markov networks exhibiting Maximum Entropy property.

When the features/sufficient statistics are known and also the average value of some functions of random variables are given, the probability distribution that best depicts the current state of the system would be having the largest entropy. This statement again is based on Markov property. The corresponding equation that shows this relation is given below:

Maximize $H(X) = – \int f(x)logf(x)d(x)$,

Subject to: $E(f_i(X)) = a_i$ for $i =1,\dots,n$.

Given below are the set of well-known family of probability distributions and their properties:

 Distribution Parameters θ Natural parameters η Base distribution h(x) Sufficient statistic T(x) Log-partition A(θ) Bernoulli p log (p/(1−p)) 1 x -log(1-p) Binomial p log (p/(1−p)) nCx x -nlog(1-p) Poisson λ log λ 1/x! x λ Exponential λ log λ 1 x -log(λ)

Here, we can brief about the Partition function i.e., $A(\theta)$. This function describes the statistical properties of the system in equilibrium state or asymptotic state. Also, we can understand the importance of base distribution i.e., $h(x)$ which gains significance when we are approximating a probability distribution with any other probability distribution of exponential family.

The equation that describes the relation of main statistics, sufficient statistics, Base function and Partition function on a given probability distribution is shown below:

$f_X(x;\theta)=h(x)exp(\eta(\theta)⋅T(x)−A(\theta))$

The directive graph with $n$ nodes and $n \choose 2$ edges which is used to analyze the data about social networks and other real world systems constitute the Exponential Random Graph Model (ERG’s). The edges are tied with average values of features/statistics.

The underlying question in the concept of ERG’s is, “What is the probability distribution of different graphs according to exponential distribution defined by exponential graph model?”.

The answer for this question can be represented by an equation as follows:

$P(X=x|\theta)=P_{\theta}(x)=c \times exp(\theta_1 z_1(x)+….+\theta_p z_p(x))$

The maximum entropy distribution of $p$ features/statistics/invariants is represented by $E(Z_1(x))….. E(Z_p(x))$.

Note that the variable $c$ in the above equation represents the “Exponential of partition function”.

The Sufficient statistic for a family of probability distributions is defined as the statistic defined for a probability distribution which gives all the information as given by a sample from which this statistic was calculated.

Below are some of the examples of scenarios and their sufficient statistics:

One tied scenarios:

1. a) Reciprocity – If A and B are friends, then B and A are also friends.
$M(x)=\sum_{i<j} x_{ij}*x_{ji}$
2. Birds of same feather flock together.
3. Bernoulli – Measured by density of edges. $\sum_{ij} x_{ij}/ \sum_{ij} 1$

Multiple tied scenarios:

1. Triangles – If there is an edge between points i and j, j and k; then there is also an edge between points i and k. The sufficient statistic for the number of triangles is illustrated as follows: $T(x)=\sum_{i<j<k} x_{ij}*x_{jk}*x_{ki}$.
2. Number of k-stars: the sufficient statistic for k-star scenario is given by:$S_k (x)=\sum_{i \in N} {x_{i+} \choose k}$, where $x_{i+} = \sum_{j} x_{ij}$.

In general, given a sufficient statistic and probability distribution, result obtained is a exponential random graph (Static Version). i.e., we will proceed to sample different random graphs according to the distribution defined by the features.

Dynamic ERG’s involve dynamic changes to the graph based only on the current state of the graph to attain the equilibrium state. Hence, we can track the dynamics of evolution of network based on limited observations about the network.

We can use the sampling algorithm to define the probability distribution of any sample path in terms of set of events. The sampling algorithm is as follows:

An ERG is represented by a $M \times M$ matrix, where a value of 0 in a cell shows that there is no edge between two vertices and a value of 1 shows that there is an edge between two vertices.

$X(t_m)$ to $X(t_{m+1})$ for $m=0,\dots,M$.

1. Sample next event time according to exponential distribution and update time t←t+Δt, where Δt ~ Exponential(ρ).

X is a function of time and the way to sample this graph is to first sample the next event according to the exponential distribution (maximum entropy property). The time of the event is calculated based on time at which first success event is observed in any experiment.

1. Sample directed pair $(i,j)$ uniformly and update $X_{ij}$ according to Gibbs Sampling.

Each pair can have two states – Edge or no edge. Using Gibbs sampling, we find the probability of having a edge on all (N-1) edges. This probability will be an exponential distribution. Taking the logarithm of this result will fetch us the linear form equation:

$P(X_{ij} = 1 | X \setminus X_{ij}) = \frac{P(X =\Delta^{+}_{ij}x)}{P(X =\Delta^{+}_{ij}x) + P(X =\Delta^{-}_{ij}x)}$

$log \frac{P(X_{ij} = 1 | X \setminus X_{ij})}{P(X_{ij} = 0| X \setminus X_{ij})} = \sum_k \theta_k (z_k(\Delta^{+}_{ij}x) -z_k(\Delta^{-}_{ij}x))$

where $\delta^{+}_{ij}z_k(x) = z_k(\Delta^{+}_{ij}x) -z_k(\Delta^{-}_{ij}x$ is called change statistic and  ‘+’ indicates there exists an edge and ‘-‘ indicates no edge between two vertices.

The equilibrium can be achieved using the Gibbs sampling approach of sampling path estimation.

Also, given the aggregate statistics such as number of people taking different courses and the time they spend in offices every day, we can perform the parameter estimation to each of these sufficient statistics, which will be explained below.

### Parameter estimation

Note that we already have a way to sample either static exponential random graph model or a dynamic exponential random graph model. So let us now discuss how we can estimate the parameters of those models. First let us try to sample a dynamic ERGM. This approach is called Geyer-Thompson approach which is based on Importance sampling.

1. Suppose that we have observations about the edges, we would like to sample the parameters in such a way that the observations are matched.
2. Sample $x(1),…,x(M)$ according to $θ~= (θ~1,…,θ~p)$. We sample this dynamic network model by first making $M$ samples of $X$ according to some initial parameters, where each $X$ is a matrix i.e. each X is an exponential random graph. Here the initial parameters are the ‘p’ features or ‘p’ sufficient statistics of $\theta$.
3. Once we have the initial sample, we iteratively improve the parameters according to the Newton’s method (Newton-Raphson method).
4. Now take another sample of x and repeat this process until z is maximized. After we have a good enough estimation of theta then we will repeat this process, normally we will need to take another sample because we have many local minima and maxima.

Details about the 3rd step: Newton’s method is a process to find the roots of functions whose graph crosses or just touches the x-axis. Consider a curve f(x) which is shown below. We would like to find the root of f(x) i.e. find the value of x for which f(x) will be equal to 0. We first assign a value to x say $x = x_0$ which is sufficiently close to the root. We then apply first order approximation of $f(x)$ at x = x We get a new value of x say $x_1$. Now we apply the first order approximation of f(x) at $x = x_1$, to find $x_2$. This procedure continues until you converge to a single value which is the root of the function. This is illustrated in the below example. Note that The Newton-Raphson method is just the multidimensional equivalent of this one dimensional Newton’s method.

Example:

1. We initially consider xo to be 6 and apply first order approximation of f(x) at x = 6. We will get the value of x1 as 3.33.
2. Now we apply first order approximation of f(x) at x = 3.33. We get x2 to be equal to 2.27.
3. Now we apply first order approximation to f(x) at x = 2.27. This will give us the value of x3 to be 2.01.
4. Now we apply first order approximation to f(x) at x = 2.01. This will give us the value of x4 to be 2.00. Now if you continue further you will get the same value of x which is 2.00. You can say that the value of x is converged to 2.00 which is the root of f(x).

Let us understand the intuition behind this method: One of the simplest functions one can deal with is a linear function: $f(x)=mx+b$. In particular, if you want the root of a linear function, it’s quite easily figured: $x=−bm$. Now, it is well-known that the tangent line of a function is the “best” linear approximation of a function in the vicinity of its point of tangency:

The first idea of the Newton-Raphson method is that, since it is easy to find the root of a linear function, we pretend that our complicated function is a line, and then find the root of a line, with the hope that the line’s crossing is an excellent approximation to the root we actually need.

Mathematically, if we have the tangent line of $f(x)$ at $x=a$, where ‘a’ is the “starting point”: $f(x) \approx f(a) + f′(a)(x−a)=0$. This is obtained by the Taylor series of the function $f(x)$ by considering only the first order terms. If we want equation in terms of $x$, then $x=a−\frac{f(a)}{f′(a)}$. Let’s call this $x_1$.

As you can see, the blue point corresponding to the approximation is a bit far off, which brings us to the second idea of Newton-Raphson: go for multiple iterations until you the reach the converged point.

As you can see, the new blue point is much nearer to the red point. Mathematically, this corresponds to finding the root of the new tangent line at $x = x_1$: $x_2 = x_1−\frac{f(x_1)}{f′(x_1)}$

This can be written in the form of a matrix. We can keep playing this game (with $x_2, x_3,\cdots, x_n$), up until the point that we find a value where the quantity $f(x_n)\over f′(x_n)$ is “tiny”. We then say that we have converged to an approximation of the root. That is the essence of Newton-Raphson.

The reason we first take a sample is to be able to estimate the sufficient statistics. This is given by $E_\theta(z(x)) \approx [\sum w^{(i)}* z(x^{(i)})]$ for $i = 1, 2, 3, \cdots,m$

This is how we estimate the sufficient statistics. Here $w(m)$: weight of $x_m$ and θ takes value of $\theta^{(g−1)}$ and

$$\begin{eqnarray*} w^{(m)} &=& \exp\left(\sum_{i=1}^p(\theta_i-\tilde\theta_i)z_i(x^{(m)})\right) \bigg / \sum_{m=1}^M \exp\left(\sum_{i=1}^p(\theta_i-\tilde\theta_i)z_i(x^{(m)})\right)\\ D(\theta) &=& \sum_m w^{(m)}z(x^{(m)})z(x^{(m)})^\mbox{T}-\left[\sum_m w^{(m)}z(x^{(m)})\right]\cdot\left[\sum_m w^{(m)}z(x^{(m)})\right]^\mbox{T}\\ \theta^{(g)} &=& \theta^{(g-1)}-D(\theta^{(g-1)})^{-1}\left[\sum_{m=1}^M w^{(m)}z(x^{(m)})-z(x_\mbox{obs})\right]\end{eqnarray*}$$

Here instead of taking the partial derivative of ‘f’ over ‘x’, what we have done is we have used the difference which approximates to the partial derivative. The equations above correspond to the difference between weighted values of the sufficient statistics which should be close to the estimated value of this one under this new sample. By these equations we find the next theta that is closer to the sufficient statistics that we are given. By iterating this we hope that in the end we will be able to get some theta that makes the expected value the empirical average to be close to this given average of the statistics.

Now let us see another approach for parameter estimation which is called Robins-Monro approach based on Metropolis-Hastings sampling. Given below are the steps followed in this approach.

1. We make a sample of X, and then for the sample, we compute the mean value of the features and the statistics of those features.
2. We iteratively sample the next exponential random graph and estimate the next parameters. We continue this process until it converges.
3. We will proceed with several sub-phases and in each sub-phase we use smaller and smaller steps so that we do not over shoot the actual solution.
4. Finally, we need to check the convergence which is estimated in terms of the deviation. Since standard deviations are pretty hard to estimate due to the lack of parameters, we sample a set of X according to our last estimation of θ, then from the sample we estimate standard deviation of different features and thereby the sufficient statistics.

### Goodness of Fit

After estimation of the parameters and estimation of our sample, it is important to check if they are a good fit to our model. In case of social networks, the properties you require your model to have are very dynamic in nature i.e. today you may require your model to fit certain properties and tomorrow you may require it to fit some other properties. Therefore whenever we have a model, we need to consider the goodness of fit. There are two ways to talk about goodness of fit.

1. One way is to talk about a common statistic. Suppose we have quantile-quantile plot for one distribution v/s another distribution and if these plots are within a narrow band, then we say it is a goodness of fit.
2. Another way to work is that, suppose the statistic has a Gaussian distribution, then we can perform chi-square tests and talk about the Gaussian distribution.

### Statnet package in R

As long as the network is static and not dynamic, all the methods we have discussed so far like sampling, parameter estimation and testing goodness of fit could be implemented using the statnet package in R.

Let us see how we can track or estimate dynamic networks. For example, let us consider the Florentine marriage data set which indicates the marriage relationships among different families. This data set has a set of attributes like wealth, population etc. which describe all the families resulting in different covariance for each of them. Edges are formed between two families if there is marriage relationship between them.

As discussed earlier, it is important for us to understand the marriage network and to sample this network. But the only thing we know are the edges that are formed. Considering the random graph model, the only thing we need to know is the number of edges so that we can sample it and fit the model. After we fit the model, we compute the p-value to check whether our model is statistically significant enough.

One of the ways to improve this model is through triangle inequality. Accordingly, if family A form marriage with family B and family B forms marriage with family C, then according to the principle of triangle inequality, family A should most likely form marriage with family C. But if you see the graph of the network, it is obvious that this principle does not hold true here as there is no triangle closure in case of some of the nodes. So we can proceed with model fitting. The below graph with the corresponding R code show the plot of the network in which the size of the node is proportional the wealth of the corresponding families. We find that the families with similar wealth are more likely to be form edges between them.

<pre>plot(flomarriage, vertex.cex=flomarriage %v% ‘wealth’ / 25, main=’Florentine marriage by wealth’)</pre>

We can include this factor in our model so that now model will be a function of both edges and the covariance of the wealth of the families. If you compare the p-value of the previous model (considering the triangle closure principle) with the p-value of this model, we find that p-value has improved from 0.79 to 0.027. We can infer that although this model is not statistically very significant but still there is some truth behind the principle – “Birds of same feather flock together”.

We can also sample the dynamic network by using the Gibbs sampling algorithm where we sample one edge and fix all other edges. We can simulate this network using a model that involves the density of edges as well as wealth and birds of the feather coefficient into consideration. These graphs are only sample graphs and will be different from the original network. We can see that the edges in the simulated graphs do not coincide with the edges in the original graph. The below graphs correspond to the 10 simulations of Florentine marriage network which are computed using the R code below.

flomodel.wealth.sim <- simulate(flomodel.wealth, nsim=10)
for(I in 1: length(flomodel.wealth.sim)) plot(flomodel.wealth.sim[[i]], vertex.cex=flomarriage %v% 'wealth' / 25, main='Florentine marriage simulaion')

In order to talk about the goodness of fit we can think of different statistics and then we can compare the different statistics in the target network with the different statistics in the original network and we see whether those are correct. The goodness of fit computation in R and the corresponding results are given below.

flomodel.wealth.gof <- gof(flomodel.wealth~degree + esp + distance)
plot(flomodel.wealth.gof)

Here we are fitting the family wealth with the degree of the edges and plus some other things. Note that we have many families. Thus we will have a distribution of the wealth of these families based on our graph. We can compare the distributions with the observed distributions. From these results, we can infer that this model is pretty good in simulating the dynamic network we have.

# Temporal Networks [10/15]

##### Maira Saboia, Venkatesh Yerneni and Vinayak Karuppasamy -October 15, 2015

Social Evolution Network (matrix)

This lecture will be about examples of Temporal Networks. The first network mentioned here uses social evolution data set. The matrix depicted in the Figure 1 was drawn using this data set, which is about to a group of undergraduate students in a small student dormitory at the MIT Campus. Surveys were conducted at different times, almost at the end of every month. The data selected include locations (WiFi access points) and proximity data. Each matrix in the Figure 1 represents a survey on a certain relationship at a certain time. Each column is indexed by somebody who filed out the survey, and the rows correspond to the people asked about.

From the aforementioned matrix we can see that close friends is a subjective decision; socializing with means spending at least two hours per week and so forth. And since that time was the time of presidential election, we also asked about whether people discussed politics. This is a very typical kind of social network. If we look at those matrix it is possible to see clusters. If we look over time, we find out that, for example, some clusters began pretty small and they grow in size and so on and so forth.

Social Evolution Network (Graph) : Friendship Network (Left – 2008.12, Right – 2009.03)

Figure 2 is a second view of this same network. Those are still the undergraduate students from freshman to senior, and there are also several graduate tutors. In this student dormitory there are four floors, and each floor is separated by a firewall in the middle into two parts, and there is passage in the firewall of the third floor. People on both sides of the firewall could communicate with people on other side.

Students organized in terms of the living sectors. The living sectors are represented with different colors. We can see that there are a lot of interactions between the green people and the yellow people. We also have numbers, which represent different schools years. We can see, for example, the freshman go together three months after the semester began. If we look at the network again, after another three months, although the freshman are still there they get mingled into other group of people in this undergraduate dormitory. There are also other changes.

Suppose that we can ideally ask a survey about the relationships every day. Suppose we have this ground truth, then the question that we can ask are: what is the mechanism for people to form friends and for people to dissolve their relationship? Whether or not we can interpolate the snapshots of the dynamic networks using an event trigger model. And whether or not the interpolation of the networks in between the observations grew with what, supposing that we have a ground truth.

Another question is, if we think about it in terms of KL divergence, how well are our estimations of the snapshots of the dynamic networks between our service? We can also ask what are the different graph rates or different weight that we will use? and what are the sufficient statistics that corresponds to the formation of this kind of network?

Product Space Network CA Hidalgo, B Klinger, A-L Barabasi, R Hausmann. “The Product Space Conditions the Development of Nations” Science (2007) 317: 482-487

Another network, which is also very interesting, is the Product Space Network. The origin of this network is the discussion about the rich people club. The idea is that imagine that we have a forest, and in this forest we have many trees, there are high branches and low branches in the trees, and we have monkeys. The idea is that there are monkeys on the high branches and there are monkeys at the low branches. But normally it is pretty hard for monkeys from low branches to climb up to the high branches. High branch monkeys do not want to go down to the low branches either.

In terms of the GDP (Gross Domestic Product) of different nations, and the richness of different nations, we have the same kind of problem. The idea is why some nations are richer than some other nations. This is because the monkeys of the high branches need to give a hand to the monkeys at the low branches to climb up.

We can observe that we have different industrial sectors and the different industrial sectors are related with one another in the sense that, for example, if I manufacture chips then I will have better access with to software engineering. For example, it takes longer for people to grow vegetables to be able to write code. The reason is that there is big gap between the people who grow vegetables to the people who write code.

It is possible to correlate different industrial sectors and then if a nation has import or export of both type of industry, then the distance is closer. If a country either import or export one kind of thing, or another kind of thing, but never both, then distance between those two is longer.

The red lines represent lines that are pretty close, and this not according to the geometric distance. The red lines are pretty close, and the blue lines are a little further and the yellow lines are even further.

For the countries who have already mastered the production of some products, then this country could very easily navigate from those products to the nearby products. But it will take longer time to navigate from those products that are further away.

The Coauthorship network represents a collaboration network. Each node in this graph represents a person, and the nodes are colored according to the person field of study. The chances of collaboration are represented by the distance between the nodes. On observing this graph, we see that people in the same area with similar colors, collaborate with each other. This can be seen in the graph that all nodes with the same color are clustered together, which makes complete sense that people working in the same field tend to collaborate on research ideas.

Coauthorship Network, M. E. J. Newman. “Coauthorship networks and patterns of scientific collaboration”. PNAS 2004 101

However from a new observation of this graph can be seen that there are chances for collaboration between nodes of different colours depending on their distance from each other. For example, we can see from the graph that nodes in statistical physics are closer to nodes in ecology, and ecology nodes are also close to nodes in nodes in agent based modeling. This can be true as people in ecology also do multi-agent modeling. This kind of modelling can give interesting results like, what will the next paper be about? Or what kind of collaborators can we have for a particular paper? These insights can be used to maximize our benefits in terms of game theory and modern perspective.

One of the many things that such networks are used for is trying to predict their evolution overtime, by looking at it snapshot by snapshot. Interpolating the snapshots will give us a prediction about the growth of the network and also explain the parameters of their growth. These networks also give a lot of information about their asymptotic properties, like what would be the properties of the network if it were infinitely large. An interesting property of this network in this regard is that, while keeping the density of the network constant if we keep on increasing the number of edges, we will get a connected network, i.e. the network will consist of only one component, which is a surprising property of this network.

There are many other properties, which is why these networks are important. Reasoning about these properties give useful insights. If we let the size of the network to go to infinity, meaning we are dealing with a network of internet, that is basically one component. Another network of a similar type is the small world network. This network tells us that the degrees of separation between any two people in the world is 6, i.e. any two people in the world are connected with atmost 6 people in between. We have a regular connected network which is like an abstraction. That for example that people are connected with one another through their physical distances. A person has a higher probability of being connected to then people within, for example, 10 miles away from himself, or in a social structure a person is more likely to be connected with another person working in the same organization.

This kind of regularly connected network, or regular graphs of any dimension are just abstractions of such ideas. The way to form this idea is to randomly rewire the edges. Take all edges from one person to another person in a grid. And then randomly rewire this into another node randomly sampled in a whole network. In this manner, given any two nodes, we will use those randomly rewired edges to go very quickly from one place to another, and use the regular edges, the green edges to navigate at the nearby locations, and that is one way to quickly get from one person to another person. The average distance between any two people grows as the logarithm of network size. It is the size of our social network.

The interesting thing is, those kinds of things and those kinds of properties and other properties that where mentioned can be either found out analytically if there is an analytical solution, or can be found through simulation. What we have here is this nice property about average path length, and also in this kind of network we have very large cluster coefficient. Meaning that if A knows B, and B knows C, then A knows C. In this world we have the nodes of forming different clusters.

Erdős–Rényi model

Watts-Strogatz model

Barabasi-Albert model model

What we will talk next is about this “Game Theory” perspective of a dynamic random network. We have a group of people and each person has certain properties, which can be considered as their resources. They might choose to be connected to another person or not connect to another person, or dissolve a previous edge. For each connection there is a cost to maintain this connection. So for each connection there will be a gain, the gain comes from sharing resources and sharing information; there will also be a cost, the cost comes from maintaining the edge. Each person in this network can only know the other people around. The idea is how can different people dynamically choose to connect to other people or not connect to other people over time, and whether or not there will be an equilibrium in the end.

#### Players, Networks, Chains and Components

What we have in this type of network is that we’ll have a set of players which are the nodes, and also we’ll have a set of networks and each network is connected to each other through a chain. This is the formulation of this kind of network:

• Let $N=\{1,…,n\}$ be a set of players
•  Let $\{g:g\subset g^N\}$ be a set of networks, also called graphs, where $g^N=\{ij:i,j \in N,i \neq j\}$ is the complete network and ij is a link
• Let$N(g)={i:\exists j, s.t.\ ij \in g}$ be players involved in at least one link, $n(g)$ be the cardinality of $N(g)$
•  A chain $i_1,i_2,…,in$ in $g$ connecting $i_1$ and in if ${i_1,i_2,…,i_n}\subseteq N(g)$ and ${i_1i_2,i_2i_3,…,i_{n-1}i_n}\subseteq g$
• A nonempty network $g’$ is a component of $g$ if
$i \in N(g’)$ and $j \in N(g’) \Rightarrow a$ connecting $i$ and $j$, and
$(no two players would benefit by forming a new link) • A sequence of adjacent networks$g1=g,g2,…,gK=g’$is an improving path from$g$to$g’$if for any$kgk+1=gk-ij$for some$ij$and$Yi(gk-ij)>Yi(gk)$,(one link is deleted and at least one involved player strictly benefits from deletion) or$gk+1=gk+ij$for some$ij$and$Yi(gk+ij)>Yi(gk)$and$Yj(gk+ij)\geq Yj(gk)$(one link is added and one involved player strictly benefit and the other doesn’t lose) • A set of network$C$is a cycle if there exists an improving path from any$g\in C$to any$g’\in CC$is a maximal cycle if it is not a proper subset of a cycle$C$is a closed cycle if no network$g\in C$leads to an improved network not in$C$. The question is whether or not they will reach one specific absorbing state, meaning that at that state nothing will change. In other words, they will end up with some kind of equilibrium in the sense that the statistics will forget the previous configuration a long time ago and it is something that has a detailed balance. Those are the two different considerations that we have. Similar to what we have coped with in a Markov chain we might end up with a cycle in a sense that we end up with some periodic Markov chain, we go from A to B from B to C and from C to A. Those are the different configurations that we have in terms of network game. One example of this network game (Trading Game) looks like following: Trading Game We have four players, each player can have two goods$(0,1)$or$(1,0)$with probability is$1/2$. The utility of each player is$U(x,y)=x*y$, for x and y amount of different goods (expected utility per player is$1/8$,$1/6$and$3/16$for components$1, 2$and$3$players). The players will tend to either form an edge or dissolve an edge in order to maximize the utilities of themselves. Trade can only happen along the network and the cost of a link is$5/96$per player. We can play this game and we will find out that this network form a cycle, the players will repeatedly try to optimize the utility function, but in the end the greedy choice of the players will just bring the network dynamics in a circular fashion. So this is one example of the network game. Another example of the network game is the Co-author network. In this network each player$i$has$n_i$project. Suppose that each player spends$1/n_i$time on each project equally. Each player can collaborate with another player on exactly one project. It depends on how the player choose to build an edge or to dissolve an edge. When one player has collaboration with another player there is a synergy (utility of collaboration) which is expressed in terms of$u_i(g)=\sum_{j:ij\in g} 1/n_i + 1/n_{j}+1/n_{i}n_j$. The idea of this network is: what is the best way for players to choose the number of projects and to choose the connections of this network in order to maximized the utility or the productivity of different users? One thing that we know is that for any type of network game, either the game will end up with one absorbing state or will end up with a cycle of many absorbing state. Pairwise Stable Network vs. Closed Cycle of Networks Claim: for any$v$and$Y$there exists at least one pairwise stable network or closed cycle of networks. Proof: • Starting network g • Either leads to a pairwise stable network or • Or leads to a cycle C. • We can expand C into a maximal cycle, • Which is closed cycle if no pairwise stable network exists. This one says that for any network valuation function,$v$(the utility of this network), there exists at least one pairwise stable network. Which is just one state, or a closed circle of networks. Players in the network continues to add an edge or remove an edge, and then form a cycle. The idea to show this is actually pretty simple. It starts from a random network and then a player goes one step. Either this state is absorbing state, meaning that nothing will change or this specific network is not absorbing state. So if it not absorbing state we’ll go to another network in this graph. Since there is only a finite number of network configurations in this plot, we will end up with going back to some state. Either the first state or some state in the middle. If we go to some state in the middle, we ultimately get a cycle from the different types of configurations. This is the prove of this one. # Opinion Dynamics, Culture Dynamics & Language Dynamics [10/20] Prepared by Jing Zhang, Bikram Kashyap and Buddhika Laknath Semage This lecture talks about diffusion in social networks. The type of tools that we’re using is simulation and some mathematics. Those are the related literature, the most impactful literature that clipped on diffusion. ### Models and Tools -Claudio Castellano. Santo Fortunato and Vittorio Loreto (2009), Statistical Physics of Social Dynamics, Reviews of Modern Physics 81, 591-646. -Thomas Liggett. [Interacting particle systems](http://catalog.lib.buffalo.edu/vufind/Record/000681209). Vol. 276. Springer Science & Business Media, 2012. -Seth Tisue, and Uri Wilensky. “[Netlogo](https://ccl.northwestern.edu/netlogo/): A simple environment for modeling complexity.” International conference on complex systems. 2004. – N. A. Christakis, & J. H. Fowler (2007). The spread of obesity in a large social network over Centola, D. 2010. The Spread of Behavior in an Online Social Network Experiment. science 329(5996):1194 -Stephen Eubank, et al. “Modeling disease outbreaks in realistic urban social networks.” Nature 429.6988 (2004): 180-184. The first one is a summary of all kinds of models about diffusion. Models about diffusion can explain the formation of language, for example, culture and other type of things. The second one is about the interacting particle systems and the idea is that if we have just one particle and if each particle can take two states, and if we have a large system of many of those not interacting particles, then the system is pretty easy to understand, because it’s just the product dynamics of the individual particles, but if we impose some interaction among those particles, then, the system becomes harder to understand,. Also, because this type of problem maybe very hard, especially if we cope with a smaller system. If we cope up with a larger system, then we could think of what would be the equilibrium behavior and how many equilibrium states there will be. If we let the system evolve from arbitrary initial state, then how many final states there will be and how many attracting states? Those are the attracting states, so given initial state, how can we know what will be the final state of the system up to the evolution? The third one is a simulator. NetLogo is agent-based stimulator. The configuration of NetLogo is a space, a two-world space with grid, so we have cells in our matrix and the cells in the matrix can interact with the neighbors. We can define the neighbors in different ways. They can interact with the neighbors in various ways that can be defined with a program language. Also, we have many practical work on real-world data set and this one, the first one by Christakis and Fowler is perhaps the work that makes people begin to realize that this kind of diffusion behavior actually exists in our real-world network. It is on data set collected about the medical records collected in the New England area in Massachusetts, and it is about more than 30 years of data sets. It’s a data set about patients. Each time a patient visited the clinics, we have a recording about the height, weight, and on the behavior of the patients. Since the patients recommend their relatives and their friends to join the medical network, so we’d have a network, and as we get a huge amount of data then, this is the situation we can talk about science because we have a huge amount of data then. This means that we have either central limit theorem or theorem of mass deviation, so the patterns that we can find may be some pattern that can be generalizable. This is a first, a practical study about the social diffusion. The last study about social diffusion is about the behavior. From the previous studies, we know that the behaviors and opinions will diffuse in real-world networks. One opinion is that the more of my neighbors, the more of my friends take certain opinions, then I will have higher probability of take this opinion. Another competitive idea is that the probability for me to take opinion does not depend on the number of friends that take opinion. It is only based on whether or not I am in the influence of opinions. ### Voter Model –$N=\{1,\dots,n\}$are nodes –$E\subseteq 2^N$are edges –$X_i(0)=\pm 1$for$i\in N$Interate over$t$: Choose j randomly from$\{j: (i,j)\in E\}$with probability$1 \big / \sum_{(i,j)\in E} 1$, and$X_i(t+1)\leftarrow X_j(t)$The first model that we want to talk about is what people call a voter model. A voter model works in the following way. We have a node and we have an edges on those nodes. We could have N times N directed edges, and each edge could exist or not exist. This is why we have 2 to the N number of potential edges. Each node could take one of two states, either +1 or -1, so they are voting. They are either voting for +1 or voting for -1. This is the set-up of our work. We have nodes and we have edges, and the nodes could take either +1 or -1, and the dynamics of this voter model works in the following way: we iterate through time, at any time, we randomly pick a node and after we random pick a note, we randomly pick one of his neighbor, it could be any neighbor, uniformly, and then, we change our opinion to the opinion of our neighbor. These kind of model captures the behavior that a person, the opinions, the randomness spread from one person to another person. #### Modifications: – Adding “zealots” who do not change opinions – 3 States per agent, leftists, centrists and rightists interactions involve only centrists leftists and rightists do not talk to each other. – Introducing memory — Changing opinion only after encouraging opposite opinion$\ge r$times Modifications of these models could be done to capture a more advanced behavior include, for example, we have a certain amount of zealots in a system and those zealots do not change opinions. This could be true, right? The idea is how does the zealot affect the behavior of this network? Another modification could be that we have, instead of two states for each node, we have three states. We could have each agent, each node could be a leftist or a rightist, or a centrist. The leftist could bring a centrist to the left side and the rightist could bring the centrist to the right side, but the leftist never interact with the rightist, and the rightist never act with the leftist. Again, the idea is what will be the dynamics of this model. Those are the ways that have some impact. This means that those ways have been adopted by many persons in their studies. Another way is for us to introduce our memory. This means that we only change, a person in the network only changes his or her opinion after he or she has already heard of this opinion for many times, heard of either +1 or -1. In the example that we mention, it could be that a certain amount of people have already told him about the new opinion. The questions that we want to ask in such kind of models as, for example, whether or not all people in this system will agree on on a certain stand meaning, whether the system will cover still either all +1 or all -1, or something. This is one thing that we could ask. If it does not happen, the next question is whether or not this system will ultimately get to some stationary state after some time, or this system will be evolving randomly or purely adequate. This is the second question that we could ask. The third question that we could ask is that whether this system will become more, will evolve, will form more order or less order will be more chaotic or less chaotic. Those are all very good questions that we could ask. If we think about the book on interacting particle systems, the hardest thing is first, how can we formulate this problem mathematically? After we formulate this program mathematically, how can we solve those problems? Those problems, although they sound pretty easy and intuitive, but they are actually pretty hard. Constantly, the voter model, what we found is that starting from a chaotic state. The system will evolve and it ultimately reaches some state with some stationary, with some order there. This is a voting model written by NetLogo [http://ccl.northwestern.edu/netlogo/models/Voting], and so we can actually conduct some experiments. If we evolve this voter model, will ultimately gets to some stationary state and in different scenarios we’ll get different conclusions, and in some scenarios, we will be able to get a consensus in this system. Another model that we want to talk about is called this bounded confidence model. Again, we have a system, a network of nodes or people. In this network, each people could have opinion, and this opinion is a number randomly distributed between zero and 1. Bounded Confidence Models (http://ccl.northwestern.edu/netlogo/models/community/bc) People only talk to those with similar opinions –$N=\{1,\dots,n\}$are nodes –$E\subseteq 2^N$are edges –$X_i(0) = x_i(0) \in [0,1]$Interate over$t$: Choose j randomly from$\{j: (i,j)\in E\}$if$|x_i(t)-x_j(t)|\ge\epsilon$nothing happens at time$t+1$, otherwise –$X_i(t+1) = X_i(t)+\mu\cdot \left(x_j(t)-x_i(t)\right)$–$X_j(t+1) = X_j(t)+\mu\cdot \left(x_i(t)-x_j(t)\right)$Properties of Bounded Confidence Models Number of clusters is approximately$1 / (2\epsilon)$, because at stationarity, people in different clusters cannot talk to each other This model evolves in the following way. At each time we iterate over time$T$, at each time$T,$we randomly choose a pair of nodes,$I $and$ J,$and if the opinions of$ I$and$J$is greater than some threshold then, it will not influence one another. However, if the opinions of those two people are within some SigmaXL, then they will change their opinion, convert their opinion according to the opinion of other people. This is the bounded confidence model. Again, the question that we want to ask about this bounded confidence model is that what will be the distribution of the opinions ultimately in this system? Those are mathematical problems, but the first is, to start thinking about this model is to first proceed with some stimulation and find out what will be the result. NetLogo User Community Models [http://ccl.northwestern.edu/netlogo/models/community/bc]. At each time step, we randomly select two people among all peoples, and if the opinions of those two people are too far away, then, those people do not change opinions, but if the opinions of those people are close enough, then their opinions will move closer to each other. Then, this means that if the opinions of any two persons are close enough, then, their opinions will converge. So, as time goes on, there will be certain attractors in the system, and those attractors will collect together the nearby opinions. That is why we ultimately get some polarized opinions in this system. Majority Rule Models A group of$r$agents is selected at random, and all agents in the group take the majority opinion.$\Rightarrow$System could have concensus$E(X)=\pm 1$, unstable state and metastable state. [Models based on Social Impact Theory] (http://ccl.northwestern.edu/netlogo/models/community/Sitsim) –$N=\{1,\dots,n\}$are nodes –$E\subseteq 2^N$are edges –$X_i(0)=\pm 1$for$i\in N$Agents change opinions according to the opinions of distance, persuasiveness, and supportiveness of other agents. The other kind of model about the opinions is that, again, here, we have the agents. They are in a network, so they have distances, and the distances could be defined in various ways such as the number of hops, one node to another node, or if the network, the distance is defined in terms of distance, then we can find out a circle and find out the groups within the circle. What we do in this majority role model is that we have a system of people and each people could take one of two opinions, either plus or minus. Each two people have a distance. At each time step, we will random pick a person, and then, randomly pick a radius among this person. This is the group that we want to consider. Then, we change the opinions of all of those persons into the majority of their opinions. This is how this model works. Of course, we can also work with a more complicated model, and the beauty of a simple model is that a simple model will be more generalizable to other situations, and if we have a lot of constraints about a model, then the generalizability of the model is limited. Based on the social impact theory, an agent change opinions according to opinions of the distance such as in the majority role or could also be based on the persuasiveness of the other people, or based on the supportiveness of the other agents. The agent could persuade another agent. An agent could pursue another agent to change opinion or could support the same opinion of another agent. This is the idea. In a social network, we normally have. It’s normally a small world, cluster network. We have clusters and in the clusters, people are cohesively, tightly connected, and between the clusters, we have wear ties. There are also gateway people and the gateway people tends to propagate the ideas from one cluster to another cluster. As social impact theory actually captures this type of dynamics. Another book which I haven’t listed here because it’s just a pop science book. I think it’s called Structural Hole. It’s also a pretty interesting book about the dynamics of social networks and the diffusion of information and ideas within a social network. We have talked about the opinion dynamics. The next topic that I want to talk about is culture dynamics. Culture dynamics is actually not very different from opinion dynamics. The only difference is that in opinion dynamics, each person can have one dimension which is opinion about something. It could be the + or 1, it could be a range, but in culture dynamics, we have a culture is composed of multiple opinions. We could have opinion one to opinion N, and each opinion could take values from a range. It could be the +1, -1 could be within a range. ### Cultural Dynamics – What are the microscopic mechanisms that drive the formation of cultural domains? – What is the ultimate fate of diversity? Is it bound to persist, or all differences will eventually disappear? – What is the role of social network structure? In cultural dynamics models, we examine the evolution interaction of different cultural domains and groups and try to reason their eventual fate. Whether the cultures will combine, or if they will consolidate. What factors are responsible. What are the roles of social network structure? Axelrod Model [http://ccl.northwestern.edu/netlogo/models/community/Dissemination%20of%20Culture] Individuals in a network have –$F$features$\left(\sigma_1,\sigma_2,\dots,\sigma_F\right)$, –$q$traits per feature$\sigma_f=0,1,\dots,q$for$f=1,\dots,F$At each step #. an individual$i$and its neighbour$j$are selected, #. their similarity is estimated$w_{i,j}={1\over F}\sum_{f=1}^F 1_{\sigma_f(i),\sigma_f(j)}$social influence: tendency for individuals to become more similar when they interact #. with probability$w_{i,j}$,$\sigma_f(j)\leftarrow\sigma_f(i)$for$f$s.t. ($\sigma_f(i)\ne\sigma_f(j)$) homophily: tendency for the likes to interact more Axelrod, R., 1997, J. Conflict Resolut. 41(2), 203. Properties of Axelrod Model starting from disordered initial condition, the system will evolved into – the ordered state, in which all individuals share the same traits for all features, or – one of frozen states, with coexistence of different cultural regions. – System will more likely to evolve into many cultural regions if each feature has more traits. At each step, an individual i and its neighbor j are selected, and their similarity is calculated using$w_{i,j}={1\over F}\sum_{f=1}^F 1_{\sigma_f(i),\sigma_f(j)}$. That is we are calculating the similarities based on agreement of the features. If feature f of individuals i and j agrees, then their similarity weight increases, and if the feature f is different, then it has no contribution. Now, with probability wi,j, we change the features of individual j by the features of individual i. Set$\sigma_f(i)=\sigma_f(j)$, with probability$w_{i,j}$for all f such that$\sigma_f(i)\ne\sigma_f(j)$. Repeated application of this algorithm can be used to simulate the evolution of culture groups. Starting with a random distribution of features and the traits, we can run the algorithm and end up with three different kind of evolution: ● All converge into one culture, with same features. ● There will different cultural regions coexisting, in a frozen state. There will be clusters of features for the different cultural regions. ● If each feature has a lot of traits, convergence becomes hard. So we will end up with lot of small clusters, that is, cultural regions. If we try to use tweets to simulate the system, with a small number of tweets per culture, the system will consolidate the people in the culture. With a high number of tweets, this becomes much harder. In a system of people, we have many features, and many tweets for each features, that is, the tweets can be considered traits. With high count of tweets, we end up in the third situation as described above. It is easy to simulate the system programmatically. But getting an mathematical solution is hard. There are entire groups of people working on finding mathematical answers. ### Flock Dynamics Each particle$i$is characterized by its – position$x_i$, – direction$\theta$, and – constant modulus of speed$v_0$At each time step,$\theta_i(t+\Delta t)=\theta(t) _{S(i)}+\xi$, where$\xi\sim U(-\eta,\eta)$particle$i$assumes average direction of particles nearby, with some random perturbation A. Czirok, A.-L. Barabasi, and T. Vicsek, 1999, Phys. Rev. Lett. 82(1), 209 This kind on dynamics model is based on movement of birds and other animals like fish that moves in large groups. What we observe is birds moving in flocks, where a huge group moves together. Each particle/member has 3 properties: ● Position$x_i$● DIrection$\theta$● Constant speed modulus$v_0$At each time step, each particle assumes average direction of nearby particles, with some random deviation. With the passage of time, it is quite obvious that we will end up with clusters of particles moving in the same direction. Also if the particles are too fast, and the neighbourhood radius is small, then the interaction between particles will be very weak, and clusters will not form. If the particles are fast, and the neighbourhood is not to small, we can end up with just a single large cluster. NetLogo has a simulator where we can play with different parameters and simulate the system. The flock model might be too complex to be analytically analyzed, and there may not be any solid work on the analytical solution of the ultimate behaviour of the flock. Flock Dynamics Simulation http://ccl.northwestern.edu/netlogo/models/Flocking Properties of Flock Dynamics Order parameter$\Phi={1\over N}\left|\sum_{j=1}^N \vec v_j\right|$varies with noise level$\eta$:$\Phi(\eta)=\begin{cases}
\left[\frac{\eta_{c}(\rho)-\eta}{\eta_{c}(\rho)}\right]^{\beta} & \eta\eta_{c}(\rho)
\end{cases}$where –$\rho$is the density of particles –$\eta_c$is threshold noise for system to evolve into an ordered state The order parameter$\Phi$varies with the noise$\eta$. If the noise level is over a threshold noise level$\eta_c$, then we will have chaotic behaviour, and the flocks will not form. If the noise is less than the critical noise level, then we will end up with consolidated behaviour. The lower the noise, the more coordinated the members in the flock will be. ### Language Dynamics – Processes explaining the emergence, change, evolution, interactions and extinction of languages Sociobiological approach: Successful communicators, enjoying a selective advantage, are more likely to reproduce than worse communicators. Sociocultural approach: Agents can select better strategies exploiting cultural choice and direct feedback from the system. Language Dynamics model is an attempt at exploring behaviors of human languages. With this model it’s possible to simulate events such as formation of languages, evolution and the outcome. There are two schools of thoughts when it comes to language dynamics. 1. Sociobiological approach This approach takes the position that successful communicators enjoy a selective advantage because they can better communicate with each other and they can better coordinate their behavior. It’s based on the hypothesis that social behavior has resulted from evolution and attempts to explain and examine social behavior within that context [1][3]. Applying the phenomena of natural selection in evolution theory, successful communicators are more likely to reproduce than their less successful communicators and become the more prominent trait. 2. Sociocultural approach This approach is based on the concept that human activities take place in cultural contexts, are mediated by language and other symbol systems, and can be best understood when investigated in their historical development [2]. Because of the interdependence of social and individual processes, individual agents can set up better strategies by exploiting culture choice and feedback from the system. #### Naming Game There is a vocabulary of words. Initially all agents in a network have empty memory. Iterate over the following steps until convergence. 1. A pair of neighbor agents are chosen, one as speaker and the other as listener. 2. If the speaker has empty memory, he will generate a new word from the vocabulary. Speaker randomly selects a word from his memory and tell it to the listener. 3. If the listener has the word, he will remove all other words and keep only this one. Otherwise, the listener will remember this word. Baronchelli, A., Felici, M., Loreto, V., Caglioti, E. & Steels, L. Sharp transition towards shared vocabularies in multi-agent systems. J. Stat. Mech. 2006, P06014 (2006). Naming game is a quite well-known interpretation of the language dynamics. It’s a model able to account for the emergence of a shared vocabulary of form-meaning associations through social/cultural learning [4]. It works in the following way. There’s a vocabulary of words, a network and nodes, which are the agents. Initially, the agents have an empty memory. Whenever they encounter something, they will name them and will communicate the name with other agents. At each time, we will randomly choose a pair of neighboring agents as the pair of agents who needs to communicate with each other. We specify one agent as a speaker and another agent as a listener. If the speaker has an empty memory, he would generate a new word from the vocabulary. And if the listener has the word in his vocabulary, then the speaker and the listener have both consolidated the word. The interaction is considered a success, and both agents delete from their inventories all their names except for the agreed name. But if the name chosen by the speaker is not in the memory of the listener, he will think of the word as a new word, and will add the word to his memory (ie: remember the word). Properties of Naming Game Starting from random choice of words by networked individuals, global consensus will eventually emerge. –$N_w(t)$, total number of words in the population –$N_d(t)$, total number of different words in the population –$S(t)$, success rate Now that we have established the rules of the game, let’s look at the dynamics. What has been observed in this model is that during the exploitation stage, the total number of active words in the system will first increase. Then it will decrease and the total number of different words will also decrease. But the successful rate of this system will gradually increase. #### Evolution Language Game –$n$objects –$m$words production matrix$p_{i,j}$probability for speaker to use word$j$when seeing object$i$comprehension matrix$q_{i,j}$probability for lisener to associate word$j$to object$i$language$L=(P,Q)$fitness$F(L_1,L_2)={1\over 2}\sum_{i,j} p_{i,j}^{(1)}q_{j,i}^{(2)}+p_{i,j}^{(2)}q_{j,i}^{(1)}$associate matrix$a_{i,j}$number of times parent associate object$i$with word$j$–$p_{i,j}=a_{i,j}/\sum_l a_{i,l}$–$q_{i,j}=a_{i,j}/\sum_l a_{l,j}$In order for a set of agents in a network to form a language, the set of agents should choose some strategy to be able to think in an understandable way through the above naming game. This is the idea behind evolution of language game. There are N objects and M words. We want to use the M words to represent the N objects. There are two matrices. 1. Production matrix – This is at the speaker’s side. Each row represents an object and each column represent a word. Whenever we see an object, we have a distribution of the words, which represents the likelihood of the speaker to choose certain words. An entry in this matrix –$p_{i,j}$¬ represents the probability for speaker to use word j when explaining the object i. 2. Comprehension matrix – This is at the listener’s side. Whenever a listener gets a word, he will map the word to a set of objects in the system. An entry in the matrix –$q_{i,j}$represents the probability for the listener to associate word j to object i. The fitness of this system will be represented as the probability for a pair of listener and the speaker to understand one another in both ways.$F(L_1,L_2) = \frac{1}{2}\Sigma_{i,j} p_{i,q}^{(1)}q_{j,i}^{(2)} + p_{i,q}^{(2)}q_{j,i}^{(1)}\$

The first term is the average success rate for person J to understand person I. The second term is the average success rate for person I to understand person J. The combination of two terms gives the fitness of this system.

This system evolves in a way that the parents will have kids in a genetic model and the kids will take the production matrix or the comprehension matrix from the parents with some random permutation. In the end, our goal is to evolve this language so that the rate — the fitness of this system increases.

### References

[1] https://en.wikipedia.org/wiki/Sociobiology
[2] http://webpages.charter.net/schmolze1/vygotsky/johnsteiner.html
[3] http://plato.stanford.edu/entries/sociobiology/
[4] http://openaccess.city.ac.uk/2659/1/In-depth%20analysis%20of%20the%20Naming%20Game%20dynamics.pdf