Prepared by Hao Fan, Manu Kothegal Shivakumar and Pritika Mehta

In this lecture, we will specific talk about how can we combine agent-based modeling on one hand and vehicle tracking data on other hand to make inference about the states of the roads dynamically.

The resources for us to understand our cities are not only the publication of papers or the books, more importantly are the models, computational tools, and visualization tools. Listed as follow are the open source simulation software.

- MATSim
- TRANSIMS — Transportation Analysis Simulation System
- DynaMIT & MITSIMLab — DynaMIT & MITSIMLab is developed by the civil engineering department at MIT, supported by the department of transportation.
- UrbanSim — UrbanSim is a tool to simulate the migration of people within a city and to simulate the interaction of different industries as well as people. It is pretty useful for people working with cities and working on social dynamics at large-scale.
- SimCity open source — SimCity is a famous computer game, and there are several versions of it. The first version of it was open-sourced, and there is also a java version.

If we work with real world cities, there can be two major problems. The first one is how can we combine those tools to make real time prediction about what will happen, the second one is how can we bridge the gap between the data that we can collect and the data that we will really find useful, and justify that our bridge can lead to convincing conclusions about our cities. In such cases, we need visualization tools to help display our models.

- OpenLayers — OpenLayers is a set of JavaScript to bring tiles of a map from map server.
- MapServer — MapServer is read in the geographical features such as roads, rivers, the boundaries of countries or administrative areas at the server side and then serve those tiles.
- Google Analytics — Google Analytics can help generate all types of dynamic charts about many different things.
- Google Maps API — After we get simulation and visualization tools, the next step is to find useful data. Below are some resources of data that can help us understand the cities.

**Census Data**

- Demographic & Health Surveys: The Demographics & Health Surveys contains data set created by people from the US about many different countries, it includes information such as the latitudes and longitudes of the different villages of the survey is and surveys such as the annual income, the average education, the birthrates and the death rate and other things.
- US Cenesus Bureau: US Cenesus Bureau contains comprehensive US census data, includes road segments, household statistics, travel patterns and so on.
- Senegal: National Agency of Statistics and Demography (ANSD) ANSD is data for development collaboration, contains it contains additional information about epidemics, employment and many other things.
- Open government initiative

**Street Map & Yellow Pages**

- Open Street Map: Open Street Map data is especially good in the US and in Europe, but it is also good at many other parts of the world.
- Yellow Pages

**Call Detail Records**

Data for Development of Ivory Coast and Senegal contains flow of population and the interaction of population in different countries. It can also used to track the spreading of epidemic diseases at country site, or make predictions about the economics.

**Vehicle Tracking**

- Niagara Frontiers Transportation Authority Developer Tools
- MTA Bus Time: There are many city governments developed some web API for people to extract the locations of the individual vehicles of different public transportation fleet, above are the web API of transportation at Buffalo and New York City.
- NYCT&L Taxi Data: NYCT&L Taxi Data is taxi data with information of latitudes and longitudes of all yellow cabs at Manhattan for over three years at a temporal resolution of one sample per minute. It can also help us understand whether there is a traffic, accident, or other things happening in the city.
- Cabspotting — Cabspotting is a data set that tracks the vehicles at San Francisco.
- mPat sample data
- Mobile Millennium

Mobile Millennium is a data set about the dynamics in the San Francisco Bay Area, including San Francisco, San Jose, Berkeley and many other cities.

In a paper “A survey of results on mobile phone datasets analysis” from Vincent D. Blondel, Adeline Decuyper and Gautier Krings, some advances made recently in the study of mobile phone datasets are reviewed.

**Models of the behavior captured by mobile phone data sets**

- Diffusion of information and diseases
- Trajectory as Levy flight, exploration vs exploration, very few habitats, predictability of next location, amount of commuting between two counties, neighborhood signatures revealed by calling patterns, movement and geographic information propagation in response to emergency
- Probability of making a phone call, duration of a phone vs distance between the two ends of this phone call, social/ethnic/historical partitioning revealed by phone calls, economic development vs diversity
- Duration of being in phone-contact, burstiness of inter phone call intervals
- Power law degree distribution, clustering/small world (maximizing utility), symmetry, community detection/ dynamics of small and large communities, inferring gender and age from call pattern

**Applications of mobile phone data sets**

- Urban sensing (o/d matrix, flow of people, classifying trips)
- Tracking diseases
- Viral marketing

In the Transportation Modeling, we want to identify a transportation simulator as event driven model and assign probability to different several paths of this event driven model. Based on this type of ideas if we identify the observation as the observation of the system, then we can think of simulator as hidden Markov process (although this kind of hidden Markov process is pretty complicated) and make inference about what’s happening about this hidden Markov process in terms of the observations.

In order to do the simulation and inference, several things are needed.

**Network file: information about roads**The road network makes it a really a network. The nodes represent the road intersections, the edges are either the locations (such as buildings) or the road segments. There are road segments from one intersection to another intersection, and at these intersection a vehicle needs to make a decision whether it should go through one branch or another.**Plans/population file: trips of all individuals between home, work, school etc. by car, walk, bus etc.**Things actually determines the dynamic is the plans of people, the individuals. Such as their locations, places they go to on a typical day, the characteristics of the trips, and the times they plan those trips. The plans of the populations determine the dynamics because from the plans and the populations we can find out the state transition matrix**Facility file: open time, close time and capacity of buildings****Configuration file: choice to simulate traffic on a road, replan trips, score trips, generate output, etc.**

The paper “A large-scale agent-based traffic microsimulation based on queue model” from Nurhan Cetin, Adrian Burri and Kai Nagela provide us a specific case study in terms of how the events are scheduled and how the events are registered against the timer and how we up to date the state of the road segments as well as the states of the agents in response to the different individual events.

- Each road has length, free speed and capacity.
- If number of cars ahead is less than capacity, a car can move to the next segment with free speed.
- Otherwise, the car wait (at a queue) until all cars move out.

Another paper “Generating complete all-day activity plans with genetic algorithms” from David Charypar and Kai Nagel provide us a dynamic model in terms of different people choose different trips every day.

- Change trip starting time
- Change route
- Change mode (car, bus, walk)
- Change destination

In this model, people change routes, change the travel mode and change the destination, and the road segments responds to the same demands in different ways at different times according to weather and many other things.

In event driven model we define set of events and events will change the state of the system in different ways. If a sample path of events with corresponding time is given we can find the probabilities.

$p\circ l_1\to p\circ l_2$

This represents one event where person p goes from one location $l_1$ to another location $l_2$.

The state of the system is defined in terms of the location of a person at time t and the number of vehicles on road segment A at time t.

$X^{(p)}_t$ is location of person $p$ in time t

$X^{(l)}_t$ is number of vehicles on road $l$ at time t

If road segment has a lot of vehicles then this vehicle will stay on the road segment A for longer time. If a road segment has a lot of vehicles then the driver might decide to go off to one branch instead of another branch. So the state transition matrix is actually a function of the state of this system. Anyway, we can identify this system in terms of just one event and then we can get the states of the event and from the states we can proceed to get the probability measure of different sample paths.

One way we can calculate probabilities is by using mean field approximation, a projection based method. Based on the model that we defined we will be able to find out the probability for us to make a transition from location $l_1$ to another location $l_2$, conditioned on our observations.

The posterior probability of event $pl_1\to pl_2$ and latent states $x_{t-1,t}^{(l)}$ , $x_{t-1,t}^{(p)}$ conditioned on observations is

\begin{align*}

& P(pl_{1}\to pl_{2},\{x_{t-1,t}^{(l)}:l\},\{x_{t-1,t}^{(p)},p\}|\mbox{obs})\\

& =\prod_{l}\alpha_{t-1}^{(l)}(x_{t-1}^{(l)})\cdot\prod_{p’}\alpha_{t-1}^{(p’)}(x_{t-1}^{(p’)})\\

& \cdot p_{l_{1},l_{2}}\cdot\delta(x_{t-1}^{(p)}=l_{1})\\

& \cdot\beta_{t}^{(l_{1})}(x_{t}^{(l_{1})})\delta(x_{t}^{(l_{1})}=x_{t-1}^{(l_{1})}-1)P(\mbox{obs}_{t}^{(l_{1})}|x_{t}^{(l_{1})})\\

& \cdot\beta_{t}^{(l_{2})}(x_{t}^{(l_{2})})\delta(x_{t}^{(l_{2})}=x_{t-1}^{(l_{2})}+1)P(\mbox{obs}_{t}^{(l_{2})}|x_{t}^{(l_{2})})\\

& \cdot\prod_{l\ne l_{1},l_{2}}\beta_{t}^{(l)}(x_{t}^{(l)})\delta(x_{t}^{(l)}=x_{t-1}^{(l)})P(\mbox{obs}_{t}^{(l)}|x_{t}^{(l)})\\

& \cdot\beta_{t}^{(p)}(x_{t}^{(p)})\delta(x_{t}^{(p)}=l_{2})P(\mbox{obs}_{t}^{(p)}|x_{t}^{(p)})\prod_{p’\ne p}\beta_{t-1}^{(p’)}(x_{t}^{(p’)})\delta(x_{t}^{(p’)}=x_{t-1}^{(p’)})P(\mbox{obs}_{t}^{(p’)}|x_{t}^{(p’)}).

\end{align*}

We can also estimate the probability that we do not have any event in a $\epsilon$ time (very small fraction of time)

The posterior probability of no event and latent states $x_{t-1,t}^{(l)}$ , $x_{t-1,t}^{(p)}$ conditioned on observations is

\begin{align*}

& P(\emptyset,\{x_{t-1,t}^{(l)}:l\},\{x_{t-1,t}^{(p)},p\}|\mbox{obs})\\

& =\prod_{l}\alpha_{t-1}^{(l)}(x_{t-1}^{(l)})\cdot\prod_{p’}\alpha_{t-1}^{(p’)}(x_{t-1}^{(p’)})\\

& \cdot\left(1-\sum_{p’:\mbox{person}}\sum_{\mbox{event }p’l_{1}\to p’l_{2}}p_{l_{1},l_{2}}\cdot\delta(x_{t-1}^{(p’)}=l_{1})\right)\\

& \cdot\prod_{l}\beta_{t}^{(l)}(x_{t}^{(l)})\delta(x_{t}^{(l)}=x_{t-1}^{(l)})P(\mbox{obs}_{t}^{(l)}|x_{t}^{(l)})\cdot\prod_{p’}\beta_{t-1}^{(p’)}(x_{t}^{(p’)})\delta(x_{t}^{(p’)}=x_{t-1}^{(p’)})P(\mbox{obs}_{t}^{(p’)}|x_{t}^{(p’)}).

\end{align*}

The state space is Combinatorial state space and it is big. For example, if we want to model San Francisco we normally have about sixty thousand road segments and if we want the simulation about a city to be realistic we need to have about three hundred thousand vehicles which is approximately 1% of the vehicles running in San Francisco for a whole day. The state space is going to be very big so we have to either use Markov Chain Monte Carlo or Variational method.

There are two ways to make inference using variational method. One is through mean field approximation meaning that the vehicles move around and the road segments change state according to mean field effects. This means that we will project all other things, all other states into just this one individual or one road segment. The problem becomes more tractable.