Data acquisition and preprocesing

Our data consist on over 7 billion calls in three countries during a period of six months. In order to avoid both marketing callers and misdialed numbers, we build a social network with all the links where at least 1 call was placed in each direction. After doing so, we have three social networks that in total include 25 million nodes and over 100 million links.

Apart from the social interactions, we also compute a location for each user in the network. Mobile phones work by transmitting information to nearby towers which will forward the call to the tower nearest to the destination phone. You will probably have seen mobile towers either on top of buildings (like in this picture), or in a structure similar to power lines towers.

Logs of this process allow to locate users near a certain antenna, providing a rough locataion for them. In our experiments we consider users are located at their most used tower during the observation period (could be work, home, or even something else, depending on the user) for France and Portugal. For Spain, users are located according to their billing zipcode (for mapping those zip codes to coordinates we've used the geonames database). Overall, we located our users in more than 26,000 different locations with a spatial resolution <1km in urban areas and <10km out of those.

Anonimization

Our decentralized algorithms allow your browser to compute routes in large social networks, just by querying the server for the friends of a particular node each hop (alternatively computing the optimal distance would take over 10G RAM in most cases). However, if you sniff this app's traffic, you'll find all routes come already calculated from the server. The reason for this is because we need ensure privacy on the records we hold.

To ensure our data remains anonymous, we hash nodes independently in every route, then we use this hash to choose a name and a surname according those popular in their region, choosing gender at random. That's why you are likely to find some Rodrigo Sousas in Lisbon, some Pau Garcías in Barcelona, and some Pierre Bernards in Paris. We also add random noise in locations, altough we always keep the original city of each user.

So...what's real, in the end?. Basically if you find a route that a user A in the city X forwards the message to a user B in the city Y, that means there is a least one person in X who is friend with someone from Y, and anything you can bulid on top of that.