Why Finding Bacon?

First of all you maybe have wondered why the name of this app is Finding Bacon, maybe it sounds delicious (or not), but probably it doesn't sound too related to social networks.

The reason is that some years before Facebook or Twitter appeared or mobile phone records started to be analyzed from a network persperctive, a group of scientists analyzed a social network for which a lot of accurate data was available: the actors collaborations gathered by IMDB. With those data a network was build so that two actors were linked if they worked together in at least one movie. Results proved network diameter (how many hops are there at maximun between any two nodes) was short, and the authors published a website named The oracle of Bacon allowing users to find the shortest path between Kevin Bacon and any other Hollywood actor.

As you can imagine now, this app is a tribute to that early project. Times evolved, and while they studied connections between tens of thousands, we're handling tens of millions, which makes crucial to find a path to any node without global information on the network structure.

Six degrees of separation

In the 1960's, social psychologist Stanley Milgram carried out a series of experiments asking participants in Omaha, Nebraska to send a letter a stranger in Boston, Massachusetts [1-2]. The rules of the experiment stated that the letter must be passed to an acquaintance of the participants. In short, Milgram was trying to measure the diameter of the US social network, picking Nebraska as a place geographically and demographically distant from Boston. Surprisingly, a fraction of these letters successfully reached their intended target and did so in a median number of six hops, giving rise to the popular expression six degrees of separation.

This finding was the first evidence that humans build networks with non trivial structure. Although it is true the simplest random graph with N nodes presents a diameter O(logN), social networks are not random at all. For example, if you mention me two of your friends, I can fairly assume they know each other (at least the probability they know each other is orders of magnitude larger than for two random people in the world). This phenomenon can be measured as the fraction of triangles in the network, the so-called clustering coefficient. It's not straightforward to think of a network with a large number of triangles and a logarithmic diameter at the same time. In fact it was not until the end of the last millenium that two scientist, Watts and Strogatz, proposed a model of how a network could have both features simultaneusly [3].

Navigating without a map

Have you ever tried to get to somewhere without a map? It's a rather difficult task, however it is possible. With a general idea we can manage to follow indications. For example, if you're in New York and you're headed Cambridge, MA, it's enough if you know Cambridge is near Boston and you first follow indications to Boston. However, road network was designed having navigability in mind, specifically most of signals point to strategic well-known places.

In same cases, the very structure of the network allows navigability. Think of Manhattan, for example. As you can see in the figure a person (let's name her Alice) may want to go from 55 of 9th to 44 of 5th. To reach there, she has to take a decision in each intersection. The first decision is choosing A, B, or C. As long as Alice can calculate some distance from those intersections to the target (it doesn't need to be accurate, just to know if the distance is longer or shorter that where she's now), she will make a wise choice if she tries to reduce the distance, this is choosing A or B but never C.

However, although this structure allows everyone to find a path to any destination, those paths are not really short. In a NxN grid, the average path length would be N/2 which for a large N is much bigger than O(logN). To overcome this we can employ the solution proposed by Watts and Strogazt previously cited, which would consist of digging some tunnels between two random points in the grid.

In this case, it's obvious than if Alice has a map of the city available, she can dramatically reduce the number of intersections she need to cross to reach her target. Actually it can be proven than by adding a few of these shortcuts in a NxN lattice, the short diameter O(logN) emerges.

But still, if Alice doesn't have a map, and she just tries to reduce her distance to the target step by step, she will never take advantage of the shortcuts. In fact, theoretical results by Kleinberg proved the addition of shortcuts need to be much smarter to allow people without a map to find short paths [4-6]. If we think of a continous model where the probability of adding a shortcut between two intersections separated by a distance r is proportional to 1/rβ for any β>0, results prove that only for β=2 decentralized search (people without map) will be efficient.

If we look back now to the Milgram's experiment, we can agree that even more shocking that the mere existence shortest paths in the US social network is the fact that people were able to find them. This is a remarkable achivement since in this case is quite obvious no one had a general map.

Collective knowledge solves a complex problem

Having theoretically established that only a very particular network structure allows decentralized search strategies to succeed, next question was to gather information on how humans make such good decisions while forwarding messages in Milgram-like experiments. This question was answered in a repetition of the Milgram experiment in 2003 by Dodds, Muhamad and Watts [7]. This time they used e-mail and they also asked participants which was the selection criteria for the next recipient. Next figure summarizes their findings.

Results show people used geographical criteria in the begining, and then switched to other social factors, like field of work, education or family. Later on, these strategies have been tried in real world network built from elecronic communication records. Adamic and Ada tried the performance of different strategies to find a person within the staff of a HP labs in California (a few hundred nodes) [8], while Liben-Nowell et al. tried to reproduce them a national scale but data resolution only allowed them to test if it was possible to reach the right city [9].

Having learned from all these previous experiencies, we have systematically tested in unprecedented scales the performance of different decentralized search strategies, and in the meanwhile discovered very interesting new properties of the social network. A deeper understanding of the human communication on this macrooscopic scale will lead to the improvement of communication system and to more accurate models of information spreading. We invite you to read about our findings in the full version of the paper.

References

  1. S. Milgram, The small world problem, Psychology today 2, 60–67 (1967)
  2. J. Travers, S. Milgram, An experimental study of the small world problem, Sociometry 32, 425–443 (1969)
  3. D. Watts, S. Strogatz, Collective dynamics of “small-world” networks, Nature 393, 440–442 (1998)
  4. J. Kleinberg, in Proceedings of the thirty-second annual ACM symposium on Theory of computing, (2000), pp. 163–170
  5. J. Kleinberg, in Proceedings of the International Congress of Mathematicians: Madrid, August 22-30, 2006: invited lectures, (2006), pp. 1019–1044
  6. J. M. Kleinberg, Navigation in a small world, Nature 406, 845 (2000).
  7. P. S. Dodds, R. Muhamad, D. J. Watts, An experimental study of search in global social networks, Science 301, 827–829 (2003)
  8. L. Adamic, E. Adar, How to search a social network, Social Networks 27, 187–203 (2005)
  9. D. Liben-Nowell, J. Novak, R. Kumar, P. Raghavan, A. Tomkins, Geographic routing in social networks, Proceedings of the National Academy of Sciences of the United States of America 102, 11623–11628 (2005)