Using Metadata to Find Paul Revere

The state of the universe.
June 10 2013 4:22 PM

Using Metadata to Find Paul Revere

“Social Networke Analysis,” a small encroachment on freedom, identifies terrorists in the Colonies.

(Continued from Page 1)

Rather than relying on tables, we can make a picture of the relationship between the groups, using the number of shared members as an index of the strength of the link between the seditious groups. Here’s what that looks like.

130610_SCI_RelationshipChart1

And, of course, we can also do that for the links between the people, using our 254x254 “Person by Person” table. Here is what that looks like.

130610_SCI_MetadataNewChart

What a nice picture! The analytical engine has arranged everyone neatly, picking out clusters of individuals and also showing both peripheral individuals and—more intriguingly—people who seem to bridge various groups in ways that might perhaps be relevant to national security. Look at that person right in the middle there. Zoom in if you wish. He seems to bridge several groups in an unusual (though perhaps not unique) way. His name is Paul Revere.

Advertisement

Once again, I remind you that I know nothing of Mr Revere, or his conversations, or his habits or beliefs, his writings (if he has any), or his personal life. All I know is this bit of metadata, based on membership in some organizations. And yet my analytical engine, on the basis of absolutely the most elementary of operations in Social Networke Analysis, seems to have picked him out of our 254 names as being of unusual interest. We do not have to stop here, with just a picture. Now that we have used our simple “Person by Event” table to generate a “Person by Person” matrix, we can do things like calculate centrality scores, or figure out whether there are cliques, or investigate other patterns. For example, we could calculate a betweenness centrality measure for everyone in our matrix, which is roughly the number of “shortest paths” between any two people in our network that pass through the person of interest. It is a way of asking, “If I have to get from Person A to Person Z, how likely is it that the quickest way is through Person X?” Here are the top betweenness scores for our list of suspected terrorists:

130610_SCI_Table4

Perhaps I should not say terrorists so rashly. But you can see how tempting it is. Anyway, look—there he is again, this Mr Revere! Very interesting. There are fancier ways to measure importance in a network besides this one. There is something called eigenvector centrality, which my friends in Natural Philosophy tell me is a bit of mathematics unlikely ever to have any practical application in the wider world. You can think of it as a measure of centrality weighted by one’s connection to other central people. Here are our top scorers on that measure:

130610_SCI_Table5

Here our Mr Revere appears to score highly alongside a few other persons of interest. And for one last demonstration, a calculation of Bonacich Power Centrality, another more sophisticated measure. Here the lower score indicates a more central location.

130610_SCI_Table6

And here again, Mr Revere—along with Messrs Urann, Proctor, and Barber—appear towards the top of our list.

So there you have it. From a table of membership in different groups we have gotten a picture of a kind of social networke between individuals, a sense of the degree of connection between organizations, and some strong hints of who the key players are in this world. And all this—all of it!—from the merest sliver of metadata about a single modality of relationship between people. I do not wish to overstep the remit of my memorandum, but I must ask you to imagine what might be possible if we were but able to collect information on very many more people and also synthesize information from different kinds of ties between people! For the simple methods I have described are quite generalizable in these ways, and their capability only becomes more apparent as the size and scope of the information they are given increases. We would not need to know what was being whispered between individuals, only that they were connected in various ways. The analytical engine would do the rest! I daresay the shape of the real structure of social relations would emerge from our calculations gradually, first in outline only, but eventually with ever-increasing clarity and, at last, in beautiful detail—like a great, silent ship coming out of the gray New England fog.

I admit that, in addition to the possibilities for finding something interesting, there may also be the prospect of discovering suggestive but ultimately incorrect or misleading patterns. But I feel this problem would surely be greatly ameliorated by more and better metadata. At the present time, alas, the technology required to automatically collect the required information is beyond our capacity. But I say again, if a mere scribe such as I—one who knows nearly nothing—can use the very simplest of these methods to pick the name of a traitor like Paul Revere from those of two hundred and fifty four other men, using nothing but a list of memberships and a portable calculating engine, then just think what weapons we might wield in the defense of liberty one or two centuries from now.

This article originally appeared on Kieran Healy’s blog.

Read more on Slate about the NSA’s secret snooping programs.