Behavioral modeling, ever finer grained
Not too long ago, NPR’s Morning Edition had me on to discuss the near-term prospects for “behavioral marketing.” Inevitably, in a five-minute radio segment, you’re going to wind up focusing primarily on something ready to hand and easy for the mass audience to wrap their heads around – in this case, Yahoo!’s introduction of so-called “Smart Ads.”
As we know, though, such ads are only a small part of a much larger discussion on the use, wisdom and reliability of behavioral modeling. Here’s the “script” that I ginned up for myself, mostly to clarify my own thoughts on the topic before going on the air and having to represent these issues fairly to a very large, non-specialist audience.
For years, marketing firms like Claritas have organized consumers into ZIP-code-derived clusters. Their argument, more or less, is that “you are where you live” – that where you live is a reasonably accurate predictor of your behavior as a consumer. What kind of beer you buy, what sorts of magazines you read, and so on.
The Claritas clusters themselves make great magazine fodder. They have cutesy names like “Money and Brains,” “Rustic Living,” and “The Affluentials,” and they’ve been pretty widely reported on. In fact, I think at one point there was even a deck of cards featuring the 66 PRIZM clusters.
It hardly needs to be said that these are pretty reductive descriptions. The Claritas description for the “Difficult Times” pattern, for example, describes the residents of such areas as “very low-income families [who] buy video games, dine at fast-food chicken restaurants, and [use] non-prescription cough syrup.” Whether or not this bears any resemblance to any actual American neighborhood, the pernicious thing is that the clients paying good money for the PRIZM dataset certainly believe it does. And they act on that belief. They build their marketing and advertising campaigns around that belief.
And not surprisingly, people tend to buy and consume the things they’re offered. It’s a vicious cycle.
This is all problematic enough already. My own concern is that, as digital information technology pervades more and more of everyday life – as not merely mobile phones but iPods and Nike+ shoes and digital artifacts of all sorts are networked, and transmit a rich variety of data points relating to each person’s location and current activity – we give away enough to build some vastly improved models of personal behavior.
Location in itself, it turns out, is enough to build some pretty interesting pictures with, given a long enough timeline, and, of course, your mobile phone is giving away your position all the time whether you’re using it or not. (To cite just one fairly recent example, the FBI used this method to locate the body of Kelly Nolan, a University of Wisconsin student who went missing last summer.)
Just how full a picture of activity can be built up from such data? I’ve gotten a much better idea of the possibilities from listening to an MIT researcher named Nathan Eagle describe his work. Nathan calls his project “reality mining“: given the ability to install a few lines of Java code on your mobile phone, he claims to be able to reconstruct some pretty high-level phenomena.
Even without knowing anybody’s name at the start, given a large enough mobile data set, he can build very detailed models of social networks and activity patterns. All he’s starting with is anonymous patterns of mobile-phone use, and he can essentially tell you who you are, who you hang out with, what you’re likely to be doing at any given time.
There are two reasons that this is even scarier than it sounds.
The first is that Nathan only has data from your phone. Imagine now that there’s someone in a similar position, but with less benign intentions. And they’re able, in building their models, to draw not merely from the mobile-activity dataset, but from information stored in databases strewn across the Web. Height and weight information, health history information, histories of contributions to political candidates, records of what you’ve downloaded, book purchase and video rental histories – imagine being able to pull all of these sources together into one query, and to build behavioral models on that one query in real time. (I have some friends whose company, Metaweb, is building a “database of databases” that will essentially allow you to do just that.)
And then consider that it doesn’t even take that level of sophistication to tie people back to their online behavior. Here’s an example of what I mean by that. As part of an academic outreach program not too long ago, AOL released a block of 20,000,000 search queries. They had of course very conscientiously scrubbed all potentially identifying details from this stack of requests before releasing them to the academic community. There was ostensibly no way that you’d be able to tie the requests back to any given person. But through the sheer application of good journalistic research practice – time on the phone, lots of good ol’ shoe leather – New York Times reporters were able to correlate and cross-reference these innocent, trivial search strings until in the whole world, there was one best candidate to have produced them. And when they called her up and asked her if she was the originator, she confirmed it. Of course, she was flabbergasted. Who wouldn’t be?
The point I take away from Foucault’s discussion of the Panopticon is that it isn’t necessarily important whether we’re actually under surveillance at all times and in all places or not. No: it’s enough for us to believe that we’re under surveillance at all times and in all places, to internalize this belief, to get us to change our behavior. To be “docilized.” And what Nathan Eagle’s work and the efforts of the Times reporters suggest to me is that any sufficiently interested party even now has access to datasets large enough not merely to model my current behavior to a reasonably high degree of resolution, but to be able to make meaningful predictions about my future choices. And if anything ever was, that’s docilizing.
It’s precisely this that worries me about the next-generation equivalents of the Claritas clusters: their superficial gloss of scientism, empiricism and pinpoint accuracy, and the sense they so easily give us that we are not merely knowable-in-principle, but actually known. Again, my concern is not so much whether “reality mining” à la Eagle actually says anything meaningful about people, but as to whether or not people using it think it does. So I think we’re in for some pretty scary times.