Behavioral modeling, ever finer grained

Not too long ago, NPR’s Morning Edition had me on to discuss the near-term prospects for “behavioral marketing.” Inevitably, in a five-minute radio segment, you’re going to wind up focusing primarily on something ready to hand and easy for the mass audience to wrap their heads around – in this case, Yahoo!’s introduction of so-called “Smart Ads.”

As we know, though, such ads are only a small part of a much larger discussion on the use, wisdom and reliability of behavioral modeling. Here’s the “script” that I ginned up for myself, mostly to clarify my own thoughts on the topic before going on the air and having to represent these issues fairly to a very large, non-specialist audience.

For years, marketing firms like Claritas have organized consumers into ZIP-code-derived clusters. Their argument, more or less, is that “you are where you live” – that where you live is a reasonably accurate predictor of your behavior as a consumer. What kind of beer you buy, what sorts of magazines you read, and so on.

The Claritas clusters themselves make great magazine fodder. They have cutesy names like “Money and Brains,” “Rustic Living,” and “The Affluentials,” and they’ve been pretty widely reported on. In fact, I think at one point there was even a deck of cards featuring the 66 PRIZM clusters.

It hardly needs to be said that these are pretty reductive descriptions. The Claritas description for the “Difficult Times” pattern, for example, describes the residents of such areas as “very low-income families [who] buy video games, dine at fast-food chicken restaurants, and [use] non-prescription cough syrup.” Whether or not this bears any resemblance to any actual American neighborhood, the pernicious thing is that the clients paying good money for the PRIZM dataset certainly believe it does. And they act on that belief. They build their marketing and advertising campaigns around that belief.

And not surprisingly, people tend to buy and consume the things they’re offered. It’s a vicious cycle.

This is all problematic enough already. My own concern is that, as digital information technology pervades more and more of everyday life – as not merely mobile phones but iPods and Nike+ shoes and digital artifacts of all sorts are networked, and transmit a rich variety of data points relating to each person’s location and current activity – we give away enough to build some vastly improved models of personal behavior.

Location in itself, it turns out, is enough to build some pretty interesting pictures with, given a long enough timeline, and, of course, your mobile phone is giving away your position all the time whether you’re using it or not. (To cite just one fairly recent example, the FBI used this method to locate the body of Kelly Nolan, a University of Wisconsin student who went missing last summer.)

Just how full a picture of activity can be built up from such data? I’ve gotten a much better idea of the possibilities from listening to an MIT researcher named Nathan Eagle describe his work. Nathan calls his project “reality mining“: given the ability to install a few lines of Java code on your mobile phone, he claims to be able to reconstruct some pretty high-level phenomena.

Even without knowing anybody’s name at the start, given a large enough mobile data set, he can build very detailed models of social networks and activity patterns. All he’s starting with is anonymous patterns of mobile-phone use, and he can essentially tell you who you are, who you hang out with, what you’re likely to be doing at any given time.

There are two reasons that this is even scarier than it sounds.

The first is that Nathan only has data from your phone. Imagine now that there’s someone in a similar position, but with less benign intentions. And they’re able, in building their models, to draw not merely from the mobile-activity dataset, but from information stored in databases strewn across the Web. Height and weight information, health history information, histories of contributions to political candidates, records of what you’ve downloaded, book purchase and video rental histories – imagine being able to pull all of these sources together into one query, and to build behavioral models on that one query in real time. (I have some friends whose company, Metaweb, is building a “database of databases” that will essentially allow you to do just that.)

And then consider that it doesn’t even take that level of sophistication to tie people back to their online behavior. Here’s an example of what I mean by that. As part of an academic outreach program not too long ago, AOL released a block of 20,000,000 search queries. They had of course very conscientiously scrubbed all potentially identifying details from this stack of requests before releasing them to the academic community. There was ostensibly no way that you’d be able to tie the requests back to any given person. But through the sheer application of good journalistic research practice – time on the phone, lots of good ol’ shoe leather – New York Times reporters were able to correlate and cross-reference these innocent, trivial search strings until in the whole world, there was one best candidate to have produced them. And when they called her up and asked her if she was the originator, she confirmed it. Of course, she was flabbergasted. Who wouldn’t be?

The point I take away from Foucault’s discussion of the Panopticon is that it isn’t necessarily important whether we’re actually under surveillance at all times and in all places or not. No: it’s enough for us to believe that we’re under surveillance at all times and in all places, to internalize this belief, to get us to change our behavior. To be “docilized.” And what Nathan Eagle’s work and the efforts of the Times reporters suggest to me is that any sufficiently interested party even now has access to datasets large enough not merely to model my current behavior to a reasonably high degree of resolution, but to be able to make meaningful predictions about my future choices. And if anything ever was, that’s docilizing.

It’s precisely this that worries me about the next-generation equivalents of the Claritas clusters: their superficial gloss of scientism, empiricism and pinpoint accuracy, and the sense they so easily give us that we are not merely knowable-in-principle, but actually known. Again, my concern is not so much whether “reality mining” à la Eagle actually says anything meaningful about people, but as to whether or not people using it think it does. So I think we’re in for some pretty scary times.

2 responses to “Behavioral modeling, ever finer grained”

  1. Christopher Fahey says :

    Great essay, Adam.

    When the NSA data mining stories came out a few years ago, I remember that the media continually said the information was “anonymous”. I wrote about how this is manifestly false, insofar as connecting an “anonymous” entry to a real name is almost always child’s play (, as the NYTimes investigation showed. Can you imagine how much easier the Times’ story would have been for folks at the NSA who have access to so many other databases than the Times does? Or a marketing company? Or, Christ, a telecom company?

    The problem has several angles: First, the public lacks the technical understanding of how databases work, and how data can be cross-referenced. Second, we lack the imagination to envision how this data might be used to harm us. Even if we are able imagine truly nefarious uses such as rounding up Muslims or emptying bank accounts, we find it a little harder to imagine the less egregious, but still evil, potential abuses of the data, such as bombarding us with junk mail or increasing taxes in certain ZIP codes. Finally, despite the multitude of conspiracy theorists in our country, we generally don’t think that big corporations, or the government, could possibly be so evil (which is kinda true, in a way, insofar as the people doing the deeds don’t think trading your personal data is all that evil, either).

    I’ve always found the PRIZM stuff pretty funny, however. I can’t tell if you’re skeptical of that data’s effectiveness for marketers, but my gut tells me that for the right purposes, it could be really useful. For example, if I run a company selling real estate in Aspen, I would browse the demographic groups and select only those that match my idea of what Aspen real estate buyers look like, and then send expensive brochures only to those Zip codes that include those groups. Similarly, if I am selling bottled Lourdes holy water, trying to influence people to vote for Mitt Romney, or trying to get people to subscribe to a wrestling magazine, I think the PRIZM data could be really useful. Whether or not it’s a good thing to expose demographic segments exclusively to products that fit the stereotype of what those groups would be interested in (i.e., offer The Economist to 10021 but not to 11207) is another question.

Trackbacks / Pingbacks

  1. Adam Crowe - 22 November 2007

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s