On non-convergent folksonomies
Inspired by Nurri’s Insa talk a few weeks back, I’ve been giving renewed thought to issues of classification, categorization, taxonomy and sorting of late. I know it’s not everybody’s cup of tea, but it’s still a question of enduring interest for me, personally and occasionally even professionally: how do people decide what category a given object belongs to? What constitutes centrality to a category? And why are some things still more than others “boundary objects” that seem to different observers to have entirely divergent characteristics or essences?
These are intractable questions even when applied to discrete material objects, but they spiral dizzyingly out of control when the “object” in question in something squishy…like, say, a blog. If you want an illustration of this, there’s no better place to turn than the social bookmarking service del.icio.us. Here’s a perfect example, one that’ll likely be familiar enough to you that you’ll grok my point easily, whatever your interest or lack of same in broader questions of taxonomy:
All of the following 386 people bookmarked what is essentially the same page to del.icio.us, the home page of my old v-2.org site:
– 42 who bookmarked
– 212 who bookmarked
– 67 who bookmarked
– and 65 who bookmarked
You’d think that with fairly robust samples to work from, all groups would agree what the site was about. This turns out to be largely the case:
– the first group thought the site was about
design, architecture, usability;
– the second (and by far the largest) group thought it concerned
design, blog, usability;
– the third and fourth both characterize it using the words
design, architecture, blog.
There’s obviously a high degree of overlap here: “design” appears atop all four lists, and, indeed, the site’s creator regards that as a perfectly accurate description. But what has always fascinated me, especially with such relatively generous sample sizes, is that there should be any variation at all between groups. Descriptors
theory appear in some but not all of the groups, and there are even outliers like
web2.0, which appear under only one heading. Why is it that all six people who (bizarrely, in my personal opinion) tagged the site
web2.0 chose to save the
http://v-2.org/ URL? What accounts for this?
Time seems to play some role. Bookmarks for
http://www.v-2.org/ (the group of 212) go back a full eighteen months earlier than any of the other variants, all of which start up in April or May of ’04. (This is another mystery to me, because as far as I’m aware all of those URLs returned the identical home page for the entire period under consideration.) The earlier bookmarkers were far more likely to characterize the site as a
blog or as relating to
IA, and that seems to make sense – I talked about those things a lot more back in the day. So a logical first-pass guess might be that bookmarkers on del.icio.us are accurately tracking the site’s content as it changed over time.
On the other hand, though, none of the cohort who bookmarked
http://v-2.org/index.php (i.e. preponderantly, more recently) thought the site had to do with
everyware, during a period when it began to focus on just that – and not a single one of them was apparently
inspired either, despite the appearance of that descriptor under two of the other headings, once fairly close to the top. There’s no logic that I could discern that might account for this.
This is just a single example, but you can find other good ones if you poke around del.icio.us some. In my ignorance, I almost want to assert this as a general, if loose, principle of all such bottom-up taxonomies: there is something operating that looks an awful lot like sensitive dependence on initial conditions, and different subgroups of a larger cohort – though apparently homogenous in composition – will eventually diverge significantly in their characterization of a given object.
People who know a lot more than me about classification and taxonomy are, of course, invited to blow this pet theory of mine to tiny little chunks. I’m still curious, though, what might produce such a divergent spread of descriptions for what remained essentially the same object at all times in question.