Folksonomy for applied analysis and market action

| 2 Comments | No TrackBacks

Folksonomy has almost exclusively been framed in terms of folk classification and retrieval. Folksonomy has rich potential for applied web analytics to improve sales conversion and influence. This applied analytic view can also help information retrieval.

Folksonomy can be thought of as a free form survey where users are asked to tag (classify) web pages (and other information objects) using one or more one word descriptors. An easy example is flickr where people do this all the time with pictures. Another is del.icio.us where people do this with bookmarks. You can use the tags to later recall everything that you gave that tag. See a picture of a baby and tag it as “cute” and “baby”. See a picture of a pretty girl and tag it as “cute” and “girl”. Later, go back and retrieve everything you tagged as “cute”, and you'll get the picture of the baby and the girl.

It's clear to anyone with a marketing or business background that this type of activity could be useful for both market research and market action as Steve Rubel initially suggested over a month ago. However, folksonomy seems to be viewed in the main as a classification tool for individuals or cohesive groups. This view of folkosonomy is currently shaping the types of things that are easily possible with folksonomy tools and impeding the richer uses of folksonomic data that would be desired by marketers or any applied data analyst and could even be of use those wanting to use folksonomy for retrieval. In this post, I'll lay out the “data analytic” view of folksonomic data. I'll then suggest a few changes to how current tools such as del.icio.us and flickr let you access the already collected data that could dramatically increase the data's usefulness both for applied data analysts and those focused on retrieval.

The retrieval view of folksonomy

Thomas Vanderwal, the person who came up with the term folksonomy, chiefly views folksonomy as a personal classification and retrieval system (I tag picture “cute”; I then later retrieve “cute” pictures). In an initial take on the implications of this individual behavior, he implicitly relies on the fact that people in well-established cultures will tend to tag (label) the same well-understood (and in some sense already culturally classified) objects similarly. He uses this assumption to come up with the observation that tags will be distributed according to the power law. Simply put, in a power law distribution there will be one very dominant tag, that a high percentage of people will use, with percentage of people using subsequent tags rapidly dropping off. For example, with a picture of a girl, a very high percentage of people are going to use the term “girl” with decreasingly smaller percentages of people using other tags such as “blond” and “little”. If the power law (with its assumptions) indeed holds, there is hope that the aggregate of everybody's folksonomic classification can become a basis for more formal classification systems used for computerized information retrieval.

The problem with the power law view is that it requires cultural homogeneity in how an object is labeled (hence called “the homogeneity assumption”). The homogeneity assumption is considered by some as a dumbing down of classification to the least common denominator which is both personally repugnant to them and makes unique retrieval difficult. For instance, Bush might be universally tagged as president, but there are lots of presidents. Since there is less agreement on the other tags (for instance a 51% – 49% split on good vs. bad) developing a useful retrieval classification just using folksonomy is bound to be hard.

More importantly, the homogeneity assumption just may not hold in many cases. For instance, what are the tags for Martin Luther King, Malcom X, or Ho Chi Min? There may be little cultural agreement on controversial items. Further, how do you tag new items? Consider the “This Old House” show segment where the experts sit around trying to classify previously unseen household implements. For those who have not seen the show, they often come up with completely different and hilarious conceptualizations. Using folksonomy, controversial and novel items may be unclassifiable because there is not a broad enough consensus on how they should be tagged, even if there are a lot of people tagging them.

[Update: A side conversation with Thomas Vanderwal in email has convinced me that he intends to account for what I just described in the last two paragraphs as part of the long-tail phenomenon, something he mentions in the post I cited. Basically, the idea in the long-tail phenomenon is that over a large enough group, enough people will be using the less frequent tags that they become sizable and worth considering as potential subgroups. If one considers that the long tail may become large enough that there is no initial dominant tag or set of tags to begin with, we are essentially talking about the same thing here. Let me point out that the semantics of these issues have become rather twisted in folksonomy discussions. Most people, when they talk about power law distributions are making the point that some items are winner-take-all. Even when they consider the “long-tail”, they are talking about a distribution where the most frequent items dominate. Personally, I think it is better to just say the power law distribution just does not always hold and focus on the specific cases. But, Thomas is right to point out that this “long-tail” terminology has become rather commonly used, and if you understand its subtleties the way I just described, we're in agreement. Caveat lector.]

The data analytic view to the rescue

In many applied disciplines such as marketing or the behavioral sciences in general, the above situation appears more as an opportunity than an insurmountable problem for general classification. The fact that different portions of the population may perceive the same item very differently means they belong in different segments. I now have a well-defined sub-group that I can uniquely tailor my message to and get a better conversion rate. Profiles developed from the folksonomic data could help me develop the segment's profile and tailor my message. That's a good thing.

A less venal use of the data would be to exploit systematic differences in different groups' tagging to create a dynamic map between the classification systems that could be used for retrieval. I have previously developed that idea here, and it has been taken up here and here.

Finally, information about how different people are tagging items could be ineresting for me if I am trying to get a new idea accepted. In this post, I commented on Richard MacManus's efforts to get the term “Web 2.0” more widely used. He noted that it was not showing up often in the del.icio.us tags (which he painstakingly analyzed by hand). He then noted that certain key people did seem to be using the term. Information about communication patterns, who is looking at whose folksonomy, in a tight group can lead to a plan about where to best leverage your efforts.

How current services need to be fixed

Ideally, one would be able to easily obtain a view of a folksonomy that would allow you to get the different tags that different individuals are applying to the same items (as identified by URL). To be very specific the format would be: item identifier (URL), the individual's identifier (could be an anonymous ID), and the tag. Currently, in del.icio.us, you can get a summary of the tags applied to a URL and the people bookmarking it. To get the data I want, you would need to look at each person individually and see how they tagged the item. This might require hundreds of queries for popular items, a level of use that might get your del.icio.us account deactivated. Flickr apparently does not provide tags except in its html rendition.

Del.icio.us does allow you to retrieve all of the items with a specific tag, a bow to the homogeneity assumption.

Another desired item would be to get information about who is following (subscribing to or just browsing) whose tags. This information is very available to the system operators, but may (and may not) raise privacy concerns. The information would allow me to study the influence patterns in tagging (who notices items first; whose tags tend to be replicated). Even anonymized data would be useful here because we might be able to infer general heuristics (such as “he who tags first is the one whose tags stick”).

Conclusion

From its inception, folksonomy has been almost exclusively framed in terms of folk classification and retrieval. However, some of its foundation assumptions, particularly the “homogeneity assumption” appear not to cover a number of commonly occurring cases. I've tried to propose an analytic view that can help overcome these limitations by making use of additional information contained in the folksonomy.

Use of folksonomy may currently appear limited to the digitally elite. The two most commented on implementations, technorati and del.icio.us, are used by the elite. Flickr, the popular photo sharing site, is a step toward popularization. Recently, Ben Hammersley announced that the Guardian's new Sunday paper site would include folksonomy, and folksonomy is built into The Port's new community blog product. Similarly to Flickr both of these offerings are targeted at a general audience.

As folksonomy reaches the more general population, it has rich potential as an ongoing data source for marketers and information architects alike.

[update: A previous rendition of this post seemed to cast aspersions on Information Architects (IA) in general by labeling the retrieval focus as the IA view and then critiquing it. That was not my conscious intention, so I revised passages leading to that interpretation out.]

No TrackBacks

TrackBack URL: http://thecommunityengine.com/cgi-sys/cgiwrap/fpgibson/thecommunityengine.com/mt/mt-tb.cgi/513

Comments

Entries by Month

About this Entry

This page contains a single entry by Bud published on February 23, 2005 4:06 PM.

Why you should create a link blog now was the previous entry in this blog.

The value of interaction in business communities is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.