Tagging strategies

Dave’s earlier posts sparked some good conversation about tagging. Here is my proposal for how tagging could work on the new version of the site. This proposal isn’t necessarily what we will do; I’m putting it out there to get feedback from the community about whether it’s the right approach.

First, an overview. There are two ways to approach tagging:

  • Folksonomy: all the users use their own tagging schemes. There are tools to let users discover tags already in use.
  • Ontology: the owners of the site describe exactly what tags people can use, and expect people to use them.

Our goals are also twofold:

  • To help  readers of science blogs more easily find the content they are looking for, and
  • To do so without imposing constraints on the authors of science blogs

I believe that folksonomies are the best solution to the above dilemma: they impose no constraints on authors; and, if things are done right, hopefully many of the tags will start to come together. My suspicion is that if we specified a strict list of tags, users would not want to use them.

But how to make the folksonomy chaos into something useful? We will maintain adatabase of tags. Each tag’s entry in the database will have (at a minimum — this can be expanded later):

  • Name of tag (e.g., “tamarin”)
  • List of synonymous tags (“Saguinus”, maybe “tamarind” if we want to support common mistakes)
  • List of children tags (“cotton top tamarind”, “cotton top”, “Saguinus oedipus”, etc — may be very long)
  • List of parent tags (“New World monkeys” — may be multiple)

Bloggers may tag with any of the synonymous tags. Let’s say we do decide to support mistakes. Someone may tag “tamarin” or “tamarind”. Those are different tags, but our system understands that they are synonymous.

Someone searches for “tamarin.” They get a list of posts tagged with either “tamarin” or any of the synonymous tags (so “tamarind” or “Saguinus”).

So what are some problems which might arise?

What if one tag is used for two entirely separate things?

A physics blogger uses “charm” to describe a kind of quark. An anthropologist uses “charm” to describe something used medicinally by a tribe of primitive people. A user searching for “charm” will get both.

I submit that this isn’t a huge problem. It isn’t going to happen all that often. When it does, in almost all cases, the user will be able to refine their search to say “I am only interested in ‘charm’ tags used on blogs with a ‘physics’ theme.” It will be annoying to the people who want to see what the parent/children tags are for “charm,” because they’ll get a weird mix of physics and anthropology subjects. But I think it is not going to happen often enough to really be annoying (and it is better than the alternative of trying too hard to control things).

Sounds like a lot of work to input parent/children/synonym relationships!

Yes. We will have to start with no relationships at all — just a big flat list of tags. Eventually, each subject area will have one or more curators who help manage it. Part of their jobs may be to input relationships for tags in their areas. We will have to make a user interface to make this very easy. Perhaps we will build a user interface to allow users to suggest the addition of new relationships, as well.

The point is that we can do this very gradually. The system will start working immediately, and then be improved with time.

What about brand new tags (“pepsi-gate” vs “pepsigate”)? How can curators possibly keep up with that?

In that case, I believe that the crowd will start to converge, if a) we provide incentives to use the same tags — “if you use the most popular tags, your post will be more discoverable and you’ll get more readers” — and b) we make it very easy for bloggers to find out what the relevant tags are.

Of course, we will provide a list of available tags, organized for readability once we have parent/child relationships. Additionally, we will need a tool to provide tagging suggestions to bloggers while they are writing blog posts. Again, that can be something to do a little ways down the road.

We can also provide a page on the site which offers lists of the currently most popular tags, maybe even the most popular new tags. If it’s clear to someone that they are about to browse “pepsigate” posts, then if they want to write a followup, they are likely to remember that that’s the tag they are responding to, and tag their post appropriately.

Won’t this list of tags become so long that any tool which auto-suggests tags to users will become too slow to use?

This problem can be at least partly alleviated by letting users specify that they are only interested in tag suggestions from particular categories. Once parent/child relationships are in place in the tag database, tag suggestions can be filtered that way. We can also learn from other tools that offer auto-complete over large spaces to see how they solve this problem.

Have folksonomies been successfully used in the past? What are good examples?

Obviously, Flickr is the best example of a site which has completely user-generated tagging. Their mission is somewhat different from ours, though! Do you have examples of folksonomies that work or that have failed?

This post is intended to start discussion, so please, weigh in! What do you think about this approach to handling the huge number and variety of tags in use on science blogs? Is it clear, and do you have questions?

10 comments on “Tagging strategies

  1. I think I’m missing a major point – are these tags applied by the authors when they’re writing their original posts or can the community tag posts that are aggregated in the system? Both?

    I like the idea of allowing all-comers for tags but also having gardeners to mark relationships. I guess the proof is in the implementation and trial.

  2. Jessica Hekman says:

    The primary intention is for these posts to be tagged by their authors when they are writing them. Community tagging might be something we’d address at some point, and I would think that this proposal would support that as well, but it isn’t what I was intending to discuss.

    I, too, am wondering about how this would shake out in the Real World. I guess you never know for sure until you try!

  3. Dave Munger says:

    Thanks for writing this up, Jessica.

    I guess a couple of folksonomies that I think *sort of* work but don’t do a great job are Twitter and Slashdot. Sometimes Twitter works great, when a hashtag catches on, but they’re inconsistently applied, and some of the best posts about conferences end up not getting tagged.

    On Slashdot, I believe any user can add a tag to any story, and some of them are pretty good, but if you click on a tag like, say “energy,” it seems to me that maybe half the posts that come up are relevant. It makes me wonder whether I’m missing a lot of relevant posts that weren’t properly tagged as well.

    I like the idea of curators establishing relationships between tags. It does make me wonder if we at least want to come up with a set of basic parent tags — maybe 50 or so that are the highest-level tags possible, which are accessible in a list somewhere, so readers know where to start when looking for information — I find tag clouds to be very limiting in this regard. I’m not necessarily looking for the most popular tag, I’m looking for a particular level of tag that encompasses many others. Does that make sense?

    It’s possible that this list of highest-level tags could still be generated via some sort of folksonomy, but I guess what I’m looking for is for some way for readers to easily grasp the structure of the information. Maybe we could come up with an entirely new graphical way to show the relationships between tags.

    Also, see Don Sawtelle’s comment on my post about curation for more ideas about tagging.

  4. Jessica Hekman says:

    Yes, I think having some high level tags to provide structure is a great idea. It seems like if we are going to provide a list of “themes” for people to choose from to categorize their blogs (I am trying not to use the word “category” since it sometimes means the same thing as “tag”!), those “themes” could also serve as high level groupings for the tag cloud.

    WinnowTag is intriguing! It sounds like it could help a lot with providing some structure (or at least tools to let people identify tags/posts). I think that’s definitely worth checking out when we get to that point.

  5. I agree that some superordinate parent list (perhaps the very same list as the “themes”) ought to be created from the top-down, and the “children” tags could be attached to them from the bottom-up. Perhaps in the same way that we have area-specific “editors” at RB.org, we could implement a set of some 20 area-specific “curators” here, to facilitate that process. Perhaps after the first year, the tag-taxonomy (tagsonomy?) would be stable enough that it could be maintained by a smaller group of individuals (e.g. fewer than 10).

  6. Jessica Hekman says:

    Yes. We have already been discussing the usefulness of having curators, and hoping people will be interested in that! They might do other things besides wrangle tags — like pick out posts to highlight, filter out spam, generally do work to keep their area lively.

  7. Dave Munger says:

    Jessica: One more concern about tagging. What do we do when (inevitably), some users don’t tag their posts? We could use categories as tags, but that may not work very well because people use categories in very different ways — I think Mark said there were something like 15,000 different categories on his Science 3.0 aggregator.

    We could also assign (or allow users to assign) default tags to blogs (e.g. tag Pharyngula as “biology” and/or “atheism”). But obviously that’s not as helpful as post-specific tags. Maybe if a post was only default-tagged it would display lower in a set of results? Maybe we could subtract a day from the date if a post was default-tagged instead of user-tagged. There could be a similar (smaller?) penalty if a post was category-tagged.

    There’s also the possibility of allowing users to tag posts, but I think that’s probably not going to be a high-priority feature.

  8. Jessica Hekman says:

    I want to make sure we have our terms straight before answering this (because I think I am misusing the term “category”). A “tag” is something that you use to mark a subject for a particular post. Is a “category” not used exactly the same way? Can you give an example of when someone would “categorize” a blog post in a way that wouldn’t be like a tag?

    I think that if a blog really has no tags (be they “tags” or “categories”) then posts to that blog will just not come up when someone does a tag search. That doesn’t mean that that blog is inaccessible. It will show up when people browse through blogs by theme — so in your example, someone may want to look for all “atheist” blogs. But if they search for specific posts tagged “atheist,” and that “atheist” blog used no kinds of tagging on its individual posts, no posts would show up.

    Does that make sense? And is it a big problem? Actually, I would be curious to know what other people think, too. Can people give examples of science blogs that don’t use tags (or categories)?

    I feel that tagging is an important way to communicate what a post is about in a machine-readable way. My hope is that if we make tags more useful than they currently are, more people will use them, and this problem will be solved. I could be wrong, but as I keep saying, you never know until you try. I am willing to be told that this is too big a problem to just set aside like that, though!

  9. Dave Munger says:

    To me the difference between a tag and a category is that a category is blog-specific, while a tag is not — it’s designed for a much wider audience. So while I, on my blog, might want to differentiate between color perception and face perception, in a broader context, it’s all perception. Or maybe it’s all just psychology.

    A category is a way for individual bloggers divide their blog into manageable chunks. A tag is a way a blogger describes their blog’s content to the rest of the world.

  10. Jessica Hekman says:

    OK, makes sense. So your original question was “if a user doesn’t use tags, can we use categories instead?” I think we had originally been thinking yes, we would use categories. Maybe we need to do some work to research how people are using categories and make the decision about whether they make sense to consider interchangeably with tags.

    At any rate it should not be hard to add categories later, if we want to start with just tags and see how far that gets us.

Leave a Reply