Building a better aggregator: Goals, Tagging

The ScienceBlogging site you see now was always intended to be a temporary solution. What we really need is a site that not only aggregates blog posts, but also allows users to classify them, search them, highlight their favorites, point their friends to them, and do many other things we haven’t even imagined yet.

Behind the scenes, Bora, Anton, Jessica, Mark, and I have been discussing how to do that, but we realized that limiting the discussion to just ourselves is depriving us of a valuable resource: The people who’ll be using and contributing to the new site.

So, over the next few days, I’ll be offering some thoughts about how to proceed and inviting your comments. Our plan is to have at least a partially functional, working prototype of the new site by the ScienceOnline conference in January 2011. Let’s get that started right now by discussing the goals for the site.

Goals
Here are the goals we came up with for the site:

  • To be a central site where scientists, media, other experts, and laypeople see what scientific topics are being discussed on blogs, in real time
  • To be a resource for locating past discussions
  • To promote science blogging and other online discussion of science
  • To promote scientific accuracy and avoid pseudoscience and crackpottery
  • To be encyclopedic and inclusive
  • To be searchable and filterable
  • To have a system (or multiple systems) for highlighting discussions and posts that are especially topical / high quality
  • To have a means of removing or hiding posts that are not scientific (e.g. vacation photos, political rants unrelated to science, etc.)
  • To be multilingual
  • To be open source / open access

Should anything be added, changed, or removed?

Tags
One of the first considerations will be how to keep track of all this information, and a huge key to that will be classifying it. That’s why we think it will be essential to have a unified tagging system in place. If bloggers don’t select their primary tags from a central list, then it will be difficult for users to find posts on the topics that interest them. On the other hand, if bloggers must visit our site to choose primary categories, then usage will suffer. We can allow bloggers to set default tags for their posts using their registration page, but there should be some way to override those settings for individual posts, still using our list of preferred tags.

Could we create a WordPress plugin for this? Or adapt an existing plugin? What about other blogging platforms? What about templates that don’t support tags? One possibility is using a bookmarklet, which would be platform neutral but not ideal. Any other ideas on how to implement a tagging system?

That’s just the first bit — there’s a lot more to discuss, but we thought this would be a good way to get the conversation started. So please, let us know what you think in the comments.

11 comments on “Building a better aggregator: Goals, Tagging

  1. There’s also the issue that some use categories and others use tags, depending on the way a certain blogger has organized his or her blog or on the platform.

    Perhaps there might be a way to “register” with scienceblogging.org (similar to with RB.org) – and the blogger then chooses how he/she wants scienceblogging to aggregate his/her posts: by category, or by tag.

    Then, he/she can assign his/her unique tags to match up with the scienceblogging universal tags. For example, I can take my unique categories “rhesus monkey” “chimpanzee” “bonobo” “cotton-top tamarin” and “howler monkey” and match them to the scienceblogging universal tag “primatology.” That would reduce redundancy on peoples’ own blogs (e.g. if i’ve already got 3 dozen categories, i’d rather not suddenly need 3 dozen more to make it compatible), and would allow some coherence on scienceblogging.org

  2. Dave Munger says:

    That’s a really neat idea, Jason. Does anyone know if there’s a standard XML format for categories? How difficult would it be for a central aggregator to read categories off a blog?

  3. Mary Canady says:

    Hi–great idea. Perhaps this is a big can of worms, but regarding tagging, is there a way to tie into the Linked Data project?

    http://linkeddata.org/

    May be a naive question, I am not a semantic web expert, but unifying science blogging posts with research databases and publications, etc. using the same tags would be incredibly powerful.

    Mary

  4. The more I think about registering blogs with scienceblogging.org, I like it. There could be some measure of control (similar to RB.org) – but also, this way even if the blogger did not want to go through the trouble of matching tags or categories or whatever, he or she could still assign the blog as a whole with one or more categories.

    In this way, if a user wanted to find all the psychology blogs about animals, they could see my blog in a list, with perhaps links to the 3-4 most recent posts.

    But if they wanted to find *posts* about cognition and dogs, they’d have to filter by scienceblogging universal categories.

    Perhaps the way to organize this is to allow the user to filter by “theme” (associated with a blog as a whole) or by tag or category (associated with individual posts). Or both at the same time (“all posts about primates but only from psychology blogs and not from biology blogs”)

  5. Jessica Hekman says:

    Mary: really good point. I don’t know about the others, but I at least hadn’t considered the possible usefulness of this site to connect research blogging posts with the articles that they are about. Obviously, bloggers usually link to the article, but it would be really interesting to go the other way — given an article, what posts are about it? If people use researchblogging.org’s tagging system for marking up the URL of the article, it shouldn’t be too hard to pull out the article’s information when we are crawling a post anyways. Again, not to speak for the group: I think it’s a great idea, but not something we’d be able to get to in the first pass of the site. They can chime in if they disagree with me!

    Jason: Yes, exactly. People should be able to put together their own joint feeds for all posts about psychology blogs on the fly — it would be silly to try to compete with RSS aggregators like Google Reader, of course, but I have been imagining that someone who has just discovered the joys of reading psych blogs could come here and get a dose of all the recent stuff in that area, then decide what they want to subscribe to long term. One obvious application is for MSM reporters who want to know what’s being talked about in a particular area. I had been using the word “category” to mean the theme of a blog, but you are right, that term is already in use to mean something that is used like a tag is used, so I like your “theme” terminology better.

  6. Mary Canady says:

    Hi Jessica–I don’t fully understand what you’re saying, maybe we’re on the same page. Basically what I meant was, if someone’s talking about, for example, MAP Kinase in a post, it would be great if this were interrelated to all other posts on that entity. Using the unified tags from linkeddata.org would seem to be the way.

    Mary

  7. KBHC says:

    “To have a means of removing or hiding posts that are not scientific (e.g. vacation photos, political rants unrelated to science, etc.)”

    This is the only one that worries me. If there was a way for the authors of the posts to do this only, that would be ok. But again, we need to think about how we define scienceblogging. Scienceblogging isn’t the same as researchblogging. There are very important contributions to scienceblogging that, by some definitions, aren’t scientific, like a father blogging about being up all night with his sick daughter and almost missing a grant deadline, or a grad student describing an incidence of sexism during her fieldwork. I don’t want to exclude posts about the life of science or process of science, and so I don’t know that I really want people to be able to hide or remove posts.

  8. Dave Munger says:

    KBHC: I agree, all of this stuff is ultimately related to science, but I still think if someone writes a post about their favorite band, there should be some way for users to filter that kind of thing out—or to display absolutely everything, if they prefer.

    I wonder if there’s a better way to phrase this goal to indicate that we wouldn’t be making a judgment about what people are blogging about, but that we do want to make it possible for users to focus in on the topics they are interested in. Maybe some of the other goals (like “to be searchable and filterable”) cover that and we just don’t need this one.

  9. Jessica Hekman says:

    Dave, Anton, Mark, Bora, and I had some discussion offline. I said this, and Dave suggested I post it here. This is in reference to Jason’s idea about mapping tags from user tags to a central list:

    > …I think it is worth at least thinking about a way to get users
    > to use the same tags. The other problem about mapping tags is that it
    > means we have to stay really involved, and I really think one use of
    > tags is for current conversations (e.g. Pepsigate). I hate the idea of
    > our having to stay on top of brand new tags like that from all over the
    > blogosphere! It would be great if we could find some sort of tool to
    > allow people to come to a consensus. Obviously it would not be perfect,
    > but none of these solutions are perfect.
    >
    > I’ll ruminate on it. Something to let people who are about to tag
    > something ask “what are recent tags? what are tags in these categories?”
    > etc. A Firefox plugin? A WordPress plugin?

    Since I wrote that, I had some thoughts about how such a system would work. I’ll try to write that up and post it in the next day or so.

  10. My hesitation with this is that some users (I assume I’m not the only one) have designed the organization of their categories or tags in a very explicit and methodical way to be useful to the readers. Some users will have several years worth of posts already organized in such a way. To require that users add a handful of new tags in order to tap into the scienceblogging system could be unwieldy. And changing the names of those tags/categories would invalidate hundreds of links to those categories around the blogosophere.

    Even if the use of tags were more for current topics (like Pepsigate) than long-term topics (like primatology), you still run into the problem of matching. Some people use “pepsigate” “pepsi-gate” “pepsipocalypse” “#pepsigate” “scibloxymoron” “#scibloxymoron” etc etc.

  11. Dave Munger says:

    Jason,

    I agree that those are some potential problems. I don’t think anyone is suggesting that bloggers abandon their carefully-crafted categories, though. What we’re talking about is an additional layer of tagging beyond that.

    Generally the tags for v 2.0 would be more general than the specific tags on your blog. So “change blindness,” “representational momentum,” and “color perception” on a blog might all be under “cognitive psychology” on v 2.0. Maybe you’d specify “cognitive psychology” as one of your default tags, so you wouldn’t need to tag your most of your posts at all for them to be properly tagged on v 2.0.

    I think ideally you wouldn’t need to maintain a list of tags on your blog at all — we would provide some sort of plugin or separate web page with a list of suggested tags, and you’d just pick the one or two that applied. If you didn’t see a tag you like, you’d type in your own, and if enough people used that tag, then maybe we could come up with a way to add it to the list.

    I’m not sure what to do about the “pepsigate” “pepsi-gate” “pepsipocalypse” “#pepsigate” “scibloxymoron” “#scibloxymoron” problem, though. Maybe there could be a list of recent tags and a list of most common tags ove the long term, and people could choose from both? Any other ideas?

Leave a Reply