Saturday, 5 March 2005

Technorati tags: "related tags", tag spamming, etc






[Added 10 April 2005: It looks like Technorati have officially launched their related tags now, see this post.]

Related Tags

I've just noticed that Technorati have now introduced a "Related Tags" feature, though I can't yet find anything on their site mentioning it.

After you search Technorati for a tag (for how, see e.g. my intro to Technorati tags), in your search results under "X posts from Y blogs match this tag" there's now a line that says:
Related Tags: a, b, c, d
(where a, b, c, and d are just clickable links to similar tags).

I mentioned the difficulty of finding tags on related subjects towards the end of my previous post (in the "Any downside..." section) - e.g. a search for "Humour" won't find things tagged with "Humor"; and I made the point that in my view Technorati needs some kind of thesaurus of synonyms or the like. It's good to know they've obviously been thinking along similar lines.

What I'd really be interested to find out is, how are those lists produced? Is there a Technorati human-maintained "thesaurus" of synonyms behind the scenes which gets looked up when you search for a tag? Or are Technorati running some whizzy software which looks at how people tag their own blogs, pics and bookmarks, and decides that if one person tags the item post with a, b, c and d, then those tags must be related - and then it takes into account what tags other people are using as associated tags for their own items too, to come up with some kind of average weighted by "popular vote" as to which tags are related to each other?

I suspect it may be something like the latter, because the list of related tags is variable. If you have an idle moment or two, it can be fun doing random searches and seeing what comes up as related tags.

For instance, when searching for the tag "Funny", the related tags are said to be "News, Politics, random, Humor, Humour, Pictures, Blog, Music, Web, Gaming." I kinda like the association between "Politics" and "Funny"!

What's even more interesting, if you search for one of the tags in the "Related Tags" list, you won't necessarily come up with (as you might expect with a pure thesaurus lookup system) the other words that were on the same list.

So, the tag for "Categories" gives you as related tags Tags, wordpress, Coding, Plugins, Wanted, Weblog Technology, PHP. But the tag for "Tags" brings up as related tags, not "Categories", wordpress, Coding, etc - but technorati, Del.icio.us, folksonomy, Blogging, Blog, pivot, Taggerati.

It's all early days still, of course, but if I'm right in my guess as to how Technorati are coming up with those related tags lists, this may be the start of possibly the next stage of development in folksonomies and tagging - namely, synonyms generated automatically by looking at what people consider to be synonyms, weighted according to the number of people who make the same word associations (or maybe combined with some other kind of weighting, who knows?). Fascinating stuff.

Tagging old posts; tag spamming

While I'm on Technorati, just a warning note: I've still not heard back from Technorati on why my original attempts at using their tags failed. But from correspondence about the possibility of tagging old posts (if I have the energy and time!), and how to then get those tagged posts re-indexed, I've found out that (as you'd expect) Technorati do have some kind of mechanism in place to pick out possible tag spammers, or at least link spammers.

And while I don't know how that works, if you have lots of Technorati tags on the same page (e.g. your main blog page), you do run the risk of being tagged (if you'll forgive the pun) as a link spammer.

Now I know Technorati can't give away the details of how they suss out link spam, or else the spammers could use that to circumvent their system, but all the same I personally would like some rough guidelines from them as to what we legit bloggers should or shouldn't do, in very general terms, to avoid being considered spammers.

The only thing I know for sure is, the fewer tags per page, the less likely you are to be blacklisted.

Which is not very good news for people like me who, trying to get around the "lack of synonyms" issue, tag with singular and plural variations of the same word plus related words in order to increase the likelihood of people finding the right information.

Old posts - final note: do NOT try to tag old posts and then ping Technorati with the exact URL of each post or your archive directory, in an attempt to get them to re-index your newly-tagged old posts. They say that that will mess up your blog listing on Technorati as their spider may then think each old post page is a separate blog...

Technorati Tags: , , , , , , , , , ,, , , , , , , , , , , , , , ,

No comments: