Friday, 12 January 2007

Technorati bug: multiple word tags - should you be tagging differently?






[TWICE UPDATED: important, see the end]

Either Technorati's tagging system or their tag pages (and tag page searches) can't handle multiple word tags properly.

Is this a bug?

They should be able to cope with multiple-word tags, i.e. tags that consist of more than one word (like "A Consuming Experience").

Their tags help page says:
"Please note that two word tags should be joined by a "+". For example:
<a href="http://technorati.com/tag/[tagname]" rel="tag">[tagname]</a>
<a href="http://technorati.com/tag/[tagname]+[tagname]" rel="tag">[tagname tagname]</a>
<a href="http://technorati.com/tag/global+warming" rel="tag">global warming</a>"

The official page on the rel-tags microformat also confirms this.

So far, so good. (And the Magical Sheep tagger for Blogger users therefore follows that format in constructing multiple word tags).

But it doesn't actually work properly when you go to Technorati's tag pages, or try to search for tags.

Let's take an example.

If you want to tag a post with "Brian Robertson" you would use the tag <a href="http://technorati.com/tag/Brian+Robertson" rel="tag">Brian Robertson</a>

Go to the tag page you get on clicking on that tag: (screenshot below, click on the pic to enlarge it, but click that link if you want).

At the time of writing anyway, you'll see the Technorati tag page says there are 7 posts with that tag. It's exactly the same webpage you'd get if you did a search in tags on Technorati and entered as your search term Brian Robertson.


Now, search in tags on Technorati for "Brian Robertson", and this time put quotes around the words:


You only get 4 posts this time. Look at the address bar, and it says http://www.technorati.com/tag/%22Brian+Robertson%22 (%22 is just what quotation marks get turned into in URLs).

What on earth's going on? You should be getting exactly the same number of posts, indeed exactly the same posts, returned in both cases - i.e. all the posts that have been tagged Brian Robertson, quotes or no quotes.

The answer is, it's down to how Technorati have set up their search syntax.

Yep. You guessed it. Their system is such that if you search for:
Brian Robertson
or
Brian+Robertson

it does exactly the same thing, which is an "AND" search for Brian AND Robertson.

In other words, it finds all posts which have a tag with the word Brian in it AND which also have a tag with the word Robertson in it. Here's a prime example: it's been tagged with "brian urlacher", and also with "tyna robertson", and that's why it appeared on that tag page.

That's no good of course for anyone who wants to find only posts that have been tagged with the full phrase Brian Robertson (rather than posts tagged "brian urlacher" and "tyna robertson". or indeed - another real example - with "brian springer" and "pat robertson").

If you want to find a multiple word tag exactly as it is, you will have to insert quotes around the phrase you're searching for, i.e. use "Brian Robertson" with the quotation marks - or else you'll get extra hits that aren't really true hits at all.

Tag searching

For tag searchers, the moral is this:

When you're doing a tag search on Technorati or you're just entering the appropriate URL to go straight to the tag page you want, and you are interested in a multi-word tag, you must make sure you put quotes around the search words (or %22 in place of quotes in the URL, though I think quotes entered in the address bar of common browsers do get translated properly).

That's the only way to find exactly what you're looking for. Otherwise it may get confusing!

Bloggers and tagging

But what if you're tagging your posts? What should you do?

Well you've been doing nothing wrong in using the + symbol to separate tags that consist of two or more words. That is after all what both Technorati and the rel="tag" microformat spec say. It's absolutely correct to do that.

I think the ideal solution really is that Technorati should fix their searching syntax so that a multiple word search term entered with spaces between is automatically interpreted as a phrase search with quotes. And that the + symbol between words in URLs going to their tag pages, as well as in their tag searches, should be treated in the same way.

In other words (see my previous post on the search syntax) they should sort it out so that searches (and corresponding URLs) for
A B
and for
A+B
are treated in the same way exactly as a search for
"A B"
is at the moment.

And, for those who do want to be able to do proper "AND" searches, they should introduce the ability to use search terms like
A AND B
and ensure that using that syntax will perform that function.

Are Technorati really going to do that, when after nearly 2 years they still haven't even fixed the bug where properly-tagged posts don't show up on their tag pages at all? (which I've been banging on about for ages, as Zo and others have noticed!)

I doubt it, somehow.

So, should we be tagging our posts with %22 for multiple word tags? E.g. using the format (I've put the extra bits in bold):
<a href="http://technorati.com/tag/%22Brian+Robertson%22" rel="tag">Brian Robertson</a>

UPDATED AGAIN: That ain't what the rel-tag microformat says to do, and the answer is, NO, you should NOT do it if you want your posts to be tagged properly. Sure, on Technorati anyway, it will enable people who click on the tag to find the exact tag pages they want, listing posts with the exact words appearing in the same tag in the exact same order in which they appear. Not some random posts where the words in the tag happen to appear in different tags in those posts. But really, what's the point of that if your post doesn't appear on Technorati's at all, because the use of quotes is not recognised by the microformat?

Putting quotes in tags will muck up how your posts appear on the tag pages or tag searches of Technorati as well as other blogosphere search engines. This post has been crawled by Technorati yet its tags haven't been picked up by Technorati AT ALL. However, a repeat post of this post, which was virtually the same (with a different intro ("[Deliberate repost..."and some minor text edits) BUT, crucially, which had NO quotes around the tags, DID get picked up on Technorati's tag pages - see this screenshot:


So DON'T try using quotes round your tags or they won't get picked up properly as tags (which makes sense, as per the rel-tag spec which doesn't require or allow for quotes round multiple word tags). Basically it looks like Technorati should be fixing their searching in the way I suggested above, or something similar, and we'll all have to live with multiple word tag links on Technorati doing "and" searches on tags instead of a proper phrase search in the meantime. But I wouldn't hold my breath.

(With thanks to Rev. Brian Robertson for spotting this issue, and yes I've deliberately used his name for the examples in honour of his discovery!)

4 comments:

Ashish Mohta said...

I wont call it a bug.Google search engines work differently while technorati works on single word.So making it as formed url is the only option....thats what you showed.
We have to understand how it works rather than saying it is a bug

Improbulus said...

Thanks for the comment Ashish.

To be honest I deliberately used "bug" in the title to try to draw Technorati's attention and hopefully get them to read this post.

My point is that SOMEthing is wrong. After further investigation (see the start of this post) it's evident that the problem is how Technorati have set up their tag pages and tag searches. If they change things so their searching works as suggested in my post, then the problem is solved, for everyone.

And I've found that adding the quotes does muck up your post getting tagged on Technorati and elsewhere (as I suspected might be a possibility). So no one should be doing that.

Unless Technorati fix their tag searching, better to get tagged properly and live with picking up irrelevant posts with "and" rather than "exact phrase" tags, than not to get tagged at all.

Dave Mitchell said...

Ashish,

Thanks for the post. That is great.

Dave

David Bruce said...

I just found this... googled it actually.

The date of this post is almost exactly a year ago: any updates since then?

I want to tag my blog posts and want them to show up in technorati

so am I to understand that we should be added a plus sign to connect more than one tag?

Frederick+Internet+Advertising

and not

"Frederick Internet Advertising"

is this still true?

Great post BTW
I'll be subscribing
got a feedburner email delivery to go with this great info?