Tuesday, 28 March 2006

Copyright: Creative Commons licences effective in court






If you use a Creative Commons copyright licence for your blog, like I do, you may be pleased to hear that various courts have recently recognised CC licences. You can see the "Some rights reserved" notice and CC logo at the bottom of each of my blog pages.

Amsterdam, Netherlands court upholds Flickr photos CC licence

Blogger and podcaster Adam Curry posted photos on the popular photo-sharing website Flickr under a Creative Commons Attribution-Noncommercial-Sharealike license (the most common one, I think - it's certainly what I use for my own blog). That means non-commercial use of those photos is allowed as long as the author is credited. A Dutch tabloid however published a story about his children which reproduced some of his Flickr photos without his consent, and when he sued for breach of copyright and privacy infringement in the Netherlands, he won on the copyright front.

The judge didn't think much of the tabloid's argument that because the Flickr website said "this photo is public" they thought it was OK to use the photos without permisson, and they hadn't bothered checking the link to the CC licence conditions because they claimed it wasn't obvious. The judge said all they had to do was click on the symbol by the "Some rights reserved" notice and then they'd have seen a summary of the license, and if they weren't sure about it they should have asked his permission. (Via Out-law; that article has links at the end to his blog post and CC Canada's fuller info on the case plus the Dutch judgement too. The main CC blog also has a post about this.)

So, if you have photos anywhere - could be on Flickr, could be on your own blog etc - which you've licensed under a clear CC licence, you can rest easy that in the Netherlands at least your rights will be enforced (and there's hopefully no reason why courts elsewhere shouldn't enforce them either).

Spain - Spanish court says CC-licensed music isn't subject to collecting society fees

Then when the Spanish collecting society Sociedad General de Autores y Editores sued the owner of a bar in Badajoz, Spain which played CC-licensed music, claiming licence fees for public performance of music managed by the collecting society, the society lost.

A bit strange that they'd sue, given that (as the owner successfully showed the judge) the music he was using wasn't managed by the society - the music performed was licensed under CC licences which allowed "public display". (Via Creative Commons blog which links to the Spanish judgement and says "This case shows that there is more music that can be enjoyed and played publicly than that which is managed by the collecting societies".)

So songwriters and composers who want more exposure for their music can also use CC licences. However of course in this case presumably they didn't get any money direct for their music being played in the Spanish bar because the CC licences they used didn't require payment in that sort of situation (sorry, I'm not sure exactly what variety of licence was used in that case, my Spanish is limited to Si! Or is that Italian...?).

As you'll know more and more CC-licensed music is being played on podcasts e.g. by Uwe Hermann or made available for download, so it should be perfectly legal to listen to those podcasts and download those MP3s without risking record companies and the like trying to sue you, your children or your granny.

I plan to post more about copyright and your blog at some point.


Technorati Tags: , , , , , , , , , , , ,

Monday, 27 March 2006

Next blog queue traffic theft: Blogger wake up, it's nextsplog abuse






It's bad enough that the Net is too full of spam blogs or splogs, often hosted on Google's own free Blogspot (though we can now flag most of them one way or another in order to report them to Blogger). Supposedly Blogger will delete splogs if enough people flag them, but reportedly they've been slow to do that, even with very obvious splogs.

Now, yet another iniquitous way to abuse Blogger's system has arisen, which involves stealing traffic away from bloggers who use Blogspot (yes, that could include you if your blog is on Blogspot) - but oddly enough Blogger don't seem to have picked up on it, or as it's been reported to them, maybe they don't care?

The nextsplog traffic stealing spam or scam - abuse of the Next Blog queue

I'm talking about Nextsplog "spam", the automated diversion or hijacking of the "Next blog" queue, which someone very sharp called "Nextsplog" (who invented the term "Nextsplog") has spotted and documented on his or her blog Nextsplog.blogspot.com, as mentioned on the Blogger forum.

Basically, the aim of this form of misuse of Blogger is to fool systems into thinking that the blog concerned is constantly being updated, even though it's not a real update but some nonsense text which is automatically posted (presumably by a script) every say 30 seconds, thus pinging weblogs.com and updating the blog's newsfeed. (By "systems" I mean Blogger, weblogs.com and blog search engines like Technorati and Google's Blog Search, and indeed all other software systems, engines or bots which rely on weblogs.com and the like or blog newsfeeds to tell them that a blog has been updated.)

Well, so what, you may think? In itself that doesn't do much, granted, but the first element of the trickery involved is this: the blog which is constantly updated contains nothing of substance itself, it's what Nextsplog calls a "shell" blog, but what it does (if Javascript is turned on in the visitor's browser) is to redirect visitors to another blog with a very similar URL (so you probably won't notice the sneaky redirect, and you won't see the nonsense text either, unless you turn off Javascript or you look at the nextsplog's newsfeed direct - it'll be the blog's URL with atom.xml tacked on at the end). It is that other, second, blog which contains some real content and, in particular, Google's Adsense or other ads.

The second element is this. You know the Next Blog button in the Blogger navbar, the bar which you're supposed to have along the top of your blog pages if your blog is hosted on Blogger's free Blogspot? Do you know how Blogger's system decides where to take you to when you click on the Next Blog button? Well, it picks the destination blog randomly from, you guessed it, the most recently updated blogs. If a blog appears to be constantly updated every 30 seconds (even if it isn't really), then it will be kept near the top of the Next Blog queue, and Next Blog surfers will regularly be taken to that blog - and thence redirected to the "real" blog.

The scammer identified by Nextsplog seems to have set up several "matched pairs" of this kind on Blogspot: one's a fake blog that's constantly being updated, the other's a real blog which gets the benefit of the Next Blog traffic redirected from the fake blog.

For full technical details see the clear explanation at Nextsplog.blogspot.com, which also gives the URLs of the culprit nextsplogs so you can see it all in action for yourself.

The latest news is that, possibly because Nextsplog has spotted this scam, the shell blogs have since the first report by Nextsplog begun trying to hide what they're doing - if someone goes to a particular shell the first time they'll be redirected to the real blog of the matched pair, but it sneakily adds a cookie to their computer that lasts for 1.5 days, so if the same person happens to come back later and get the same shell via the Next Blog button, it reads from the cookie that they've already been there recently, and redirects them to a random next blog instead of the paired blog. That way, people aren't likely to notice what's going on as the same person won't keep being redirected to the same blog and wonder what's up.

What's wrong with that?

Many would call this sort of thing traffic stealing. It's a trick which diverts Web traffic - heaps of traffic - to the second blog of the pair (in the matched pair instance), away from Blogspot blogs which do properly belong in the Next Blog queue because they've genuinely been updated by the posting of real content of substance.

It's puzzling that Blogger's system doesn't seem capable of catching this sort of thing - why doesn't the updating of the same blog like clockwork every 30 seconds or whatever not sound alarm bells with them?

While the extra calls on Blogger/Blogspot resources through the constant updating may seem relatively minor given that every second there are tons of blogs on Blogspot which are being updated, it's still a misuse of their resources and the Next Blog queue, and surely it must be against Blogger's TOS or terms of service - those blogs are getting a helluva lot more traffic than they would have from sheer merit alone.

The fake blogs ought to be deleted as an obvious abuse of the TOS and, if there's any justice, even though the second blogs of the pairs are apparently legit and don't breach the TOS in themselves, surely their owner has breached the TOS by setting up those matched pairs, and they ought to be deleted as a deterrent against abuses of this kind.

The point of all this is presumably money - all that extra traffic means lots more clicks on the ads displayed on the second blogs, through sheer weight of numbers, from people who wouldn't normally visit those blogs or even be sent there by the Next Blog mechanism. A nice little earner for the person running those matched pairs. And, of course, a nice little earner for Google too, as they get a cut of what Adsense advertisers pay: is that why they've not done anything about it?

It may not be click fraud in the strict sense of someone doing automated clicking on their own ads, but click fraud is a pretty hot topic these days, and if Google care about their reputation then they should realise this is pretty close to the line if not over it - and they ought to do something about it. Even if they don't care about the click "fraud", don't they care about the abuse of Blogger resources? If they don't quickly and firmly scotch this sort of thing, then everyone will be jumping on the nextsplog bandwagon just so their blogs don't get "left behind", and the end result might be the same as before the scam started - a level playing field for most blogs, only with a lot more load on Blogger's servers (and indeed ping services), quite unnecessarily. Surely Google can't want that?

What can you do?

If you're a Blogspot user yourself, you may well be annoyed that this trick is diverting away people who might otherwise be sent to your blog and indeed your ads (and Nextsplog seems particularly incensed that the person responsible for this scam rather hypocritically claims to be against spam and purports to provide anti-spam advice on his blog!). Even if you don't use Blogspot, you may still be concerned about the blatant traffic stealing - you may well feel that visitors should be going to a blog because its content is useful, interesting or fun, not because the blog owner has effectively hijacked the Next Blog queue through an automated process.

So, what can you do about this? There are a few options:
  • Contact Blogger direct to report this - select Report a TOS violation, give them the URLs of both halves of each matched pair (not just the nextsplogs), and tell them about the abuse of their services and violation of their TOS, or at the very least direct them to the Nextspot.blogspot.com summary of the situation and tell them you agree!
  • Contact Adsense also to report it, using the method suggested by Nextsplog - "Just go to that [nextsplog's] page and click on the "Ads by Google" (or "Ads by Goooooogle") link. Then on the page that comes up scroll down and click on the "Send Google your thoughts on the ads you just saw " link. From there just fill out the information (email optional) and there you go."
  • Disable Javascript in your browser, go to the fake blogs half of the matched pair, and, now that you won't get redirected to the second blogs, flag the fake blog to report it to Google/Blogger (if the navbar or flag has been hidden by the nextsplogger, use this Greasemonkey script to get it back or use this bookmarklet to flag it). Repeat for the other nextsplogs before you turn Javascript back on.
How to disable Javascript (sorry, I'm not up on Safari):
Internet Explorer 6 - Tools, Internet Options, Security tab, select Internet Web content zone, Custom Level, scroll down to the Scripting bit, under Active scripting select Disable (and to enable Javascript later, just choose Enable again), OK, Yes and OK.
Firefox 1.5 - Tools, Options, Content tab, untick Enable Javascript (and you can obviously tick it later on to re-enable), OK
Opera 8 - Tools, Preferences, Advanced tab, select Content on the left, untick Enable Javascript and OK.
  • Spread the word, post about this, get others to flag nextsplogs too.
  • If you feel it appropriate, flag the apparently "legitimate" second blogs of the matched pairs, too.
  • If you spot other nextsplogs in future, do the same again (the Nextsplog post, including the comments on it, has info more on how you can spot this trick being played - e.g. hearing two clicks because of the redirection).
  • If you can't beat 'em, join 'em? Note that I say this mainly tongue in cheek, just to live up to my "slightly wicked" name... If Google/Blogger won't listen to genuine reports of abuse, might they wake up if lots and lots of other bloggers start using the same trick too, and their resources really start taking a hit? But if you do that, you risk getting your own blog deleted if and when they eventually decide to take notice...!
We'll see what happens, I guess. But it really is a sneaky scam, and I hope Google put a stop to it.


Technorati Tags: , , , , , , , , , , , , , , , , , , , , , , ,

Saturday, 25 March 2006

Blogspot splogs: flag bookmarklet now automatically finds blog ID






Hot on the heels of my post yesterday about a free bookmarklet to let you flag splogs on Blogger's Blogspot which have hidden the navbar or flag and which don't get picked up by the Magical Sheep Greasemonkey Blogspot Navbar Restorer script, I'm pleased to be able to tell you that Kirk has stepped in yet again to improve and enhance it.

The favelet now finds the offending spam blog's Blogger blog ID for you automatically in most cases, and fills it in the popup. Just hit OK to flag the splog with Blogger. But even if it can't find the blog ID, you can still search for it via view source and fill it in manually.

Details are in my previous post, which I've now updated.


Technorati Tags: , , , , , , , , , , , , , , , , , , , , , ,

Friday, 24 March 2006

Blogger: bookmarklet - flag Blogspot splogs hiding navbar or flag






[Updated 25 March 2006:]

Previously I posted about the Magical Sheep Firefox/Greasemonkey Blogspot Navbar Restorer script to get the Blogger navbar back so you could flag a spam blog (or "splog") hosted on Blogger's free Blogspot.com if it had cunningly hidden the navbar or flag, or made it non-clickable. Apart from being a nuisance to the rest of us, spam blogs hosted on Blogspot are in violation of Blogger's terms of service or TOS, which prohibits the use of Blogspot blogs for spamming, so Blogger will delete Blogspot splogs if they know about them.

Limitations of the Magical Sheep Blogspot Navbar Restorer script

Now that script only works if you are surfing using Firefox and Greasemonkey (though they're free, and Firefox beats Internet Explorer hands down in most cases), and of course you have to install the script.

It also only addresses certain methods of concealing the navbar or flag.

What if you're surfing using IE and come across a splog on Blogspot? Or what if the splog uses some sneaky method to delete or hide the navbar/flag which hasn't been picked up by the script?

Kirk and I have been discussing this, and it's impossible to anticipate all the ways to conceal or disable the Blogger navbar or flag that spammers could come up with (so please do let us know if you come across any that the script doesn't work for).

But - whatever method the spammers use - hopefully they won't be able to hide their blog ID.

The complementary solution: Magical Sheep Flag Blog bookmarklet

So, here's how to flag splogs that the Blogspot Navbar Restorer script can't reach - we've produced a free Magical Sheep "Flag blog" bookmarklet which you can save to your Favorites or drag to your Bookmarks Toolbar (more info and instructions on how to use bookmarklets) - quick and easy to do.

I won't post the favelet here because Blogger frequently messes up bookmarklet links with Javascript when I republish this blog. Instead, see this Improbulus bookmarklets page - it's no. 3. It works with Internet Explorer and Firefox, but sorry not with Opera. Don't have a Mac so if anyone can confirm that it works with Safari I'd appreciate it.

How to use the Magical Sheep Flag Blog favelet

After you've saved the bookmarklet, when you are viewing (via the browser you've saved the bookmarklet to) a Blogspot splog that you want to flag, just click the favelet link in your Favorites or Bookmarks, and it will automatically pick up the blog's ID and fill it in for you in the popup. Just hit OK to send a flag report to Blogger.

If for any reason it can't find the blog ID automatically, the popup line for the ID will be empty - just view source for that splog (menu View, Source in IE; View, Page source in Firefox), look for the blog ID by searching in the source for the text "blogID" without the quotes - it'll be the number after "blogID=" - and copy that blog's blog ID number. Then, in the popup just paste the blog ID number you just copied, and Roberta's your auntie!

You don't even have to have saved this bookmarklet to use it - if you prefer, just bookmark my Improbulus favelets page link. Then whenever you want to flag a Blogspot splog, find its blog ID via view source and note it or copy it, go to my bookmarklets page, click the "Flag blog"link on that page, fill in the popup with the blog ID you've noted and hit OK.


Technorati Tags: , , , , , , , , , , , , , , , , , , , , , ,

Thursday, 23 March 2006

Blogger: you can't modify the navbar






It's finally official. While notoriously (and oddly) reluctant to say anything comprehensible about their navbar whenever they're asked the question, Blogger have at last come right out and said clearly, though in the context of defending themselves on a different front, that "We consider it a violation of the terms [i.e. Blogger's TOS or terms of service] to modify the Blogger navbar" (the bar you get across the top of each page of blogs hosted on Blogspot.com with the Search this Blog, Next Blog etc buttons).

If you violate their terms of service, the conditions under which they allow you to host your blog for free on Blogspot.com using Blogger, then they have the right to kick you off and delete your blog. So - don't mess with the Blogger navbar if you're on Blogspot.

(If a blog hides the flag or makes parts of the navbar not clickable, the Magical Sheep team have produced a Firefox/Greasemonkey Blogspot Navbar Restorer script for surfers who want to get the navbar features back and flag 'em anyway!)

A pedant might point out they've only referred to modifying the navbar ("modifying" would include at the very least getting rid of the flag button, or plastering a Make Poverty History or other banner over it, I think): they've not actually said you're not allowed to hide or get rid of the navbar altogether. However, that seems a bit nitpicky to me - if their clearly stated policy now is that even changing or tweaking the navbar is a no no, surely concealing it completely through CSS or script trickery would be considered a violation of their TOS too. (The Blogspot Navbar Restorer script also works in most cases to restore the navbar if completely hidden, by the way).

So I'll be changing my previous post about the navbar to add a note about this confirmation.


Technorati Tags: , , , , , , , , , , , , , , ,

Tuesday, 21 March 2006

Blogpot spam blogs: restoring Blogger navbar or flag






[Added 29 March 2006:] For a simple bookmarklet or favelet you can click to flag a Blogspot spam blog that you're viewing (e.g. if the script below doesn't work to bring back the flag), see this post.

Previously I'd blogged about spam blogs ("splogs") hosted on Blogger's free Blogspot, which cunningly hide the flag in Blogger's navbar (that you see along the top of the page on Blogspot blogs) so that you can't report the blog to Blogger as spam. I wrote a simple Greasemonkey script for the free Firefox browser to restore the flag in the case of one specific way used to get rid of the flag.

But of course there are many ways for splogs to stop people from reporting them, by hiding not just the flag but the entire navbar itself, etc - which is in fact against Blogger's TOS (terms of service), and could in itself expose the blog to deletion by Blogger.

Well now the incomparable Kirk has greatly enhanced that Greasemonkey script so that it restores the navbar/flag for 6 hiding methods seen, as Kirk puts it, "in the wild":

1) Flag button hidden with script
2) Navbar hidden with CSS
3) Navbar hidden with noembed tags
4) Navbar flag/"Next Blog" links overlaid by banners or other content, so you can't click on them (like the "Make Poverty History" banner) - the banner is still there but you'll now be able to click on Next Blog
5) Navbar commented out
6) Navbar hidden with noscript tags.

Magical Sheep Blogspot Navbar Restorer

So, I now give you the Magical Sheep Blogspot Navbar Restorer script, version 0.7 beta (direct link to script).

(How to install Greasemonkey and its user scripts. NB - if you were using my original Blogspot flag restorer script mentioned in my previous post, it's best to uninstall it first in Firefox (menu Tools, Manage User Scripts, click on the name of the Blogspot Flag Restorer script in the list on the left, then click the Uninstall button and OK) before you install this version).

The script restores the flag or navbar in all the situations listed above. Of course it can't deal with every single sneaky method used by spammers to hide the navbar, but it addresses the ones which are the most common at the moment (March 2006). If you come across any other naughty hiding tricks by spammers or have any other feedback or comments, do let Kirk and me know, and the script will be updated to deal with them if possible.

This script could be useful not just to flag spam blogs but also to go to the Next Blog, e.g.if you're surfing Blogspot blogs and want to move on to the Next Blog, but can't because the current blog has removed or hidden the navbar.

The script shouldn't interfere with any Webpages other than loading back in the navbar or flag, but if you have a blog-related problem you can always try disabling Greasemonkey in Firefox (menu Tools, Extensions, scroll to find Greasemonkey in the list, rightclick on it, choose Disable), just in case... though that's do with Greasemonkey generally, rather than this particular script. We don't think there is anything in this script which would mess up a page, it hasn't yet done that or crashed Firefox during the time we've been testing it privately, but you never know - so if you use the seript please note that you do so at your risk.

Notes

Red border. When this script restores the flag button which has been hidden by a script, or when it adds the whole navbar back after it's been hidden, you will see a red border around the flag or navbar to show that the blog has tried to hide the flag or navbar. In rare cases the navbar may be restored without the red indicator border, but most likely this is due to coding errors rather than mischief.

Formatting. Sometimes the restored navbar may have some "gaps" or funky formatting. This is usually due to sloppy coding in relation to the original blog (some people leave out the closing </style> tag which keeps the imported Blogger CSS from functioning). In those cases, the Navbar Restorer script reloads just the minimum CSS to get the bar back in place, but nothing more on the formatting/looks front.

Exclusions. If you're using the script but don't want it to restore the navbar on certain blogs, you add those blog to the exclude list as with other Greasemonkey scripts. (To do that, in Firefox go to the Tools menu, Manage User Scripts, click on Blogspot Navbar Restorer in the list on the left, then on the right by the box headed Excluded pages click the Add button and type in the URL of the blog you want to exclude, and repeat the Add process for each blog you want to exclude.)

Trying it out. If you want to test this script out, try looking at these blogs before installing the script (or look at them in Internet Explorer, where the navbar or flag will still be hidden), and then again after installation. Note though that these splogs may not be there forever as no doubt people will now be flagging them!
Script hiding of flag:
Flag 1
Flag 2
CSS hiding of navbar:
CSS 1
There are noembed ones and Make Poverty History banner ones but most aren't spam so I won't list any, I'm sure you can find a few.

So you can now go forth and flag more splogs!


Technorati Tags: , , , , , , , , , , , , , , , , , ,

Wednesday, 15 March 2006

Blogger, MSN etc: which blogging system's most popular?






Which blogging platforms are currently the most popular? What percentage of bloggers use which system? There don't seem to be many recent surveys or stats trying to measure the comparative popularity or usage of the different blogging systems, though perhaps I've just missed them.

Recently I found a paper, SVMs for the Blogosphere: Blog Identification and Splog Detection by Pranam Kolari, Tim Finin and Anupam Joshi, on the identification of blogs and how to distinguish them from non-blogs - not necessarily as easy as it might sound especially if you're trying to automate the process and have software rather than humans do it.

It concentrated on the detection of spam blogs, but part of it dealt with how (as part of their research) they identified blogs hosted on what they found were the most used blogging software or blogging services: Blogger's Blogspot, Microsoft's MSN and the like (mainly for the purpose of then excluding those blogs from their study). They carried out this analysis based on the websites returned from random searches on leading blogosphere search engine Technorati, from which they collected about half a million "live" blog home pages. The study was conducted in in May to August 2005 and the paper is copyright 2006, so the data seems relatively up to date. They also regularly monitored Weblogs.com for data on blog updates, to confirm that the Technorati data was indeed a good sampling of the blogosphere, and the results from the about 5 million blog home pages they collected in this way generally matched in the relative order of blog hosting popularity though not in their exact position.

Based on the Technorati queries part of their study, it seems that the most commonly-used blogging platforms are:

blogspot 44%
msn 23%
livejournal 8%
aol 1%
splinder 1%
20six 1%
typepad 1%
blog 1%
fc2 1%
hatena 1%

Note that the method used in the SVMs paper only looks at the relative number of blog hosts, gleaned from the domain names, rather than blogging software. A lot of Blogger users publish to their own domains/servers instead of using Blogspot.com, and similarly many Wordpress and Movable Type etc users have their own domains, so just going by the domain name will leave those users out of the equation. However, I still think it's useful as a rough guide. Certainly, these results reassure me that if I want to keep producing posts which are helpful to the most number of bloggers, then continuing to focus mainly on Blogger is fine as clearly Blogger still has the biggest user base.

I also found an analysis by Elise Bauer carried out back in February 2005, where she looked at how many sites Google reported as linking to sites on Blogspot.com, Livejournal.com etc, what she called "Google Share". It's obviously out of date now, but still useful as a comparison. Again, it looks only at the domain names, so the same caveats apply.

In February 2005, according to this method, the top two weblog tools were Blogger and Live Journal, with (quite some way behind) Diaryland next. So it's clear that MSN has taken over Livejournal's no. 2 spot in the blogosphere, in just a few months (remember, the SVMs study ended in August 2005). It would certainly be interesting to compare the results if she carries out a similar analysis again soon.

I'd love to see more accurate statistics, though, pulled out just for the purpose of analysing the relative "market share" of the various blogging systems (looking at meta "generator" tags of blogs, for instance, which would be a more acccurate way of identifying the blogging software used than looking at their domains). It would be very interesting to track the changes in the comparative popularity of the different blogging platforms over time, and see if there are any broad trends.

Someone with more expertise than me should be able to employ Technorati's API to dig out that kind of information - perhaps Technorati CEO David Sifry could include this sort of data in a future edition of his authoritative "state of the blogosphere" updates?


Technorati Tags: , , , , , , , , , , , , , , , , , ,

Monday, 13 March 2006

Technorati tag problems: they're on it; but do report your problems






An update on the occasional problems with the tag pages of blogosphere search engine Technorati, which I've posted about before, and which led me to try an experiment whose interesting but curious results are posted here.

Just to recap briefly - even when you've tagged your posts properly, sometimes a particular post won't show up on Technorati's tag pages for any of the tags you've used. The problem seems to be specific to the post - if it won't show up on one Technorati tag page, it won't show up on any of them; yet other posts before or after the "problem" post will show up fine, and the post still gets tagged properly on e.g. rival blogosphere search engine Icerocket.

The last time I had this problem, I decided to try to figure out what was going wrong, or at least what it was about my post which Technorati's system didn't like. I found that it was definitely certain content or code which was doing it consistently, at least in the case of that one post, but I still don't know what the common factor is amongst all the posts that don't get onto Technorati's tag pages, though I assume there must be one; and I don't know why it is that Technorati's system doesn't like that content, as it was just some text and links and a couple of images, nothing out of the ordinary - no complex HTML or Javascript etc.

Thanks to Kent Newsome, Welcome to Wallyworld and Successful Blog and Solar Dweller, amongst others, for mentioning my post and helping to draw attention to the issue.

And some good news - Dave Sifry, Technorati's CEO, emailed me about my test results, to say:
This is an AMAZING bug report. Wow, thanks. We're on top of it and will be in touch if there's any questions. This issue has been puzzling us for a while - I hope that your post and excellent data helps our engineers to get to the bottom if the issue quickly...

...Thanks for the FANTASTIC bug report, it gave us a lot to work with.

...We continue to debug and develop to fix this stuff...

So, it's great to know that they're on it. Clearly the issue has been puzzling them as well as us - and to help them to get to the bottom of it quickly, may I suggest that if anyone else suffers problems with certain of their tagged posts not appearing on Technorati's tag pages (but appearing on say Icerocket's), you should report the bug to Technorati, giving them a link to the problem post's permalink.

If you have time and want to try the same thing as me, i.e. break your problem post down into separate bits and repost each individual section to see exactly which section Technorati's system doesn't like, and again submit an error report about the problem section to Technorati, that might also help speed up resolution of the problem - I think they want to sort it out as much as we do!


Technorati Tags: , , , , , , , , , , , , , , , , , , ,

Sunday, 12 March 2006

Blogger: show only excerpts from long posts






Shay wanted to know about how to manually cut certain long posts on Blogger, with a "read more.." or similarly-named link (possibly with definable link text) to the full-text post (i.e. the permalink of the post page or item page).

As Shay says, Blogger do suggest a way to implement expandable post summaries on their help page, but unfortunately that adds a "read more" link to all your posts, whether you've actually cut them or not. If you want to be selective about it and show some posts in full, but show other, longer, posts only in summary form (i.e. just excerpts from the post, like the first few words or paragraphs), with a link your readers can then click to see the whole post, how do you do it?

Well other people have already worked it out so, rather than re-invent the wheel I'll just point you to The Little Master's post on expandable posts which links to a couple of possible methods. (Also, if you want the "read more..." link to appear in the middle of the post, as TLM did, take a look at this post, and the solution suggested by Kirk a.k.a. redryder52, all hail to him as usual!) You can see it in action on TLM's blog, e.g. his January 2006 archive (the Intelligence Quotients post), for instance.

I don't employ any of those methods myself. I thought about it a while back when I was starting to blog, as I was conscious my posts were often long, but I decided that it was more convenient for my readers to be able to see the full post. There's always the scroll bar... If however anyone disagrees and would prefer me to use the "cut" method for my longer posts, please let me know.


Technorati Tags: , , , , , , , , , , , , , ,

Friday, 10 March 2006

Google Page Creator beta: free webspace more than webpage creation






UPDATE: want to play with Page Creator? You can get it for free if you sign up for Gmail (which anyone can do), and follow the steps in this post to sign in. Make sure you create a Gmail account with the username you want to show in the URL as your Page Creator Googlepages.com URL will be http://theGmailUsernameYouCreated.googlepages.com.

As many people will be aware, recently Google Labs introduced in beta a new service, Google Page Creator. Here's my initial review.

In summary, so far it's really no great shakes as a webpage creator, but it does give Gmail account holders 100MB of free webspace which you can use to host your files, including file types like MP3 or other music/audio files which you couldn't upload to your regular blog server (e.g. Blogger users will know that Blogspot only allows uploading of webpages and pics). To be fair, it's only in very early beta, so we'll see how it develops.

Key links

What's Google Page Creator?

From Google's own summary, it's "a new product that makes creating your own web pages as easy as creating a document in a word processor. Google Page Creator is a free tool that lets you create web pages right in your browser and publish them to the web with one click. There's no software to download and no web designer to hire. The pages you create are hosted on Google servers and are available at http://yourgmailusername.googlepages.com for the world to see."

In other words, it's a free WYSIWYG (what you see is what you get) web editor or website publishing tool where you can design and publish your own web pages to a Google-hosted server. No knowledge of HTML is necessary. Plus, they provide 100MB of webspace, and it all automatically gets indexed by Google for searching, as Googlepages sites seem to come with automatic Sitemaps.

Here's my test homepage (opens in a new window). And some screenshots -

Page Manager



Create a page

When you click the green Create a new page block above, you get this - the title you give the page here appears in the URL of the page, so if you entered say Technoratifaves your page would be at http://improbulus.googlepages.com/technoratifaves:

Editing a page




The toolbar on the left lets you access the obvious things like formatting (bold, italics), structure like headings, etc.

It also has a button to let you add an image from your own computer or elsewhere on the Web (and even drag it round the page as you can on Kirk's post, which is kinda fun):


You can add a link, and even upload your own file to link to:


And there's more.


See the Edit HTML link at the bottom left, above? That lets you edit the HTML source. You have to click on a "field" i.e. section of the page first (will non-techies really understand "field"?! Why not "box"?) before clicking Edit HTML to edit the text within the "field" - so you can't change the layout, only edit the HTML within e.g. the left sidebar box, the center box, right sidebar box etc. This is what you get when you click that link:


It seems to let you do a reasonable amount, e.g. I copied and pasted the HTML from my post on Technorati favorites, and a bit of tidying up was needed (I didn't do it all) but it worked - see http://improbulus.googlepages.com/technoratifaves (I even managed to include code in a textarea, which doesn't work well in Blogger if you have Convert line breaks set to Yes. And I could get the text to highlight when you click in the textarea, something I can't do in Blogger because of the Convert line breaks issue.)

Unfortunately, there are clearly limitations at the moment. I couldn't get embed or bgsound to work for sound/audio files, for instance. And forget proper Javascript or full CSS.

There's even a way to lock editing, so if one person is editing it someone else who logs in is warned that it's being edited, but you can break the lock (so I don't know who would win any battle of the breaks!):

Changing the look

This is the overall design, colour etc. There are 41 templates at the moment, but you can't edit them at all. And you can only change each page individually, not all pages across the site at once, see below on Bugs.

Changing the layout

This is just the columns, sidebars etc. Again you can't change the positioning, only the type. As you can see there are only 4 styles of layout to choose from at the moment. And yet again you can only change the layout one page at a time, manually for each page.


Limits

You can only use Page Creator if you have a Gmail account (Google Mail, in the UK)
- which means you need an invite from someone with an existing account, or if you live in (currently) Australia, Indonesia, Malaysia, New Zealand, the Philippines, Singapore, Thailand, Turkey or the USA, you can get a Gmail invite from Google if you're willing to part with your mobile number (a clever way for Google to collect numbers "for upcoming Google mobile services like secure password recovery and SMS alerts" and no doubt other future plans for worldwide mobile domination!).
UPDATE: see this post on how to get to Page Creator after you've signed up for Gmail.

Plus, a few days after Page Creator was announced, Google closed it to new users - the page been saying "Due to heavy demand, we are unable to offer new accounts for today. If you'd like to be added to our waiting list, please enter your email address." (Never mind "today", it's been like that every day since then! And if it's anything like the waiting list for Gmail, then you may as well effectively forget it. At least with Gmail I got an invite, but there seems to be no way to get a Page Creator account now if you didn't sign in shortly after they released it.)

Other restrictions or limitations:
  • The URL is based on your Gmail username as you'll have noticed. If you have more than one Gmail account, use the one whose username you're happy to have public and which fits in best with the intended theme of your website
  • 100 MB of webspace max - seems to take lots of filetypes so, e.g., could used as free hosting for MP3 files (legal only, of course!)
  • 10 MB max for each uploaded file (according to the Google Group members) and up to 100 files can be uploaded (you can hide uploaded files too)
  • No ftp, only uploads or deletions of files via your browser
  • Number of pages limit? I've seen some mention of this on the Google Group but I've been able to get more than 4 pages myself.
  • Possible bandwidth limit? Don't know.
  • HTML editing limits - as mentioned above, I couldn't get embed/bgsound for audio files to work, there's no proper Javascript or CSS (or e.g. password protection of individual pages). There may be other limits on what other code it will accept, which I haven't noticed yet.
  • Can't control the folder/file structure much, though you can to some extent by naming your . All uploaded files seem to be dumped at the root. New pages seem to go into their own folder, or are they redirecting it somehow? E.g. http://improbulus.googlepages.com/page2 works, but http://improbulus.googlepages.com/page2/ doesn't, nor does http://improbulus.googlepages.com/page2.html
  • Generally, there's very little you can do in terms of changing the finer detail of the look, layout etc, never mind the HTML.
  • No keyboard shortcuts, unlike Gmail - but I have a thing about wanting hotkeys for everything conceivably computer-related.
  • No "across the whole site" automation functions for batch editing of several pages at once, see Thoughts below.

Bugs - Edit HTML link, etc

The most obvious thing I've noticed so far is that the Edit HTML link (at the bottom left of the Edit page) is hidden from view initially, with no visible scrollbar for the window.

In Internet Explorer, trying to page down with the keyboard does nothing at all. I have to hide my browser toolbars and indeed taskbar in order to that link because of the lack of a scrollbar.

In Firefox you can page down with the keyboard to see that link (despite the lack of a scrollbar), unless you have virtually no toolbars in your browser in which the link is just about visible.

Is this a subtle way for Google to get people to use Firefox instead of IE, I wonder? Seriously, Google have since added a Help item on this since I first noticed the issue, and guess what, it's a "feature" not a bug - their very rather unhelpful though hopefully temporary answer is "please set your screen resolution to 1152 X 864 or higher." Not with my eyesight I'm not! Why can't they let us scroll it?

Plus, at the moment you have to manually edit the sidebar for every single page, individually - yes, that's a pain - and the same for the design or look of a page, which is only applied to the current page - you're supposed to be able to change the default page look but there ain't no such box on my Site settings page (just Site name, Site URL and a box to tick to warn of adult content if appropriate). The ability to select all pages isn't much help yet, as you can't do much with them as a group but delete or publish/unpublish.

Thoughts

It's early days yet for this new service, I haven't had the chance to have a proper play, but I have to say that Page Creator is very, very basic - and I do mean very, very basic. Did I say it's very, very basic? It's almost primitive. It's clearly aimed at complete novices who have no technical knowledge whatsoever (e.g. the Help includes as an FAQ "What's a site?"), and as such I think it's too frustratingly limited at the moment. I hope they won't sacrifice control for users for the sake of apparent (note I said "apparent"!) ease of use for non-techies - I firmly believe it's perfectly possible to make things user-friendly for beginners while allowing more power users access to more advanced features, e.g. in this case editing the full source including CSS direct, ability to organise and manage files/folders properly, etc.

In developing Page Creator, did Google build on their experiences with Blogger, which after all is just a specialised means of publishing to the Web? It's hard to say - this is so different. If they did, unfortunately it doesn't show. If they didn't (and they do admit it's not compatible with Blogger - or indeed Adsense, Google Video, Picasa...), they really should draw on the expertise of the Blogger team. Never mind asking for user suggestions, surely they would benefit from cross-fertilisation internally, by adopting many of the features of Blogger, e.g. the ability to have a sidebar whose content is the same across all pages. So please, please Google, bring in the Blogger features that facilitate and automate web publishing; you have a readymade goldmine there. They do say they are hoping to make Page Creator compatible with Blogger etc in future, but there's "compatible" and there's true integration...

It's interesting that users can upload files though - as I touched on earlier, it's a good place to host files that aren't photos, which you want to link to from your Blogspot blog or other blog. But files get dumped at root level only, you can't put them somewhere else. I've uploaded as a test the Magical Sheep Greasemonkey Technorati tag creator script, for instance (covered in this post).

I really like the idea of being able to create webpages from within my browser - the Web browser is increasing in power and usefulness, and that to me was the main missing link (maybe the need to build that functionality has resulted in the divergence from Blogger?). Clearly we are getting closer and closer to Web browser as computer desktop replacement, and given Google's strengths and probable future direction it's no surprise they're focusing on this (and no doubt an Adsense option will be added for sites hosted on Googlepages, given their business model).

But, Page Creator as a direct competitor to Microsoft's Frontpage as this MSNBC article suggested? They couldn't have meant that, surely. Page Creator has got a very long way to go before it can do anything more than help a non-techie put up a single home page (e.g. when you get to more than one page, the lack of a template that can be applied across all pages for content - like a common sidebar, as I said above - plus the lack of finer control over layout/looks - would make it a bit of a nightmare to manage). I suspect at some point the user will be able to drag round the webpage to change the margins, etc, but no doubt that will be a while in coming.

So, my thoughts on this are pretty much the same as on Google Reader in its early days - it's not very useful for a website yet, unless you're happy with the most basic of websites, well realistically a single webpage, only. But to me it's worth it just to get 100MB of free webspace for uploading any type of file, and as a place where I can host forms, textareas etc, and anything else that might possibly risk killing my blog's chances of getting onto Technorati's tag pages, with apparently unlimited bandwidth.


Technorati Tags: , , , , , , , , , , , , ,

London bloggers' meet, 21 March 2006?






I see there's another attempt to organise a London bloggers' meet for Tuesday 21 March 7 pm.

Not sure if I can go, but I'll do my bit to spread the word by posting this.


Technorati Tags: , , , , , , , , , , , ,

Wednesday, 8 March 2006

Copyfighters London, Sunday 19 March 2006 open event






I've blogged about the London Copyfighters' Drunken Brunch and Talking Shop before.

Their last event at the Stanhope Centre will still be hosted by Cory Doctorow, but then Suw Charman of the Open Rights Group will continue to host the brunches as picnics in Hyde Park starting in April, and will be looking for an indoor home for the events come the autumn.

In honour of this last Copyfighters' event at the Stanhope it will be open to the general public, so please feel free to spread the word to anyone you think might be interested (though unfortunately I don't think I'll be able to make it on this occasion, myself).

It will be co-sponsored by the Electronic Frontier Foundation, Open Rights Group, the Foundation for Free Information Infrastructure and the Open Knowledge Forum Network.

Here are the details.

When: Sunday, March 19. Food from 11AM to 1PM; Speakers' Corner excursion 1PM-~2PM.

Where: Stanhope Centre (Stanhope Centre, Stanhope House, Stanhope Place, London W2 2HH)

Directions: The nearest underground station is Marble Arch. If you are at the Marble Arch tube station, walk West on the North side of the street. The Street is Oxford Street as you exit the tube but it immediately becomes Bayswater. Keeping walking due West for 2 blocks (using the pedestrian underpass to cross under Edgeware Road). Hyde Park will be on your left or to the South as you walk. Then, take the first right after you exit the pedestrian underpass is Stanhope Place. It is about 75 yards from where you exit the pedestrian underpass. Walk north on Stanhope Place approximately 50 feet to the first set of steps on the block. You will see a sign for Stanhope Centre. Walk up the stairs and ring the bottom bell, which is marked Stanhope Centre.


Technorati Tags: , , , , , , , , , , , , , , , , , , ,

Technorati tag pages problem: my test results






I've been looking into the problem I and others have experienced where properly-tagged posts don't appear on the appropriate tag pages of the leading blogosphere search engine Technorati, as previously mentioned.
How widespread are these problems?

I even tried doing a survey a while back: only about 80 people responded, but, for those interested, the key results to date are here (opens in a new window - 'scuse the look and the need to scroll but that's the only one-column template on offer and I needed a one-column to fit and don't have time to tinker further with that - talk about lack of choice on Google Pages!). (I won't even try to include the graphs here lest the images or iframes stop this post from being tagged. You never know...).

Now 80 doesn't seem many but it does indicate that the problem isn't unknown, and that more people have experienced the problem than have bothered to report it to Technorati. (If you want to vent, the poll is still open, see the end of this post.)

Just over a week ago, when yet another post didn't get displayed properly on the correct Technorati tag pages (e.g. the Technorati Improbulus tag page, though the post is on another blog search engine IceRocket's Improbulus tag page), I decided to investigate further, as mentioned in my previous post. Technorati's CEO David Sifry had commented there that Technorati were going to get to the bottom of this, which is very welcome news.

To recap, I think there are several aspects that need investigation here. Which bit of the affected post is behind the part of Technorati's system that's going wrong? Which part of Technorati's system is it that's going wrong?

The post: what doesn't Technorati like about the post content?

As outlined in my previous post, I can think of several possibilities (apart from the "valid XHTML" point which, as I explained in the previous post, I don't think is an issue in the case of my blog, but I tested it anyway):
  • length of post
  • lots of code in the post, displayed as such
  • lots of HTML other than links/images, e.g. forms, iframes
  • a combination of the previous, or something else I haven't thought of!

My experiment

The problem post "Technorati: favorite blogs; help others add your blog, and thoughts on Technorati Favorites" was long and had lots of code, including a form and iframe. So I split it into different sections and posted each section separately (with some normal posts in between) to see what happened. (I should have given the posts more distinctive titles, but there we are - the end of the first paragraph of each test post does summarise what bits it contains.)

All those individual posts had mostly the same tags. For speed I checked mainly my meblogging tag's tag page for the Improbulus tag, but I cross checked also on other tag pages e.g. for "A Consuming Experience", and the results were the same.

Here's screenshots of part of the Technorati Improbulus tag page:



and of my list of actual posts:


- from which you'll see that clearly some of the test posts are not on Technorati's tag pages.

Now, just to break down the various test posts by content type (you can doublecheck them on the Improbulus tag page if you want to):

A. Post with text, links, images including buttons, code: Technorati: favorite blogs; help others add your blog - OK

B. Post with text, iframe and code for iframe: Technorati: favorite blogs; show your Technorati favorites on your blog - OK

C. Post with text, links, icons, one URL as code without link, and form: Technorati favorite blogs: benefits for readers of blogs - PROBLEM

D. Post with text, links and images (bugs, issues, thoughts): Technorati favorites: bugs, issues and thoughts - OK

So the problem seemed to be with C. I suspected it was the form, perhaps because the input tags weren't closed (so it was not strictly valid XHTML). Technorati have been telling people (e.g. in their help and via their staff) that having valid XTHML will help your posts get properly indexed by their spider, in fact that seems to have been their main response over the months to people who have asked for support on this very issue.

Therefore I reposted C again, having first tweaked the form so the input tags were closed (and therefore more valid XHTML, just in case - despite my personal view - that was the source of the problem) - but still, no go.

Next, I tried breaking C down further into two separate bits - just the form (with a few links), and the rest. Guess what? The post with just the form was fine! It was the post with the rest of the content of C that wouldn't show up on the tag pages. I didn't expect that, because other problem posts I've had in the past have often contained forms and I was really wondering if that was it. Just to be sure, I tried posting C without the form, again. Same thing - a no show.

Finally, on the XHTML validation front (yet again), my template has some warnings, but then those are common to all my posts including A, B and D which did get picked up. So ignoring validation issues with the template, the only thing left wrong with the body of that post is that I didn't include "alt" attributes for the images. I reposted that post (C without the form), this time with blank alt attributes (on the basis that another post with blank alt attributes for images did get displayed properly on Technorati 's tag pages). And that post was again missing in action from the Improbulus tag page (and indeed other relevant tag pages e.g. Consuming Experience), even though a subsequent post (on ID cards) did show up. So it can't be the XHTML validity (or rather invalidity) of the main body of the post. I'd also mention that unlike other people, lately I've had no problems with pinging Technorati - I checked and they correctly showed when my blog was last updated for that particular post; they just didn't show that post on their tag pages.

Now, I'm completely stumped. Whatever Technorati's system doesn't like is, in my case at least, clearly something to do with the problem C bit - whenever I post ANYthing containing that bit (whether the original post, the extract I've called C, C with tweaked form, C without any form, C with no form and with alt attributes for the images), that post just doesn't appear on Technorati's tag pages. Whereas all the other sections of last week's post (A, B and D above) displayed fine when posted separately. (Someone with lots of time could break that the problem bit of C down further into paragraphs and post those separately too, then break those down further, in order to pin down exactly which bit of C it is that Technorati's system chokes on - but that someone will not be me...!)

What I'm puzzled about is, that section doesn't contain anything out of the ordinary; it's just text, some links, a couple of images. Why should that be a problem? I really have no idea. Well, are there maybe certain words their system doesn't like? I don't think they consciously deploy a censor (though I did briefly wonder, in the case of my long post about female sexuality which didn't get picked up on their tag pages!). If they did have a censorship mechanism it would surely filter out the post for all purposes (like when using their standard search), not just on their tag pages; and besides, from what I've heard David Sifry is not the sort of man who would brook censorship on Technorati.

Technorati: what's going wrong?

This part is really for Technorati to figure out, of course, but it seems to me that when a post isn't on the right tag pages on Technorati, the possibilities are:
  • No Technorati Crawl - Technorati's spider is skipping that post somehow
  • No Post/Tag Association - the post is indexed (and you can find it on a simple full text search on Technorati), but it's not being associated on Technorati's system with some or all of the right tags (e.g. it's not been stored in their tags database in the right place or at all, depending on how Technorati do it)
  • "Recovery" Issue - the post is indexed, it's associated with the right tags, but when you go to the relevant tag page(s), whatever's behind the scenes is not fetching back the right information
  • Tag Pages Wrong - the post is indexed, it's associated with the right tags, when you go to the relevant tag page(s) the right info is returned, but whatever is responsible for displaying the tag page just isn't showing it properly
  • Something Else - again, a combination of some of the above, or something I haven't thought of.
Whatever the problem is, in my case at least, I'm sure it's not "No Technorati Crawl" because my posts which are missing from the tag pages do show up on the basic search results pages on doing a search (e.g. this search, which I've made a bit more complex just to pick out some of those test posts of mine that aren't on the tag pages but clearly can be found just by searching).

Conclusions

So, it's over to Technorati now. I know that for many people the tag pages problem issue may be different from mine, and following the guidance given by Technorati (including on validation and the rel="bookmark" point) may help get their blogs picked up by Technorati, or else contacting Technorati support (by the way their new customer support specialist Janice Myint also has her own blog which offers unofficial Technorati help, I saw via this Blogher post). But, in the case of my blog, I think you'll agree from all the above that my issue has to be something else entirely.

I really don't think there's anything more useful that I can do, having established that it's not the form or missing alt attributes in images, but that there is some consistency in what things their system doesn't like. It's down to Technorati to work out what that could be, and why, and hopefully fix this ongoing issue - not just in the case of the problem C extract but hopefully also in the case of other posts, from whatever blog.

And I hope if it's something to do with the content of the post or underlying code for a post, which certainly seems to be an issue from my tests, that they will either sort the issue out internally or else share with us what that thing is, so that we can avoid including in our future posts anything that could result in our posts not appearing on Technorati 's tag pages.

This issue has been plaguing Technorati continually for over a year now to my knowledge, i.e. ever since they pioneered blog posts tagging. While Technorati have their fans (and I am in fact one of them, despite the tag pages problems), it's clear that people have been getting Technoratty (if you'll forgive the pun) for some time now - even latterly, there is still dissatisfaction with Technorati: see see e.g. this Blogher post and comments on it, or this post - and for the sake of their continued credibility I hope Technorati will get to the bottom of this issue soon.

Update 13 March 2006: After I posted these results Dave Sifry the Technorati CEO emailed me to say they're on it - see this post; if people regularly report this problem to Technorati when they encounter it, it might help them fix it faster.


Technorati Tags: , , , , , , , , , , , , , , , , , , , ,