A Consuming Experience

Thoughts on my experiences as a consumer of products, services, people (well maybe not that last one...), from reviews to raves, rants and random thoughts - concentrating on technology, gadgets, software, product usability, consumer issues, customer service. Including some introductory guides and tips on various subjects (like blogging!) which stumped me until I figured them out. And the occasional ever so slightly naughty observation.

Deutsch | Español | Français | Italiano | Português | 日本語 | 한국어 | 汉语

Add this blog to Del.icio.us, Digg or Furl | Create Watchlist for this blog

Add this blog to my Technorati Favorites!


Blogger feeds, Google crawls - Sitemap test over, y'all please come back now!

Thursday, May 31, 2007
Deutsch | Español | Français | Italiano | Português | 日本語 | 한국어 | 汉语
Add this post to Del.icio.us, Digg or Furl | Create Watchlist




My attempt at a new sitemap has "taken" as much as it could, experiment's over, and I've deleted my "don't come if you don't want a slow-loading page" post. Google's picking up the same number of URLs from the home page as before, no more.

I should have checked more thoroughly first. The only Blogger feed Google Webmaster Central will accept for their Webmaster tools sitemaps (quickie on Google Sitemaps for those unfamiliar with 'em) is the atom.xml feed, because it's the only one in the root directory of files stored by Google for Blogger blogs.

Kirk pointed out that New Blogger's base atom.xml feed shows just 25 entries anyway - regardless of how many posts you've set to display on your main blog home page, whether fewer than 25 or more than that. This is different from Old Blogger, where the feed showed exactly the number of posts you'd set to show on your home page, and boy do I have some editing of old posts to do, as many of my posts mentioned the importance of this point..

Plus, that base feed is ordered by "date updated" too, not "date published" (more on Blogger feed ordering and sorting, see Kirk's post for the full lowdown). Unfortunately you can't change that when you add the feed URL to Webmaster Central to use it as a sitemap, even though you can change it in e.g. the feed URL that you give to services like Feedburner.

This means that the base Blogger feed shows the 25 latest posts that you've published or updated - yes, including old posts that you've just edited - rather than your 25 newest posts.

So the feed only pointed the Googlebot to my last 25 updated posts, not the universe of posts I've put up since my domain name change. Wail. However, Kirk also pointed out, quite rightly, that feed as sitemap isn't really appropriate for blogs and hopefully all the pages on the blog will get picked up by Google eventually as they link to each other and the Googlebot follows links. It's just going to take longer than I'd hoped for Google to crawl and index my post-domain name change posts. Bear with me as I may well publish a post with those links, just to help it on its way.

And I hope my sudden 30% drop in daily visitors since the domain name change was just a blip with Google rather than the result of the change. Quite depressing as I'd managed to build up a PageRank of 6 before the change, now it's 0 at least until the next Google public update - unfortunately for me my domain name change came at the wrong time for me so it never got picked up on the last one around the end of April or beginning of May, I gather. Ah well, c'est la vie.

I'll be posting on my trials and tribulations following the domain name change, with tips and pitfalls to avoid, of course, once I've actually dug myself out of the current pits!

Labels: ,

| View blog reactions | Links to this post | Post a comment or view 0 comment(s) | Subscribe to Post Comments [Atom] | Subscribe to all comments on all posts

Google News: news source - bug, or deliberate?

Wednesday, April 11, 2007
Deutsch | Español | Français | Italiano | Português | 日本語 | 한국어 | 汉语
Add this post to Del.icio.us, Digg or Furl | Create Watchlist




Google News is no longer in beta but it is, oddly, forcing "news source" searches on users.

You can do a "source" search by clicking "Advanced news search" and typing the source you want to limit the search to in the "News Source" box, outlined in red below - e.g. CNN or New York Times:


Or you can type "source:Reuters" (or whatever) in the main search box along with your search term:



But what if you don't want to do a search confined to, say, news agency Reuters as the news source? What if, instead, you want to search for news on Google News about Reuters?

Well, I've just noticed that Google News won't let you. It automatically fills in "source:Reuters" (or "+Reuters") if you just type in Reuters as a search term, e.g. if you wanted to check the news reports about the news agency Reuters setting up shop in the Second Life virtual world in late 2006.

If you type in the Google News search box: Reuters "Second Life":

and click Search News, you get this search results page:

In other words, Google News seems to assume that "Reuters" has to be a "news source", rather than the possible subject of any news, and makes it into one whether you like it or not - changing the search to source:Reuters "Second Life" (or +Reuters "Second Life").

Interestingly, it doesn't do this when you search the Google News Archives, where you get this on searching: Reuters "second life"


As you can see, the search results from the archives are from a mix of sources, not exclusively from Reuters itself.

Google News behaves in exactly the same way if you try to use e.g. "CNN" as a search term rather than a news source, e.g. search for CNN shooting and this is the results page, it's been turned into "source:CNN shooting":


I've not tried other searches yet with the names of other news corporations or press agencies, but I wonder why Google decided to do that. What's going on? Is it a bug? Is it deliberate? Did they come to some sort of deal with various news agencies?

It doesn't do it if you search e.g. "Financial Times" or "Times" or "Guardian" or indeed "New York Times" or "Washington Post" (odd that those wouldn't be deemed worthy news sources! Especially as the New York Times is listed as an example news source on the Advanced news search page). But it does change "BBC" to "source:bbc_news". Even more strangely, if you try "Bloomberg", it doesn't change it to source:Bloomberg in the search box, but it does say on the results page (just above the results listings) "Search news source Bloomberg" - yet the results clearly include reports from other sources e.g. the International Herald Tribune:


OK, I've done me share of typing names of random news sources to see what Google News does to them; if anyone else wants to sit there trying different names, feel free - but please let do me know what Google News does or doesn't "convert" into a news source! I can't find anything about this peculiarity in the Google News help or just on generally searching.

So, are we destined never to be able to read news reports on Google News about a news corporation (e.g. any takeover of CNN) - unless they're written by that corporation itself, or enough time has passed that it'll be old archived news?? That's the cynical view, of course...

Hopefully there's nothing Big Brotherly behind this. Sort of a variation on quis custodiet - who then will report on the reporters? How will we be able to find news about the news sources? etc etc. (Yeah, I know, before anyone suggests it: not on Google News!).

But still I think it's interesting quirk, if a little frustrating for the user who wants to search for news about a news source.

Labels: , ,

| View blog reactions | Links to this post | Post a comment or view 4 comment(s) | Subscribe to Post Comments [Atom] | Subscribe to all comments on all posts