Wednesday, 31 August 2005

Your blog and Google Sitemaps: summary, and note on feeds






In my earlier post I described how blogs or websites who have newsfeeds can ping Google when they've updated their site with new posts or webpages, in order to ask Google to re-index the updated site. (Though how soon Google respond is another matter! Still, it has to be better than waiting for their bot to get round to your site as part of its usual re-crawls, which can be rather random).

Just to summarise here, for those who don't want to plough through the long version in the previous post [updated 28 April 2006 as Google have changed the way you do this and also permitted users to verify through a meta tag]:

1. Make sure your site is being indexed by Google already (no point pinging them if it isn't!). Submit your blog URL to Google if not, and then be patient - do the occasional search to check.

2. Assuming your blog is already on Google, sign in to Google Sitemaps using your blog Gmail address (or register with Google for a Google Account if you haven't yet, then sign in).

3. Under Add Site, enter the URL of your blog and OK. Against your blog URL, click Add a Sitemap, choose type Add General Web Sitemap. In the list that appears, tick all the boxes:
  • I have created a Sitemap in a supported format.
  • I have uploaded my Sitemap to the highest-level directory to which I have access.
  • My Sitemap URL is:
and in the box below that, enter the URL of your blog's feed (NOT the main blog URL), then click Add Web Sitemap. (For example my feed URL is http://consumingexperience.blogspot.com/atom.xml and yours should be similar, i.e. your blog URL with /atom.xml added to it, if you're on Blogger). The Sitemap Status column should now say Pending.

Check back by logging in from time to time; the Sitemap Status column should change from Pending to OK which means they've, theoretically at least, re-crawled your blog.

[Edited 1 Sept 2005:] After it's done the first re-crawl, under your site name there's a link labelled "Verify" which asks you to upload a verification file to your server. If you can't upload anything (which is the case with us Blogspot users), don't worry about it. Verification just gives you access to more detailed Google stats; the lack of it won't stop Google from re-indexing your blog at all - see the Sitemaps help (but consider lobbying Blogger (the Suggest New Feature box at the bottom) and also Google Sitemaps to allow what we need - tell 'em you want Blogspot users to be able to have full Sitemaps functionality including verification!) [Added 28 April 2006:] As from 26 April 2006 you can now verify your Blogspot blog - see this post for a howto and why you might want to verify.

4. In future, when you update your blog or site, ping Google Sitemaps. Other ways are mentioned in my previous post, all just variations on the same thing, but the easiest way by far is to use this form - fill in your blog feed's URL (NOT main blog URL), then hit the button. Bookmark the resulting page or save it as a favorite, then next time you want to ping Google, just click on the saved bookmark/favorite:

Checking your feed

I just wanted to add a note about the importance of checking your site feed settings. Google takes your feed file (the XML file) to use for its sitemap. This means what gets re-crawled on your site or blog will depend on what's in your feed.

So - you should make sure your feed file is being created in the first place, if you don't already make use of your feed! In Blogger it's Settings, Site Feed, and make sure Publish Site Feed is set to Yes (it should be by default anyway - Blogger normally create the feed file for you whether or not you make use of it).

Also, Google will only notice the URLs of posts which are listed in your site feed. If the feed shows only the last 5 posts, and you've just written 15, the earliest 10 may not get picked up. To fix this, you need to change the settings so that your feed reflects the number of posts that will be most appropriate for your blog, e.g. 15 in this case, and leave it like that until Google have re-crawled your blog. It's easiest to pick a number that will work for how much and how often you post on average, and then leave it alone. (If you've not come across feeds as used in your blog yet, I plan to write an introduction sometime, when I can get to it! For now, if you're on Blogger at least, hopefully this explanation will be enough for present purposes).

There should be a setting somewhere to tweak your feed. On Blogger, it's not on your site feed page - instead go to Settings then the Formatting tab. It's the setting for Show X posts on the main page. This seems also to dictate the number of posts shown in your site feed. So you can adjust the number there.


Technorati Tags: , , , , , , , , , , , , , , , ,

15 comments:

corpodibacco said...

Hi,
first of all, I linked to you for the Blogday! And that does it with the compliments :)

I tried google sitemaps following your istructions and everything went well.
But as a blogspot user, I don't seem to be able to complete the 'verify' procedure!
As a mtter of fact, after Google crawled my sitemap a small 'verify' link appeared after the website address.
As I you followed it, Google asked me to manually upload a file with a coded name to my website, which obviously I couldn't do as a blogspot user!
It's ironic that blogspot it's them.
Any suggestions?

Improbulus said...

In case it wasn't clear, I've amended the original post above to add a note about verification - hope this helps.

Anonymous said...

Does this work with an RSS feed (xml)? I thought sitemap was meant to be a more traditional site map?

Joshua said...

http://sitemap.xmlecho.org
This site seems to do a good job of creating a google sitemap and even helps with sending it to google and setting it up on your website.

Improbulus said...

Anon, yes Google Sitemaps isn't the same as a traditional site map of a blog or website, as such. What Google Sitemaps does, as I explained in my previous post, is to enable webmasters and bloggers to inform Google's crawlers when their sites have been updated, and which pages on their blog or site Google should crawl and index. In that sense therefore there is a sitemap involved, but the map is for the benefit of Google, to tell them which pages on your site to index.

Because of lack of integration within Google, it's impossible for Blogspot users to submit a proper full sitemap to Google - because that involves your uploading a sitemap (in the special Google sense) to the top level of the server hosting your blog.

Blogspot users aren't allowed to do that; we can only upload posts and templates and pics, nothing else, and we can't control the directory to which we can upload files on the Blogpost servers.

Therefore if Blogspot users want Google to crawl our changed pages, we have to use an alternative way which is accepted by Google, which is to submit our feed URL, and that's then treated by Google as a "sitemap" for its crawling purposes (and indeed it does "map" out your last few new posts, those visible on the main page of your blog).

See my previous post for fuller details. Hope this is clear now. The feed IS the map.

Improbulus said...

Joshua, thanks for the comment.

Unfortunately Blogspot users can't upload a proper sitemap to the Blogspot servers hosting our blogs, no matter how well created the sitemap (e.g. using the tool you mentioned) - we're stuck with using our feeds for now.

Unless you know something I don't?

Paulo a Pe said...

Followed your idiots' guide to adding a sitemap - but clearly I am an Uber-idiot!! My blog is on blogspot, and following the instructions I obtained the site feed. However the blog address is http://www.xxxxxx.blogspot.com but the site feed comes up as http://XXXXXX.blogspot.com/atom.xml - ie omitting the 'www' and adding the atom.xml. Then when I try to add the site feed url on the add sitemap function, it comes up with an error saying the site url (you say site feed url) must be in the format http://www etc. If I then add the www in before the correct site feed url, it accepts it and designates the site map in the 'pending' page as atom.xml only. Horribly confusing explanation, but did it make sense. After waiting with the thing pending for 20 minutes or so, it comes up with Errors(10) in red, which on listing are of the form
"Error Detail
URL not allowed (Line 17) with URL http://xxxxxxx.blogspot.com/2006/05/frolic-in-forest-and-floral-grandeur.html This url is not allowed for a Sitemap at this location. More" When I hit 'more' for the explanation, it bangs on a about levels of sitemap etc and loses me. How do I find out what level the sitemap or the blog is at?
Regards,
Paulo

Anonymous said...

Why is it that one's blog should be already indexed before pinging Google sitemap?

Improbulus said...

Because if Google aren't indexing your blog at all, pinging them won't help - they've emphasised that adding a sitemap won't necessarily help get your blog or site on their radar any faster ("Please note that submitting a Sitemap doesn't guarantee that all pages of your site will be crawled or included in our search results" - see this page).

Improbulus said...

Paulo, I'm not sure why that's happening with you, I've not had that problem with the blogs I've tried. For the SITE url you should enter your blogspot url, eg http://consumingexperience.blogspot.com but for the site FEED url you should enter the atom.xml one e.g. http://consumingexperience.blogspot.com/atom.xml

hackaback said...

Well I'm trying sitemapdoc.com to create a sitemap for my site http://theblogwhichsaysall.blogspot.com. But it has indexed nearly 400 pages till now

Improbulus said...

Hackaback, thanks for the comment. And did it work?

Of course you could alternatively submit your blog's feed as a sitemap - if you're on New Blogger, you can get a feed of the entire blog not just the most recent posts.

Anonymous said...

Hi. u hv been gr8 in explaining things here. may b i am one super idiot here. I am nu to blogging and this web world of jargons. My goal is to submit a sitemap for my blog(new).
Now I dont understnad the directory or anything else. I have tried feedburner added that feed to my blog. now wat?(since u said feed is the sitemap). so now shud I paste the feed address http://feeds.feedburner.com/Dxxxx into the google sitemap url??
well I tried that but it gave me an error saying that "The Sitemap must be located at http://xxxx.blogspot.com/. To add a Sitemap at http://xxxx.blogspot.com/http://feeds.feedburner.com/, first add that site to your account and then click the Add a Sitemap link beside it."

but then thats exactly what i have been doing. I have verified my site and clicked on the "Add sitemap" link and was doing this.
I am utterly confused. Can u please help!!!
Thanks a ton!!

Improbulus said...

Anon, you don't need Feedburner to produce a feed for a sitemap - your blog should already have a feed, Feedburner just makes it a different kind of feed.

If you use Blogspot as your blogging service, your feed URL will be http://yourblogname.blogspot.com/atom.xml, just submitting that URL should work for sitemaps.

For more on feed URL locations and feeds generally, please see my intro to feeds.

Rafiq Raja said...

Wonderful explanation. Thanks, I have registered to Google now with your help.