Thursday, 19 April 2007

Googlepages.com files aren't private!






Did you know that generally anyone can see all webpages, photos and other files on googlepages.com sites, even files not linked to from any webpages? All they need to know is your base URL i.e. yoursitename.googlepages.com, and they can sneak a look under your kilt.

Anyone with a Gmail account can have free webspace on Googlepages.com via Google Page Creator (which is itself at pages.google.com though it's also accessible via www.googlepages.com). I wouldn't be surprised if lots of people use it to store files (like pics) which they mean to keep private to themselves, or to those to whom they've chosen to reveal the direct URL of the picture or whatever it is they've uploaded. Well, those files are in fact open for anyone to access, if they know how.

How come people can view all files on Pages Creator sites? Because for every single GPC site, Google automatically creates and updates a basic sitemap, an XML file which lists all the files on that site (yes, even files you've uploaded separately yourself and not created or edited using Google's Page Creator webpage editor). Anyone can view that sitemap just by going to a standard URL: yoursitename.googlepages.com/sitemap.xml.

How can people know your sitename? Because by default, when you get a GPC account, it uses your Gmail username or login for your Googlepages site name. You can create other sitenames but I bet most people will generally use the original one, especially as it wasn't possible to get alternative site names when GPC was first launched; you had to use your Gmail user name, which wasn't necessarily good for privacy, but if you wanted Google webspace then you were stuck with it. So if someone knows your Gmail user name, they can view any files you have on GPC using your Gmail ID as your site name.

How to view any Googlepages.com files

Now, that sitemap.xml is pretty ugly and user-unfriendly to view in a web browser. Which is not surprising, as Web browsers are not generally set up to optimise viewing of XML files.

Fortunately, or maybe not so fortunately for some, Gilles Rasigade has produced GPExplorer (or Google Pages Explorer), a clever Google Gadget which clearly displays the files on any Googlepages.com site by making use of its sitemap.xml file. It's aimed at people wanting to manage their own Google Pages, but can of course be used to look at all the files on any site whose main Googlepages URL you know.

For instance, in Firefox just enter any Googlepages URL in full (including the .googlepages.com) the form box below, which makes use of GPExplorer, and hit Take a peek! to see the site's files (in a new window or tab), e.g. improbulus.googlepages.com:




(Note I said Firefox, because this doesn't work in Internet Explorer, not even IE7, no surprise (). Works fine in Opera 9.20. Sorry, no idea about Safari. You'd have thought that with IE7 Microsoft would finally have caught on that it's not necessarily a good thing to insist on doing things differently from every other browser in the universe, but nope. And it still doesn't support stuff one would hope it would by now, like :before, but - again, nope.)

As you can see, GPExplorer brings up a list of the files on the site, on the left; all you then have to do is click on a filename in the list to view the contents of the file on the right (if it's a webpage or image):


And of course you can rightclick a link on the left to open it in a new tab, etc.

If you want, it can show you the files from more than one GPC site, which you can then switch between easily. Just enter the URLs of the different sites but separate them with a | (no spaces) - e.g. improbulus.googlepages.com|phydeauxredux.googlepages.com:


Bottom line - how to protect your privacy?

If you're worried about the privacy or security of certain web pages, images or other files, don't store them on Googlepages.com, or you could be exposing your privates to anyone who wants to take a peek. Best to upload them somewhere else (even a Gmail account using GSpace aka Gmail Space, for instance).

Different, unguessable sitename?

If you have to keep your private files on GPC, don't use your Gmail email user name, maybe get yourself another sitename and use that. But even though you could make up a long obscure name and not give the URL to anyone, don't forget that nothing is ever truly 100% secure and if anyone is really determined they could e.g. get robots to try different combos of random characters fast. Unlikely that they'd do that and hit on your particular site, but you never know.

Hide your site?

It would be good if Google offered an option to turn off the automatic sitemap creation for those who want it (or allowed you to upload your own sitemap overriding theirs), perhaps. At the moment, I don't think that's possible.

Now you can hide your site (e.g. to reduce publicising your Gmail address) via site settings:



Hiding your GPC site is meant to hide it from web search engines and stop them crawling and indexing your site. But it also blocks access to your site's automatic sitemap.xml file. Some people may want to use this option.

However, one problem is that, as far as I can see, hiding your site also blocks, within seconds, all general Web browser access to all your site files, yes even those you want to remain public, even those whose direct URLs are known. If I try to go to the direct URL of a file I've uploaded to GPC after I've hidden the site, then I just get a "File not found" 404 error. Even when I'm logged in to Page Creator.

You can still login to Page Creator and access and edit your site from there etc - but it seems to me that if you've hidden the site, the only use then for Google Pages is just to upload and download files (i.e. as a sort of file host), and if you want to share certain files with a limited circle so that they can download the files too, they can't unless you give them your GPC login and password, which isn't generally a good idea especially as it'll give them access to your Gmail account email and your Google Account too.

So hiding your site is a bit of a blunt instrument. There's also a problem with Internet Explorer (at least IE7), in that if you tick Hide this site and then leave that page, when you go back to Site Settings you may find that it hasn't "taken" and the box has unticked itself, so you may have to do it several times and keep checking to see that it's worked. No surprise, again.

(Discovered by the inimitable Kirk, but of course.)

2 comments:

Efendi said...

very useful article ! especially the peeking tool ;)

Improbulus said...

Thanks Efendi! Ah, so you're one of the people who's peeked at my files, are you? I hope you found it.. :D