Tuesday, 8 March 2005

Auto-translating Webpages/blogs: code






As requested after my previous post on “automatic” translation, in this post I set out the code you can include in your Web page or blog template which allows visitors, without leaving your site or blog, to translate your Webpages or blog posts from English into German, Spanish, French, Italian, Portuguese, Japanese, Korean or Chinese (using Google's language translation tools). I can't get it to display nicely and still work, so it's very long horizontally and cuts across the sidebar, but copy/paste should be fine. (Must get try to get a textarea working properly in Blogger!).

I’m very glad I introduced this for my own blog. Every day since I’ve done so, there have been at least one or two people who have used it, I have noticed from my logs. It’s all about making a blog or Website more accessible to a wider range of readers.

WARNING: Blogger keeps doing weird things to the code I included below, with extra "amp;amp;amp;" appearing after ampersands, and "SPECIAL REMOVE" stuff creeping in within the Korean, Japanese and Chinese code. Please remove them if they crop up... sorry, can't sort it, it keeps recurring...

Javascript version

I produced the Javascript version with some help from the inestimable redryder52. Any mistakes are of course mine alone. (There's also an HTML version, see later). Here's the Javascript to enable translation of the Webpage on which the code appears
<div style="border-style:none; font-size: 1; text-align: center">
<script type="text/javascript">
//By Improbulus, http://consumingexperience.blogspot.com/
//licensed under Creative Commons License
//http://creativecommons.org/licenses/by-nc-sa/2.0/
//with thanks to redryder52, http://truckspy.blogspot.com/
var Location = document.location;
document.write (
//German starts here
'<a href="http://translate.google.com/translate?u='+Location+'&langpair=en%7Cde&hl=de&ie=UTF-8&ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools" target="_blank">Deutsch</a>'
+' | '+ //spacer and divider
//German ends here
//Spanish starts here
'<a href="http://translate.google.com/translate?u='+Location+'&langpair=en%7Ces&hl=es&ie=UTF-8&ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools" target="_blank">Español</a>'
+' | '+ //spacer and divider
//Spanish ends here
//French starts here
'<a href="http://translate.google.com/translate?u='+Location+'&langpair=en%7Cfr&hl=fr&ie=UTF-8&ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools" target="_blank">Français</a>'
+' | '+ //spacer and divider
//French ends here
//Italian starts here
'<a href="http://translate.google.com/translate?u='+Location+'&langpair=en%7Cit&hl=it&ie=UTF-8&amp;ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools" target="_blank">Italiano</a>'
+' | '+ //spacer and divider
//Italian ends here
//Portuguese starts here
'<a href="http://translate.google.com/translate?u='+Location+'&langpair=en%7Cpt&hl=pt&ie=UTF-8&amp;ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools" target="_blank">Português</a>'
+' | '+ //spacer and divider
//Portuguese ends here
//Japanese starts here
'<a href="http://translate.google.com/translate?u='+Location+'&langpair=en%7Cja&hl=ja&ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools" target="_blank">日本語lt;/a>'
+' | '+ //spacer and divider
//Japanese ends here
//Korean starts here
'<a href="http://translate.google.com/translate?u='+Location+'&langpair=en%7Cko&hl=ko&ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools" target="_blank">한국어</a>'
+' | '+ //spacer and divider
//Korean ends here
//Chinese starts here
'<a href="http://translate.google.com/translate?u='+Location+'&langpair=en%7Czh-CN&hl=zh&ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools" target="_blank">汉语</a>'
//Chinese ends here
);
</script>
</div>

Instructions

To offer translation into those languages, all you need to do is copy and paste the code above into your HTML or blog template; you don't have to do anything else, except (if you wish) use standard HTML/CSS to style it to fit the design of your own site or blog, tweak the spacing before and after by using <br> or <p> tags, etc. (You can delete or amend the <div> and </div> lines if desired; they're just to set the style I decided to use for the translation links in my own blog: small font, centered etc. I use “1” (instead of “10px”) for font-size where the code is at the top of a page instead of under the title of a particular post, for instance).

Blogger.com users can use that block of code for their main page, archive pages or item pages - it will work, as is, on all of them (if you're new to Blogger and haven't figured out conditional tags yet, read this). A good place to put it is just before the "<Blogger>" tags for the appropriate pages in the template. I've done that and as you can see this inserts the language translation links just under the blog description on my main page and my archive pages e.g. for January 2005.

Deleting or changing a language

If you don't want to offer translation into any of the languages listed, delete the section for that language (the section of code for a particular language is indicated by the note e.g. "//Japanese starts here" on the line just before that section, and "//Japanese ends here" on the line just after it). So, if I don't want to offer French, I'd delete these lines:

//French starts here
'<a href="http://translate.google.com/translate?u='+Location+'&langpair=en%7Cfr&amp;amp;#38;hl=fr&ie=UTF-8&amp;ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools" target="_blank">Français</a>'
//French ends here


Notice that you should then also delete the "'+ | +' " just before or just after the deleted section, as appropriate, because (as indicated by the "//spacer and divider" comment) that's what produces the dividing " | " immediately before and after the names of the languages in the displayed list. (If you don't like "|" as a divider, replace each occurrence of " |" in the code with anything you like e.g. just "+' '+ " for a blank space).

To offer translations from languages other than English, I suggest you go to Google's language translation tools page, try doing some translations of Website or blog pages manually from and to the desired languages, and then view what is in the address bar of the results (it will look similar to parts of the above code). You can then use what’s there to adapt the code.

New window

The translation opens in a new browser window. If you want it to open in the same window, obviously you can delete each "target="_blank"" in the code.

HTML version

If you prefer to avoid Javascript (e.g. for speed of loading reasons), you can still offer translations of your home Web or blog page using HTML. In fact that's what I've done for my own blog, reserving the Javascript for my archive pages only. (I've just mentioned the Javascript version first in this post as it's the simplest to implement on any Webpage or blog.)

For HTML only, try the following code, changing "<$BlogURL$>" to the URL of your site or blog (e.g. “http://consumingexperience.blogspot.com/” in my case – without any quotation marks), or if you're on Blogger you can just use this code as is (again, just before the <Blogger> tag is a good place):

<!-- By Improbulus, http://consumingexperience.blogspot.com/ licensed under Creative Commons License http://creativecommons.org/licenses/by-nc-sa/2.0/ -->
<a target="_blank" href="http://translate.google.com/translate?u=<$BlogURL$>&langpair=en%7Cde&hl=de&ie=UTF-8&amp;ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools">Deutsch</a> | <a target="_blank" href="http://translate.google.com/translate?u=<$BlogURL$>&langpair=en%7Ces&hl=es&ie=UTF-8&amp;amp;#38;ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools">Español</a> | <a target="_blank" href="http://translate.google.com/translate?u=<$BlogURL$>&langpair=en%7Cfr&hl=fr&ie=UTF-8&amp;ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools">Français</a> | <a target="_blank" href="http://translate.google.com/translate?u=<$BlogURL$>&langpair=en%7Cit&hl=it&ie=UTF-8&amp;ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools">Italiano</a> | <a target="_blank" href="http://translate.google.com/translate?u=<$BlogURL$>&langpair=en%7Cpt&hl=pt&ie=UTF-8&amp;ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools">Português</a> | <a target="_blank" href="http://translate.google.com/translate?u=<$BlogURL$>&langpair=en%7Cja&hl=ja&ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools"> 日本語</a> | <a target="_blank" href="http://translate.google.com/translate?u=<$BlogURL$>&langpair=en%7Cko&hl=ko&ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools"> 한국어</a> | <a target="_blank" href="http://translate.google.com/translate?u=<$BlogURL$>&langpair=en%7Czh-CN&hl=zh&ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools"> 汉语</a>

(You'll have to style it yourself, something goes weird with the closing /div tag in Blogger - or just use the same div tags as I included around the Javascript version).

This will work the same as the Javascript version above, except that it won't work on your archive pages or on Webpages other than your main page - it will only translate the page whose URL is inserted into the code. You could manually change the URLs in the code to that of the page where you've added the code, but it can be a pain to do that for every one of your site pages, which is why the Javascript version is more useful - it automatically fills in the URL of whatever page you're on.

Also the HTML version won't work on Blogger archive pages (unless someone else can figure out a way?) - you'll have to use the Javascript version for those pages.

Again you can style it as you wish, delete languages (hopefully after the explanation about the Javascript version you can figure out which chunks to delete now).

Twist for Blogger and other blogs

For pages with more than one post, e.g. the main blog page and archive pages, you can offer translations of just the individual post.

This could be useful especially if, as with my blog, the page is long (which means that the last section of the full page may not get translated, see the next section below).

To do this (for Blogger.com blogs), just add the above HTML code to your template, but change "URL", every time it appears in the in the code, to <$BlogItemPermalinkURL$>. (This makes use of the special <$BlogItemPermalinkURL$> Blogger template tag which pulls in the URLs of individual post pages. You may well be able to adapt the code for non-Blogger blogs but I'm not familiar with other platforms - [Added 13 March 2004:] Nick Chase has since provided copy and paste code for Movable Type). So for example the bit for Italian which reads

<a target="_blank" href="http://translate.google.com/translate?u=<$BlogURL$>&langpair=en%7Cit&hl=it&ie=UTF-8&ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools">Italiano</a>

would be changed to read

<a target="_blank" href="http://translate.google.com/translate?u=<$BlogItemPermalinkURL$>&langpair=en%7Cit&hl=it&ie=UTF-8&ie=UTF-8&oe=UTF-8&prev=%2Flanguage_tools">Italiano</a>


- and so on for the other languages.

The code should be inserted between the <Blogger> and </Blogger> tags for your main page or archive page (or main/archive page, depending on your template is set up). On my blog I have it just after the Blogger template tags for post title and date used in my template (the <$BlogItemTitle$> and <$BlogDateHeaderDate$> tags) – your template may be different.

As mentioned before, you can style it as you wish, delete the bits for any languages you don't want, and delete each "target="_blank"" if you prefer the translation to open in the same window.

I include that code on my archive pages even though the archive page (which I've tweaked in my blog to just list the titles of my posts for that month, rather than the full posts) can be translated as one - because it lets people go straight to the translation of a post they're interested in, without having to open the English version of the post first.

Limitations

The code uses Google's translation tools, so their restrictions apply. It’s only a word for word translation so don’t expect perfection, it’s just to get a general idea (and some words will remain untranslated particularly slang).

Also it only translates a limited number of characters. As mentioned in my previous post, you can get round that by manually using Google to translate the remaining untranslated bit, breaking it into chunks and feeding one chunk at a time into Google's tool if the untranslated bit is itself too long.

Also, of course, if Google change the way their translation tool works, all bets will be off, the code above may stop working, and we'll all then have to figure out a way to use the new version...


Technorati Tags: , , , , , , , , , , , , , , , , , , , , ,

8 comments:

Xavier Ashe said...

Thanks alot! This was exactly waht I was looking for!

Improbulus said...

Glad to help Xavier. Just watch the extra stuff Blogger keeps inserting into the code - change any &amp;amp;#38; to just a single &amp; and delete all the squiggle "special remove" stuff if it's crept in. Doesn't seem to be there today, thank goodness! It varies..

The Little Master™ said...

just a bit of fun

http://www.degraeve.com/translator.php

:o)

Improbulus said...

Fun indeed TLM, thanks for the link! I believe I've heard of similar sites before. Good way of passing the time..

baby(*-*) said...

hi there,
i've managed to add those codes in my html codes. But i could only see those chinese character when i click on the preview btn. After i've save it, i couldnt see what i want in the Internet explorer.
Please advise. thanks alot..
YOu can send me an email to advise.
cindyluv01@gmail.com

Improbulus said...

Sorry for the delay in replying baby. If it works in preview but not after that, a basic thing to check but did you save the template change AND republish your blog, and then refresh the page view in Internet Explorer (hit F5)? If that doesn't work I'm not sure what else to suggest.

hikawac said...

i wonder if there's anyone write codes about translating Chinese into English...

--so far
i know there are people using alta vista(babel fish) to do it, but the translation doesn't quite match the meaning of the source language...

Improbulus said...

Hikawac, you can use exactly the same sort of code as I did in the post above, with tweaks, to translate from Chines to English.

Let me know if you need the tweaks.

However the issue is that with ANY automatic word for word translation tool, it will never exactly match the meaning of the source language. We still need bilingual human beings to do that properly!