XMLHttpRequest and character encoding

The XMLHttpRequest transport method retrieves content over http, just like a regular http request from a web browser does.

There are two result variants:
The responseXml field holds a parsed DOM tree if the retrieved source was well formed XML
The responseText field holds the raw source, a Javascript string basically.

With current Firefox versions (1.5.x) this responseText string is always forced into UTF-8, regardless of the charset encoding sent by the originating web server. Thus valid ISO-8859-1 characters end up as illegible garbage in resulting Javascript string.
This can be a problem for instance with Greasemonkey scripts targeted at a server, which uses something other than UTF-8 as encoding format.

Now there is a solution, along the lines of this Bugzilla entry: Bug 337434 – XMLHttpRequest mangles binary data – that is: use method overrideMimeType on the XMLHttpRequest object before making the originating call.

Code example:

// XHR implementation
// overrideMimeType is available to Moz' native XHR
function requestPage(src, func) {
    var xhr = new window.XMLHttpRequest();
    xhr.onreadystatechange = function() { func(xhr); };
    xhr.open("GET", src);
    // this fixes the content type glitch...
    xhr.overrideMimeType("text/html; charset=ISO-8859-1");

This solves the problem for regular XMLHttpRequest calls, without the benefits of cross domain permissions of the GM_xmlhttpRequest method of course. It would be nice if this method could be exposed in Greasemonkey as well!

Reblog this post [with Zemanta]