Firefox 'funny' characters ?

**DimPrawn** · 22 November 2010, 19:57

You need Microsoft 'innit?

**bogeyman** · 22 November 2010, 20:02

You on a Mac Platypus?

I see the same thing on FF and Chrome (OS X 10.6.4).

I see the same thing in FF on Win XP under VMWare too.

Could be because the page is declared as charset=ISO-8859-1 (ISO LATIN 1) instead of charset=UTF-8 (Unicode).

The main problem is that non-ascii characters should be escaped or represented as entities (e.g. ” ‘ etc.).

**Platypus** · 22 November 2010, 20:35

Originally posted by bogeyman View Post

You on a Mac Platypus?

I see the same thing on FF and Chrome (OS X 10.6.4).

See the same thing in FF on Win XP under VMWare too.

I'm on Win XP SP3, native (not VM) running FF 3.6.12
But I've been seeing this for years on FF.
And I just had a quick peek using IE8 - same thing!

I tried to chase this down once, and read lots of forum posts about character sets, but the replying geeks were so busy trying to out-geek each other with what-ifs and wherefores that any useful information (i.e. a simple fix) was completely obscured

**Platypus** · 22 November 2010, 20:39

Originally posted by bogeyman View Post

Could be because the page is declared as charset=ISO-8859-1 (ISO LATIN 1) instead of charset=UTF-8 (Unicode).

The main problem is that non-ascii characters should be escaped or represented as entities (e.g. ” $lsquo; etc.).

... so does this that the webpage is in error?

EDIT: and furthermore, if it is, why don't the people who create such pages immediately see the error?

This very page is indeed ISO-8859-1

**bogeyman** · 22 November 2010, 20:44

Originally posted by Platypus View Post

I'm on Win XP SP3, native (not VM) running FF 3.6.12
But I've been seeing this for years on FF.
And I just had a quick peek using IE8 - same thing!

I tried to chase this down once, and read lots of forum posts about character sets, but the replying geeks were so busy trying to out-geek each other with what-ifs and wherefores that any useful information (i.e. a simple fix) was completely obscured

What it basically comes down to is that the text content has characters that are not part of the common character set.

The funny accented A's are just fancy curly opening/closing single or double quotes in this case.

It's CUKs' content management editor at fault I think. It should translate non-standard characters into HTML entities.

That doesn't seen to be happing for some reason.

It's not a fault with your browser or anything.

**NickFitz** · 22 November 2010, 22:46

The headlines in the sidebar come from the content management system for the main site, but the character encoding is getting mucked up for things like curly quotes: I think it's coming from over there as ISO-8859-1 but with curly quotes thrown in, then being parsed as UTF-8, then being stuck in a database configured to use ISO-8859-1

I'll see about getting it fixed

**bogeyman** · 22 November 2010, 23:40

Originally posted by NickFitz View Post

The headlines in the sidebar come from the content management system for the main site, but the character encoding is getting mucked up for things like curly quotes: I think it's coming from over there as ISO-8859-1 but with curly quotes thrown in, then being parsed as UTF-8, then being stuck in a database configured to use ISO-8859-1

I'll see about getting it fixed

Good on yer Nick, but shouldn't these characters be converted to HTML entites (“ etc.) at some point, before they hit the browser? The character encoding and code-page wouldn't matter then, would it?

**OwlHoot** · 22 November 2010, 23:49

On many web sites that host news articles these will have trundled through several steps, being parsed and converted at each hop. So there's a fair chance some developer along the line will assume text is UTF-8 when it isn't, or vice versa. One often sees munged characters even on sites like the BBC and the Telegraph. (Well, no surprise with the last, as they've probably sacked most of their developers, but you'd expect the BBC to be a bit more savvy.)

**NickFitz** · 23 November 2010, 00:47

Originally posted by bogeyman View Post

Good on yer Nick, but shouldn't these characters be converted to HTML entites (“ etc.) at some point, before they hit the browser? The character encoding and code-page wouldn't matter then, would it?

Unfortunately, it's too late by the time it gets to the point where it makes sense to use HTML entities. The way it's set up at the moment is that the news is entered into the main site CMS, which saves a copy of the headlines as an XML file on the forum server (as well as shoving the stories into the main site database, of course). My vBulletin plugin checks that file's last modification date as and when, and if it's been updated it parses the XML and shoves the headlines into the forum database, ready to be displayed in the sidebar.

It's only at display time that it makes sense to replace oddball characters with entities, and by then it's too late, as the characters got screwed up either when the file was created, when it was parsed, or when the forum database was updated - my current best guess is the parsing, but I need to confirm that.

The good news is that the main site CMS is soon to be upgraded to a system that's UTF-8 from end to end, so that should make it easier to sort things out.

Firefox 'funny' characters ?