• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Firefox 'funny' characters ?

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    Firefox 'funny' characters ?

    On CUK today, the sidebar on the RHS CUK News says

    Interview and travel costs ‘prohibitive for jobless’

    I see funny  characters, as I do on many websites where there are pound signs or single quotes.

    Can some clever person like Nick Fitz tell me how to stop this?

    EDIT: for clarity, the 'funny' character I see is a capital A with a circumflex (hat)
    Last edited by Platypus; 22 November 2010, 19:55.

    #2
    You need Microsoft 'innit?

    Comment


      #3
      You on a Mac Platypus?

      I see the same thing on FF and Chrome (OS X 10.6.4).

      I see the same thing in FF on Win XP under VMWare too.

      Could be because the page is declared as charset=ISO-8859-1 (ISO LATIN 1) instead of charset=UTF-8 (Unicode).

      The main problem is that non-ascii characters should be escaped or represented as entities (e.g. ” ‘ etc.).
      Last edited by bogeyman; 22 November 2010, 20:36.

      You've come right out the other side of the forest of irony and ended up in the desert of wrong.

      Comment


        #4
        Originally posted by bogeyman View Post
        You on a Mac Platypus?

        I see the same thing on FF and Chrome (OS X 10.6.4).

        See the same thing in FF on Win XP under VMWare too.
        I'm on Win XP SP3, native (not VM) running FF 3.6.12
        But I've been seeing this for years on FF.
        And I just had a quick peek using IE8 - same thing!

        I tried to chase this down once, and read lots of forum posts about character sets, but the replying geeks were so busy trying to out-geek each other with what-ifs and wherefores that any useful information (i.e. a simple fix) was completely obscured

        Comment


          #5
          Originally posted by bogeyman View Post
          Could be because the page is declared as charset=ISO-8859-1 (ISO LATIN 1) instead of charset=UTF-8 (Unicode).

          The main problem is that non-ascii characters should be escaped or represented as entities (e.g. ” $lsquo; etc.).
          ... so does this that the webpage is in error?

          EDIT: and furthermore, if it is, why don't the people who create such pages immediately see the error?


          This very page is indeed ISO-8859-1
          Last edited by Platypus; 22 November 2010, 20:42.

          Comment


            #6
            Originally posted by Platypus View Post
            I'm on Win XP SP3, native (not VM) running FF 3.6.12
            But I've been seeing this for years on FF.
            And I just had a quick peek using IE8 - same thing!

            I tried to chase this down once, and read lots of forum posts about character sets, but the replying geeks were so busy trying to out-geek each other with what-ifs and wherefores that any useful information (i.e. a simple fix) was completely obscured
            What it basically comes down to is that the text content has characters that are not part of the common character set.

            The funny accented A's are just fancy curly opening/closing single or double quotes in this case.

            It's CUKs' content management editor at fault I think. It should translate non-standard characters into HTML entities.

            That doesn't seen to be happing for some reason.

            It's not a fault with your browser or anything.

            You've come right out the other side of the forest of irony and ended up in the desert of wrong.

            Comment


              #7
              The headlines in the sidebar come from the content management system for the main site, but the character encoding is getting mucked up for things like curly quotes: I think it's coming from over there as ISO-8859-1 but with curly quotes thrown in, then being parsed as UTF-8, then being stuck in a database configured to use ISO-8859-1

              I'll see about getting it fixed

              Comment


                #8
                Originally posted by NickFitz View Post
                The headlines in the sidebar come from the content management system for the main site, but the character encoding is getting mucked up for things like curly quotes: I think it's coming from over there as ISO-8859-1 but with curly quotes thrown in, then being parsed as UTF-8, then being stuck in a database configured to use ISO-8859-1

                I'll see about getting it fixed
                Good on yer Nick, but shouldn't these characters be converted to HTML entites (“ etc.) at some point, before they hit the browser? The character encoding and code-page wouldn't matter then, would it?

                You've come right out the other side of the forest of irony and ended up in the desert of wrong.

                Comment


                  #9
                  On many web sites that host news articles these will have trundled through several steps, being parsed and converted at each hop. So there's a fair chance some developer along the line will assume text is UTF-8 when it isn't, or vice versa. One often sees munged characters even on sites like the BBC and the Telegraph. (Well, no surprise with the last, as they've probably sacked most of their developers, but you'd expect the BBC to be a bit more savvy.)
                  Work in the public sector? Read the IR35 FAQ here

                  Comment


                    #10
                    Originally posted by bogeyman View Post
                    Good on yer Nick, but shouldn't these characters be converted to HTML entites (“ etc.) at some point, before they hit the browser? The character encoding and code-page wouldn't matter then, would it?
                    Unfortunately, it's too late by the time it gets to the point where it makes sense to use HTML entities. The way it's set up at the moment is that the news is entered into the main site CMS, which saves a copy of the headlines as an XML file on the forum server (as well as shoving the stories into the main site database, of course). My vBulletin plugin checks that file's last modification date as and when, and if it's been updated it parses the XML and shoves the headlines into the forum database, ready to be displayed in the sidebar.

                    It's only at display time that it makes sense to replace oddball characters with entities, and by then it's too late, as the characters got screwed up either when the file was created, when it was parsed, or when the forum database was updated - my current best guess is the parsing, but I need to confirm that.

                    The good news is that the main site CMS is soon to be upgraded to a system that's UTF-8 from end to end, so that should make it easier to sort things out.

                    Comment

                    Working...
                    X