Page 1 of 1

Encoding (non)issues

Posted: Fri Apr 06, 2007 12:40 am
by Chris Vogel
Although this forum declares its character set as ISO 8859-1, it instead seems to be using windows-1252, which is basically ISO 8859-1 with some control characters replaced with displayable ones. Below are the displayable characters that are part of windows-1252 but not ISO 8859-1:

€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ

Those characters should be escaped in the source code, but they are not. Documents should not contain characters outside their character sets, although I’ll admit that, since this is a common error, most browsers will just treat ISO 8859-1 as windows-1252 anyway. I’m just being pedantic, I suppose. :P

Characters outside both ISO 8859-1 and windows-1252 seem to be escaped as expected (歯ブラシ, for example).

Archived topic from Anythingforums, old topic ID:3179, old post ID:58518

Encoding (non)issues

Posted: Fri Apr 06, 2007 8:26 am
by michaelk1993adg
what?


Archived topic from Anythingforums, old topic ID:3179, old post ID:58540

Encoding (non)issues

Posted: Fri Apr 06, 2007 12:58 pm
by Red Squirrel
I noticed lot of sites have screwed up encoding. I have yet to figure out how to make Firefox force it to iso-8859. The problem is only visible on pages with french characters, they'll apear as a diamond.

Pictures are also screwed up on this board, have no idea what causes that one.

Archived topic from Anythingforums, old topic ID:3179, old post ID:58548

Encoding (non)issues

Posted: Sat Apr 07, 2007 12:34 am
by Chris Vogel
Red Squirrel wrote: I have yet to figure out how to make Firefox force it to iso-8859.
As a Web author or a Web user? If you want your pages to be served as ISO 8859-1, you would make sure Content-Type: text/html; charset=iso-8859-1 is in your HTTP headers and <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> is in your HTML, but that character set is usually the default anyway. If you want to force a different character set on a page you’re browsing in Firefox, you would go to View → Character Encoding → Western (ISO-8859-1).

I don’t know of a way to turn Firefox’s error correction off and treat ISO 8859-1 like, well, ISO 8859-1 instead of windows-1252, but that shouldn’t matter since the ISO 8859-1 characters that don’t correspond to the ones in windows-1252 can’t be used in HTML anyway. The problem arises when you use the new displayable characters in windows-1252 but serve the document as ISO 8859-1 and then rely on browsers to treat it as windows-1252 despite having declared it as something else. Browsers are not obligated to do that, and using those code points in ISO 8859-1 will make your HTML invalid.
Red Squirrel wrote: The problem is only visible on pages with french characters, they'll apear as a diamond.
Do you have an example? ISO 8859-1 mostly supports French, but it lacks single guillemets, the OE/oe ligature, the capital Y with diaeresis, and the euro sign. (Of course, those are available via HTML entities.)
Red Squirrel wrote: Pictures are also screwed up on this board, have no idea what causes that one.
Yeah, it just happened out of the blue. Maybe it was something we said? :P



Now I’ve come to a dilemma: Do I stop trying to be so typographically correct and dump my curly quotes and dashes, or do I ignore the problem?

Archived topic from Anythingforums, old topic ID:3179, old post ID:58561