Date: Sun, 22 Apr 2012 13:06:42 +0200 From: Polytropon <freebsd@edvax.de> To: Matthew Seaman <m.seaman@infracaninophile.co.uk> Cc: freebsd-questions@freebsd.org Subject: Re: converting UTF-8 to HTML Message-ID: <20120422130642.cb5b09c2.freebsd@edvax.de> In-Reply-To: <4F93E159.7020807@infracaninophile.co.uk> References: <20120421055823.GA6788@tinyCurrent> <4F9253D7.7010609@locolomo.org> <4F9278A2.1020301@locolomo.org> <alpine.BSF.2.00.1204210909450.5338@abbf.6qbyyneqvnyhc.pbz> <4F93CC95.5050209@locolomo.org> <4F93E159.7020807@infracaninophile.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 22 Apr 2012 11:45:45 +0100, Matthew Seaman wrote: > On 22/04/2012 10:17, Erik N=F8rgaard wrote: > > UTF-8 is variable with, ascii characters are stored as single bytes (not > > sure about iso-8859-1) while other characters are stored as two byte ch= ars. >=20 > ascii uses the low 128 values that you can assign to an unsigned char, > ie. those where the high-order bit is zero. >=20 > iso-8859-1 and the various other iso-8859-X character sets fill in the > remaining 128 characters with various other glyphs useful in latin > alphabets, so it's still one char per glyph. Other alphabets (greek, > cyrillic, etc) have similar one byte-per glyph encodings. But you have > to know what the encoding is to display the content correctly, and it is > difficult to mix chunks of text in different encodings in the same docume= nt. How about the "extended ASCII character set" that has a mixture of "non-US glyphs" and semi-graphic symbols? http://asciiset.com/extended.gif This default layout isn't tied to a specific encoding, if I remember correctly, or is it? Accessing the set as seen in the picture allows using "special character" from many languages, such as german umlauts and eszett, greek gamma and phi, danish o-slash, swedish a-circle and even the yen symbol. And the nice semi-graphic symbols to draw boxes and backgrounds, as well as card deck symbols or the "lazy L". Of course, there are no arabic or chinese letters in there, so it can be seen as a "roman-derived language" centrism (targeting europe and america in the first place). All of them are natively supported by graphic cards when running in text mode, if my assumption is correct. So this "extended set of capabilities" still is the most-minimum common functionality that one can rely on. (FreeBSD remaps some of the characters in text mode to display the semi-graphic mouse pointer, so the full set cannot be used all the time.) --=20 Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120422130642.cb5b09c2.freebsd>