From owner-freebsd-questions@FreeBSD.ORG Sun Apr 22 11:06:50 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 33D29106566B for ; Sun, 22 Apr 2012 11:06:50 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from mx02.qsc.de (mx02.qsc.de [213.148.130.14]) by mx1.freebsd.org (Postfix) with ESMTP id E181D8FC43 for ; Sun, 22 Apr 2012 11:06:49 +0000 (UTC) Received: from r56.edvax.de (port-92-195-124-250.dynamic.qsc.de [92.195.124.250]) by mx02.qsc.de (Postfix) with ESMTP id 57D591E923; Sun, 22 Apr 2012 13:06:43 +0200 (CEST) Received: from r56.edvax.de (localhost [127.0.0.1]) by r56.edvax.de (8.14.5/8.14.5) with SMTP id q3MB6gou010707; Sun, 22 Apr 2012 13:06:42 +0200 (CEST) (envelope-from freebsd@edvax.de) Date: Sun, 22 Apr 2012 13:06:42 +0200 From: Polytropon To: Matthew Seaman Message-Id: <20120422130642.cb5b09c2.freebsd@edvax.de> In-Reply-To: <4F93E159.7020807@infracaninophile.co.uk> References: <20120421055823.GA6788@tinyCurrent> <4F9253D7.7010609@locolomo.org> <4F9278A2.1020301@locolomo.org> <4F93CC95.5050209@locolomo.org> <4F93E159.7020807@infracaninophile.co.uk> Organization: EDVAX X-Mailer: Sylpheed 3.1.1 (GTK+ 2.24.5; i386-portbld-freebsd8.2) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-questions@freebsd.org Subject: Re: converting UTF-8 to HTML X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Polytropon List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Apr 2012 11:06:50 -0000 On Sun, 22 Apr 2012 11:45:45 +0100, Matthew Seaman wrote: > On 22/04/2012 10:17, Erik N=F8rgaard wrote: > > UTF-8 is variable with, ascii characters are stored as single bytes (not > > sure about iso-8859-1) while other characters are stored as two byte ch= ars. >=20 > ascii uses the low 128 values that you can assign to an unsigned char, > ie. those where the high-order bit is zero. >=20 > iso-8859-1 and the various other iso-8859-X character sets fill in the > remaining 128 characters with various other glyphs useful in latin > alphabets, so it's still one char per glyph. Other alphabets (greek, > cyrillic, etc) have similar one byte-per glyph encodings. But you have > to know what the encoding is to display the content correctly, and it is > difficult to mix chunks of text in different encodings in the same docume= nt. How about the "extended ASCII character set" that has a mixture of "non-US glyphs" and semi-graphic symbols? http://asciiset.com/extended.gif This default layout isn't tied to a specific encoding, if I remember correctly, or is it? Accessing the set as seen in the picture allows using "special character" from many languages, such as german umlauts and eszett, greek gamma and phi, danish o-slash, swedish a-circle and even the yen symbol. And the nice semi-graphic symbols to draw boxes and backgrounds, as well as card deck symbols or the "lazy L". Of course, there are no arabic or chinese letters in there, so it can be seen as a "roman-derived language" centrism (targeting europe and america in the first place). All of them are natively supported by graphic cards when running in text mode, if my assumption is correct. So this "extended set of capabilities" still is the most-minimum common functionality that one can rely on. (FreeBSD remaps some of the characters in text mode to display the semi-graphic mouse pointer, so the full set cannot be used all the time.) --=20 Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...