From owner-freebsd-questions@FreeBSD.ORG  Sun Apr 22 11:06:50 2012
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 33D29106566B
	for <freebsd-questions@freebsd.org>;
	Sun, 22 Apr 2012 11:06:50 +0000 (UTC)
	(envelope-from freebsd@edvax.de)
Received: from mx02.qsc.de (mx02.qsc.de [213.148.130.14])
	by mx1.freebsd.org (Postfix) with ESMTP id E181D8FC43
	for <freebsd-questions@freebsd.org>;
	Sun, 22 Apr 2012 11:06:49 +0000 (UTC)
Received: from r56.edvax.de (port-92-195-124-250.dynamic.qsc.de
	[92.195.124.250]) by mx02.qsc.de (Postfix) with ESMTP id 57D591E923;
	Sun, 22 Apr 2012 13:06:43 +0200 (CEST)
Received: from r56.edvax.de (localhost [127.0.0.1])
	by r56.edvax.de (8.14.5/8.14.5) with SMTP id q3MB6gou010707;
	Sun, 22 Apr 2012 13:06:42 +0200 (CEST)
	(envelope-from freebsd@edvax.de)
Date: Sun, 22 Apr 2012 13:06:42 +0200
From: Polytropon <freebsd@edvax.de>
To: Matthew Seaman <m.seaman@infracaninophile.co.uk>
Message-Id: <20120422130642.cb5b09c2.freebsd@edvax.de>
In-Reply-To: <4F93E159.7020807@infracaninophile.co.uk>
References: <20120421055823.GA6788@tinyCurrent> <4F9253D7.7010609@locolomo.org>
	<4F9278A2.1020301@locolomo.org>
	<alpine.BSF.2.00.1204210909450.5338@abbf.6qbyyneqvnyhc.pbz>
	<4F93CC95.5050209@locolomo.org>
	<4F93E159.7020807@infracaninophile.co.uk>
Organization: EDVAX
X-Mailer: Sylpheed 3.1.1 (GTK+ 2.24.5; i386-portbld-freebsd8.2)
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-questions@freebsd.org
Subject: Re: converting UTF-8 to HTML
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Polytropon <freebsd@edvax.de>
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 22 Apr 2012 11:06:50 -0000

On Sun, 22 Apr 2012 11:45:45 +0100, Matthew Seaman wrote:
> On 22/04/2012 10:17, Erik N=F8rgaard wrote:
> > UTF-8 is variable with, ascii characters are stored as single bytes (not
> > sure about iso-8859-1) while other characters are stored as two byte ch=
ars.
>=20
> ascii uses the low 128 values that you can assign to an unsigned char,
> ie. those where the high-order bit is zero.
>=20
> iso-8859-1 and the various other iso-8859-X character sets fill in the
> remaining 128 characters with various other glyphs useful in latin
> alphabets, so it's still one char per glyph.  Other alphabets (greek,
> cyrillic, etc) have similar one byte-per glyph encodings. But you have
> to know what the encoding is to display the content correctly, and it is
> difficult to mix chunks of text in different encodings in the same docume=
nt.

How about the "extended ASCII character set" that has a mixture
of "non-US glyphs" and semi-graphic symbols?

	http://asciiset.com/extended.gif

This default layout isn't tied to a specific encoding, if I
remember correctly, or is it? Accessing the set as seen in the
picture allows using "special character" from many languages,
such as german umlauts and eszett, greek gamma and phi,
danish o-slash, swedish a-circle and even the yen symbol.
And the nice semi-graphic symbols to draw boxes and backgrounds,
as well as card deck symbols or the "lazy L".

Of course, there are no arabic or chinese letters in there,
so it can be seen as a "roman-derived language" centrism
(targeting europe and america in the first place). All of
them are natively supported by graphic cards when running
in text mode, if my assumption is correct. So this "extended
set of capabilities" still is the most-minimum common
functionality that one can rely on.

(FreeBSD remaps some of the characters in text mode to display
the semi-graphic mouse pointer, so the full set cannot be
used all the time.)


--=20
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...