From owner-freebsd-questions@FreeBSD.ORG Sun Apr 22 12:56:16 2012 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 380331065670 for ; Sun, 22 Apr 2012 12:56:16 +0000 (UTC) (envelope-from norgaard@locolomo.org) Received: from mail.locolomo.org (97.pool85-48-194.static.orange.es [85.48.194.97]) by mx1.freebsd.org (Postfix) with ESMTP id DB8AE8FC16 for ; Sun, 22 Apr 2012 12:56:15 +0000 (UTC) Received: from gamma.lan.locolomo.org (gamma.lan.locolomo.org [192.168.0.33]) by mail.locolomo.org (Postfix) with ESMTPSA id D47EF1C0841 for ; Sun, 22 Apr 2012 14:56:14 +0200 (CEST) Message-ID: <4F93FFEE.4040905@locolomo.org> Date: Sun, 22 Apr 2012 14:56:14 +0200 From: =?ISO-8859-1?Q?Erik_N=F8rgaard?= User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 To: freebsd-questions@freebsd.org References: <20120421055823.GA6788@tinyCurrent> <4F9253D7.7010609@locolomo.org> <4F9278A2.1020301@locolomo.org> <4F93CC95.5050209@locolomo.org> <4F93E159.7020807@infracaninophile.co.uk> <20120422130642.cb5b09c2.freebsd@edvax.de> In-Reply-To: <20120422130642.cb5b09c2.freebsd@edvax.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: converting UTF-8 to HTML X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Apr 2012 12:56:16 -0000 On 22/04/2012 13:06, Polytropon wrote: > How about the "extended ASCII character set" that has a mixture > of "non-US glyphs" and semi-graphic symbols? > > http://asciiset.com/extended.gif I can't even write my name in that character set. As long as there are multiple charactersets you will have the problem of some characters being shown wrong. This is nothing particular for UTF-8, you have the problem even when choosing between the 10+ different ISO-8859. The only thing that UTF-8 introduce is the variable byte length characters so you can't equate no. bytes with no. characters. Cheers, Erik -- M: +34 666 334 818 T: +34 915 211 157