From owner-freebsd-hackers@FreeBSD.ORG  Tue Apr 28 09:25:38 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 67D06106564A
	for <freebsd-hackers@freebsd.org>; Tue, 28 Apr 2009 09:25:38 +0000 (UTC)
	(envelope-from gabor@FreeBSD.org)
Received: from server.mypc.hu (server.mypc.hu [87.229.73.95])
	by mx1.freebsd.org (Postfix) with ESMTP id D0DD58FC08
	for <freebsd-hackers@freebsd.org>; Tue, 28 Apr 2009 09:25:37 +0000 (UTC)
	(envelope-from gabor@FreeBSD.org)
Received: from localhost (localhost [127.0.0.1])
	by server.mypc.hu (Postfix) with ESMTP id 7C65E14D5379
	for <freebsd-hackers@freebsd.org>;
	Tue, 28 Apr 2009 11:08:51 +0200 (CEST)
X-Virus-Scanned: amavisd-new at t-hosting.hu
Received: from server.mypc.hu ([127.0.0.1])
	by localhost (server.mypc.hu [127.0.0.1]) (amavisd-new, port 10024)
	with LMTP id 7NP5MpY8x3Yr for <freebsd-hackers@freebsd.org>;
	Tue, 28 Apr 2009 11:08:50 +0200 (CEST)
Received: from [192.168.1.105] (catv-80-98-231-64.catv.broadband.hu
	[80.98.231.64])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by server.mypc.hu (Postfix) with ESMTPSA id 00DC014D536F
	for <freebsd-hackers@freebsd.org>;
	Tue, 28 Apr 2009 11:08:49 +0200 (CEST)
Message-ID: <49F6C7A1.6070708@FreeBSD.org>
Date: Tue, 28 Apr 2009 11:08:49 +0200
From: Gabor Kovesdan <gabor@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.21 (Windows/20090302)
MIME-Version: 1.0
To: freebsd-hackers@freebsd.org
References: <aa9f273a8313c6436e76fa9f5d587ef4.squirrel@webmail.kovesdan.org>	<20090427183836.GA10793@zim.MIT.EDU>
	<49F5FE45.2090101@freebsd.org>	<20090427193326.GA7654@britannica.bec.de>
	<20090427194904.GA11137@zim.MIT.EDU>
In-Reply-To: <20090427194904.GA11137@zim.MIT.EDU>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Subject: Re: SoC 2009: BSD-licensed libiconv in base system
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Apr 2009 09:25:38 -0000

David Schultz escribió:
> On Mon, Apr 27, 2009, Joerg Sonnenberger wrote:
>   
>> On Mon, Apr 27, 2009 at 11:49:41AM -0700, Tim Kientzle wrote:
>>     
>>> David Schultz wrote:
>>>       
>>>> ... whether it would make more sense to standardize on something like
>>>> UCS-4 for the internal representation.
>>>>         
>>> YES.  Without this, wchar_t is useless.
>>>       
>> I strongly disagree. Everything can be represented as UCS-4 is a bad
>> assumption, but something Americans and Europeans naturally don't have
>> to care about.
>>     
>
> ...but isn't this moot at present because there are no
> widely-accepted encodings that include characters that
> aren't supported by UCS-4? Citrus doesn't seem to support
> any such encodings in any case.
>   
Citrus is based on UCS-4 as an internal encoding, just like the another 
BSD-licensed iconv library. This is a barrier to support encodings that 
aren't supported by UCS-4.
> If this ever really becomes an issue, we could always stuff
> locale-dependent encodings into unused UCS-4 code pages.
> However, it doesn't seem worthwhile to deliberately burden
> programmers over concerns that are presently, and for the
> foreseeable future, hypothetical.
>   
I'm not a Unicode expert, but isn't the reason of periodical standard 
reviews and changes to cover more and more human languages? We could 
just support the latest Unicode standard and let the Unicode workgroups 
map those new characters into unused code points. The Latin-based, 
Cyrillic, Devanagari and CJK encodings are well-supported, I think. I 
don't know too much about CJK encondings, though, if the thousands of 
ideographs are all supported or not. But I'd say the most significant 
languages that are used on the Internet are supported, the rest might 
have another problems...

[OFF]
It's possible that there are little poor countries with an own writing 
system but probably their writing system is unsupported because the 
starvation, poorness and lack of water and electricity are more serious 
problems there. My ex-girlfriend is working in Nepal in a cooperation 
program (it's kinda scholarship) and she told me that they only have 
electricity in 8 hours a day, 4 during the night and 4 during the day. 
There are no sidewalks for pedestrians, they go along with the cars on 
the street and the pollution is extremely high. Even this country's 
encoding is supported. What I am trying to say is that countries with 
unsupported languages probably won't really care about character 
encodings if they rarely have computers... I can just hope that their 
living conditions will get better and their language will be supported. 
I can also hope that the Unicode people will focus more on these 
countries instead of fucking up the time with fictionary languages from 
fairy tales... [1]
Probably I'll go to visit her in Nepal in January, it will be an 
interesting experience. I'll check if I can help the IT world there with 
anything.
[ON]

Another idea to consider. Are all of our utilities wchar-clean? What 
about library functions? (regex is surely not) Do we lack any important 
utility or library? (we still do lack iconv and gettext and what 
else...?) What about standards, like C99 wchar functions? Is there 
something missing? What about POSIX if it has something related? 
Personally, I think that these are more important questions than support 
of some extremely rare languages. It's worth to consider how to deal 
with them later but the basic problems need a higher priority.


[1] http://en.wikipedia.org/wiki/Tengwar#Unicode


Cheers,

-- 
Gabor Kovesdan
FreeBSD Volunteer

EMAIL: gabor@FreeBSD.org .:|:. gabor@kovesdan.org
WEB:   http://people.FreeBSD.org/~gabor .:|:. http://kovesdan.org