From owner-freebsd-hackers@FreeBSD.ORG Tue Apr 28 09:25:38 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 67D06106564A for ; Tue, 28 Apr 2009 09:25:38 +0000 (UTC) (envelope-from gabor@FreeBSD.org) Received: from server.mypc.hu (server.mypc.hu [87.229.73.95]) by mx1.freebsd.org (Postfix) with ESMTP id D0DD58FC08 for ; Tue, 28 Apr 2009 09:25:37 +0000 (UTC) (envelope-from gabor@FreeBSD.org) Received: from localhost (localhost [127.0.0.1]) by server.mypc.hu (Postfix) with ESMTP id 7C65E14D5379 for ; Tue, 28 Apr 2009 11:08:51 +0200 (CEST) X-Virus-Scanned: amavisd-new at t-hosting.hu Received: from server.mypc.hu ([127.0.0.1]) by localhost (server.mypc.hu [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 7NP5MpY8x3Yr for ; Tue, 28 Apr 2009 11:08:50 +0200 (CEST) Received: from [192.168.1.105] (catv-80-98-231-64.catv.broadband.hu [80.98.231.64]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by server.mypc.hu (Postfix) with ESMTPSA id 00DC014D536F for ; Tue, 28 Apr 2009 11:08:49 +0200 (CEST) Message-ID: <49F6C7A1.6070708@FreeBSD.org> Date: Tue, 28 Apr 2009 11:08:49 +0200 From: Gabor Kovesdan User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) MIME-Version: 1.0 To: freebsd-hackers@freebsd.org References: <20090427183836.GA10793@zim.MIT.EDU> <49F5FE45.2090101@freebsd.org> <20090427193326.GA7654@britannica.bec.de> <20090427194904.GA11137@zim.MIT.EDU> In-Reply-To: <20090427194904.GA11137@zim.MIT.EDU> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: SoC 2009: BSD-licensed libiconv in base system X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Apr 2009 09:25:38 -0000 David Schultz escribió: > On Mon, Apr 27, 2009, Joerg Sonnenberger wrote: > >> On Mon, Apr 27, 2009 at 11:49:41AM -0700, Tim Kientzle wrote: >> >>> David Schultz wrote: >>> >>>> ... whether it would make more sense to standardize on something like >>>> UCS-4 for the internal representation. >>>> >>> YES. Without this, wchar_t is useless. >>> >> I strongly disagree. Everything can be represented as UCS-4 is a bad >> assumption, but something Americans and Europeans naturally don't have >> to care about. >> > > ...but isn't this moot at present because there are no > widely-accepted encodings that include characters that > aren't supported by UCS-4? Citrus doesn't seem to support > any such encodings in any case. > Citrus is based on UCS-4 as an internal encoding, just like the another BSD-licensed iconv library. This is a barrier to support encodings that aren't supported by UCS-4. > If this ever really becomes an issue, we could always stuff > locale-dependent encodings into unused UCS-4 code pages. > However, it doesn't seem worthwhile to deliberately burden > programmers over concerns that are presently, and for the > foreseeable future, hypothetical. > I'm not a Unicode expert, but isn't the reason of periodical standard reviews and changes to cover more and more human languages? We could just support the latest Unicode standard and let the Unicode workgroups map those new characters into unused code points. The Latin-based, Cyrillic, Devanagari and CJK encodings are well-supported, I think. I don't know too much about CJK encondings, though, if the thousands of ideographs are all supported or not. But I'd say the most significant languages that are used on the Internet are supported, the rest might have another problems... [OFF] It's possible that there are little poor countries with an own writing system but probably their writing system is unsupported because the starvation, poorness and lack of water and electricity are more serious problems there. My ex-girlfriend is working in Nepal in a cooperation program (it's kinda scholarship) and she told me that they only have electricity in 8 hours a day, 4 during the night and 4 during the day. There are no sidewalks for pedestrians, they go along with the cars on the street and the pollution is extremely high. Even this country's encoding is supported. What I am trying to say is that countries with unsupported languages probably won't really care about character encodings if they rarely have computers... I can just hope that their living conditions will get better and their language will be supported. I can also hope that the Unicode people will focus more on these countries instead of fucking up the time with fictionary languages from fairy tales... [1] Probably I'll go to visit her in Nepal in January, it will be an interesting experience. I'll check if I can help the IT world there with anything. [ON] Another idea to consider. Are all of our utilities wchar-clean? What about library functions? (regex is surely not) Do we lack any important utility or library? (we still do lack iconv and gettext and what else...?) What about standards, like C99 wchar functions? Is there something missing? What about POSIX if it has something related? Personally, I think that these are more important questions than support of some extremely rare languages. It's worth to consider how to deal with them later but the basic problems need a higher priority. [1] http://en.wikipedia.org/wiki/Tengwar#Unicode Cheers, -- Gabor Kovesdan FreeBSD Volunteer EMAIL: gabor@FreeBSD.org .:|:. gabor@kovesdan.org WEB: http://people.FreeBSD.org/~gabor .:|:. http://kovesdan.org