From owner-freebsd-current@FreeBSD.ORG Tue Aug 26 09:40:29 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 049F2106564A for ; Tue, 26 Aug 2008 09:40:29 +0000 (UTC) (envelope-from svavar@kjarrval.is) Received: from ey-out-2122.google.com (ey-out-2122.google.com [74.125.78.24]) by mx1.freebsd.org (Postfix) with ESMTP id 9D1708FC1C for ; Tue, 26 Aug 2008 09:40:28 +0000 (UTC) (envelope-from svavar@kjarrval.is) Received: by ey-out-2122.google.com with SMTP id 6so232362eyi.7 for ; Tue, 26 Aug 2008 02:40:27 -0700 (PDT) Received: by 10.210.21.6 with SMTP id 6mr8136060ebu.111.1219743627209; Tue, 26 Aug 2008 02:40:27 -0700 (PDT) Received: from ?10.0.0.20? ( [194.144.25.21]) by mx.google.com with ESMTPS id k9sm34993157nfh.23.2008.08.26.02.40.25 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 26 Aug 2008 02:40:26 -0700 (PDT) Message-ID: <48B3CF6F.5020202@kjarrval.is> Date: Tue, 26 Aug 2008 09:39:59 +0000 From: =?UTF-8?B?U3ZhdmFyIEzDunRoZXJzc29u?= User-Agent: Thunderbird 2.0.0.16 (Windows/20080708) MIME-Version: 1.0 To: Tim Kientzle References: <3cb459ed0808221700w335b0906g6901d8b8bec4dad9@mail.gmail.com> <200808241415.31812.mitchell@wyatt672earp.force9.co.uk> <6a7033710808241239p1cbdc7adwd4f87814b428b10b@mail.gmail.com> <3cb459ed0808241958v552eafejf7841f0f9993928e@mail.gmail.com> <48B28B8D.9030305@kjarrval.is> <3cb459ed0808250621s28a1b825u1cc16939951bb157@mail.gmail.com> <48B336D8.2030300@kjarrval.is> <3cb459ed0808251656l5716ee51y5bddf34fb8809b0c@mail.gmail.com> <48B3544B.4020601@kjarrval.is> <48B38895.9040000@freebsd.org> In-Reply-To: <48B38895.9040000@freebsd.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-current@freebsd.org Subject: Re: Unicode-based FreeBSD X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Aug 2008 09:40:29 -0000 Tim Kientzle wrote: >> Going to UTF-8 might fix some of the character issues >> but we would be in the same shoes when it comes to characters >> which are in -16 and -32 but not in -8. > > You need to read the Unicode/ISO10646 standards again; > you do not understand them. You are right, I do not understand them. As I mentioned, I am not a Unicode expert and I have never claimed to be one. > > There are no characters in UTF-32 that are not in UTF-8. > > UTF-32, UTF-16, and UTF-8 all use exactly the same characters. > > UTF-8 encodes Unicode characters from U+000000 to U+10FFFF, using 1 to > 4 bytes per character. > > UTF-16 encodes Unicode characters from U+000000 to U+10FFFF, using 2 > to 4 bytes per character. > > UTF-32 encodes Unicode characters from U+000000 to U+10FFFF, using 4 > bytes per character. > > Practically speaking, UTF-8 is a bit more convenient for file > storage and transmission (including terminal support), UTF-16 > or UTF-32 can be slightly more convenient for internal > string manipulation. But all three encodings use exactly > the same characters. > > Tim Kientzle I cannot confirm you are 100% right because I am not an expert in Unicode. However, after some reading, I can see there is no "character loss" by using one form of Unicode than the other. Therefore, I stand corrected on that issue. I still think there should be support for UTF-16 and UTF-32 in FreeBSD in general but it is outside the scope of the topic (Unicode in syscons). Tz-Huan Huang wrote: > How do you define ``support''? > > If you mean software-level support, vim supports UTF-16, firefox > supports UTF-16/UTF-32, perl supports UTF-16/UTF-32, etc. > > If you mean system-level support, there are two cases: > > 1. The system internal text representation is still in UTF-8, just add > UTF-16/32 > support for terminal, stdin/stdout/stderr, etc. I think it's not so > hard (I might be > wrong because I don't know terminal at all) but I don't see any reason to set > locale to UTF-16 or UTF-32. > > 2. The system internal text representation is changed to UTF-16 or UTF-32. > This is another story and I have no comment on it. > > By support I meant full handling of Unicode characters which meant both 1 and 2. Although, in connection to my discovery above, I think it is better if the internal handling is (continued to be) done in UTF-8. Með kveðju / With regards, Svavar Kjarrval (svavar@kjarrval.is) s. 863-9900