From owner-freebsd-current@FreeBSD.ORG Mon Aug 25 11:05:47 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F35E01065680 for ; Mon, 25 Aug 2008 11:05:46 +0000 (UTC) (envelope-from admin@stuff.is) Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.191]) by mx1.freebsd.org (Postfix) with ESMTP id 93FF98FC25 for ; Mon, 25 Aug 2008 11:05:46 +0000 (UTC) (envelope-from admin@stuff.is) Received: by nf-out-0910.google.com with SMTP id h3so823388nfh.33 for ; Mon, 25 Aug 2008 04:05:45 -0700 (PDT) Received: by 10.210.21.13 with SMTP id 13mr6428314ebu.75.1219660713186; Mon, 25 Aug 2008 03:38:33 -0700 (PDT) Received: from ?10.0.0.20? ( [194.144.25.21]) by mx.google.com with ESMTPS id h7sm27337136nfh.4.2008.08.25.03.38.27 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 25 Aug 2008 03:38:32 -0700 (PDT) Message-ID: <48B28B8D.9030305@kjarrval.is> Date: Mon, 25 Aug 2008 10:38:05 +0000 User-Agent: Thunderbird 2.0.0.16 (Windows/20080708) MIME-Version: 1.0 To: Alexander Churanov References: <3cb459ed0808221700w335b0906g6901d8b8bec4dad9@mail.gmail.com> <200808241415.31812.mitchell@wyatt672earp.force9.co.uk> <6a7033710808241239p1cbdc7adwd4f87814b428b10b@mail.gmail.com> <3cb459ed0808241958v552eafejf7841f0f9993928e@mail.gmail.com> In-Reply-To: <3cb459ed0808241958v552eafejf7841f0f9993928e@mail.gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit From: =?UTF-8?B?U3ZhdmFyIEzDunRoZXJzc29u?= X-Mailman-Approved-At: Mon, 25 Aug 2008 11:27:40 +0000 Cc: freebsd-current@freebsd.org Subject: Re: Unicode-based FreeBSD X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Aug 2008 11:05:47 -0000 Alexander Churanov wrote: > 2008/8/24 Tz-Huan Huang > >> I'm a Chinese living in Taiwan and I am probably sure that Unicode is >> larger >> than any other Chinese character sets (including traditional and simplified >> Chinese). The UTF-8 support in FreeBSD/Xorg is good enough for me. >> I can read/type all Unicode 4.0 characters (including CJKV extension A/B) >> in Firefox or any gtk/qt programs if I have the needed font; I can produce >> documents with any Unicode characters by LaTeX+CJK package. >> It's much better than MS IE and Word because IE and Word only support >> Unicode 2.0 (or maybe 3.0, I'm not so sure). >> >> There are two reasons to use any character sets other than UTF-8: >> 1. compatibility for old programs/services or other OS. >> 2. the old man wrote the document when Unicode was not so popular and >> newbies read the old document. >> >> UTF-8 is more and more popular in Chinese, at least in Taiwan. >> Almost everything works well in my daily jobs (of course under the X). >> The major missing part is the kiconv UTF-8 support -- currently the kiconv >> doesn't support more than two bytes character conversion so there >> is no UTF-8 support for Chinese (most Chinese characters are 3-byte or >> more). I should mount msdosfs/cd9660 in zh_TW.Big5 and convert the >> filename to UTF-8 by lint or screen. >> >> IMHO, If I need Chinese support, I'll go into X. I have no reason to use >> Chinese under console even if I can read/type in Chinese. I prefer Firefox >> rather than w3m or links. :-) >> >> Regards, >> Tz-Huan >> _______________________________________________ >> freebsd-current@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" >> > > > > Tz-Huan, > > Working with Chinese text is the hard part of my solution (described in full > in freebsd-current@freebsd.org). In brief it's about moving FreeBSD to UTF-8 > completely and making syscons map UTF-8 to selected 8-bit charset for > displaying (a failsafe solution). It seems that this makes syscons somewhat > more usable for some people, but not for from East Asia, am I right? > > I was thinking of how to make working with Chinese filenames possible under > syscons, but the help of a native speaker/writer would help much, because I > know only basic facts about that matter. > > I see two alternatives of displaying unicode code points that do not fit > into selected 8-bit display charset: > > 1) Substituting with some character, like '?'. This is very affordable > solutiuon, but makes inconvenient working with files having names that do > not fit into selected charset. > > 2) Substituting with encoded code point value like "#1234;". This is more > complex solutuon, if correct baskspacing and things like that are required. > I am not ready to implement it. > > In any case, it would be nice to have some "magic" implemented: if copying > a text with substitued code points and then pasting it would case the > original UTF-8 sequence to be inserted. > > For all folks I'd like to explain again that I'm not discussing correct > rendering of non-latin scripts. It's not possible to render Devanagari in > character mode. And approach that Linux console takes is partial. The cost > of full solution is like X, freetype, freebidi and so on. > > Tz-Huan, could you comment on the proposed solution? From your point of > view, are proposed changes in syscons useful? > > Again, this does not affect X, Firefox, etc, but would make possible to have > the whole system using UTF-8 out of box. > > Alexander Churanov > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > Hi. First time poster so be nice, please. I am not an expert in Unicode but I am Icelandic and need to manage filenames which have some "special characters" in the Latin alphabet. Like á, ð, é, í, ó, ú, þ, æ and ö. Even though these characters are defined in ISO-8859-1 and -15, they cannot be directly typed (by default) in the console in FreeBSD (also applies to Debian Linux). This means I have to resort to other programs - under certain circumstances - if I want to rename, copy or delete files which have these aforementioned alphabetic characters. For me it is an annoying situation and I would gladly support ideas to fix that and especially those who enable one to type and display most Unicode characters in the console. On to my point... My suggestion is to go as far as possible with the proposed solution. There should be UTF-32, UTF-16 and UTF-8 support and the first-mentioned should be the primary charset with the others as fallback. I think only enabling UTF-8 is not going far enough and therefore I do not support Churanov's ideas to obscure non-displayable characters with other symbols. We should take future needs into account and have UTF-32 and UTF-16 solutions already available instead of having to take on a yet another rewrite when it is needed. I do not know much about other system components but the same should apply to them. I think there should be total Unicode support in FreeBSD as soon as possible. It might take some time for those ISO-8859 charsets to die out but it should not delay the transfer to Unicode. But again, I am not the most knowledgeable about Unicode. I suggest that the FreeBSD Foundation set a policy which states that every new component must have Unicode support (if applicable) and support projects to bring in Unicode support where there is none or only partial. In FreeBSD 9 or 10, we might start seeing pretty reliable Unicode support and maybe, just maybe, be able to migrate everything from other charsets to Unicode and keep it as the default. This is the chicken and the egg problem. FreeBSD does not have a good enough Unicode support because hardly anybody uses it and hardly anybody uses it because the support is not good enough. To break out of this circle, we need to acknowledge the problem and solve it. Með kveðju / With regards, Svavar Kjarrval (svavar@kjarrval.is) s. 863-9900