From owner-freebsd-current@FreeBSD.ORG Mon Aug 25 23:20:55 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9B3D31065678 for ; Mon, 25 Aug 2008 23:20:55 +0000 (UTC) (envelope-from svavar@kjarrval.is) Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.190]) by mx1.freebsd.org (Postfix) with ESMTP id F11EC8FC18 for ; Mon, 25 Aug 2008 23:20:54 +0000 (UTC) (envelope-from svavar@kjarrval.is) Received: by nf-out-0910.google.com with SMTP id h3so954253nfh.33 for ; Mon, 25 Aug 2008 16:20:53 -0700 (PDT) Received: by 10.210.72.14 with SMTP id u14mr7347495eba.113.1219704570469; Mon, 25 Aug 2008 15:49:30 -0700 (PDT) Received: from ?10.0.0.20? ( [194.144.25.21]) by mx.google.com with ESMTPS id i6sm22635475gve.2.2008.08.25.15.49.26 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 25 Aug 2008 15:49:28 -0700 (PDT) Message-ID: <48B336D8.2030300@kjarrval.is> Date: Mon, 25 Aug 2008 22:48:56 +0000 From: =?UTF-8?B?U3ZhdmFyIEzDunRoZXJzc29u?= User-Agent: Thunderbird 2.0.0.16 (Windows/20080708) MIME-Version: 1.0 To: Alexander Churanov References: <3cb459ed0808221700w335b0906g6901d8b8bec4dad9@mail.gmail.com> <200808241415.31812.mitchell@wyatt672earp.force9.co.uk> <6a7033710808241239p1cbdc7adwd4f87814b428b10b@mail.gmail.com> <3cb459ed0808241958v552eafejf7841f0f9993928e@mail.gmail.com> <48B28B8D.9030305@kjarrval.is> <3cb459ed0808250621s28a1b825u1cc16939951bb157@mail.gmail.com> In-Reply-To: <3cb459ed0808250621s28a1b825u1cc16939951bb157@mail.gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-current@freebsd.org Subject: Re: Unicode-based FreeBSD X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Aug 2008 23:20:55 -0000 Alexander Churanov wrote: > Svavar, > > You have to type "special characters" that are high-bit characters of > ISO-8859-1 and -15. I have to type cyrillic characters that are high-bit > characters of koi8-r. But I am able to do this. Did you try "keymap" and > "scrnmap" settings of "rc.conf"? I am not sure, but your issue looks like > misconfiguration. > > Then, about UTFs. All three forms encode THE SAME set of code points and > from user's perspective there is no great difference. However, UTF-8 is > interoperable with ASCII and this fact makes many old applications work > without modification. I've already posted information about my experience of > using vipw with UTF-8 on FreeBSD 6.2 having LANG=ru_RU.KOI8-R to the list. > > The actual drawback of my solution is that a person will not be able to read > and type Icelandic and Russian text simultaneously in syscons console. And > that ideas of obscuring output are attempts to provide some way to > manipulate files with, say, russian names on a PC tuned for Icelandic text. > > Please note, that I DO NOT propagandize syscons character mode as a device > for working correctly with multilingual texts. For some scripts, for > example, Devanagari, syscons will NEVER work uless it is extended to > something like X, freetype, freebidi and many other tools working together. > Please, note that you can start working in true multilingual environment > right now, using, for example, X+KDE (kate and konsole) and switching them > to UTF-8. This will work. > > What I am trying to discuss is just making syscons working correctly if the > whole system is switched to UTF-8. This will not affect X and KDE, but > standard syscons FreeBSD console will fail to work correctly. Mainly the > ideas are: > > 1) Make switching everything to UTF-8 possible. > > 2) Either map non-ASCII characters to 128-chars subset of full unicode range > > Or encode them to sequences of ASCII chars. > > Or mix these approaches. > > To my mind this should result in the following abilities: > > 1) To work in graphical environment without restrictions. (this is what you > have right now) > > 2) To read and type some filenames (that contains only characters that are > mappable to 8-bit font) in a natural way. (this is also possible now, but > with 8-bit LANG, not UTF-8) > > 3) To read and type filenames that contain characters that do not fit in > current 8-bit screenmap, possibly in an unnatural way. > > The later would help if you are in Iceland and see a Chinese filename. I > want engineers that do techical support of systems to be able to delete or > rename such files even in single-user mode. I think that typing something > like "#1234;#4321;" instead of actual hieroglyph is affordable price. > > I'm just trying to be realistic and provide doable solution. I leave plans > of rewriting every bit of software to others. And I even think that latter > is not required, since syscons console is probably not heavily used now. > > Alexander Churanov > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > Hi. Again, I am not an expert in Unicode and I am not even suggesting that I know everything about FreeBSD. It is unfortunate that Alexander misunderstood me in some instances. First, I mentioned that the Icelandic characters cannot be typed by default in the console. This also counts for the times when I configure the installer to use the Icelandic keyboard and it adds keymap="icelandic.iso.acc" into my rc.conf. The Icelandic alphabet works in editors like pico but I have not found another editor where it actually displays the characters correctly. I checked edit, vi and vipw to be sure. It might be a configuration problem or a lack of it but it's better for the user experience if it works out-of-the-box. It should be enough to configure it in one place and it should work "everywhere". The primary problem of the character support in syscons is displaying specialised characters on the screen/tty. When I use the special Icelandic characters in UTF-8, each character is displayed as "??" which is very confusing to see if there are 2 or more in a row. One step in fixing that would be to enable syscons to display the correct symbols. If I press tab, it shows the symbol code for the characters. The problem is worse when the filename begins with a character I cannot write in the console and I think it even becomes even worse when displaying Cyrillic characters. How can I know if a hieroglyph corresponds to a specific character code? It could be offered as an alternative method of writing filenames. There is no easy solution to the "tech support problem", though. The drawback of your solutions is too great and I do not think it should be carried out in the way you suggest it. Of course there are certain problems with changing the filenames between languages like Russian and Icelandic since the normal keyboard only has about 100 keys and cannot possible contain all the characters in the Unicode specification. It however should not stop me from reading the filenames in the language they were written. As for writing characters in other languages, the "Windows approach" steps in the right direction by enabling me to change the input language and therefore type in characters I would not otherwise be able to with the Icelandic keyboard. If the characters are translated to Unicode, it should not matter what keyboard layout is used. As for how it would be carried out in FreeBSD, I will leave it up to the developers. The aforementioned is why I am suggesting that the system should be moved directly to UTF-32. If it is moved to UTF-8 and there is a need in the future for UTF-16 or -32, the conversion process has to start again. Like I mentioned in my former answer, the program writers do not write Unicode compatible programs because there is almost no Unicode support and the FreeBSD developers see little reason to speed up Unicode implementation because there are so few programs Unicode compatible. Therefore I think that FreeBSD should implement a Unicode support policy and move straight to UTF-32 and make it the FreeBSD default. I am not pretending that this project will be easy, painless and quick but it is better done sooner than later. Said policy could begin by announcing an active plan for Unicode support and suggest that every new FreeBSD project should support Unicode. At the same time it should suggest the same to other developers which write software for FreeBSD. When the time is right or after further steps, the FreeBSD Foundation should announce that after version X, Unicode will be default charset. At that time, the software which has Unicode support will (I hope) work flawlessly with Unicode characters. When UTF-32 would be fully supported in FreeBSD, the developers could wait for the end of the support cycle for the first version with full UTF-32 support and then make it the default in the versions to come. That way the backward compatibility would be great and for all supported versions of FreeBSD. Með kveðju / With regards, Svavar Kjarrval (svavar@kjarrval.is) s. 863-9900