From owner-freebsd-hackers Thu Jan 19 17:44:58 1995 Return-Path: hackers-owner Received: (from root@localhost) by freefall.cdrom.com (8.6.9/8.6.6) id RAA11325 for hackers-outgoing; Thu, 19 Jan 1995 17:44:58 -0800 Received: from netcom4.netcom.com (bakul@netcom4.netcom.com [192.100.81.107]) by freefall.cdrom.com (8.6.9/8.6.6) with ESMTP id RAA11313 for ; Thu, 19 Jan 1995 17:44:55 -0800 Received: from localhost by netcom4.netcom.com (8.6.9/Netcom) id RAA23680; Thu, 19 Jan 1995 17:43:55 -0800 Message-Id: <199501200143.RAA23680@netcom4.netcom.com> To: Kaleb Keithley cc: freebsd-hackers@freefall.cdrom.com Subject: Re: Internationalization (was Re: CVS stuff) In-reply-to: Your message of "Wed, 18 Jan 95 20:32:47 EST." <9501190132.AA19622@fedora.x.org> Date: Thu, 19 Jan 95 17:43:53 -0800 From: Bakul Shah Sender: hackers-owner@FreeBSD.org Precedence: bulk > >I'd rather see support for *inputting* and *displaying* > >other languages first. > You're using X aren't you? This is all built into X and has been since > R5. Well, X still doesn't do bidirectional or vertical text very well. I am using X but, as you later point out, it does not provide complete support. Also, IMHO this should be available outside of X (perhaps limited to displaying fixed width glyphs). Input/output methods support needs to be factored out so that one doesn't have to drag around all of X. > But before you can use what's built into X you need good locale support > built into the C runtime and/or OS. To my inexpert eyes what is done in Plan 9 in this area seems like a perfectly reasonable way to extend the libraries/OS. Plan 9 uses UTF-8 (invented(?) by Ken Thompson). It is an 8-bit encoding of UNICODE which is ASCII compatible. Non-ASCII chars use multi-byte sequences. It may be easier to extend tools like grep/sed/perl etc. to understand UTF-8. (Also, by definition, all ASCII data is UTF-8 compatible!). What you lose in UTF is random-access: if A is an array of chars, A[i] is not the nth UNICODE char due to the multibyte encoding. If this is a real problem, one can use a decode UTF8 to UTF16 or UNICODE and use short/long for incore representation of each char. I also think that in a text processing app. one will typically have some higher level structure for indexing so this is not a great problem. To repeat, I am not an expert -- there may be better solutions. It is just that UTF-8 would satisfy my needs. I am sure Terry Lambert can say a lot more about this internationalization issue :-) (and I actually agree with him for the most part). Bakul