From owner-freebsd-hackers  Thu Jan 19 17:44:58 1995
Return-Path: hackers-owner
Received: (from root@localhost) by freefall.cdrom.com (8.6.9/8.6.6) id RAA11325 for hackers-outgoing; Thu, 19 Jan 1995 17:44:58 -0800
Received: from netcom4.netcom.com (bakul@netcom4.netcom.com [192.100.81.107]) by freefall.cdrom.com (8.6.9/8.6.6) with ESMTP id RAA11313 for <freebsd-hackers@freefall.cdrom.com>; Thu, 19 Jan 1995 17:44:55 -0800
Received: from localhost by netcom4.netcom.com (8.6.9/Netcom)
	id RAA23680; Thu, 19 Jan 1995 17:43:55 -0800
Message-Id: <199501200143.RAA23680@netcom4.netcom.com>
To: Kaleb Keithley <kaleb@x.org>
cc: freebsd-hackers@freefall.cdrom.com
Subject: Re: Internationalization (was Re: CVS stuff) 
In-reply-to: Your message of "Wed, 18 Jan 95 20:32:47 EST."
             <9501190132.AA19622@fedora.x.org> 
Date: Thu, 19 Jan 95 17:43:53 -0800
From: Bakul Shah <bakul@netcom.com>
Sender: hackers-owner@FreeBSD.org
Precedence: bulk

> >I'd rather see support for *inputting* and *displaying*
> >other languages first.

> You're using X aren't you? This is all built into X and has been since
> R5. Well, X still doesn't do bidirectional or vertical text very well.

I am using X but, as you later point out, it does not
provide complete support.  Also, IMHO this should be
available outside of X (perhaps limited to displaying fixed
width glyphs).  Input/output methods support needs to be
factored out so that one doesn't have to drag around all of
X.

> But before you can use what's built into X you need good locale support
> built into the C runtime and/or OS.

To my inexpert eyes what is done in Plan 9 in this area
seems like a perfectly reasonable way to extend the
libraries/OS.  Plan 9 uses UTF-8 (invented(?) by Ken
Thompson).  It is an 8-bit encoding of UNICODE which is
ASCII compatible.  Non-ASCII chars use multi-byte sequences.
It may be easier to extend tools like grep/sed/perl etc. to
understand UTF-8.  (Also, by definition, all ASCII data is
UTF-8 compatible!).  What you lose in UTF is random-access:
if A is an array of chars, A[i] is not the nth UNICODE char
due to the multibyte encoding.  If this is a real problem,
one can use a decode UTF8 to UTF16 or UNICODE and use
short/long for incore representation of each char.  I also
think that in a text processing app. one will typically have
some higher level structure for indexing so this is not a
great problem.

To repeat, I am not an expert -- there may be better
solutions.  It is just that UTF-8 would satisfy my needs.  I
am sure Terry Lambert can say a lot more about this
internationalization issue :-)  (and I actually agree with
him for the most part).

Bakul