Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 7 Apr 2000 07:13:22 -0400
From:      Anatoly Vorobey <mellon@pobox.com>
To:        Kazutaka YOKOTA <yokota@zodiac.mech.utsunomiya-u.ac.jp>
Cc:        hackers@freebsd.org
Subject:   Re: Unicode on FreeBSD
Message-ID:  <20000407071322.A29764@sasami.jurai.net>
In-Reply-To: <200004070313.MAA03369@zodiac.mech.utsunomiya-u.ac.jp>; from yokota@zodiac.mech.utsunomiya-u.ac.jp on Fri, Apr 07, 2000 at 12:13:14PM %2B0900
References:  <20000405173037.A460@sasami.jurai.net> <200004070313.MAA03369@zodiac.mech.utsunomiya-u.ac.jp>

next in thread | previous in thread | raw e-mail | index | archive | help
I'm glad we are discussing specific technical issues now. Perhaps
we should move this discussion to freebsd-i18n once it's created?

You, Kazutaka YOKOTA, were spotted writing this on Fri, Apr 07, 2000 at 12:13:14PM +0900:
> 
> >I have suggested adding Unicode support in the keyboard driver and the
> >vga driver (more precisely, vga and syscons). As a result of such changes:
> >
> >a) keymap files would map keycodes to the desired Unicode values rather
> >than 8-bit values depending on a particular encoding, which should
> >greatly simplify /usr/share/syscons/keymaps and let applications
> >that desire so obtain Unicode input directly;
> 
> As you are well aware, the keyboard driver (and keyboard related part
> of syscons has no knowledge about the character code generated via the
> keymap.  Thus, we will need little or no modification to handle
> Unicode-based keymaps.

Well, new code must be written to translate Unicode values produced by
the (modified) keyboard driver back into 8bit for normal userland 
applications. This code would use the same encoding table that syscons
would use to translate 8bit output to Unicode before displaying it.

Moreover, a way should be provided for userland applications to receive 
Unicode input directly should they want that. One solution is 
to simply add another mode (ks_mode member of atkbd_state structure)
which would return Unicode codes directly. 

> >b) font files would map Unicode chars, rather than encoding-dependent
> >chars, to glyphs. That would greatly simplify /usr/share/syscons/fonts,
> >get rid of a huge amount of redundant information there, and allow
> >creation of unified font files describing many languages at once.
> 
> Um, well, we may be able to use a unified font file for many
> languages.  But, do not expect that we will be able to create a single
> font file which will be suitable for ALL languages.

You are right. I won't expect that.

> >c) vga code would be changed to allow 512-characters hardware fonts in
> >text modes, which will suffice to hold several languages at once. Moreover,
> 
> The pcvt driver already uses 512 chars.

True text modes create an additional problem to consider: given some
(Unicode) font files loaded into kernel, and a limited supply (512 minus
128) of available char slots, which glyphs should be loaded into the
VGA font table? In other words, which glyphs are more important than
the others? One solution is to let userland dictate this, but this isn't
completely satisfying, because then userland has two additional control
structures now to provide for the kernel: encoding table for 8bit<-->Unicode
translation and mapping table for Unicode->512chars translation, the latter
being also irrelevant for the raster modes. 

I'll look into how Linux people handled this issue.

> >in raster modes (which are pseudo-text modes -- graphic modes with
> >fast text rendering) any amount of Unicode glyphs could be displayed
> >at once. 
> 
> If we intend to display any languages at once in the console, the
> raster mode is the only solution.  I agree.  But, we need a fair
> amount of knowledge about the language/script we are dealing with, in
> order to display its text correctly.

Let's try to enumerate the issues we will run into here. After all a
new font file format depends crucially on that. We need to reach a
conclusion on what is realistic and what isn't to provide on a 
fixed-width console. For instance, I would love to be able to handle
bidirectional output and Hebrew diacritics, but I am not sure at all
this is realistic to provide.

> >UTF-8 may play a role of
> >one such particular table, which will in future allow easy way
> >to modify userland applications to support UTF-8 if desired.
> 
> Multilingual text processing in the userland is a completely different
> issue which, I think, should be discussed separately.

I agree, but I'm rather talking here about allowing (future) userland
multilingual processing, rather than what and how it should be done. 
What I mean here is that the encoding table format should be more flexible
than "one byte <-->one UCS-2 code" because that will not allow 
simple and easy UTF-8 translation in the future, should we want that.

> We need more discussion to design a reasoble implementation
> (compromise :-) which does not make lives of some people difficult by
> imposing a single rigid scheme.

Great, let's have this discussion right here and now ;)

> Unicode, as it stands now, does not seem to be THE solution which
> addresses all the issues/problems/complexities of the languages in the
> world...  It can be viewed/used as a tool, though.

I agree with that completely.

-- 
Anatoly Vorobey,
mellon@pobox.com http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000407071322.A29764>