Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 30 Aug 2008 10:39:02 +0200
From:      Marcus von Appen <mva@sysfault.org>
To:        freebsd-current@freebsd.org
Subject:   Re: Unicode-based FreeBSD
Message-ID:  <20080830083901.GA2183@medusa.sysfault.org>
In-Reply-To: <3cb459ed0808291708l581422c1pdb2e3cb2913ecaa7@mail.gmail.com>
References:  <3cb459ed0808250952j572dfc35j2feb852a73de5ace@mail.gmail.com> <200808281718.m7SHISGL067492@lurza.secnetix.de> <3cb459ed0808290636r5eb389c8y6d4aafae1b8001cf@mail.gmail.com> <e7db6d980808291134h50bb71aaud1f4b8da7f61a174@mail.gmail.com> <3cb459ed0808291708l581422c1pdb2e3cb2913ecaa7@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--qMm9M+Fa2AknHoGS
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On, Sat Aug 30, 2008, Alexander Churanov wrote:

[...]
> Sample schedule:
>=20
> 1) Implement support for UTF-8 in syscons with mapping UTF-8 to 256
> characters for rendering. The rest of characters will not be displayed
> correctly, however proper backspacing, character counts and other things
> like that will work. This is clearly better than what we have now.

I wonder, how backspacing will be implemented for complex scripts such
as the Indic one or Arabic, where two codepoints will be resolved to one
logical (and usually visible) character. Backspacing and character
counting for example has to operate on the one or other specified
unicode character set. For Arabic this would be on the representation
form for the visible codepoints, but character manipulation
(backspacing, deletions) has to operate on the other one. Bidi layouts,
as you mentioned before, are something completely out of that scope.

In my opinion that'd mean either that for codepoints, which are not
rendered, either the internal unicode set is used (for Arabic this'd be
form 1) or the user-visible one (form 2). In either of those case the
backspacing might appear broken to the user.

> 2) Proceed with design discussion and cost estimation for implementing
> rendering more than 256 glyphs at a time. This will only affect rendering.

In theory that's no big deal. There are good and solid solutions for
processing UTF-8 codepoints (libutf8 pops up my mind). What frightens me
away is the implementation cost for the fonts. Using 8xXX fonts for CJK
will be hardly possible as you would not be able to determine the
glyphs. Creating e.g. 13x13 CYK fonts is possible without bigger
implications regarding the visibility. for the CJK range I'd promote
wide-character fonts as it greatly simplifies the layouting for the
ASCII or KOI8-R ranges contained within them.

Creating a useful CJK font however will mostly mean to implement around
at least 1000-2000 characters ;-).

Rendering complex RTL scripts (Arabic, Hebrew) is another issue. In
case RTL layout is left aside, they still will need proper bidi
support, otherwise they are useless in my opinion. For Arabic, the
representation form mapping is simple and the rendering for characters
should be possible. I'm not sure about the supported widths in syscons,
but 8 pixels wide console fonts seem to be inappropriate to me for
Arabic, Indic or Thai character sets. Syscons Gurus: can that be tweaked
easily?

Regards
Marcus

--qMm9M+Fa2AknHoGS
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (FreeBSD)

iEYEARECAAYFAki5ByUACgkQo/JpszXavhw2bgCfZsWPxHwjsp0jmpMizfwurCCU
gkgAn0Ea430mx81EawD8pk5A7/j0po1X
=wVRE
-----END PGP SIGNATURE-----

--qMm9M+Fa2AknHoGS--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080830083901.GA2183>