Date: Sat, 30 Aug 2008 10:39:02 +0200 From: Marcus von Appen <mva@sysfault.org> To: freebsd-current@freebsd.org Subject: Re: Unicode-based FreeBSD Message-ID: <20080830083901.GA2183@medusa.sysfault.org> In-Reply-To: <3cb459ed0808291708l581422c1pdb2e3cb2913ecaa7@mail.gmail.com> References: <3cb459ed0808250952j572dfc35j2feb852a73de5ace@mail.gmail.com> <200808281718.m7SHISGL067492@lurza.secnetix.de> <3cb459ed0808290636r5eb389c8y6d4aafae1b8001cf@mail.gmail.com> <e7db6d980808291134h50bb71aaud1f4b8da7f61a174@mail.gmail.com> <3cb459ed0808291708l581422c1pdb2e3cb2913ecaa7@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--qMm9M+Fa2AknHoGS Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On, Sat Aug 30, 2008, Alexander Churanov wrote: [...] > Sample schedule: >=20 > 1) Implement support for UTF-8 in syscons with mapping UTF-8 to 256 > characters for rendering. The rest of characters will not be displayed > correctly, however proper backspacing, character counts and other things > like that will work. This is clearly better than what we have now. I wonder, how backspacing will be implemented for complex scripts such as the Indic one or Arabic, where two codepoints will be resolved to one logical (and usually visible) character. Backspacing and character counting for example has to operate on the one or other specified unicode character set. For Arabic this would be on the representation form for the visible codepoints, but character manipulation (backspacing, deletions) has to operate on the other one. Bidi layouts, as you mentioned before, are something completely out of that scope. In my opinion that'd mean either that for codepoints, which are not rendered, either the internal unicode set is used (for Arabic this'd be form 1) or the user-visible one (form 2). In either of those case the backspacing might appear broken to the user. > 2) Proceed with design discussion and cost estimation for implementing > rendering more than 256 glyphs at a time. This will only affect rendering. In theory that's no big deal. There are good and solid solutions for processing UTF-8 codepoints (libutf8 pops up my mind). What frightens me away is the implementation cost for the fonts. Using 8xXX fonts for CJK will be hardly possible as you would not be able to determine the glyphs. Creating e.g. 13x13 CYK fonts is possible without bigger implications regarding the visibility. for the CJK range I'd promote wide-character fonts as it greatly simplifies the layouting for the ASCII or KOI8-R ranges contained within them. Creating a useful CJK font however will mostly mean to implement around at least 1000-2000 characters ;-). Rendering complex RTL scripts (Arabic, Hebrew) is another issue. In case RTL layout is left aside, they still will need proper bidi support, otherwise they are useless in my opinion. For Arabic, the representation form mapping is simple and the rendering for characters should be possible. I'm not sure about the supported widths in syscons, but 8 pixels wide console fonts seem to be inappropriate to me for Arabic, Indic or Thai character sets. Syscons Gurus: can that be tweaked easily? Regards Marcus --qMm9M+Fa2AknHoGS Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (FreeBSD) iEYEARECAAYFAki5ByUACgkQo/JpszXavhw2bgCfZsWPxHwjsp0jmpMizfwurCCU gkgAn0Ea430mx81EawD8pk5A7/j0po1X =wVRE -----END PGP SIGNATURE----- --qMm9M+Fa2AknHoGS--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080830083901.GA2183>