Date: Thu, 01 Feb 2018 19:51:15 -0800 From: Bakul Shah <bakul@bitblocks.com> To: Farhan Khan <khanzf@gmail.com> Cc: freebsd-hackers@freebsd.org Subject: Re: Printing UTF-8 characters Message-ID: <20180202035130.C51F8156E80B@mail.bitblocks.com> In-Reply-To: Your message of "Thu, 01 Feb 2018 10:42:36 -0500." <CAFd4kYB_eU00Z5nBzp-iNGuELN4cy_ADGABb-boq4Fvn-a0XMg@mail.gmail.com> References: <CAFd4kYD_Q9Y84LvCGELVodt%2B30KM_KzNzoLOzudZm9kaLqGPaQ@mail.gmail.com> <20180201072831.GA2239@c720-r314251> <CAFd4kYB_eU00Z5nBzp-iNGuELN4cy_ADGABb-boq4Fvn-a0XMg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 01 Feb 2018 10:42:36 -0500 Farhan Khan <khanzf@gmail.com> wrote: > Sorry, that was a poorly phrased question on my part. Let me try again. > I am trying to make text align in columns in a terminal. My > understanding is that characters above 0x7E are 3 bytes in length. A > modern terminal will render that as either a single question-mark or > the character itself, making terminal column alignment easy. But how > would an older terminal display a 3-byte character? I am worried that > would render as 3 question marks and throw off column alignment. If > so, is there a proper way to perform alignment for both newer and > older terminals? UTF-8 can use upto 4 bytes to encode a unicode point, depending on the script. For what you want, you can use openoffice like programs that understand unicode and can do complex text layout. Normal terminal programs typically use monospace (fixed width) fonts are simply not capable of what you want. The assumption that one char means one rectangular cell on the screen is too deeply woven in them. Particularly for Indic languages this just doesn't work, You may have N unicode points, each of which require 3 bytes, all together map to a one single glyph.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180202035130.C51F8156E80B>