Date: 18 Oct 2020 11:48:37 -0400 From: "John Levine" <johnl@iecc.com> To: freebsd-questions@freebsd.org Cc: naddy@mips.inka.de Subject: Re: printf(1) and UTF-8 multi-byte chars Message-ID: <20201018154838.49CBC239CEDF@ary.qy> In-Reply-To: <slrnroo8n9.1iu4.naddy@lorvorc.mips.inka.de>
next in thread | previous in thread | raw e-mail | index | archive | help
In article <slrnroo8n9.1iu4.naddy@lorvorc.mips.inka.de> you write: >On 2020-10-17, Matthias Apitz <guru@unixarea.de> wrote: > >> This means the output of printf(1) is byte oriented and not >> character oriented. > >This conforms to POSIX. I don't think there is any useful middle ground between counting bytes and full Unicode typesetting. Some Unicode characters are half- or double-width, particularly in east Asian languages, and many combine with adjacent characters depending on context, e.g., the character รถ can be the single xF6 character which is two UTF-8 bytes, or a combining diaresis x308 followed by lower case o x6F which is three UTF-8 bytes, but one space wide either way.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20201018154838.49CBC239CEDF>