Date: 18 Oct 2020 14:05:46 -0400 From: "John R. Levine" <johnl@iecc.com> To: "Steve O'Hara-Smith" <steve@sohara.org> Cc: freebsd-questions@freebsd.org, naddy@mips.inka.de Subject: Re: printf(1) and UTF-8 multi-byte chars Message-ID: <3c62a326-887f-4f4e-dbb2-56666f7571a0@iecc.com> In-Reply-To: <20201018182309.490ff752536eae2092533c5a@sohara.org> References: <slrnroo8n9.1iu4.naddy@lorvorc.mips.inka.de> <20201018154838.49CBC239CEDF@ary.qy> <20201018182309.490ff752536eae2092533c5a@sohara.org>
next in thread | previous in thread | raw e-mail | index | archive | help
> There are good reasons for using all three levels, here are some: > > Bytes: Content length headers, malloc calls - storage related Sure. > Glyphs: Truncation, apparent length, sorting - appearance related Not so much. I suppose it's preferable to truncate at a glyph boundary, but sorting UTF-8 bytes gives you the same order as sorting the glyphs, and for useful sorting you need to deal with issues like normalized forms and case folding. Not sure what use apparent length would be since the number of glyphs tells you neither the number of visible characters nor how wide they are. > Unicode Characters: UTF-8/16/32 conversions - encoding related That and a lot of composition and display issues. Regards, John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies", Please consider the environment before reading this e-mail. https://jl.ly
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3c62a326-887f-4f4e-dbb2-56666f7571a0>