Date: Tue, 03 Jul 2018 02:09:56 +0900 (JST) From: Hiroki Sato <hrs@FreeBSD.org> To: daichigoto@icloud.com Cc: lists@eitanadler.com, daichi@freebsd.org, gnn@FreeBSD.org, cem@freebsd.org, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r335836 - head/usr.bin/top Message-ID: <20180703.020956.859981414196673670.hrs@allbsd.org> In-Reply-To: <459BD898-8072-426E-A968-96C1382AC616@icloud.com> References: <CAF6rxg=Zjkf6EbSgt1fBQBUDHGKWwLf=n9ZJweJH%2BDi800kJ3w@mail.gmail.com> <20180702.155529.1102410939281120947.hrs@allbsd.org> <459BD898-8072-426E-A968-96C1382AC616@icloud.com>
next in thread | previous in thread | raw e-mail | index | archive | help
----Security_Multipart(Tue_Jul__3_02_09_56_2018_607)-- Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit $B8eF#BgCO(B <daichigoto@icloud.com> wrote in <459BD898-8072-426E-A968-96C1382AC616@icloud.com>: da> da> da> > 2018/07/02 15:55$B!"(BHiroki Sato <hrs@FreeBSD.org>$B$N%a!<%k(B: da> > da> > Eitan Adler <lists@eitanadler.com> wrote da> > in <CAF6rxg=Zjkf6EbSgt1fBQBUDHGKWwLf=n9ZJweJH+Di800kJ3w@mail.gmail.com>: da> > da> > li> On 1 July 2018 at 10:08, Conrad Meyer <cem@freebsd.org> wrote: da> > li> > Hi Daichi, da> > li> > da> > li> > da> > li> > da> > li> > I don't think code to decode UTF-8 belongs in top(1). I don't know da> > li> > what the goal of this routine is, but I doubt this is the right way to da> > li> > accomplish it. da> > li> da> > li> For the record, I agree. This is why I didn't click "accept" on the da> > li> revision. I don't fully oppose leaving it in top(1) for now as we work da> > li> out the API, but long term its the wrong place. da> > li> da> > li> https://reviews.freebsd.org/D16058 is the review. da> > da> > I strongly object this kind of encoding-specific routine. Please da> > back out it. The problem is that top(1) does not support multibyte da> > encoding in functions for printing, and using C99 wide/multibyte da> > character manipulation API such as iswprint(3) is the way to solve da> > it. Doing getenv("LANG") and assuming an encoding based on it is a da> > very bad practice to internationalize software. da> > da> > -- Hiroki da> da> I respect what you mean. da> da> Once I back out, I will begin implementing it in a different way. da> Please advise which function should be used for implementation da> (iswprint (3) and what other functions should be used?) Roughly speaking, POSIX/XPG/C99 I18N model requires the following steps: 1. Call setlocale(LC_ALL, "") first. 2. Use mbs<->wcs and/or mb<->wc conversion functions in C95/C99 to manipulate characters and strings depending on what you want to do. The printable() function should use mbtowc(3) and iswprint(3), for example. And wcslen(3) should be used to determine the length of characters to be printed instead of strlen(). Note that if mbs->wcs or mb->wc conversion fails with EILSEQ at some point, some of the character(s) are invalid for printing. This can happen because command-line parameters in top(1) are not always encoded in one specified in LC_CTYPE or LANG. It should also be handled as non-printable. However, to make matters worse, each process does not always use a single, same locale as top(1). A process invoked with LANG=ja_JP.eucJP may have EUC-JP characters in its ARGV array even if top(1) runs by another user whose LANG is en_US.UTF-8. You have to determine which locale should be used before doing mb->wc conversion. It is not so simple. 3. Print the multibyte characters by using strvisx(3) family, which supports multibyte character, or swprintf(3) family if you want to format wide characters directly. Note that buffer length for strvisx(3) must be calculated by using MB_LEN_MAX. I recommend you to learn about I18N by reading the following documents since this involves an I18N programming model, not just a matter of which function should be used. While they are quite old and contain system-specific topics, they are still useful to understand general overview of how XPG4 and the relevant C95/C99 APIs work: [1] Developer's Guide to Internationalization (801-6660) https://docs.oracle.com/cd/E19457-01/801-6660/801-6660.pdf [2] Software Internationalization Guide (526225-002) https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c02131936 [3] ISO/IEC 9899:TC2 draft (p.204, Sec. 7.11 Localization) http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf [4] Internationalization Guide, Version 2 ISBN: 978-0133535419 -- Hiroki ----Security_Multipart(Tue_Jul__3_02_09_56_2018_607)-- Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iEUEABECAAYFAls6XGQACgkQTyzT2CeTzy0S1gCYqZxIks21KRt8aXhWQFAbZc32 ZACcCe/wIH4C05HgRdJso+ALuG43WNk= =UBXt -----END PGP SIGNATURE----- ----Security_Multipart(Tue_Jul__3_02_09_56_2018_607)----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180703.020956.859981414196673670.hrs>