Date: Mon, 2 Jul 2018 13:03:33 -0600 From: Warner Losh <imp@bsdimp.com> To: Hiroki Sato <hrs@freebsd.org> Cc: daichigoto@icloud.com, Eitan Adler <lists@eitanadler.com>, daichi@freebsd.org, gnn@freebsd.org, cem@freebsd.org, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r335836 - head/usr.bin/top Message-ID: <CANCZdfqjEQdj03NeZAeP1igUbZ7DjMFdgjZbkXzMXTod513DrQ@mail.gmail.com> In-Reply-To: <20180703.020956.859981414196673670.hrs@allbsd.org> References: <CAF6rxg=Zjkf6EbSgt1fBQBUDHGKWwLf=n9ZJweJH%2BDi800kJ3w@mail.gmail.com> <20180702.155529.1102410939281120947.hrs@allbsd.org> <459BD898-8072-426E-A968-96C1382AC616@icloud.com> <20180703.020956.859981414196673670.hrs@allbsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Sato-san Sorry for the top post, but your message would make an excellent intro to i18n in one of our developer guides. Warner On Mon, Jul 2, 2018, 11:13 AM Hiroki Sato <hrs@freebsd.org> wrote: > =E5=BE=8C=E8=97=A4=E5=A4=A7=E5=9C=B0 <daichigoto@icloud.com> wrote > in <459BD898-8072-426E-A968-96C1382AC616@icloud.com>: > > da> > da> > da> > 2018/07/02 15:55=E3=80=81Hiroki Sato <hrs@FreeBSD.org>=E3=81=AE=E3= =83=A1=E3=83=BC=E3=83=AB: > da> > > da> > Eitan Adler <lists@eitanadler.com> wrote > da> > in <CAF6rxg=3DZjkf6EbSgt1fBQBUDHGKWwLf=3D > n9ZJweJH+Di800kJ3w@mail.gmail.com>: > da> > > da> > li> On 1 July 2018 at 10:08, Conrad Meyer <cem@freebsd.org> wrote: > da> > li> > Hi Daichi, > da> > li> > > da> > li> > > da> > li> > > da> > li> > I don't think code to decode UTF-8 belongs in top(1). I don'= t > know > da> > li> > what the goal of this routine is, but I doubt this is the > right way to > da> > li> > accomplish it. > da> > li> > da> > li> For the record, I agree. This is why I didn't click "accept" on > the > da> > li> revision. I don't fully oppose leaving it in top(1) for now as > we work > da> > li> out the API, but long term its the wrong place. > da> > li> > da> > li> https://reviews.freebsd.org/D16058 is the review. > da> > > da> > I strongly object this kind of encoding-specific routine. Please > da> > back out it. The problem is that top(1) does not support multibyte > da> > encoding in functions for printing, and using C99 wide/multibyte > da> > character manipulation API such as iswprint(3) is the way to solve > da> > it. Doing getenv("LANG") and assuming an encoding based on it is a > da> > very bad practice to internationalize software. > da> > > da> > -- Hiroki > da> > da> I respect what you mean. > da> > da> Once I back out, I will begin implementing it in a different way. > da> Please advise which function should be used for implementation > da> (iswprint (3) and what other functions should be used?) > > Roughly speaking, POSIX/XPG/C99 I18N model requires the following > steps: > > 1. Call setlocale(LC_ALL, "") first. > > 2. Use mbs<->wcs and/or mb<->wc conversion functions in C95/C99 to > manipulate characters and strings depending on what you want to > do. The printable() function should use mbtowc(3) and > iswprint(3), for example. And wcslen(3) should be used to > determine the length of characters to be printed instead of > strlen(). > > Note that if mbs->wcs or mb->wc conversion fails with EILSEQ at > some point, some of the character(s) are invalid for printing. > This can happen because command-line parameters in top(1) are not > always encoded in one specified in LC_CTYPE or LANG. It should > also be handled as non-printable. However, to make matters worse, > each process does not always use a single, same locale as top(1). > A process invoked with LANG=3Dja_JP.eucJP may have EUC-JP characters > in its ARGV array even if top(1) runs by another user whose LANG > is en_US.UTF-8. You have to determine which locale should be used > before doing mb->wc conversion. It is not so simple. > > 3. Print the multibyte characters by using strvisx(3) family, which > supports multibyte character, or swprintf(3) family if you want to > format wide characters directly. Note that buffer length for > strvisx(3) must be calculated by using MB_LEN_MAX. > > I recommend you to learn about I18N by reading the following > documents since this involves an I18N programming model, not just a > matter of which function should be used. While they are quite old > and contain system-specific topics, they are still useful to > understand general overview of how XPG4 and the relevant C95/C99 APIs > work: > > [1] Developer's Guide to Internationalization (801-6660) > https://docs.oracle.com/cd/E19457-01/801-6660/801-6660.pdf > > [2] Software Internationalization Guide (526225-002) > > https://support.hpe.com/hpsc/doc/public/display?docId=3Demr_na-c02131936 > > [3] ISO/IEC 9899:TC2 draft (p.204, Sec. 7.11 Localization) > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf > > [4] Internationalization Guide, Version 2 > ISBN: 978-0133535419 > > -- Hiroki >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfqjEQdj03NeZAeP1igUbZ7DjMFdgjZbkXzMXTod513DrQ>