Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 2 Jul 2018 13:03:33 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        Hiroki Sato <hrs@freebsd.org>
Cc:        daichigoto@icloud.com, Eitan Adler <lists@eitanadler.com>, daichi@freebsd.org,  gnn@freebsd.org, cem@freebsd.org, src-committers@freebsd.org,  svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r335836 - head/usr.bin/top
Message-ID:  <CANCZdfqjEQdj03NeZAeP1igUbZ7DjMFdgjZbkXzMXTod513DrQ@mail.gmail.com>
In-Reply-To: <20180703.020956.859981414196673670.hrs@allbsd.org>
References:  <CAF6rxg=Zjkf6EbSgt1fBQBUDHGKWwLf=n9ZJweJH%2BDi800kJ3w@mail.gmail.com> <20180702.155529.1102410939281120947.hrs@allbsd.org> <459BD898-8072-426E-A968-96C1382AC616@icloud.com> <20180703.020956.859981414196673670.hrs@allbsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Sato-san

Sorry for the top post, but your message would make an excellent intro to
i18n in one of our developer guides.

Warner

On Mon, Jul 2, 2018, 11:13 AM Hiroki Sato <hrs@freebsd.org> wrote:

> =E5=BE=8C=E8=97=A4=E5=A4=A7=E5=9C=B0 <daichigoto@icloud.com> wrote
>   in <459BD898-8072-426E-A968-96C1382AC616@icloud.com>:
>
> da>
> da>
> da> > 2018/07/02 15:55=E3=80=81Hiroki Sato <hrs@FreeBSD.org>=E3=81=AE=E3=
=83=A1=E3=83=BC=E3=83=AB:
> da> >
> da> > Eitan Adler <lists@eitanadler.com> wrote
> da> >  in <CAF6rxg=3DZjkf6EbSgt1fBQBUDHGKWwLf=3D
> n9ZJweJH+Di800kJ3w@mail.gmail.com>:
> da> >
> da> > li> On 1 July 2018 at 10:08, Conrad Meyer <cem@freebsd.org> wrote:
> da> > li> > Hi Daichi,
> da> > li> >
> da> > li> >
> da> > li> >
> da> > li> > I don't think code to decode UTF-8 belongs in top(1).  I don'=
t
> know
> da> > li> > what the goal of this routine is, but I doubt this is the
> right way to
> da> > li> > accomplish it.
> da> > li>
> da> > li> For the record, I agree. This is why I didn't click "accept" on
> the
> da> > li> revision. I don't fully oppose leaving it in top(1) for now as
> we work
> da> > li> out the API, but long term its the wrong place.
> da> > li>
> da> > li> https://reviews.freebsd.org/D16058 is the review.
> da> >
> da> > I strongly object this kind of encoding-specific routine.  Please
> da> > back out it.  The problem is that top(1) does not support multibyte
> da> > encoding in functions for printing, and using C99 wide/multibyte
> da> > character manipulation API such as iswprint(3) is the way to solve
> da> > it.  Doing getenv("LANG") and assuming an encoding based on it is a
> da> > very bad practice to internationalize software.
> da> >
> da> > -- Hiroki
> da>
> da> I respect what you mean.
> da>
> da> Once I back out, I will begin implementing it in a different way.
> da> Please advise which function should be used for implementation
> da> (iswprint (3) and what other functions should be used?)
>
>  Roughly speaking, POSIX/XPG/C99 I18N model requires the following
>  steps:
>
>  1. Call setlocale(LC_ALL, "") first.
>
>  2. Use mbs<->wcs and/or mb<->wc conversion functions in C95/C99 to
>     manipulate characters and strings depending on what you want to
>     do.  The printable() function should use mbtowc(3) and
>     iswprint(3), for example.  And wcslen(3) should be used to
>     determine the length of characters to be printed instead of
>     strlen().
>
>     Note that if mbs->wcs or mb->wc conversion fails with EILSEQ at
>     some point, some of the character(s) are invalid for printing.
>     This can happen because command-line parameters in top(1) are not
>     always encoded in one specified in LC_CTYPE or LANG.  It should
>     also be handled as non-printable.  However, to make matters worse,
>     each process does not always use a single, same locale as top(1).
>     A process invoked with LANG=3Dja_JP.eucJP may have EUC-JP characters
>     in its ARGV array even if top(1) runs by another user whose LANG
>     is en_US.UTF-8.  You have to determine which locale should be used
>     before doing mb->wc conversion.  It is not so simple.
>
>  3. Print the multibyte characters by using strvisx(3) family, which
>     supports multibyte character, or swprintf(3) family if you want to
>     format wide characters directly.  Note that buffer length for
>     strvisx(3) must be calculated by using MB_LEN_MAX.
>
>  I recommend you to learn about I18N by reading the following
>  documents since this involves an I18N programming model, not just a
>  matter of which function should be used.  While they are quite old
>  and contain system-specific topics, they are still useful to
>  understand general overview of how XPG4 and the relevant C95/C99 APIs
>  work:
>
>  [1] Developer's Guide to Internationalization (801-6660)
>      https://docs.oracle.com/cd/E19457-01/801-6660/801-6660.pdf
>
>  [2] Software Internationalization Guide (526225-002)
>
> https://support.hpe.com/hpsc/doc/public/display?docId=3Demr_na-c02131936
>
>  [3] ISO/IEC 9899:TC2 draft (p.204, Sec. 7.11 Localization)
>      http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
>
>  [4] Internationalization Guide, Version 2
>      ISBN: 978-0133535419
>
> -- Hiroki
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfqjEQdj03NeZAeP1igUbZ7DjMFdgjZbkXzMXTod513DrQ>