From owner-svn-src-head@freebsd.org Mon Jul 2 17:13:39 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B127710301A5; Mon, 2 Jul 2018 17:13:39 +0000 (UTC) (envelope-from hrs@FreeBSD.org) Received: from mail.allbsd.org (mx-int.allbsd.org [IPv6:2001:2f0:104:e002::7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gatekeeper.allbsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 11249714C3; Mon, 2 Jul 2018 17:13:38 +0000 (UTC) (envelope-from hrs@FreeBSD.org) Received: from mail-d.allbsd.org ([IPv6:2409:11:a740:c00:58:65ff:fe00:b0b]) (authenticated bits=56) by mail.allbsd.org (8.15.2/8.15.2) with ESMTPSA id w62HDF4N043826 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) (Client CN "/OU=GT07882699/OU=See+20www.rapidssl.com/resources/cps+20+28c+2915/OU=Domain+20Control+20Validated+20-+20RapidSSL+28R+29/CN=*.allbsd.org", Issuer "/C=US/O=GeoTrust+20Inc./CN=RapidSSL+20SHA256+20CA+20-+20G3"); Tue, 3 Jul 2018 02:13:27 +0900 (JST) (envelope-from hrs@FreeBSD.org) Received: from alph.d.allbsd.org ([IPv6:2409:11:a740:c00:16:ceff:fe34:2700]) by mail-d.allbsd.org (8.15.2/8.15.2) with ESMTPS id w62HD9bm037633 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 3 Jul 2018 02:13:09 +0900 (JST) (envelope-from hrs@FreeBSD.org) Received: from localhost (localhost [IPv6:0:0:0:0:0:0:0:1]) (authenticated bits=0) by alph.d.allbsd.org (8.15.2/8.15.2) with ESMTPA id w62HD7fV037630; Tue, 3 Jul 2018 02:13:09 +0900 (JST) (envelope-from hrs@FreeBSD.org) Date: Tue, 03 Jul 2018 02:09:56 +0900 (JST) Message-Id: <20180703.020956.859981414196673670.hrs@allbsd.org> To: daichigoto@icloud.com Cc: lists@eitanadler.com, daichi@freebsd.org, gnn@FreeBSD.org, cem@freebsd.org, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r335836 - head/usr.bin/top From: Hiroki Sato In-Reply-To: <459BD898-8072-426E-A968-96C1382AC616@icloud.com> References: <20180702.155529.1102410939281120947.hrs@allbsd.org> <459BD898-8072-426E-A968-96C1382AC616@icloud.com> X-PGPkey-fingerprint: BDB3 443F A5DD B3D0 A530 FFD7 4F2C D3D8 2793 CF2D X-Mailer: Mew version 6.7 on Emacs 25.3 / Mule 6.0 (HANACHIRUSATO) Mime-Version: 1.0 Content-Type: Multipart/Signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="--Security_Multipart(Tue_Jul__3_02_09_56_2018_607)--" Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.6.2 (mail.allbsd.org [IPv6:2001:2f0:104:e001:0:0:0:41]); Mon, 02 Jul 2018 17:13:27 +0000 (UTC) X-Spam-Status: No, score=-94.8 required=13.0 tests=CONTENT_TYPE_PRESENT, ISO2022JP_BODY,QENCPTR1,QENCPTR2,RCVD_IN_AHBL,RCVD_IN_AHBL_PROXY, RCVD_IN_AHBL_SPAM,RDNS_NONE,SPF_SOFTFAIL,URIBL_SC2_SURBL,URIBL_XS_SURBL, USER_IN_WHITELIST autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mx.allbsd.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Jul 2018 17:13:40 -0000 ----Security_Multipart(Tue_Jul__3_02_09_56_2018_607)-- Content-Type: Text/Plain; charset=iso-2022-jp Content-Transfer-Encoding: 7bit 後藤大地 wrote in <459BD898-8072-426E-A968-96C1382AC616@icloud.com>: da> da> da> > 2018/07/02 15:55、Hiroki Sato のメール: da> > da> > Eitan Adler wrote da> > in : da> > da> > li> On 1 July 2018 at 10:08, Conrad Meyer wrote: da> > li> > Hi Daichi, da> > li> > da> > li> > da> > li> > da> > li> > I don't think code to decode UTF-8 belongs in top(1). I don't know da> > li> > what the goal of this routine is, but I doubt this is the right way to da> > li> > accomplish it. da> > li> da> > li> For the record, I agree. This is why I didn't click "accept" on the da> > li> revision. I don't fully oppose leaving it in top(1) for now as we work da> > li> out the API, but long term its the wrong place. da> > li> da> > li> https://reviews.freebsd.org/D16058 is the review. da> > da> > I strongly object this kind of encoding-specific routine. Please da> > back out it. The problem is that top(1) does not support multibyte da> > encoding in functions for printing, and using C99 wide/multibyte da> > character manipulation API such as iswprint(3) is the way to solve da> > it. Doing getenv("LANG") and assuming an encoding based on it is a da> > very bad practice to internationalize software. da> > da> > -- Hiroki da> da> I respect what you mean. da> da> Once I back out, I will begin implementing it in a different way. da> Please advise which function should be used for implementation da> (iswprint (3) and what other functions should be used?) Roughly speaking, POSIX/XPG/C99 I18N model requires the following steps: 1. Call setlocale(LC_ALL, "") first. 2. Use mbs<->wcs and/or mb<->wc conversion functions in C95/C99 to manipulate characters and strings depending on what you want to do. The printable() function should use mbtowc(3) and iswprint(3), for example. And wcslen(3) should be used to determine the length of characters to be printed instead of strlen(). Note that if mbs->wcs or mb->wc conversion fails with EILSEQ at some point, some of the character(s) are invalid for printing. This can happen because command-line parameters in top(1) are not always encoded in one specified in LC_CTYPE or LANG. It should also be handled as non-printable. However, to make matters worse, each process does not always use a single, same locale as top(1). A process invoked with LANG=ja_JP.eucJP may have EUC-JP characters in its ARGV array even if top(1) runs by another user whose LANG is en_US.UTF-8. You have to determine which locale should be used before doing mb->wc conversion. It is not so simple. 3. Print the multibyte characters by using strvisx(3) family, which supports multibyte character, or swprintf(3) family if you want to format wide characters directly. Note that buffer length for strvisx(3) must be calculated by using MB_LEN_MAX. I recommend you to learn about I18N by reading the following documents since this involves an I18N programming model, not just a matter of which function should be used. While they are quite old and contain system-specific topics, they are still useful to understand general overview of how XPG4 and the relevant C95/C99 APIs work: [1] Developer's Guide to Internationalization (801-6660) https://docs.oracle.com/cd/E19457-01/801-6660/801-6660.pdf [2] Software Internationalization Guide (526225-002) https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c02131936 [3] ISO/IEC 9899:TC2 draft (p.204, Sec. 7.11 Localization) http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf [4] Internationalization Guide, Version 2 ISBN: 978-0133535419 -- Hiroki ----Security_Multipart(Tue_Jul__3_02_09_56_2018_607)-- Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iEUEABECAAYFAls6XGQACgkQTyzT2CeTzy0S1gCYqZxIks21KRt8aXhWQFAbZc32 ZACcCe/wIH4C05HgRdJso+ALuG43WNk= =UBXt -----END PGP SIGNATURE----- ----Security_Multipart(Tue_Jul__3_02_09_56_2018_607)----