From owner-svn-src-all@freebsd.org Mon Jul 2 19:03:47 2018 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D419D1035690 for ; Mon, 2 Jul 2018 19:03:46 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-it0-x230.google.com (mail-it0-x230.google.com [IPv6:2607:f8b0:4001:c0b::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5A88B759FB for ; Mon, 2 Jul 2018 19:03:46 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-it0-x230.google.com with SMTP id p17-v6so13368593itc.2 for ; Mon, 02 Jul 2018 12:03:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=L7z9QgcOY31tmarM2jj8g02ZtLkzkytVhi+FrEPkd4Q=; b=bvLJCPEMlyTto3Q5RzAjuSJMblPt5e/mnp3W/g+vRcFHx5UzJOZnxjnpFm00GS5vqV XsXh45u+MlNfgrhRIy0y7b0hb/jCUWYO9Ax+QCFPgxjJ+pM8ZeTRIXP8XBFvaVv+C9WX tZ0hXYyXwE80k3ZNEqANVhG2E3ZnlI4octy5+NvD1PoAiDgqNwwvJvga20wi9RuuZ+6Z Z+D/jZXS0hyx11DGwJaBXV/9K5U9nfleT9rQ5gWWaxb6dnj7sIVP7EIoTUTJFcjnOSS5 a3zdXL1DJcPUKqA+VenyNM4oEy26pTQSCg5NMsbZfhlXL5rvI6ptntPeHnciOuyUCY3V EgRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=L7z9QgcOY31tmarM2jj8g02ZtLkzkytVhi+FrEPkd4Q=; b=JKaK7N4fn+zAUBnm1qPvbf047d/ySkzja788Z0+ClFk71ls9tOKV5pi1Dql2bkWh+2 Ga2Nrbhk9Z+CrETvMw24uFt2PO78nj4r/zx7/wwOgFowW7cciydjXhy2kpVcEuIR19tT Ka3tbeTunwNDjphLt8+Bl4/CmIuHsEW6TOY3SJmg48smEefhiV3S6muNywFVUbBDkMIM LrLSbXSWeO3uzgscFybVNQmxIoD+jEu7SlAfs+DjybkVk+pukbBBs1x4v4TpLjVJs+nc G1S1g0cUr5Abq0aTT+2jqFztXWOUCGYvKhnyI7zoYAMvUFyoTOZSMmiyiJftmrYw3UYO +AsA== X-Gm-Message-State: APt69E06cbg9G+IUl3dxwfXdsGKXVGpo1+EdkbV5Lat8Js66GYFf9wvT vX9bo2WsdbLshyl4ql+mfenYqIaScI7lCnwr7BWhzg== X-Google-Smtp-Source: AAOMgpe+9+wrsDSrG4+sLs1a4/OwHOnbJa2M8+igEwgTeuOosmhdQCvB2tUJPEvlsaVwmx8+vJsouZyxoSImzmhwvWU= X-Received: by 2002:a24:d0d7:: with SMTP id m206-v6mr5981576itg.1.1530558225584; Mon, 02 Jul 2018 12:03:45 -0700 (PDT) MIME-Version: 1.0 References: <20180702.155529.1102410939281120947.hrs@allbsd.org> <459BD898-8072-426E-A968-96C1382AC616@icloud.com> <20180703.020956.859981414196673670.hrs@allbsd.org> In-Reply-To: <20180703.020956.859981414196673670.hrs@allbsd.org> From: Warner Losh Date: Mon, 2 Jul 2018 13:03:33 -0600 Message-ID: Subject: Re: svn commit: r335836 - head/usr.bin/top To: Hiroki Sato Cc: daichigoto@icloud.com, Eitan Adler , daichi@freebsd.org, gnn@freebsd.org, cem@freebsd.org, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.27 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Jul 2018 19:03:47 -0000 Sato-san Sorry for the top post, but your message would make an excellent intro to i18n in one of our developer guides. Warner On Mon, Jul 2, 2018, 11:13 AM Hiroki Sato wrote: > =E5=BE=8C=E8=97=A4=E5=A4=A7=E5=9C=B0 wrote > in <459BD898-8072-426E-A968-96C1382AC616@icloud.com>: > > da> > da> > da> > 2018/07/02 15:55=E3=80=81Hiroki Sato =E3=81=AE=E3= =83=A1=E3=83=BC=E3=83=AB: > da> > > da> > Eitan Adler wrote > da> > in n9ZJweJH+Di800kJ3w@mail.gmail.com>: > da> > > da> > li> On 1 July 2018 at 10:08, Conrad Meyer wrote: > da> > li> > Hi Daichi, > da> > li> > > da> > li> > > da> > li> > > da> > li> > I don't think code to decode UTF-8 belongs in top(1). I don'= t > know > da> > li> > what the goal of this routine is, but I doubt this is the > right way to > da> > li> > accomplish it. > da> > li> > da> > li> For the record, I agree. This is why I didn't click "accept" on > the > da> > li> revision. I don't fully oppose leaving it in top(1) for now as > we work > da> > li> out the API, but long term its the wrong place. > da> > li> > da> > li> https://reviews.freebsd.org/D16058 is the review. > da> > > da> > I strongly object this kind of encoding-specific routine. Please > da> > back out it. The problem is that top(1) does not support multibyte > da> > encoding in functions for printing, and using C99 wide/multibyte > da> > character manipulation API such as iswprint(3) is the way to solve > da> > it. Doing getenv("LANG") and assuming an encoding based on it is a > da> > very bad practice to internationalize software. > da> > > da> > -- Hiroki > da> > da> I respect what you mean. > da> > da> Once I back out, I will begin implementing it in a different way. > da> Please advise which function should be used for implementation > da> (iswprint (3) and what other functions should be used?) > > Roughly speaking, POSIX/XPG/C99 I18N model requires the following > steps: > > 1. Call setlocale(LC_ALL, "") first. > > 2. Use mbs<->wcs and/or mb<->wc conversion functions in C95/C99 to > manipulate characters and strings depending on what you want to > do. The printable() function should use mbtowc(3) and > iswprint(3), for example. And wcslen(3) should be used to > determine the length of characters to be printed instead of > strlen(). > > Note that if mbs->wcs or mb->wc conversion fails with EILSEQ at > some point, some of the character(s) are invalid for printing. > This can happen because command-line parameters in top(1) are not > always encoded in one specified in LC_CTYPE or LANG. It should > also be handled as non-printable. However, to make matters worse, > each process does not always use a single, same locale as top(1). > A process invoked with LANG=3Dja_JP.eucJP may have EUC-JP characters > in its ARGV array even if top(1) runs by another user whose LANG > is en_US.UTF-8. You have to determine which locale should be used > before doing mb->wc conversion. It is not so simple. > > 3. Print the multibyte characters by using strvisx(3) family, which > supports multibyte character, or swprintf(3) family if you want to > format wide characters directly. Note that buffer length for > strvisx(3) must be calculated by using MB_LEN_MAX. > > I recommend you to learn about I18N by reading the following > documents since this involves an I18N programming model, not just a > matter of which function should be used. While they are quite old > and contain system-specific topics, they are still useful to > understand general overview of how XPG4 and the relevant C95/C99 APIs > work: > > [1] Developer's Guide to Internationalization (801-6660) > https://docs.oracle.com/cd/E19457-01/801-6660/801-6660.pdf > > [2] Software Internationalization Guide (526225-002) > > https://support.hpe.com/hpsc/doc/public/display?docId=3Demr_na-c02131936 > > [3] ISO/IEC 9899:TC2 draft (p.204, Sec. 7.11 Localization) > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf > > [4] Internationalization Guide, Version 2 > ISBN: 978-0133535419 > > -- Hiroki >