Date: Wed, 04 Jul 2018 10:42:52 +0900 (JST) From: Hiroki Sato <hrs@FreeBSD.org> To: jilles@stack.nl Cc: daichigoto@icloud.com, lists@eitanadler.com, daichi@freebsd.org, gnn@FreeBSD.org, cem@freebsd.org, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r335836 - head/usr.bin/top Message-ID: <20180704.104252.1616889858955681927.hrs@allbsd.org> In-Reply-To: <20180703211002.GA11832@stack.nl> References: <459BD898-8072-426E-A968-96C1382AC616@icloud.com> <20180703.020956.859981414196673670.hrs@allbsd.org> <20180703211002.GA11832@stack.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
----Security_Multipart(Wed_Jul__4_10_42_52_2018_164)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Jilles Tjoelker <jilles@stack.nl> wrote in <20180703211002.GA11832@stack.nl>: ji> > 3. Print the multibyte characters by using strvisx(3) family, which ji> > supports multibyte character, or swprintf(3) family if you want to ji> > format wide characters directly. Note that buffer length for ji> > strvisx(3) must be calculated by using MB_LEN_MAX. ji> ji> In this case, calling setlocale() and then using strvisx() seems the ji> right solution. If locales differ across processes this may result in ji> mojibake but that cannot really be helped. Even analyzing other ji> processes' locale variables is not fully reliable, since strings may be ji> incorrectly encoded even in the process's real locale, environment ji> variables cannot be read across users and the environment block may be ji> overwritten by a program. ji> ji> In general, although using conversion to wide characters allows users a ji> lot of flexibility, I don't think it is the best in all situations: ji> ji> * The result of mbstowcs() is a UTF-32 string which consumes a lot of ji> memory. A loop with mbrtowc() may also be slow. Many operations can be ji> done directly on UTF-8 strings with no or little additional complexity ji> compared to byte strings. ji> ji> * If there is an invalid multibyte character, there is little ji> flexibility to handle this usefully and securely, since so little is ji> known about the encoding. The best handling may depend on the context. ji> ji> Therefore, in /bin/sh, I have only implemented multibyte support for ji> UTF-8. All other encodings have bytes treated as characters. ji> ji> However, I do agree that getenv("LANG") is bad. Instead, setlocale() ji> should be used. After that, nl_langinfo(CODESET) can be called and the ji> result compared to "UTF-8". Yes, I agree that using mb->wc conversion is not always the best and using strvisx() for cmdbuf, not only for argv, is enough in this case. I thought it was difficult to avoid iswprint() because I was not sure of the goal of r335836 and it looked to me that it aimed to keep the original printable() function. And as you mentioned it may not be worth to try to correctly detect/support locales in different processes, either. Probably one of the simplest ways would be that relying on LC_CTYPE+strvisx() and documenting how top(1) handles multibyte characters in the manual page. -- Hiroki ----Security_Multipart(Wed_Jul__4_10_42_52_2018_164)-- Content-Type: application/pgp-signature Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iEYEABECAAYFAls8JhwACgkQTyzT2CeTzy1IeQCaAodTCzM9gOB5rqO81+Gy24Q1 O60AnRmFR2/cYK0ov6a3d5Tma6vk/zff =MhXt -----END PGP SIGNATURE----- ----Security_Multipart(Wed_Jul__4_10_42_52_2018_164)----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180704.104252.1616889858955681927.hrs>