Date: Thu, 08 Nov 2018 13:08:41 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 232374] /bin/sh can not handle ja_JP.eucJP character code Message-ID: <bug-232374-227-u7FGt8z9A5@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-232374-227@https.bugs.freebsd.org/bugzilla/> References: <bug-232374-227@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D232374 Yuichiro NAITO <naito.yuichiro@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |naito.yuichiro@gmail.com --- Comment #2 from Yuichiro NAITO <naito.yuichiro@gmail.com> --- In my investigation, main reason of this problem is because read_char() function doesn't retry read(2) from STDIN when mbrtowc(3) returns -2. In lib/libedit/read.c, we can see following code that retries only when CHARSET_IS_UTF8 flag is set. ``` switch (ct_mbrtowc(cp, cbuf, cbp)) { <snip> case (size_t)-2: /* * We don't support other multibyte charsets. * The second condition shouldn't happen * and is here merely for additional safety. */ if ((el->el_flags & CHARSET_IS_UTF8) =3D=3D 0 || cbp >=3D MB_LEN_MAX) { errno =3D EILSEQ; *cp =3D L'\0'; return -1; } /* Incomplete sequence, read another byte. */ goto again; ``` Of course, CHARSET_IS_UTF8 flag is not set in eucJP environment. Try cutting CHARSET_IS_UTF8 flag check, /bin/sh works to read eucJP code. And I found another problem with cutting CHARSET_IS_UTF8 flag check. It is that command history mistakes calculating eucJP character length, because ct_enc_width() function in chartype.c doesn't understand other char= set than UTF-8. I rewrite ct_enc_width() to use wctomb(3), command history problem is fixed. With these two changes, we don't need CHARSET_IS_UTF8 flag any more. CHARSET_IS_UTF8 flag controls NARROW_HISTORY flag, and NARROW_HISTORY flag is used only in HIST_FUN definition. ``` #ifdef WIDECHAR #define HIST_FUN(el, fn, arg) \ (((el)->el_flags & NARROW_HISTORY) ? hist_convert(el, fn, arg) : \ HIST_FUN_INTERNAL(el, fn, arg)) #else #define HIST_FUN(el, fn, arg) HIST_FUN_INTERNAL(el, fn, arg) #endif ``` In WIDECHAR environment, hist_convert() should be called always, because hist_convert() is a multibyte aware function. For all my fix, I opened new differential on Phabricator. https://reviews.freebsd.org/D17903 I believe my fix solve this problem and doesn't affect other charset than eucJP. Please review my code. Hirabayashi-san: Could you please try my patch from Phabricator and check if this problem is fixed? I don't think /bin/sh is wrong. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-232374-227-u7FGt8z9A5>