Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 08 Nov 2018 13:08:41 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 232374] /bin/sh can not handle ja_JP.eucJP character code
Message-ID:  <bug-232374-227-u7FGt8z9A5@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-232374-227@https.bugs.freebsd.org/bugzilla/>
References:  <bug-232374-227@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D232374

Yuichiro NAITO <naito.yuichiro@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |naito.yuichiro@gmail.com

--- Comment #2 from Yuichiro NAITO <naito.yuichiro@gmail.com> ---
In my investigation, main reason of this problem is because read_char()
function
doesn't retry read(2) from STDIN when mbrtowc(3) returns -2.
In lib/libedit/read.c, we can see following code that retries only when
CHARSET_IS_UTF8 flag is set.

```
                switch (ct_mbrtowc(cp, cbuf, cbp)) {
<snip>
                case (size_t)-2:
                       /*
                        * We don't support other multibyte charsets.
                        * The second condition shouldn't happen
                        * and is here merely for additional safety.
                        */
                       if ((el->el_flags & CHARSET_IS_UTF8) =3D=3D 0 ||
                           cbp >=3D MB_LEN_MAX) {
                               errno =3D EILSEQ;
                               *cp =3D L'\0';
                               return -1;
                       }
                        /* Incomplete sequence, read another byte. */
                        goto again;
```

Of course, CHARSET_IS_UTF8 flag is not set in eucJP environment.
Try cutting CHARSET_IS_UTF8 flag check, /bin/sh works to read eucJP code.

And I found another problem with cutting CHARSET_IS_UTF8 flag check.
It is that command history mistakes calculating eucJP character length,
because ct_enc_width() function in chartype.c doesn't understand other char=
set
than UTF-8.

I rewrite ct_enc_width() to use wctomb(3), command history problem is fixed.

With these two changes, we don't need CHARSET_IS_UTF8 flag any more.
CHARSET_IS_UTF8 flag controls NARROW_HISTORY flag, and NARROW_HISTORY flag
is used only in HIST_FUN definition.

```
#ifdef WIDECHAR
#define HIST_FUN(el, fn, arg) \
    (((el)->el_flags & NARROW_HISTORY) ? hist_convert(el, fn, arg) : \
        HIST_FUN_INTERNAL(el, fn, arg))
#else
#define HIST_FUN(el, fn, arg) HIST_FUN_INTERNAL(el, fn, arg)
#endif
```

In WIDECHAR environment, hist_convert() should be called always,
because hist_convert() is a multibyte aware function.

For all my fix, I opened new differential on Phabricator.

  https://reviews.freebsd.org/D17903

I believe my fix solve this problem and doesn't affect other charset than
eucJP.
Please review my code.

Hirabayashi-san:
 Could you please try my patch from Phabricator and check if this problem is
fixed?
 I don't think /bin/sh is wrong.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-232374-227-u7FGt8z9A5>