Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 Nov 2023 04:56:15 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 275444] isprint() library function returns wrong when LC_CTYPE is ja_JP.SJIS (tcsh aborts by this)
Message-ID:  <bug-275444-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D275444

            Bug ID: 275444
           Summary: isprint() library function returns wrong when LC_CTYPE
                    is ja_JP.SJIS (tcsh aborts by this)
           Product: Base System
           Version: 14.0-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: misc
          Assignee: bugs@FreeBSD.org
          Reporter: uratan@miomio.jp
 Attachment #246681 text/plain
         mime type:

Created attachment 246681
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D246681&action=
=3Dedit
a test C code to confirm the problem

(I found same problem reported in bug #264299 just now, but I will report)

When LC_CTYPE is ja_JP.SJIS, isprint() library function returns
both 0 (not printable) for the char '\'(0x5c) and '~'(0x7e).
(Maybe) because of this problem, tcsh/csh will abort with core
by issuing simple "printenv" command.

See the result of the test below.
 (the test C code is attached to this report)
  +-----------------------------------------------------
  |% setenv LC_CTYPE ja_JP.eucJP
  |% ./z-test-isprint
  |isalnum('3') is 1
  |isalnum('B') is 1
  |isalnum('\') is 0
  |isalnum('~') is 0
  | isascii('3') is 1
  | isascii('B') is 1
  | isascii('\') is 1
  | isascii('~') is 1
  |isprint('3') is 1
  |isprint('B') is 1
  |isprint('\') is 1           <=3D=3D=3D
  |isprint('~') is 1           <=3D=3D=3D
  |% csh
  |% printenv
 -- omitted --
  |EDITOR=3Dvim
  |LC_CTYPE=3Dja_JP.eucJP
  |EXINIT=3Dsource ~/.exrc
  |PAGER=3Djless
  |% exit
  |exit
  |
  |% setenv LC_CTYPE ja_JP.SJIS
  |% ./z-test-isprint
  |isalnum('3') is 1
  |isalnum('B') is 1
  |isalnum('\') is 0
  |isalnum('~') is 0
  | isascii('3') is 1
  | isascii('B') is 1
  | isascii('\') is 1
  | isascii('~') is 1
  |isprint('3') is 1
  |isprint('B') is 1
  |isprint('\') is 0           <=3D=3D=3D
  |isprint('~') is 0           <=3D=3D=3D
  |% csh
  |% printenv
 -- omitted --
  |EDITOR=3Dvim
  |LC_CTYPE=3Dja_JP.SJIS
  |Segmentation fault (core dumped)       <=3D=3D=3D
  |% ls
  |csh.core         z-test-isprint*
  |typescript       z-test-isprint.c
  +-----------------------------------------------------

 - * - * -

>From here, I will describe the mechanism, using these files by short-name.
  /usr/share/locale/ja_JP.eucJP/LC_CTYPE                --> F1.eucJP/LC_CTY=
PE
  /usr/share/locale/ja_JP.SJIS/LC_CTYPE                 --> F2.SJIS/LC_CTYPE
  /usr/src/share/ctypedef/ja_JP.eucJP.src               --> F3.ja_JP.eucJP.=
src
  /usr/src/tools/tools/locale/etc/final-maps/map.eucJP  --> F4.map.eucJP
  /usr/src/tools/tools/locale/etc/final-maps/map.SJIS   --> F5.map.SJIS
  /usr/src/tools/tools/locale/etc/final-maps/widths.txt --> F6.widths.txt

  In /usr/src/share/ctypedef/,
  F1.eucJP/LC_CTYPE is made from:
                    F6.widths.txt, F4.map.eucJP, F3.ja_JP.eucJP.src
  and F2.SJIS/LC_CTYPE is:
                    F6.widths.txt, F5.map.SJIS,  F3.ja_JP.eucJP.src

  F4.map.eucJP has these mapping for the subjected characters:
    +-------------------------------
    |<REVERSE_SOLIDUS> \x5c
    |<TILDE> \x7e
    +-------------------------------
  and F5.map.SJIS has these:
    +-------------------------------
    |<YEN_SIGN> \x5c
    |<OVERLINE> \x7e
    +-------------------------------
   (other chars under \x7f have same names)

  F3.ja_JP.eucJP.src is diverted to making both LC_CTYPE file,
  it has <REVERSE_SOLIDUS> and <TILDE> in misc section (including 'print')
  but not have <YEN_SIGN> or <OVERLINE> in any section,
  see summary of F3.ja_JP.eucJP.src below.
       +-------------------------------------------
     1 |# Warning: Do not edit. This file is automatically extracted from t=
he
     2 |# tools in /usr/src/tools/tools/locale. The data is obtained from t=
he
     3 |# CLDR project, obtained from http://cldr.unicode.org/
     4 |#
---------------------------------------------------------------------------=
--
     5 |comment_char *
     6 |escape_char /
     7 |LC_CTYPE
     8 |*************
     9 |
    10 |upper   <A>;/
    11 |        <B>;/
   240 |lower   <a>;/
   478 |alpha   <CARON>;/
 12872 |space   <tab>;/
 12880 |cntrl   <NULL>;/
 12914 |graph   <EXCLAMATION_MARK>;/
 12932 |        <three>;/
 12947 |        <B>;/
 12973 |        <REVERSE_SOLIDUS>;/                <=3D=3D=3D
 13007 |        <TILDE>;/                          <=3D=3D=3D
 26017 |print   <space>;/
 26036 |        <three>;/
 26051 |        <B>;/
 26077 |        <REVERSE_SOLIDUS>;/                <=3D=3D=3D
 26111 |        <TILDE>;/                          <=3D=3D=3D
 39122 |punct   <EXCLAMATION_MARK>;/
 39140 |        <REVERSE_SOLIDUS>;/                <=3D=3D=3D
 39207 |digit   <zero>;/
 39210 |        <three>;/
 39218 |xdigit  <zero>;/
 39221 |        <three>;/
 39229 |        <B>;/
 39241 |blank   <tab>;/
 39245 |toupper (<a>,<A>);/
 39246 |        (<b>,<B>);/
 39474 |tolower (<A>,<a>);/
 39475 |        (<B>,<b>);/
 39703 |END LC_CTYPE
       +-------------------------------------------

  So char '\' and '~' are not classified to printable in F2.SJIS/LC_CTYPE,
  isprint() with LC_CTYPE=3Dja_JP.SJIS reports wrong in result, I think.

 - * - * -

WORK AROUND by quick-hack

  Make new F2.SJIS/LC_CTYPE by renaming the char names in F5.map.SJIS
  like below.
    <YEN_SIGN> --> <REVERSE_SOLIDUS>
    <OVERLINE> --> <TILDE>

WORK AROUND by proper

  Get proper file ja_JP.SJIS.src from somewhere and use it
  for making F2.SJIS/LC_CTYPE.

 - * - * -

Confirming from tcsh side

  I confirmed this problem from the tcsh side, see the code below.
  It is the function xputchar() in /usr/src/contrib/tcsh/sh.print.c.
  From line 167, xputchar() will output un-printable char as "\nnn" format.
     +-----------------------------------------------------
 144 |void
 145 |xputchar(int c)
 146 |{
 147 |    int     atr;
 148 |
 149 |    atr =3D c & ATTRIBUTES & TRIM;
 150 |    c &=3D CHAR | QUOTE;
 151 |    if (!output_raw && (c & QUOTE) =3D=3D 0) {
 152 |        if (iscntrl(c) && (ASC(c) < 0x80 || MB_CUR_MAX =3D=3D 1)) {
 153 |            if (c !=3D '\t' && c !=3D '\n'
 154 |#ifdef COLORCAT
 155 |                && !(adrof(STRcolorcat) && c =3D=3D CTL_ESC('\033'))
 156 |#endif
 157 |                && (xlate_cr || c !=3D '\r'))
 158 |            {
 159 |                xputchar('^' | atr);
 160 |                if (c =3D=3D CTL_ESC('\177'))
 161 |                    c =3D '?';
 162 |                else
 163 |                    /* Note: for IS_ASCII, this compiles to: c =3D c =
| 0100
*/
 164 |                    c =3D CTL_ESC(ASC(c)|0100);
 165 |            }
 166 |        }
 167 |        else if (!isprint(c) && (ASC(c) < 0x80 || MB_CUR_MAX =3D=3D 1=
)) {
 168 |            xputchar('\\' | atr);
 169 |            xputchar((((c >> 6) & 7) + '0') | atr);
 170 |            xputchar((((c >> 3) & 7) + '0') | atr);
 171 |            c =3D (c & 7) + '0';
 172 |        }
 173 |        (void) putraw(c | atr);
 174 |    }
 175 |    else {
 176 |        c &=3D TRIM;
 177 |        if (haderr ? (didfds ? is2atty : isdiagatty) :
 178 |            (didfds ? is1atty : isoutatty))
 179 |            SetAttributes(c | atr);
 180 |        (void) putpure(c);
 181 |    }
 182 |    if (lbuffed && (c & CHAR) =3D=3D '\n')
 183 |        flush();
 184 |}
     +-----------------------------------------------------

  The trigger was the '~' char in my environment variable EXINIT,
  it is detected un-printable (wrongly).
  So the xputchar() put first '\' by calling himself recursively,
  then, in the child call, the '\' char is detected also un-printable
  wrongly, so he calls himself once more, more, more...

  At the result, an infinite-recursive-function-call is established
  and loops until the stack overflow.
  The '\' char should never be un-printable for xputchar().

 - * - * -

p.s.
  I am satisfied enough by my quick-hack now...

  It seems that the scope of the char names like <TILDE> are closed
  within F3.ja_JP.eucJP.src, F4.map.eucJP and F5.map.SJIS,
  and seems that the char names are not included in F1.eucJP/LC_CTYPE
  and F2.SJIS/LC_CTYPE,
  so my quick-hack may be the complete solution if it is true...

  Also because it is very natural, I think, that isXXXX() functions
  with ja_JP.ANY environment return same result for char code
  from 0x00 to 0x7f regardless of LC_CTYPE configuration...
    (regardless of the apperarance of the char/font)

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-275444-227>