From nobody Thu Nov 30 04:56:15 2023 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4SgkRg3fbdz52pJt for ; Thu, 30 Nov 2023 04:56:15 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4SgkRg0xQCz4Lvt for ; Thu, 30 Nov 2023 04:56:15 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1701320175; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=BONHea3/L4mQDA47eNQoWSxhfpXy8Fzrrn6iwN8wyYo=; b=nQvYu89Xdc/hN6iG6eEO5AWgrOwRHsW44wkQYJLkCSdqANpKThq38LaeQ3GTnASpiPEey6 Hwgb1Li7SDE4YFjMq8qWjibXSYkGsxgKUoOvoGG4d5S/xVfkPmgXcZNj2A7Mv0iu/vgX3Z 9rSjDsJ9M4aNvvlYwHZWciN7nNUscL8Y228uhDS73iLlcKoYxvDY3o1QS4FHlqnEhJTLef 7vuqHgx/iA8I+TTtHa0QhUOB7WKXgs5Fia0wWD0hxU6z1nQ1W3dPS5Sva0Z1THvLN5tVnV Kc+rA7fNT3wEfwkP8cKnNH4LSiQdmUiLJ6AJ7HFIiVtpbRh4FOB9s59AtYXV9w== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1701320175; a=rsa-sha256; cv=none; b=b0jHqvtSNWmCb50SwG0ZJFoq8j1xI5gwCAbuovC0anndDFpsZPgHJenfQ3AcSQFQEv6iiw Y+f7Lk7+AmfwO920Wnjry4fY1O0lih7pm9k0SD9B3pI5RSRznFe7P535F2blTAmcKYbrgF VEYjeg8rhanwOKEvT5R6LyDJGRiioqo1sN8q5ziuyhH9DLhBxeqnUjpJ4+QgxrehqR5Csv /qfV546vWS2CQAaasXqpz0Ok5QnOg5ASvEav5yhvw0E1pFGW5wv3M6ex4gr3QA8Z7xBiJA dD7MOSK8LCq4YIbyNwEdSAt0ln+u+EyMZhYH3xpyKGimiyJK0YWS4IszmkHPTA== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4SgkRg0102zc49 for ; Thu, 30 Nov 2023 04:56:15 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 3AU4uE4u058892 for ; Thu, 30 Nov 2023 04:56:14 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 3AU4uEDI058891 for bugs@FreeBSD.org; Thu, 30 Nov 2023 04:56:14 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 275444] isprint() library function returns wrong when LC_CTYPE is ja_JP.SJIS (tcsh aborts by this) Date: Thu, 30 Nov 2023 04:56:15 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: misc X-Bugzilla-Version: 14.0-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: uratan@miomio.jp X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter attachments.mimetype attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@freebsd.org MIME-Version: 1.0 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D275444 Bug ID: 275444 Summary: isprint() library function returns wrong when LC_CTYPE is ja_JP.SJIS (tcsh aborts by this) Product: Base System Version: 14.0-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: misc Assignee: bugs@FreeBSD.org Reporter: uratan@miomio.jp Attachment #246681 text/plain mime type: Created attachment 246681 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D246681&action= =3Dedit a test C code to confirm the problem (I found same problem reported in bug #264299 just now, but I will report) When LC_CTYPE is ja_JP.SJIS, isprint() library function returns both 0 (not printable) for the char '\'(0x5c) and '~'(0x7e). (Maybe) because of this problem, tcsh/csh will abort with core by issuing simple "printenv" command. See the result of the test below. (the test C code is attached to this report) +----------------------------------------------------- |% setenv LC_CTYPE ja_JP.eucJP |% ./z-test-isprint |isalnum('3') is 1 |isalnum('B') is 1 |isalnum('\') is 0 |isalnum('~') is 0 | isascii('3') is 1 | isascii('B') is 1 | isascii('\') is 1 | isascii('~') is 1 |isprint('3') is 1 |isprint('B') is 1 |isprint('\') is 1 <=3D=3D=3D |isprint('~') is 1 <=3D=3D=3D |% csh |% printenv -- omitted -- |EDITOR=3Dvim |LC_CTYPE=3Dja_JP.eucJP |EXINIT=3Dsource ~/.exrc |PAGER=3Djless |% exit |exit | |% setenv LC_CTYPE ja_JP.SJIS |% ./z-test-isprint |isalnum('3') is 1 |isalnum('B') is 1 |isalnum('\') is 0 |isalnum('~') is 0 | isascii('3') is 1 | isascii('B') is 1 | isascii('\') is 1 | isascii('~') is 1 |isprint('3') is 1 |isprint('B') is 1 |isprint('\') is 0 <=3D=3D=3D |isprint('~') is 0 <=3D=3D=3D |% csh |% printenv -- omitted -- |EDITOR=3Dvim |LC_CTYPE=3Dja_JP.SJIS |Segmentation fault (core dumped) <=3D=3D=3D |% ls |csh.core z-test-isprint* |typescript z-test-isprint.c +----------------------------------------------------- - * - * - >From here, I will describe the mechanism, using these files by short-name. /usr/share/locale/ja_JP.eucJP/LC_CTYPE --> F1.eucJP/LC_CTY= PE /usr/share/locale/ja_JP.SJIS/LC_CTYPE --> F2.SJIS/LC_CTYPE /usr/src/share/ctypedef/ja_JP.eucJP.src --> F3.ja_JP.eucJP.= src /usr/src/tools/tools/locale/etc/final-maps/map.eucJP --> F4.map.eucJP /usr/src/tools/tools/locale/etc/final-maps/map.SJIS --> F5.map.SJIS /usr/src/tools/tools/locale/etc/final-maps/widths.txt --> F6.widths.txt In /usr/src/share/ctypedef/, F1.eucJP/LC_CTYPE is made from: F6.widths.txt, F4.map.eucJP, F3.ja_JP.eucJP.src and F2.SJIS/LC_CTYPE is: F6.widths.txt, F5.map.SJIS, F3.ja_JP.eucJP.src F4.map.eucJP has these mapping for the subjected characters: +------------------------------- | \x5c | \x7e +------------------------------- and F5.map.SJIS has these: +------------------------------- | \x5c | \x7e +------------------------------- (other chars under \x7f have same names) F3.ja_JP.eucJP.src is diverted to making both LC_CTYPE file, it has and in misc section (including 'print') but not have or in any section, see summary of F3.ja_JP.eucJP.src below. +------------------------------------------- 1 |# Warning: Do not edit. This file is automatically extracted from t= he 2 |# tools in /usr/src/tools/tools/locale. The data is obtained from t= he 3 |# CLDR project, obtained from http://cldr.unicode.org/ 4 |# ---------------------------------------------------------------------------= -- 5 |comment_char * 6 |escape_char / 7 |LC_CTYPE 8 |************* 9 | 10 |upper ;/ 11 | ;/ 240 |lower ;/ 478 |alpha ;/ 12872 |space ;/ 12880 |cntrl ;/ 12914 |graph ;/ 12932 | ;/ 12947 | ;/ 12973 | ;/ <=3D=3D=3D 13007 | ;/ <=3D=3D=3D 26017 |print ;/ 26036 | ;/ 26051 | ;/ 26077 | ;/ <=3D=3D=3D 26111 | ;/ <=3D=3D=3D 39122 |punct ;/ 39140 | ;/ <=3D=3D=3D 39207 |digit ;/ 39210 | ;/ 39218 |xdigit ;/ 39221 | ;/ 39229 | ;/ 39241 |blank ;/ 39245 |toupper (,);/ 39246 | (,);/ 39474 |tolower (,);/ 39475 | (,);/ 39703 |END LC_CTYPE +------------------------------------------- So char '\' and '~' are not classified to printable in F2.SJIS/LC_CTYPE, isprint() with LC_CTYPE=3Dja_JP.SJIS reports wrong in result, I think. - * - * - WORK AROUND by quick-hack Make new F2.SJIS/LC_CTYPE by renaming the char names in F5.map.SJIS like below. --> --> WORK AROUND by proper Get proper file ja_JP.SJIS.src from somewhere and use it for making F2.SJIS/LC_CTYPE. - * - * - Confirming from tcsh side I confirmed this problem from the tcsh side, see the code below. It is the function xputchar() in /usr/src/contrib/tcsh/sh.print.c. From line 167, xputchar() will output un-printable char as "\nnn" format. +----------------------------------------------------- 144 |void 145 |xputchar(int c) 146 |{ 147 | int atr; 148 | 149 | atr =3D c & ATTRIBUTES & TRIM; 150 | c &=3D CHAR | QUOTE; 151 | if (!output_raw && (c & QUOTE) =3D=3D 0) { 152 | if (iscntrl(c) && (ASC(c) < 0x80 || MB_CUR_MAX =3D=3D 1)) { 153 | if (c !=3D '\t' && c !=3D '\n' 154 |#ifdef COLORCAT 155 | && !(adrof(STRcolorcat) && c =3D=3D CTL_ESC('\033')) 156 |#endif 157 | && (xlate_cr || c !=3D '\r')) 158 | { 159 | xputchar('^' | atr); 160 | if (c =3D=3D CTL_ESC('\177')) 161 | c =3D '?'; 162 | else 163 | /* Note: for IS_ASCII, this compiles to: c =3D c = | 0100 */ 164 | c =3D CTL_ESC(ASC(c)|0100); 165 | } 166 | } 167 | else if (!isprint(c) && (ASC(c) < 0x80 || MB_CUR_MAX =3D=3D 1= )) { 168 | xputchar('\\' | atr); 169 | xputchar((((c >> 6) & 7) + '0') | atr); 170 | xputchar((((c >> 3) & 7) + '0') | atr); 171 | c =3D (c & 7) + '0'; 172 | } 173 | (void) putraw(c | atr); 174 | } 175 | else { 176 | c &=3D TRIM; 177 | if (haderr ? (didfds ? is2atty : isdiagatty) : 178 | (didfds ? is1atty : isoutatty)) 179 | SetAttributes(c | atr); 180 | (void) putpure(c); 181 | } 182 | if (lbuffed && (c & CHAR) =3D=3D '\n') 183 | flush(); 184 |} +----------------------------------------------------- The trigger was the '~' char in my environment variable EXINIT, it is detected un-printable (wrongly). So the xputchar() put first '\' by calling himself recursively, then, in the child call, the '\' char is detected also un-printable wrongly, so he calls himself once more, more, more... At the result, an infinite-recursive-function-call is established and loops until the stack overflow. The '\' char should never be un-printable for xputchar(). - * - * - p.s. I am satisfied enough by my quick-hack now... It seems that the scope of the char names like are closed within F3.ja_JP.eucJP.src, F4.map.eucJP and F5.map.SJIS, and seems that the char names are not included in F1.eucJP/LC_CTYPE and F2.SJIS/LC_CTYPE, so my quick-hack may be the complete solution if it is true... Also because it is very natural, I think, that isXXXX() functions with ja_JP.ANY environment return same result for char code from 0x00 to 0x7f regardless of LC_CTYPE configuration... (regardless of the apperarance of the char/font) --=20 You are receiving this mail because: You are the assignee for the bug.=