From owner-freebsd-bugs@freebsd.org Mon Oct 8 12:34:16 2018 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7F07D10BE0A0 for ; Mon, 8 Oct 2018 12:34:16 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 00F7C88D17 for ; Mon, 8 Oct 2018 12:34:16 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id B7A8C10BE09D; Mon, 8 Oct 2018 12:34:15 +0000 (UTC) Delivered-To: bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9449510BE09C for ; Mon, 8 Oct 2018 12:34:15 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3297D88D12 for ; Mon, 8 Oct 2018 12:34:15 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id 79F71260A8 for ; Mon, 8 Oct 2018 12:34:14 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id w98CYEFZ077358 for ; Mon, 8 Oct 2018 12:34:14 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id w98CYEQq077352 for bugs@FreeBSD.org; Mon, 8 Oct 2018 12:34:14 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 225692] iswprint() wrong for some FULL WIDTH characters in UTF-8 locale Date: Mon, 08 Oct 2018 12:34:13 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: yuripv@yuripv.net X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Oct 2018 12:34:16 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D225692 --- Comment #15 from Yuri Pankov --- Properly fixing U+FF08 and other full width characters is more involved. A bit of background: we have common ctype definitions file for all UTF-8 locales, src/share/ctypedef/en_US.UTF-8.src (all other locale's symlink to = the resulting /usr/share/locale/en_US.UTF-8/LC_CTYPE). src/share/ctypedef/en_US.UTF-8.src is in turn assembled from src/tools/tools/locale/etc/common.UTF-8.src and src/tools/tools/locale/etc/manual-input.UTF-8. src/tools/tools/locale/etc/common.UTF-8.src is built using src/tools/tools/locale/tools/utf8-rollup.pl, which contains the character ranges belonging to different locales, and (among other things it does) che= cks the corresponding .UTF-8.src file's LC_CTYPE section for character class to= be used. If the character is not defined, then it doesn't get into the common.UTF-8.src, which is exactly the case here as it's not found in ja_JP.UTF-8.src and ko_KR.UTF-8.src. TL;DR: all such characters not defined anywhere in *.UTF-8.src from CLDR ne= ed to be added to the manual-input.UTF-8. CLDR v34 is really close to be released, but I strongly doubt we will have = the full width characters we are missing defined in the new *.UTF-8.src files. = To amend the issue for the release, I propose looking up these characters in UTF-8.src we had in src/share/mklocale in pre-11.x times, and adding them to manual-input.UTF-8. The longer term solution would be asking CLDR guys about a way to build a complete ctype map while building POSIX locale data files, and if that's not possible, going over the entire UTF-8.src contents, and adding missing bits= to manual-input.UTF-8. --=20 You are receiving this mail because: You are the assignee for the bug.=