From owner-cvs-all@FreeBSD.ORG Sat Oct 13 16:28:22 2007 Return-Path: Delivered-To: cvs-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B2C8F16A417; Sat, 13 Oct 2007 16:28:22 +0000 (UTC) (envelope-from ache@FreeBSD.org) Received: from repoman.freebsd.org (repoman.freebsd.org [IPv6:2001:4f8:fff6::29]) by mx1.freebsd.org (Postfix) with ESMTP id A543613C458; Sat, 13 Oct 2007 16:28:22 +0000 (UTC) (envelope-from ache@FreeBSD.org) Received: from repoman.freebsd.org (localhost [127.0.0.1]) by repoman.freebsd.org (8.14.1/8.14.1) with ESMTP id l9DGSM7O032025; Sat, 13 Oct 2007 16:28:22 GMT (envelope-from ache@repoman.freebsd.org) Received: (from ache@localhost) by repoman.freebsd.org (8.14.1/8.14.1/Submit) id l9DGSMd5032024; Sat, 13 Oct 2007 16:28:22 GMT (envelope-from ache) Message-Id: <200710131628.l9DGSMd5032024@repoman.freebsd.org> From: "Andrey A. Chernov" Date: Sat, 13 Oct 2007 16:28:22 +0000 (UTC) To: src-committers@FreeBSD.org, cvs-src@FreeBSD.org, cvs-all@FreeBSD.org X-FreeBSD-CVS-Branch: HEAD Cc: Subject: cvs commit: src/include _ctype.h ctype.h wctype.h src/lib/libc/locale Symbol.map big5.c euc.c gb18030.c gb2312.c gbk.c isctype.c iswctype.c mskanji.c none.c setrunelocale.c utf8.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Oct 2007 16:28:22 -0000 ache 2007-10-13 16:28:22 UTC FreeBSD src repository Modified files: include _ctype.h ctype.h wctype.h lib/libc/locale Symbol.map big5.c euc.c gb18030.c gb2312.c gbk.c isctype.c iswctype.c mskanji.c none.c setrunelocale.c utf8.c Log: The problem is: currently our single byte ctype(3) functions are broken for wide characters locales in the argument range >= 0x80 - they may return false positives. Example 1: for UTF-8 locale we currently have: iswspace(0xA0)==1 and isspace(0xA0)==1 (because iswspace() and isspace() are the same code) but must have iswspace(0xA0)==1 and isspace(0xA0)==0 (because there is no such character and all others in the range 0x80..0xff for the UTF-8 locale, it keeps ASCII only in the single byte range because our internal wchar_t representation for UTF-8 is UCS-4). Example 2: for all wide character locales isalpha(arg) when arg > 0xFF may return false positives (must be 0). (because iswalpha() and isalpha() are the same code) This change address this issue separating single byte and wide ctype and also fix iswascii() (currently iswascii() is broken for arguments > 0xFF). This change is 100% binary compatible with old binaries. Reviewied by: i18n@ Revision Changes Path 1.31 +34 -1 src/include/_ctype.h 1.29 +21 -21 src/include/ctype.h 1.14 +1 -1 src/include/wctype.h 1.4 +5 -0 src/lib/libc/locale/Symbol.map 1.18 +3 -0 src/lib/libc/locale/big5.c 1.22 +3 -0 src/lib/libc/locale/euc.c 1.8 +3 -0 src/lib/libc/locale/gb18030.c 1.10 +3 -0 src/lib/libc/locale/gb2312.c 1.14 +3 -0 src/lib/libc/locale/gbk.c 1.11 +19 -19 src/lib/libc/locale/isctype.c 1.8 +1 -1 src/lib/libc/locale/iswctype.c 1.18 +3 -0 src/lib/libc/locale/mskanji.c 1.15 +6 -1 src/lib/libc/locale/none.c 1.47 +5 -0 src/lib/libc/locale/setrunelocale.c 1.15 +3 -0 src/lib/libc/locale/utf8.c