From owner-freebsd-i18n@FreeBSD.ORG Sun Sep 16 09:06:50 2007 Return-Path: Delivered-To: i18n@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4E28716A417; Sun, 16 Sep 2007 09:06:50 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (nagual.pp.ru [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id 90F7313C45E; Sun, 16 Sep 2007 09:06:49 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.14.1/8.14.1) with ESMTP id l8G8sXqs008993; Sun, 16 Sep 2007 12:54:33 +0400 (MSD) (envelope-from ache@nagual.pp.ru) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nagual.pp.ru; s=default; t=1189932873; bh=4WOVofvy4VQD1wBrmU4WTi71arSVvHcJpnPT9UV zw+Q=; l=1230; h=Date:From:To:Cc:Subject:Message-ID: Mail-Followup-To:References:MIME-Version:Content-Type: Content-Disposition:In-Reply-To:User-Agent; b=skZ2M0d4Lc6uYrXZyndf ME67ViQhlgWDM9NjMOnn60+/v2tssW75lYftyIkMOjOW7VCjlfhN4MSw37dD2LEObic 4gAVLNKLY6qE6etHEM8wgH6IyFkCbwXUWgUvKTwJ3jo7yQ83OP37UsI6EHqJUZb3/MY Vx1+VKZ2vSnX1IM00= Received: (from ache@localhost) by nagual.pp.ru (8.14.1/8.14.1/Submit) id l8G8sX8b008992; Sun, 16 Sep 2007 12:54:33 +0400 (MSD) (envelope-from ache) Date: Sun, 16 Sep 2007 12:54:33 +0400 From: Andrey Chernov To: Petr Hroudny Message-ID: <20070916085432.GA8884@nagual.pp.ru> Mail-Followup-To: Andrey Chernov , Petr Hroudny , freebsd-gnats-submit@FreeBSD.ORG, jkoshy@freebsd.org, perky@FreeBSD.org, i18n@freebsd.org References: <200709150908.l8F981jj075109@www.freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200709150908.l8F981jj075109@www.freebsd.org> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: jkoshy@FreeBSD.ORG, freebsd-gnats-submit@FreeBSD.ORG, perky@FreeBSD.ORG, i18n@FreeBSD.ORG Subject: Re: gnu/116363: isspace broken for UTF-8 locales X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2007 09:06:50 -0000 On Sat, Sep 15, 2007 at 09:08:01AM +0000, Petr Hroudny wrote: > > >Number: 116363 > >Category: gnu > >Synopsis: isspace broken for UTF-8 locales > >Confidential: no > >Severity: non-critical > >Priority: medium > >Responsible: freebsd-bugs > >State: open > >Quarter: > >Keywords: > >Date-Required: > >Class: sw-bug > >Submitter-Id: current-users > >Arrival-Date: Sat Sep 15 09:10:02 GMT 2007 > >Closed-Date: > >Last-Modified: > >Originator: Petr Hroudny > >Release: 6-stable, 7-current > >Organization: > >Environment: > >Description: > In UTF-8 locales, isspace(0xA0) returns 1 which is wrong. > > In UTF-8, 0xA0 could only be the second or third byte of multibyte character, but never a space. > > As a consequence, operations like str.upper() and/or str.split() are broken, when > UTF-8 character with 0xA0 byte is encountered. It seems that our UTF-8.src is completely wrong, it is just plain Unicode and not UTF-8 which multibyte values should start from C2-DF E0-EF F0-F4 only (as stated in http://en.wikipedia.org/wiki/UTF-8 f.e.) Can anybody write replacement for it? -- http://ache.pp.ru/ From owner-freebsd-i18n@FreeBSD.ORG Sun Sep 16 10:34:03 2007 Return-Path: Delivered-To: i18n@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 872D116A417; Sun, 16 Sep 2007 10:34:03 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (nagual.pp.ru [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id 29A8A13C457; Sun, 16 Sep 2007 10:34:00 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.14.1/8.14.1) with ESMTP id l8GAXxpw001877; Sun, 16 Sep 2007 14:33:59 +0400 (MSD) (envelope-from ache@nagual.pp.ru) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nagual.pp.ru; s=default; t=1189938839; bh=8nH8cbBEJ0FpGJ4khf2Fk+QlfLo7qLLxW9ANw9L OIrc=; l=88311; h=Date:From:To:Cc:Subject:Message-ID: Mail-Followup-To:References:MIME-Version:Content-Type: Content-Disposition:In-Reply-To:User-Agent; b=AzY/oxCFsmxUIq7E5DB8 uuYeTjq9X4JBOqjVNY0kFAR5wkmo3fvzuMGFQaORz57nfuwI83BypB4QA8Ecq4iL+n3 vg1m+K4GGRn65Cw3VEaWrZKsNrhAmi0PaYS40xrteHyZ0liZ+YMP9/UweZnvvCNLHFu 1rGoJsMvAvlvcE+4k= Received: (from ache@localhost) by nagual.pp.ru (8.14.1/8.14.1/Submit) id l8GAXwgR001876; Sun, 16 Sep 2007 14:33:58 +0400 (MSD) (envelope-from ache) Date: Sun, 16 Sep 2007 14:33:58 +0400 From: Andrey Chernov To: freebsd-bugs@FreeBSD.ORG Message-ID: <20070916103357.GA1691@nagual.pp.ru> Mail-Followup-To: Andrey Chernov , freebsd-bugs@FreeBSD.ORG, jkoshy@freebsd.org, perky@FreeBSD.org, i18n@freebsd.org, petr.hroudny@gmail.com References: <200709160910.l8G9A6ts050905@freefall.freebsd.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="oyUTqETQ0mS9luUI" Content-Disposition: inline In-Reply-To: <200709160910.l8G9A6ts050905@freefall.freebsd.org> User-Agent: Mutt/1.5.16 (2007-06-09) X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: petr.hroudny@gmail.com, jkoshy@FreeBSD.ORG, perky@FreeBSD.ORG, i18n@FreeBSD.ORG Subject: Re: gnu/116363: isspace broken for UTF-8 locales X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2007 10:34:03 -0000 --oyUTqETQ0mS9luUI Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Sun, Sep 16, 2007 at 09:10:06AM +0000, Andrey Chernov wrote: > Can anybody write replacement for it? Here is replacement attached, autoconverted to UTF-8 with perl script, please check! -- http://ache.pp.ru/ --oyUTqETQ0mS9luUI-- From owner-freebsd-i18n@FreeBSD.ORG Sun Sep 16 12:35:11 2007 Return-Path: Delivered-To: i18n@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 554B416A47E; Sun, 16 Sep 2007 12:35:11 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (nagual.pp.ru [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id 07E1D13C45B; Sun, 16 Sep 2007 12:35:09 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.14.1/8.14.1) with ESMTP id l8GCZ81d004783; Sun, 16 Sep 2007 16:35:08 +0400 (MSD) (envelope-from ache@nagual.pp.ru) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nagual.pp.ru; s=default; t=1189946108; bh=fXFnKAodv/cf8PKsD7OgR/HAQQIMvenCe9jycad 3Tv8=; l=83537; h=Date:From:To:Subject:Message-ID:Mail-Followup-To: References:MIME-Version:Content-Type:Content-Disposition: In-Reply-To:User-Agent; b=QHOvbNJV2psSB2pdpnS1VPdDHiRnnbf0mJp0vSZt rKOpWyV6wXRECZH02iaOR4OUkRcMOxt3ciD0FkvojCYhrLBdOOL9hrhOe08xuUNoD8h jTbvN0EwV8Ug0tL3tEgaXBFXh2vMXYWekGH/ZSdyASo0F+KeMSAiYvOXJL5VRqoc= Received: (from ache@localhost) by nagual.pp.ru (8.14.1/8.14.1/Submit) id l8GCZ8Fm004782; Sun, 16 Sep 2007 16:35:08 +0400 (MSD) (envelope-from ache) Date: Sun, 16 Sep 2007 16:35:08 +0400 From: Andrey Chernov To: freebsd-bugs@FreeBSD.ORG, jkoshy@FreeBSD.ORG, perky@FreeBSD.ORG, i18n@FreeBSD.ORG, petr.hroudny@gmail.com Message-ID: <20070916123508.GA4724@nagual.pp.ru> Mail-Followup-To: Andrey Chernov , freebsd-bugs@FreeBSD.ORG, jkoshy@freebsd.org, perky@FreeBSD.org, i18n@freebsd.org, petr.hroudny@gmail.com References: <200709160910.l8G9A6ts050905@freefall.freebsd.org> <20070916103357.GA1691@nagual.pp.ru> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="ikeVEW9yuYc//A+q" Content-Disposition: inline In-Reply-To: <20070916103357.GA1691@nagual.pp.ru> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: Subject: Re: gnu/116363: isspace broken for UTF-8 locales X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2007 12:35:11 -0000 --ikeVEW9yuYc//A+q Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Sun, Sep 16, 2007 at 02:33:58PM +0400, Andrey Chernov wrote: > On Sun, Sep 16, 2007 at 09:10:06AM +0000, Andrey Chernov wrote: > > Can anybody write replacement for it? > > Here is replacement attached, autoconverted to UTF-8 with perl script, > please check! Sorry, high range is not converted properly, here is revised version attached. -- http://ache.pp.ru/ --ikeVEW9yuYc//A+q Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="UTF-8.src" /* * Unicode 3.2 ctypes table * * Generated by Hye-Shik Chang * * $FreeBSD: src/share/mklocale/UTF-8.src,v 1.2 2006/07/28 06:10:33 jkoshy Exp $ */ /* * UCD(Unicode Character Database) Terms of Use * * Disclaimer * * The Unicode Character Database is provided as is by Unicode, Inc. No claims * are made as to fitness for any particular purpose. No warranties of any kind * are expressed or implied. The recipient agrees to determine applicability of * information provided. If this file has been purchased on magnetic or optical * media from Unicode, Inc., the sole remedy for any claim will be exchange of * defective media within 90 days of receipt. * * This disclaimer is applicable for all other data files accompanying the * Unicode Character Database, some of which have been compiled by the Unicode * Consortium, and some of which have been supplied by other sources. * * Limitations on Rights to Redistribute This Data * * Recipient is granted the right to make copies in any form for internal * distribution and to freely use the information supplied in the creation of * products supporting the UnicodeTM Standard. The files in the Unicode * Character Database can be redistributed to third parties or other * organizations (whether for profit or not) as long as this notice and the * disclaimer notice are retained. Information can be extracted from these * files and used in documentation or programs, as long as there is an * accompanying notice indicating the source. */ ENCODING "UTF-8" VARIABLE Unicode 3.2 Character Types /* * U+0000 - U+007F : Basic Latin */ ALPHA 'A' - 'Z' 'a' - 'z' CONTROL 0x00 - 0x1f 0x7f DIGIT '0' - '9' GRAPH 0x21 - 0x7e LOWER 'a' - 'z' PUNCT 0x21 - 0x2f 0x3a - 0x40 0x5b - 0x60 0x7b - 0x7e SPACE 0x09 - 0x0d 0x20 UPPER 'A' - 'Z' XDIGIT '0' - '9' 'A' - 'F' 'a' - 'f' BLANK 0x09 0x0b 0x20 PRINT 0x20 - 0x7e SWIDTH1 0x20 - 0x7e MAPUPPER < 'a' - 'z' : 'A' > MAPLOWER < 'A' - 'Z' : 'a' > TODIGIT < '0' - '9' : 0x00 > TODIGIT < 'A' - 'F' : 10 > < 'a' - 'f' : 10 > /* * U+0080 - U+00FF : Latin-1 Supplement */ ALPHA 0xc2aa 0xc2b5 0xc2ba 0xc380 - 0xc396 0xc398 - 0xc3b6 ALPHA 0xc3b8 - 0xc3bf CONTROL 0xc280 - 0xc29f GRAPH 0xc2a1 - 0xc3bf LOWER 0xc2aa 0xc2b5 0xc2ba 0xc39f - 0xc3b6 0xc3b8 - 0xc3bf PUNCT 0xc2a1 - 0xc2a9 0xc2ab - 0xc2b1 0xc2b4 0xc2b6 - 0xc2b8 PUNCT 0xc2bb 0xc2bf 0xc397 0xc3b7 SPACE 0xc285 0xc2a0 UPPER 0xc380 - 0xc396 0xc398 - 0xc39e BLANK 0xc2a0 PRINT 0xc2a0 - 0xc3bf SPECIAL 0xc2b2 0xc2b3 0xc2b9 0xc2bc - 0xc2be SWIDTH1 0xc2a0 - 0xc3bf MAPUPPER < 0xc2b5 0xce9c > MAPUPPER < 0xc3a0 - 0xc3b6 : 0xc380 > MAPUPPER < 0xc3b8 - 0xc3be : 0xc398 > MAPUPPER < 0xc3bf 0xc5b8 > MAPLOWER < 0xc380 - 0xc396 : 0xc3a0 > MAPLOWER < 0xc398 - 0xc39e : 0xc3b8 > /* * U+0100 - U+017F : Latin Extended-A */ ALPHA 0xc480 - 0xc5bf GRAPH 0xc480 - 0xc5bf LOWER 0xc481 0xc483 0xc485 0xc487 0xc489 0xc48b 0xc48d LOWER 0xc48f 0xc491 0xc493 0xc495 0xc497 0xc499 0xc49b LOWER 0xc49d 0xc49f 0xc4a1 0xc4a3 0xc4a5 0xc4a7 0xc4a9 LOWER 0xc4ab 0xc4ad 0xc4af 0xc4b1 0xc4b3 0xc4b5 0xc4b7 0xc4b8 LOWER 0xc4ba 0xc4bc 0xc4be 0xc580 0xc582 0xc584 0xc586 LOWER 0xc588 0xc589 0xc58b 0xc58d 0xc58f 0xc591 0xc593 LOWER 0xc595 0xc597 0xc599 0xc59b 0xc59d 0xc59f 0xc5a1 LOWER 0xc5a3 0xc5a5 0xc5a7 0xc5a9 0xc5ab 0xc5ad 0xc5af LOWER 0xc5b1 0xc5b3 0xc5b5 0xc5b7 0xc5ba 0xc5bc 0xc5be 0xc5bf UPPER 0xc480 0xc482 0xc484 0xc486 0xc488 0xc48a 0xc48c UPPER 0xc48e 0xc490 0xc492 0xc494 0xc496 0xc498 0xc49a UPPER 0xc49c 0xc49e 0xc4a0 0xc4a2 0xc4a4 0xc4a6 0xc4a8 UPPER 0xc4aa 0xc4ac 0xc4ae 0xc4b0 0xc4b2 0xc4b4 0xc4b6 UPPER 0xc4b9 0xc4bb 0xc4bd 0xc4bf 0xc581 0xc583 0xc585 UPPER 0xc587 0xc58a 0xc58c 0xc58e 0xc590 0xc592 0xc594 UPPER 0xc596 0xc598 0xc59a 0xc59c 0xc59e 0xc5a0 0xc5a2 UPPER 0xc5a4 0xc5a6 0xc5a8 0xc5aa 0xc5ac 0xc5ae 0xc5b0 UPPER 0xc5b2 0xc5b4 0xc5b6 0xc5b8 0xc5b9 0xc5bb 0xc5bd PRINT 0xc480 - 0xc5bf SWIDTH1 0xc480 - 0xc5bf MAPUPPER < 0xc481 0xc480 > MAPUPPER < 0xc483 0xc482 > MAPUPPER < 0xc485 0xc484 > MAPUPPER < 0xc487 0xc486 > MAPUPPER < 0xc489 0xc488 > MAPUPPER < 0xc48b 0xc48a > MAPUPPER < 0xc48d 0xc48c > MAPUPPER < 0xc48f 0xc48e > MAPUPPER < 0xc491 0xc490 > MAPUPPER < 0xc493 0xc492 > MAPUPPER < 0xc495 0xc494 > MAPUPPER < 0xc497 0xc496 > MAPUPPER < 0xc499 0xc498 > MAPUPPER < 0xc49b 0xc49a > MAPUPPER < 0xc49d 0xc49c > MAPUPPER < 0xc49f 0xc49e > MAPUPPER < 0xc4a1 0xc4a0 > MAPUPPER < 0xc4a3 0xc4a2 > MAPUPPER < 0xc4a5 0xc4a4 > MAPUPPER < 0xc4a7 0xc4a6 > MAPUPPER < 0xc4a9 0xc4a8 > MAPUPPER < 0xc4ab 0xc4aa > MAPUPPER < 0xc4ad 0xc4ac > MAPUPPER < 0xc4af 0xc4ae > MAPUPPER < 0xc4b1 'I' > MAPUPPER < 0xc4b3 0xc4b2 > MAPUPPER < 0xc4b5 0xc4b4 > MAPUPPER < 0xc4b7 0xc4b6 > MAPUPPER < 0xc4ba 0xc4b9 > MAPUPPER < 0xc4bc 0xc4bb > MAPUPPER < 0xc4be 0xc4bd > MAPUPPER < 0xc580 0xc4bf > MAPUPPER < 0xc582 0xc581 > MAPUPPER < 0xc584 0xc583 > MAPUPPER < 0xc586 0xc585 > MAPUPPER < 0xc588 0xc587 > MAPUPPER < 0xc58b 0xc58a > MAPUPPER < 0xc58d 0xc58c > MAPUPPER < 0xc58f 0xc58e > MAPUPPER < 0xc591 0xc590 > MAPUPPER < 0xc593 0xc592 > MAPUPPER < 0xc595 0xc594 > MAPUPPER < 0xc597 0xc596 > MAPUPPER < 0xc599 0xc598 > MAPUPPER < 0xc59b 0xc59a > MAPUPPER < 0xc59d 0xc59c > MAPUPPER < 0xc59f 0xc59e > MAPUPPER < 0xc5a1 0xc5a0 > MAPUPPER < 0xc5a3 0xc5a2 > MAPUPPER < 0xc5a5 0xc5a4 > MAPUPPER < 0xc5a7 0xc5a6 > MAPUPPER < 0xc5a9 0xc5a8 > MAPUPPER < 0xc5ab 0xc5aa > MAPUPPER < 0xc5ad 0xc5ac > MAPUPPER < 0xc5af 0xc5ae > MAPUPPER < 0xc5b1 0xc5b0 > MAPUPPER < 0xc5b3 0xc5b2 > MAPUPPER < 0xc5b5 0xc5b4 > MAPUPPER < 0xc5b7 0xc5b6 > MAPUPPER < 0xc5ba 0xc5b9 > MAPUPPER < 0xc5bc 0xc5bb > MAPUPPER < 0xc5be 0xc5bd > MAPUPPER < 0xc5bf 'S' > MAPLOWER < 0xc480 0xc481 > MAPLOWER < 0xc482 0xc483 > MAPLOWER < 0xc484 0xc485 > MAPLOWER < 0xc486 0xc487 > MAPLOWER < 0xc488 0xc489 > MAPLOWER < 0xc48a 0xc48b > MAPLOWER < 0xc48c 0xc48d > MAPLOWER < 0xc48e 0xc48f > MAPLOWER < 0xc490 0xc491 > MAPLOWER < 0xc492 0xc493 > MAPLOWER < 0xc494 0xc495 > MAPLOWER < 0xc496 0xc497 > MAPLOWER < 0xc498 0xc499 > MAPLOWER < 0xc49a 0xc49b > MAPLOWER < 0xc49c 0xc49d > MAPLOWER < 0xc49e 0xc49f > MAPLOWER < 0xc4a0 0xc4a1 > MAPLOWER < 0xc4a2 0xc4a3 > MAPLOWER < 0xc4a4 0xc4a5 > MAPLOWER < 0xc4a6 0xc4a7 > MAPLOWER < 0xc4a8 0xc4a9 > MAPLOWER < 0xc4aa 0xc4ab > MAPLOWER < 0xc4ac 0xc4ad > MAPLOWER < 0xc4ae 0xc4af > MAPLOWER < 0xc4b0 'i' > MAPLOWER < 0xc4b2 0xc4b3 > MAPLOWER < 0xc4b4 0xc4b5 > MAPLOWER < 0xc4b6 0xc4b7 > MAPLOWER < 0xc4b9 0xc4ba > MAPLOWER < 0xc4bb 0xc4bc > MAPLOWER < 0xc4bd 0xc4be > MAPLOWER < 0xc4bf 0xc580 > MAPLOWER < 0xc581 0xc582 > MAPLOWER < 0xc583 0xc584 > MAPLOWER < 0xc585 0xc586 > MAPLOWER < 0xc587 0xc588 > MAPLOWER < 0xc58a 0xc58b > MAPLOWER < 0xc58c 0xc58d > MAPLOWER < 0xc58e 0xc58f > MAPLOWER < 0xc590 0xc591 > MAPLOWER < 0xc592 0xc593 > MAPLOWER < 0xc594 0xc595 > MAPLOWER < 0xc596 0xc597 > MAPLOWER < 0xc598 0xc599 > MAPLOWER < 0xc59a 0xc59b > MAPLOWER < 0xc59c 0xc59d > MAPLOWER < 0xc59e 0xc59f > MAPLOWER < 0xc5a0 0xc5a1 > MAPLOWER < 0xc5a2 0xc5a3 > MAPLOWER < 0xc5a4 0xc5a5 > MAPLOWER < 0xc5a6 0xc5a7 > MAPLOWER < 0xc5a8 0xc5a9 > MAPLOWER < 0xc5aa 0xc5ab > MAPLOWER < 0xc5ac 0xc5ad > MAPLOWER < 0xc5ae 0xc5af > MAPLOWER < 0xc5b0 0xc5b1 > MAPLOWER < 0xc5b2 0xc5b3 > MAPLOWER < 0xc5b4 0xc5b5 > MAPLOWER < 0xc5b6 0xc5b7 > MAPLOWER < 0xc5b8 0xc3bf > MAPLOWER < 0xc5b9 0xc5ba > MAPLOWER < 0xc5bb 0xc5bc > MAPLOWER < 0xc5bd 0xc5be > /* * U+0180 - U+024F : Latin Extended-B */ ALPHA 0xc680 - 0xc6ba 0xc6bc - 0xc6bf 0xc784 - 0xc8a0 0xc8a2 - 0xc8b3 GRAPH 0xc680 - 0xc8a0 0xc8a2 - 0xc8b3 LOWER 0xc680 0xc683 0xc685 0xc688 0xc68c 0xc68d 0xc692 LOWER 0xc695 0xc699 - 0xc69b 0xc69e 0xc6a1 0xc6a3 0xc6a5 LOWER 0xc6a8 0xc6aa 0xc6ab 0xc6ad 0xc6b0 0xc6b4 0xc6b6 LOWER 0xc6b9 0xc6ba 0xc6bd - 0xc6bf 0xc786 0xc789 0xc78c LOWER 0xc78e 0xc790 0xc792 0xc794 0xc796 0xc798 0xc79a LOWER 0xc79c 0xc79d 0xc79f 0xc7a1 0xc7a3 0xc7a5 0xc7a7 LOWER 0xc7a9 0xc7ab 0xc7ad 0xc7af 0xc7b0 0xc7b3 0xc7b5 LOWER 0xc7b9 0xc7bb 0xc7bd 0xc7bf 0xc881 0xc883 0xc885 LOWER 0xc887 0xc889 0xc88b 0xc88d 0xc88f 0xc891 0xc893 LOWER 0xc895 0xc897 0xc899 0xc89b 0xc89d 0xc89f 0xc8a3 LOWER 0xc8a5 0xc8a7 0xc8a9 0xc8ab 0xc8ad 0xc8af 0xc8b1 LOWER 0xc8b3 UPPER 0xc681 0xc682 0xc684 0xc686 0xc687 0xc689 - 0xc68b UPPER 0xc68e - 0xc691 0xc693 0xc694 0xc696 - 0xc698 0xc69c 0xc69d UPPER 0xc69f 0xc6a0 0xc6a2 0xc6a4 0xc6a6 0xc6a7 0xc6a9 UPPER 0xc6ac 0xc6ae 0xc6af 0xc6b1 - 0xc6b3 0xc6b5 0xc6b7 0xc6b8 UPPER 0xc6bc 0xc784 0xc787 0xc78a 0xc78d 0xc78f 0xc791 UPPER 0xc793 0xc795 0xc797 0xc799 0xc79b 0xc79e 0xc7a0 UPPER 0xc7a2 0xc7a4 0xc7a6 0xc7a8 0xc7aa 0xc7ac 0xc7ae UPPER 0xc7b1 0xc7b4 0xc7b6 - 0xc7b8 0xc7ba 0xc7bc 0xc7be UPPER 0xc880 0xc882 0xc884 0xc886 0xc888 0xc88a 0xc88c UPPER 0xc88e 0xc890 0xc892 0xc894 0xc896 0xc898 0xc89a UPPER 0xc89c 0xc89e 0xc8a0 0xc8a2 0xc8a4 0xc8a6 0xc8a8 UPPER 0xc8aa 0xc8ac 0xc8ae 0xc8b0 0xc8b2 PRINT 0xc680 - 0xc8a0 0xc8a2 - 0xc8b3 SWIDTH1 0xc680 - 0xc8a0 0xc8a2 - 0xc8b3 MAPUPPER < 0xc683 0xc682 > MAPUPPER < 0xc685 0xc684 > MAPUPPER < 0xc688 0xc687 > MAPUPPER < 0xc68c 0xc68b > MAPUPPER < 0xc692 0xc691 > MAPUPPER < 0xc695 0xc7b6 > MAPUPPER < 0xc699 0xc698 > MAPUPPER < 0xc69e 0xc8a0 > MAPUPPER < 0xc6a1 0xc6a0 > MAPUPPER < 0xc6a3 0xc6a2 > MAPUPPER < 0xc6a5 0xc6a4 > MAPUPPER < 0xc6a8 0xc6a7 > MAPUPPER < 0xc6ad 0xc6ac > MAPUPPER < 0xc6b0 0xc6af > MAPUPPER < 0xc6b4 0xc6b3 > MAPUPPER < 0xc6b6 0xc6b5 > MAPUPPER < 0xc6b9 0xc6b8 > MAPUPPER < 0xc6bd 0xc6bc > MAPUPPER < 0xc6bf 0xc7b7 > MAPUPPER < 0xc785 0xc784 > MAPUPPER < 0xc786 0xc784 > MAPUPPER < 0xc788 0xc787 > MAPUPPER < 0xc789 0xc787 > MAPUPPER < 0xc78b 0xc78a > MAPUPPER < 0xc78c 0xc78a > MAPUPPER < 0xc78e 0xc78d > MAPUPPER < 0xc790 0xc78f > MAPUPPER < 0xc792 0xc791 > MAPUPPER < 0xc794 0xc793 > MAPUPPER < 0xc796 0xc795 > MAPUPPER < 0xc798 0xc797 > MAPUPPER < 0xc79a 0xc799 > MAPUPPER < 0xc79c 0xc79b > MAPUPPER < 0xc79d 0xc68e > MAPUPPER < 0xc79f 0xc79e > MAPUPPER < 0xc7a1 0xc7a0 > MAPUPPER < 0xc7a3 0xc7a2 > MAPUPPER < 0xc7a5 0xc7a4 > MAPUPPER < 0xc7a7 0xc7a6 > MAPUPPER < 0xc7a9 0xc7a8 > MAPUPPER < 0xc7ab 0xc7aa > MAPUPPER < 0xc7ad 0xc7ac > MAPUPPER < 0xc7af 0xc7ae > MAPUPPER < 0xc7b2 0xc7b1 > MAPUPPER < 0xc7b3 0xc7b1 > MAPUPPER < 0xc7b5 0xc7b4 > MAPUPPER < 0xc7b9 0xc7b8 > MAPUPPER < 0xc7bb 0xc7ba > MAPUPPER < 0xc7bd 0xc7bc > MAPUPPER < 0xc7bf 0xc7be > MAPUPPER < 0xc881 0xc880 > MAPUPPER < 0xc883 0xc882 > MAPUPPER < 0xc885 0xc884 > MAPUPPER < 0xc887 0xc886 > MAPUPPER < 0xc889 0xc888 > MAPUPPER < 0xc88b 0xc88a > MAPUPPER < 0xc88d 0xc88c > MAPUPPER < 0xc88f 0xc88e > MAPUPPER < 0xc891 0xc890 > MAPUPPER < 0xc893 0xc892 > MAPUPPER < 0xc895 0xc894 > MAPUPPER < 0xc897 0xc896 > MAPUPPER < 0xc899 0xc898 > MAPUPPER < 0xc89b 0xc89a > MAPUPPER < 0xc89d 0xc89c > MAPUPPER < 0xc89f 0xc89e > MAPUPPER < 0xc8a3 0xc8a2 > MAPUPPER < 0xc8a5 0xc8a4 > MAPUPPER < 0xc8a7 0xc8a6 > MAPUPPER < 0xc8a9 0xc8a8 > MAPUPPER < 0xc8ab 0xc8aa > MAPUPPER < 0xc8ad 0xc8ac > MAPUPPER < 0xc8af 0xc8ae > MAPUPPER < 0xc8b1 0xc8b0 > MAPUPPER < 0xc8b3 0xc8b2 > MAPLOWER < 0xc681 0xc993 > MAPLOWER < 0xc682 0xc683 > MAPLOWER < 0xc684 0xc685 > MAPLOWER < 0xc686 0xc994 > MAPLOWER < 0xc687 0xc688 > MAPLOWER < 0xc689 - 0xc68a : 0xc996 > MAPLOWER < 0xc68b 0xc68c > MAPLOWER < 0xc68e 0xc79d > MAPLOWER < 0xc68f 0xc999 > MAPLOWER < 0xc690 0xc99b > MAPLOWER < 0xc691 0xc692 > MAPLOWER < 0xc693 0xc9a0 > MAPLOWER < 0xc694 0xc9a3 > MAPLOWER < 0xc696 0xc9a9 > MAPLOWER < 0xc697 0xc9a8 > MAPLOWER < 0xc698 0xc699 > MAPLOWER < 0xc69c 0xc9af > MAPLOWER < 0xc69d 0xc9b2 > MAPLOWER < 0xc69f 0xc9b5 > MAPLOWER < 0xc6a0 0xc6a1 > MAPLOWER < 0xc6a2 0xc6a3 > MAPLOWER < 0xc6a4 0xc6a5 > MAPLOWER < 0xc6a6 0xca80 > MAPLOWER < 0xc6a7 0xc6a8 > MAPLOWER < 0xc6a9 0xca83 > MAPLOWER < 0xc6ac 0xc6ad > MAPLOWER < 0xc6ae 0xca88 > MAPLOWER < 0xc6af 0xc6b0 > MAPLOWER < 0xc6b1 - 0xc6b2 : 0xca8a > MAPLOWER < 0xc6b3 0xc6b4 > MAPLOWER < 0xc6b5 0xc6b6 > MAPLOWER < 0xc6b7 0xca92 > MAPLOWER < 0xc6b8 0xc6b9 > MAPLOWER < 0xc6bc 0xc6bd > MAPLOWER < 0xc784 0xc786 > MAPLOWER < 0xc785 0xc786 > MAPLOWER < 0xc787 0xc789 > MAPLOWER < 0xc788 0xc789 > MAPLOWER < 0xc78a 0xc78c > MAPLOWER < 0xc78b 0xc78c > MAPLOWER < 0xc78d 0xc78e > MAPLOWER < 0xc78f 0xc790 > MAPLOWER < 0xc791 0xc792 > MAPLOWER < 0xc793 0xc794 > MAPLOWER < 0xc795 0xc796 > MAPLOWER < 0xc797 0xc798 > MAPLOWER < 0xc799 0xc79a > MAPLOWER < 0xc79b 0xc79c > MAPLOWER < 0xc79e 0xc79f > MAPLOWER < 0xc7a0 0xc7a1 > MAPLOWER < 0xc7a2 0xc7a3 > MAPLOWER < 0xc7a4 0xc7a5 > MAPLOWER < 0xc7a6 0xc7a7 > MAPLOWER < 0xc7a8 0xc7a9 > MAPLOWER < 0xc7aa 0xc7ab > MAPLOWER < 0xc7ac 0xc7ad > MAPLOWER < 0xc7ae 0xc7af > MAPLOWER < 0xc7b1 0xc7b3 > MAPLOWER < 0xc7b2 0xc7b3 > MAPLOWER < 0xc7b4 0xc7b5 > MAPLOWER < 0xc7b6 0xc695 > MAPLOWER < 0xc7b7 0xc6bf > MAPLOWER < 0xc7b8 0xc7b9 > MAPLOWER < 0xc7ba 0xc7bb > MAPLOWER < 0xc7bc 0xc7bd > MAPLOWER < 0xc7be 0xc7bf > MAPLOWER < 0xc880 0xc881 > MAPLOWER < 0xc882 0xc883 > MAPLOWER < 0xc884 0xc885 > MAPLOWER < 0xc886 0xc887 > MAPLOWER < 0xc888 0xc889 > MAPLOWER < 0xc88a 0xc88b > MAPLOWER < 0xc88c 0xc88d > MAPLOWER < 0xc88e 0xc88f > MAPLOWER < 0xc890 0xc891 > MAPLOWER < 0xc892 0xc893 > MAPLOWER < 0xc894 0xc895 > MAPLOWER < 0xc896 0xc897 > MAPLOWER < 0xc898 0xc899 > MAPLOWER < 0xc89a 0xc89b > MAPLOWER < 0xc89c 0xc89d > MAPLOWER < 0xc89e 0xc89f > MAPLOWER < 0xc8a0 0xc69e > MAPLOWER < 0xc8a2 0xc8a3 > MAPLOWER < 0xc8a4 0xc8a5 > MAPLOWER < 0xc8a6 0xc8a7 > MAPLOWER < 0xc8a8 0xc8a9 > MAPLOWER < 0xc8aa 0xc8ab > MAPLOWER < 0xc8ac 0xc8ad > MAPLOWER < 0xc8ae 0xc8af > MAPLOWER < 0xc8b0 0xc8b1 > MAPLOWER < 0xc8b2 0xc8b3 > /* * U+0250 - U+02AF : IPA Extensions */ ALPHA 0xc990 - 0xcaad GRAPH 0xc990 - 0xcaad LOWER 0xc990 - 0xcaad PRINT 0xc990 - 0xcaad SWIDTH1 0xc990 - 0xcaad MAPUPPER < 0xc993 0xc681 > MAPUPPER < 0xc994 0xc686 > MAPUPPER < 0xc996 - 0xc997 : 0xc689 > MAPUPPER < 0xc999 0xc68f > MAPUPPER < 0xc99b 0xc690 > MAPUPPER < 0xc9a0 0xc693 > MAPUPPER < 0xc9a3 0xc694 > MAPUPPER < 0xc9a8 0xc697 > MAPUPPER < 0xc9a9 0xc696 > MAPUPPER < 0xc9af 0xc69c > MAPUPPER < 0xc9b2 0xc69d > MAPUPPER < 0xc9b5 0xc69f > MAPUPPER < 0xca80 0xc6a6 > MAPUPPER < 0xca83 0xc6a9 > MAPUPPER < 0xca88 0xc6ae > MAPUPPER < 0xca8a - 0xca8b : 0xc6b1 > MAPUPPER < 0xca92 0xc6b7 > /* * U+02B0 - U+02FF : Spacing Modifier Letters */ GRAPH 0xcab0 - 0xcbae PUNCT 0xcab9 0xcaba 0xcb82 - 0xcb8f 0xcb92 - 0xcb9f 0xcba5 - 0xcbad PRINT 0xcab0 - 0xcbae SWIDTH1 0xcab0 - 0xcbae /* * U+0300 - U+036F : Combining Diacritical Marks */ GRAPH 0xcc80 - 0xcd8e 0xcd90 - 0xcdaf PRINT 0xcc80 - 0xcd8e 0xcd90 - 0xcdaf SWIDTH0 0xcc80 - 0xcd8e 0xcd90 - 0xcdaf MAPUPPER < 0xcd85 0xce99 > /* * U+0370 - U+03FF : Greek and Coptic */ ALPHA 0xce86 0xce88 - 0xce8a 0xce8c 0xce8e - 0xcea1 0xcea3 - 0xcf8e ALPHA 0xcf90 - 0xcfb5 GRAPH 0xcdb4 0xcdb5 0xcdba 0xcdbe 0xce84 - 0xce8a 0xce8c GRAPH 0xce8e - 0xcea1 0xcea3 - 0xcf8e 0xcf90 - 0xcfb6 LOWER 0xce90 0xceac - 0xcf8e 0xcf90 0xcf91 0xcf95 - 0xcf97 LOWER 0xcf99 0xcf9b 0xcf9d 0xcf9f 0xcfa1 0xcfa3 0xcfa5 LOWER 0xcfa7 0xcfa9 0xcfab 0xcfad 0xcfaf - 0xcfb3 0xcfb5 PUNCT 0xcdb4 0xcdb5 0xcdbe 0xce84 0xce85 0xce87 0xcfb6 UPPER 0xce86 0xce88 - 0xce8a 0xce8c 0xce8e 0xce8f 0xce91 - 0xcea1 UPPER 0xcea3 - 0xceab 0xcf92 - 0xcf94 0xcf98 0xcf9a 0xcf9c UPPER 0xcf9e 0xcfa0 0xcfa2 0xcfa4 0xcfa6 0xcfa8 0xcfaa UPPER 0xcfac 0xcfae 0xcfb4 PRINT 0xcdb4 0xcdb5 0xcdba 0xcdbe 0xce84 - 0xce8a 0xce8c PRINT 0xce8e - 0xcea1 0xcea3 - 0xcf8e 0xcf90 - 0xcfb6 SWIDTH1 0xcdb4 0xcdb5 0xcdba 0xcdbe 0xce84 - 0xce8a 0xce8c SWIDTH1 0xce8e - 0xcea1 0xcea3 - 0xcf8e 0xcf90 - 0xcfb6 MAPUPPER < 0xceac 0xce86 > MAPUPPER < 0xcead - 0xceaf : 0xce88 > MAPUPPER < 0xceb1 - 0xcf81 : 0xce91 > MAPUPPER < 0xcf82 0xcea3 > MAPUPPER < 0xcf83 - 0xcf8b : 0xcea3 > MAPUPPER < 0xcf8c 0xce8c > MAPUPPER < 0xcf8d - 0xcf8e : 0xce8e > MAPUPPER < 0xcf90 0xce92 > MAPUPPER < 0xcf91 0xce98 > MAPUPPER < 0xcf95 0xcea6 > MAPUPPER < 0xcf96 0xcea0 > MAPUPPER < 0xcf99 0xcf98 > MAPUPPER < 0xcf9b 0xcf9a > MAPUPPER < 0xcf9d 0xcf9c > MAPUPPER < 0xcf9f 0xcf9e > MAPUPPER < 0xcfa1 0xcfa0 > MAPUPPER < 0xcfa3 0xcfa2 > MAPUPPER < 0xcfa5 0xcfa4 > MAPUPPER < 0xcfa7 0xcfa6 > MAPUPPER < 0xcfa9 0xcfa8 > MAPUPPER < 0xcfab 0xcfaa > MAPUPPER < 0xcfad 0xcfac > MAPUPPER < 0xcfaf 0xcfae > MAPUPPER < 0xcfb0 0xce9a > MAPUPPER < 0xcfb1 0xcea1 > MAPUPPER < 0xcfb2 0xcea3 > MAPUPPER < 0xcfb5 0xce95 > MAPLOWER < 0xce86 0xceac > MAPLOWER < 0xce88 - 0xce8a : 0xcead > MAPLOWER < 0xce8c 0xcf8c > MAPLOWER < 0xce8e - 0xce8f : 0xcf8d > MAPLOWER < 0xce91 - 0xcea1 : 0xceb1 > MAPLOWER < 0xcea3 - 0xceab : 0xcf83 > MAPLOWER < 0xcf98 0xcf99 > MAPLOWER < 0xcf9a 0xcf9b > MAPLOWER < 0xcf9c 0xcf9d > MAPLOWER < 0xcf9e 0xcf9f > MAPLOWER < 0xcfa0 0xcfa1 > MAPLOWER < 0xcfa2 0xcfa3 > MAPLOWER < 0xcfa4 0xcfa5 > MAPLOWER < 0xcfa6 0xcfa7 > MAPLOWER < 0xcfa8 0xcfa9 > MAPLOWER < 0xcfaa 0xcfab > MAPLOWER < 0xcfac 0xcfad > MAPLOWER < 0xcfae 0xcfaf > MAPLOWER < 0xcfb4 0xceb8 > /* * U+0400 - U+04FF : Cyrillic */ ALPHA 0xd080 - 0xd281 0xd28a - 0xd38e 0xd390 - 0xd3b5 0xd3b8 0xd3b9 GRAPH 0xd080 - 0xd286 0xd288 - 0xd38e 0xd390 - 0xd3b5 0xd3b8 0xd3b9 LOWER 0xd0b0 - 0xd19f 0xd1a1 0xd1a3 0xd1a5 0xd1a7 0xd1a9 LOWER 0xd1ab 0xd1ad 0xd1af 0xd1b1 0xd1b3 0xd1b5 0xd1b7 LOWER 0xd1b9 0xd1bb 0xd1bd 0xd1bf 0xd281 0xd28b 0xd28d LOWER 0xd28f 0xd291 0xd293 0xd295 0xd297 0xd299 0xd29b LOWER 0xd29d 0xd29f 0xd2a1 0xd2a3 0xd2a5 0xd2a7 0xd2a9 LOWER 0xd2ab 0xd2ad 0xd2af 0xd2b1 0xd2b3 0xd2b5 0xd2b7 LOWER 0xd2b9 0xd2bb 0xd2bd 0xd2bf 0xd382 0xd384 0xd386 LOWER 0xd388 0xd38a 0xd38c 0xd38e 0xd391 0xd393 0xd395 LOWER 0xd397 0xd399 0xd39b 0xd39d 0xd39f 0xd3a1 0xd3a3 LOWER 0xd3a5 0xd3a7 0xd3a9 0xd3ab 0xd3ad 0xd3af 0xd3b1 LOWER 0xd3b3 0xd3b5 0xd3b7 0xd3b9 PUNCT 0xd282 UPPER 0xd080 - 0xd0af 0xd1a0 0xd1a2 0xd1a4 0xd1a6 0xd1a8 UPPER 0xd1aa 0xd1ac 0xd1ae 0xd1b0 0xd1b2 0xd1b4 0xd1b6 UPPER 0xd1b8 0xd1ba 0xd1bc 0xd1be 0xd280 0xd28a 0xd28c UPPER 0xd28e 0xd290 0xd292 0xd294 0xd296 0xd298 0xd29a UPPER 0xd29c 0xd29e 0xd2a0 0xd2a2 0xd2a4 0xd2a6 0xd2a8 UPPER 0xd2aa 0xd2ac 0xd2ae 0xd2b0 0xd2b2 0xd2b4 0xd2b6 UPPER 0xd2b8 0xd2ba 0xd2bc 0xd2be 0xd380 0xd381 0xd383 UPPER 0xd385 0xd387 0xd389 0xd38b 0xd38d 0xd390 0xd392 UPPER 0xd394 0xd396 0xd398 0xd39a 0xd39c 0xd39e 0xd3a0 UPPER 0xd3a2 0xd3a4 0xd3a6 0xd3a8 0xd3aa 0xd3ac 0xd3ae UPPER 0xd3b0 0xd3b2 0xd3b4 0xd3b6 0xd3b8 PRINT 0xd080 - 0xd286 0xd288 - 0xd38e 0xd390 - 0xd3b9 SWIDTH0 0xd283 - 0xd286 0xd288 - 0xd289 SWIDTH1 0xd080 - 0xd282 0xd28a - 0xd38e 0xd390 - 0xd3b9 MAPUPPER < 0xd0b0 - 0xd18f : 0xd090 > MAPUPPER < 0xd190 - 0xd19f : 0xd080 > MAPUPPER < 0xd1a1 0xd1a0 > MAPUPPER < 0xd1a3 0xd1a2 > MAPUPPER < 0xd1a5 0xd1a4 > MAPUPPER < 0xd1a7 0xd1a6 > MAPUPPER < 0xd1a9 0xd1a8 > MAPUPPER < 0xd1ab 0xd1aa > MAPUPPER < 0xd1ad 0xd1ac > MAPUPPER < 0xd1af 0xd1ae > MAPUPPER < 0xd1b1 0xd1b0 > MAPUPPER < 0xd1b3 0xd1b2 > MAPUPPER < 0xd1b5 0xd1b4 > MAPUPPER < 0xd1b7 0xd1b6 > MAPUPPER < 0xd1b9 0xd1b8 > MAPUPPER < 0xd1bb 0xd1ba > MAPUPPER < 0xd1bd 0xd1bc > MAPUPPER < 0xd1bf 0xd1be > MAPUPPER < 0xd281 0xd280 > MAPUPPER < 0xd28b 0xd28a > MAPUPPER < 0xd28d 0xd28c > MAPUPPER < 0xd28f 0xd28e > MAPUPPER < 0xd291 0xd290 > MAPUPPER < 0xd293 0xd292 > MAPUPPER < 0xd295 0xd294 > MAPUPPER < 0xd297 0xd296 > MAPUPPER < 0xd299 0xd298 > MAPUPPER < 0xd29b 0xd29a > MAPUPPER < 0xd29d 0xd29c > MAPUPPER < 0xd29f 0xd29e > MAPUPPER < 0xd2a1 0xd2a0 > MAPUPPER < 0xd2a3 0xd2a2 > MAPUPPER < 0xd2a5 0xd2a4 > MAPUPPER < 0xd2a7 0xd2a6 > MAPUPPER < 0xd2a9 0xd2a8 > MAPUPPER < 0xd2ab 0xd2aa > MAPUPPER < 0xd2ad 0xd2ac > MAPUPPER < 0xd2af 0xd2ae > MAPUPPER < 0xd2b1 0xd2b0 > MAPUPPER < 0xd2b3 0xd2b2 > MAPUPPER < 0xd2b5 0xd2b4 > MAPUPPER < 0xd2b7 0xd2b6 > MAPUPPER < 0xd2b9 0xd2b8 > MAPUPPER < 0xd2bb 0xd2ba > MAPUPPER < 0xd2bd 0xd2bc > MAPUPPER < 0xd2bf 0xd2be > MAPUPPER < 0xd382 0xd381 > MAPUPPER < 0xd384 0xd383 > MAPUPPER < 0xd386 0xd385 > MAPUPPER < 0xd388 0xd387 > MAPUPPER < 0xd38a 0xd389 > MAPUPPER < 0xd38c 0xd38b > MAPUPPER < 0xd38e 0xd38d > MAPUPPER < 0xd391 0xd390 > MAPUPPER < 0xd393 0xd392 > MAPUPPER < 0xd395 0xd394 > MAPUPPER < 0xd397 0xd396 > MAPUPPER < 0xd399 0xd398 > MAPUPPER < 0xd39b 0xd39a > MAPUPPER < 0xd39d 0xd39c > MAPUPPER < 0xd39f 0xd39e > MAPUPPER < 0xd3a1 0xd3a0 > MAPUPPER < 0xd3a3 0xd3a2 > MAPUPPER < 0xd3a5 0xd3a4 > MAPUPPER < 0xd3a7 0xd3a6 > MAPUPPER < 0xd3a9 0xd3a8 > MAPUPPER < 0xd3ab 0xd3aa > MAPUPPER < 0xd3ad 0xd3ac > MAPUPPER < 0xd3af 0xd3ae > MAPUPPER < 0xd3b1 0xd3b0 > MAPUPPER < 0xd3b3 0xd3b2 > MAPUPPER < 0xd3b5 0xd3b4 > MAPUPPER < 0xd3b7 0xd3b6 > MAPUPPER < 0xd3b9 0xd3b8 > MAPLOWER < 0xd080 - 0xd08f : 0xd190 > MAPLOWER < 0xd090 - 0xd0af : 0xd0b0 > MAPLOWER < 0xd1a0 0xd1a1 > MAPLOWER < 0xd1a2 0xd1a3 > MAPLOWER < 0xd1a4 0xd1a5 > MAPLOWER < 0xd1a6 0xd1a7 > MAPLOWER < 0xd1a8 0xd1a9 > MAPLOWER < 0xd1aa 0xd1ab > MAPLOWER < 0xd1ac 0xd1ad > MAPLOWER < 0xd1ae 0xd1af > MAPLOWER < 0xd1b0 0xd1b1 > MAPLOWER < 0xd1b2 0xd1b3 > MAPLOWER < 0xd1b4 0xd1b5 > MAPLOWER < 0xd1b6 0xd1b7 > MAPLOWER < 0xd1b8 0xd1b9 > MAPLOWER < 0xd1ba 0xd1bb > MAPLOWER < 0xd1bc 0xd1bd > MAPLOWER < 0xd1be 0xd1bf > MAPLOWER < 0xd280 0xd281 > MAPLOWER < 0xd28a 0xd28b > MAPLOWER < 0xd28c 0xd28d > MAPLOWER < 0xd28e 0xd28f > MAPLOWER < 0xd290 0xd291 > MAPLOWER < 0xd292 0xd293 > MAPLOWER < 0xd294 0xd295 > MAPLOWER < 0xd296 0xd297 > MAPLOWER < 0xd298 0xd299 > MAPLOWER < 0xd29a 0xd29b > MAPLOWER < 0xd29c 0xd29d > MAPLOWER < 0xd29e 0xd29f > MAPLOWER < 0xd2a0 0xd2a1 > MAPLOWER < 0xd2a2 0xd2a3 > MAPLOWER < 0xd2a4 0xd2a5 > MAPLOWER < 0xd2a6 0xd2a7 > MAPLOWER < 0xd2a8 0xd2a9 > MAPLOWER < 0xd2aa 0xd2ab > MAPLOWER < 0xd2ac 0xd2ad > MAPLOWER < 0xd2ae 0xd2af > MAPLOWER < 0xd2b0 0xd2b1 > MAPLOWER < 0xd2b2 0xd2b3 > MAPLOWER < 0xd2b4 0xd2b5 > MAPLOWER < 0xd2b6 0xd2b7 > MAPLOWER < 0xd2b8 0xd2b9 > MAPLOWER < 0xd2ba 0xd2bb > MAPLOWER < 0xd2bc 0xd2bd > MAPLOWER < 0xd2be 0xd2bf > MAPLOWER < 0xd381 0xd382 > MAPLOWER < 0xd383 0xd384 > MAPLOWER < 0xd385 0xd386 > MAPLOWER < 0xd387 0xd388 > MAPLOWER < 0xd389 0xd38a > MAPLOWER < 0xd38b 0xd38c > MAPLOWER < 0xd38d 0xd38e > MAPLOWER < 0xd390 0xd391 > MAPLOWER < 0xd392 0xd393 > MAPLOWER < 0xd394 0xd395 > MAPLOWER < 0xd396 0xd397 > MAPLOWER < 0xd398 0xd399 > MAPLOWER < 0xd39a 0xd39b > MAPLOWER < 0xd39c 0xd39d > MAPLOWER < 0xd39e 0xd39f > MAPLOWER < 0xd3a0 0xd3a1 > MAPLOWER < 0xd3a2 0xd3a3 > MAPLOWER < 0xd3a4 0xd3a5 > MAPLOWER < 0xd3a6 0xd3a7 > MAPLOWER < 0xd3a8 0xd3a9 > MAPLOWER < 0xd3aa 0xd3ab > MAPLOWER < 0xd3ac 0xd3ad > MAPLOWER < 0xd3ae 0xd3af > MAPLOWER < 0xd3b0 0xd3b1 > MAPLOWER < 0xd3b2 0xd3b3 > MAPLOWER < 0xd3b4 0xd3b5 > MAPLOWER < 0xd3b6 0xd3b7 > MAPLOWER < 0xd3b8 0xd3b9 > /* * U+0500 - U+052F : Cyrillic Supplementary */ ALPHA 0xd480 - 0xd48f GRAPH 0xd480 - 0xd48f LOWER 0xd481 0xd483 0xd485 0xd487 0xd489 0xd48b 0xd48d LOWER 0xd48f UPPER 0xd480 0xd482 0xd484 0xd486 0xd488 0xd48a 0xd48c UPPER 0xd48e PRINT 0xd480 - 0xd48f SWIDTH1 0xd480 - 0xd48f MAPUPPER < 0xd481 0xd480 > MAPUPPER < 0xd483 0xd482 > MAPUPPER < 0xd485 0xd484 > MAPUPPER < 0xd487 0xd486 > MAPUPPER < 0xd489 0xd488 > MAPUPPER < 0xd48b 0xd48a > MAPUPPER < 0xd48d 0xd48c > MAPUPPER < 0xd48f 0xd48e > MAPLOWER < 0xd480 0xd481 > MAPLOWER < 0xd482 0xd483 > MAPLOWER < 0xd484 0xd485 > MAPLOWER < 0xd486 0xd487 > MAPLOWER < 0xd488 0xd489 > MAPLOWER < 0xd48a 0xd48b > MAPLOWER < 0xd48c 0xd48d > MAPLOWER < 0xd48e 0xd48f > /* * U+0530 - U+058F : Armenian */ ALPHA 0xd4b1 - 0xd596 0xd5a1 - 0xd687 GRAPH 0xd4b1 - 0xd596 0xd599 - 0xd59f 0xd5a1 - 0xd687 0xd689 0xd68a LOWER 0xd5a1 - 0xd687 PUNCT 0xd59a - 0xd59f 0xd689 0xd68a UPPER 0xd4b1 - 0xd596 PRINT 0xd4b1 - 0xd596 0xd599 - 0xd59f 0xd5a1 - 0xd687 0xd689 0xd68a SWIDTH1 0xd4b1 - 0xd596 0xd599 - 0xd59f 0xd5a1 - 0xd687 0xd689 0xd68a MAPUPPER < 0xd5a1 - 0xd686 : 0xd4b1 > MAPLOWER < 0xd4b1 - 0xd596 : 0xd5a1 > /* * U+0590 - U+05FF : Hebrew */ GRAPH 0xd691 - 0xd6a1 0xd6a3 - 0xd6b9 0xd6bb - 0xd784 0xd790 - 0xd7aa GRAPH 0xd7b0 - 0xd7b4 PUNCT 0xd6be 0xd780 0xd783 0xd7b3 0xd7b4 PRINT 0xd691 - 0xd6a1 0xd6a3 - 0xd6b9 0xd6bb - 0xd784 0xd790 - 0xd7aa PRINT 0xd7b0 - 0xd7b4 SWIDTH1 0xd691 - 0xd6a1 0xd6a3 - 0xd6b9 0xd6bb - 0xd784 0xd790 - 0xd7aa SWIDTH1 0xd7b0 - 0xd7b4 /* * U+0600 - U+06FF : Arabic */ CONTROL 0xdb9d GRAPH 0xd88c 0xd89b 0xd89f 0xd8a1 - 0xd8ba 0xd980 - 0xd995 GRAPH 0xd9a0 - 0xdb9c 0xdb9e - 0xdbad 0xdbb0 - 0xdbbe PUNCT 0xd88c 0xd89b 0xd89f 0xd9aa - 0xd9ad 0xdb94 0xdba9 PUNCT 0xdbbd 0xdbbe PRINT 0xd88c 0xd89b 0xd89f 0xd8a1 - 0xd8ba 0xd980 - 0xd995 PRINT 0xd9a0 - 0xdb9c 0xdb9e - 0xdbad 0xdbb0 - 0xdbbe SWIDTH1 0xd88c 0xd89b 0xd89f 0xd8a1 - 0xd8ba 0xd980 - 0xd995 SWIDTH1 0xd9a0 - 0xdb9c 0xdb9e - 0xdbad 0xdbb0 - 0xdbbe /* * U+0700 - U+074F : Syriac */ CONTROL 0xdc8f GRAPH 0xdc80 - 0xdc8d 0xdc90 - 0xdcac 0xdcb0 - 0xdd8a PUNCT 0xdc80 - 0xdc8d PRINT 0xdc80 - 0xdc8d 0xdc90 - 0xdcac 0xdcb0 - 0xdd8a SWIDTH1 0xdc80 - 0xdc8d 0xdc90 - 0xdcac 0xdcb0 - 0xdd8a /* * U+0780 - U+07BF : Thaana */ GRAPH 0xde80 - 0xdeb1 PRINT 0xde80 - 0xdeb1 SWIDTH1 0xde80 - 0xdeb1 /* * U+0900 - U+097F : Devanagari */ GRAPH 0xe0a481 - 0xe0a483 0xe0a485 - 0xe0a4b9 0xe0a4bc - 0xe0a58d 0xe0a590 - 0xe0a594 GRAPH 0xe0a598 - 0xe0a5b0 PUNCT 0xe0a5a4 0xe0a5a5 0xe0a5b0 PRINT 0xe0a481 - 0xe0a483 0xe0a485 - 0xe0a4b9 0xe0a4bc - 0xe0a58d 0xe0a590 - 0xe0a594 PRINT 0xe0a598 - 0xe0a5b0 SWIDTH1 0xe0a481 - 0xe0a483 0xe0a485 - 0xe0a4b9 0xe0a4bc - 0xe0a58d 0xe0a590 - 0xe0a594 SWIDTH1 0xe0a598 - 0xe0a5b0 /* * U+0980 - U+09FF : Bengali */ GRAPH 0xe0a681 - 0xe0a683 0xe0a685 - 0xe0a68c 0xe0a68f 0xe0a690 0xe0a693 - 0xe0a6a8 GRAPH 0xe0a6aa - 0xe0a6b0 0xe0a6b2 0xe0a6b6 - 0xe0a6b9 0xe0a6bc 0xe0a6be - 0xe0a784 GRAPH 0xe0a787 0xe0a788 0xe0a78b - 0xe0a78d 0xe0a797 0xe0a79c 0xe0a79d GRAPH 0xe0a79f - 0xe0a7a3 0xe0a7a6 - 0xe0a7ba PUNCT 0xe0a7b2 0xe0a7b3 0xe0a7ba PRINT 0xe0a681 - 0xe0a683 0xe0a685 - 0xe0a68c 0xe0a68f 0xe0a690 0xe0a693 - 0xe0a6a8 PRINT 0xe0a6aa - 0xe0a6b0 0xe0a6b2 0xe0a6b6 - 0xe0a6b9 0xe0a6bc 0xe0a6be - 0xe0a784 PRINT 0xe0a787 0xe0a788 0xe0a78b - 0xe0a78d 0xe0a797 0xe0a79c 0xe0a79d PRINT 0xe0a79f - 0xe0a7a3 0xe0a7a6 - 0xe0a7ba SPECIAL 0xe0a7b4 - 0xe0a7b9 SWIDTH1 0xe0a681 - 0xe0a683 0xe0a685 - 0xe0a68c 0xe0a68f 0xe0a690 0xe0a693 - 0xe0a6a8 SWIDTH1 0xe0a6aa - 0xe0a6b0 0xe0a6b2 0xe0a6b6 - 0xe0a6b9 0xe0a6bc 0xe0a6be - 0xe0a784 SWIDTH1 0xe0a787 0xe0a788 0xe0a78b - 0xe0a78d 0xe0a797 0xe0a79c 0xe0a79d SWIDTH1 0xe0a79f - 0xe0a7a3 0xe0a7a6 - 0xe0a7ba /* * U+0A00 - U+0A7F : Gurmukhi */ GRAPH 0xe0a882 0xe0a885 - 0xe0a88a 0xe0a88f 0xe0a890 0xe0a893 - 0xe0a8a8 GRAPH 0xe0a8aa - 0xe0a8b0 0xe0a8b2 0xe0a8b3 0xe0a8b5 0xe0a8b6 0xe0a8b8 0xe0a8b9 GRAPH 0xe0a8bc 0xe0a8be - 0xe0a982 0xe0a987 0xe0a988 0xe0a98b - 0xe0a98d GRAPH 0xe0a999 - 0xe0a99c 0xe0a99e 0xe0a9a6 - 0xe0a9b4 PRINT 0xe0a882 0xe0a885 - 0xe0a88a 0xe0a88f 0xe0a890 0xe0a893 - 0xe0a8a8 PRINT 0xe0a8aa - 0xe0a8b0 0xe0a8b2 0xe0a8b3 0xe0a8b5 0xe0a8b6 0xe0a8b8 0xe0a8b9 PRINT 0xe0a8bc 0xe0a8be - 0xe0a982 0xe0a987 0xe0a988 0xe0a98b - 0xe0a98d PRINT 0xe0a999 - 0xe0a99c 0xe0a99e 0xe0a9a6 - 0xe0a9b4 SWIDTH1 0xe0a882 0xe0a885 - 0xe0a88a 0xe0a88f 0xe0a890 0xe0a893 - 0xe0a8a8 SWIDTH1 0xe0a8aa - 0xe0a8b0 0xe0a8b2 0xe0a8b3 0xe0a8b5 0xe0a8b6 0xe0a8b8 0xe0a8b9 SWIDTH1 0xe0a8bc 0xe0a8be - 0xe0a982 0xe0a987 0xe0a988 0xe0a98b - 0xe0a98d SWIDTH1 0xe0a999 - 0xe0a99c 0xe0a99e 0xe0a9a6 - 0xe0a9b4 /* * U+0A80 - U+0AFF : Gujarati */ GRAPH 0xe0aa81 - 0xe0aa83 0xe0aa85 - 0xe0aa8b 0xe0aa8d 0xe0aa8f - 0xe0aa91 GRAPH 0xe0aa93 - 0xe0aaa8 0xe0aaaa - 0xe0aab0 0xe0aab2 0xe0aab3 0xe0aab5 - 0xe0aab9 GRAPH 0xe0aabc - 0xe0ab85 0xe0ab87 - 0xe0ab89 0xe0ab8b - 0xe0ab8d 0xe0ab90 GRAPH 0xe0aba0 0xe0aba6 - 0xe0abaf PRINT 0xe0aa81 - 0xe0aa83 0xe0aa85 - 0xe0aa8b 0xe0aa8d 0xe0aa8f - 0xe0aa91 PRINT 0xe0aa93 - 0xe0aaa8 0xe0aaaa - 0xe0aab0 0xe0aab2 0xe0aab3 0xe0aab5 - 0xe0aab9 PRINT 0xe0aabc - 0xe0ab85 0xe0ab87 - 0xe0ab89 0xe0ab8b - 0xe0ab8d 0xe0ab90 PRINT 0xe0aba0 0xe0aba6 - 0xe0abaf SWIDTH1 0xe0aa81 - 0xe0aa83 0xe0aa85 - 0xe0aa8b 0xe0aa8d 0xe0aa8f - 0xe0aa91 SWIDTH1 0xe0aa93 - 0xe0aaa8 0xe0aaaa - 0xe0aab0 0xe0aab2 0xe0aab3 0xe0aab5 - 0xe0aab9 SWIDTH1 0xe0aabc - 0xe0ab85 0xe0ab87 - 0xe0ab89 0xe0ab8b - 0xe0ab8d 0xe0ab90 SWIDTH1 0xe0aba0 0xe0aba6 - 0xe0abaf /* * U+0B00 - U+0B7F : Oriya */ GRAPH 0xe0ac81 - 0xe0ac83 0xe0ac85 - 0xe0ac8c 0xe0ac8f 0xe0ac90 0xe0ac93 - 0xe0aca8 GRAPH 0xe0acaa - 0xe0acb0 0xe0acb2 0xe0acb3 0xe0acb6 - 0xe0acb9 0xe0acbc - 0xe0ad83 GRAPH 0xe0ad87 0xe0ad88 0xe0ad8b - 0xe0ad8d 0xe0ad96 0xe0ad97 0xe0ad9c 0xe0ad9d GRAPH 0xe0ad9f - 0xe0ada1 0xe0ada6 - 0xe0adb0 PUNCT 0xe0adb0 PRINT 0xe0ac81 - 0xe0ac83 0xe0ac85 - 0xe0ac8c 0xe0ac8f 0xe0ac90 0xe0ac93 - 0xe0aca8 PRINT 0xe0acaa - 0xe0acb0 0xe0acb2 0xe0acb3 0xe0acb6 - 0xe0acb9 0xe0acbc - 0xe0ad83 PRINT 0xe0ad87 0xe0ad88 0xe0ad8b - 0xe0ad8d 0xe0ad96 0xe0ad97 0xe0ad9c 0xe0ad9d PRINT 0xe0ad9f - 0xe0ada1 0xe0ada6 - 0xe0adb0 SWIDTH1 0xe0ac81 - 0xe0ac83 0xe0ac85 - 0xe0ac8c 0xe0ac8f 0xe0ac90 0xe0ac93 - 0xe0aca8 SWIDTH1 0xe0acaa - 0xe0acb0 0xe0acb2 0xe0acb3 0xe0acb6 - 0xe0acb9 0xe0acbc - 0xe0ad83 SWIDTH1 0xe0ad87 0xe0ad88 0xe0ad8b - 0xe0ad8d 0xe0ad96 0xe0ad97 0xe0ad9c 0xe0ad9d SWIDTH1 0xe0ad9f - 0xe0ada1 0xe0ada6 - 0xe0adb0 /* * U+0B80 - U+0BFF : Tamil */ GRAPH 0xe0ae82 0xe0ae83 0xe0ae85 - 0xe0ae8a 0xe0ae8e - 0xe0ae90 0xe0ae92 - 0xe0ae95 GRAPH 0xe0ae99 0xe0ae9a 0xe0ae9c 0xe0ae9e 0xe0ae9f 0xe0aea3 0xe0aea4 GRAPH 0xe0aea8 - 0xe0aeaa 0xe0aeae - 0xe0aeb5 0xe0aeb7 - 0xe0aeb9 0xe0aebe - 0xe0af82 GRAPH 0xe0af86 - 0xe0af88 0xe0af8a - 0xe0af8d 0xe0af97 0xe0afa7 - 0xe0afb2 PRINT 0xe0ae82 0xe0ae83 0xe0ae85 - 0xe0ae8a 0xe0ae8e - 0xe0ae90 0xe0ae92 - 0xe0ae95 PRINT 0xe0ae99 0xe0ae9a 0xe0ae9c 0xe0ae9e 0xe0ae9f 0xe0aea3 0xe0aea4 PRINT 0xe0aea8 - 0xe0aeaa 0xe0aeae - 0xe0aeb5 0xe0aeb7 - 0xe0aeb9 0xe0aebe - 0xe0af82 PRINT 0xe0af86 - 0xe0af88 0xe0af8a - 0xe0af8d 0xe0af97 0xe0afa7 - 0xe0afb2 SPECIAL 0xe0afb0 - 0xe0afb2 SWIDTH1 0xe0ae82 0xe0ae83 0xe0ae85 - 0xe0ae8a 0xe0ae8e - 0xe0ae90 0xe0ae92 - 0xe0ae95 SWIDTH1 0xe0ae99 0xe0ae9a 0xe0ae9c 0xe0ae9e 0xe0ae9f 0xe0aea3 0xe0aea4 SWIDTH1 0xe0aea8 - 0xe0aeaa 0xe0aeae - 0xe0aeb5 0xe0aeb7 - 0xe0aeb9 0xe0aebe - 0xe0af82 SWIDTH1 0xe0af86 - 0xe0af88 0xe0af8a - 0xe0af8d 0xe0af97 0xe0afa7 - 0xe0afb2 /* * U+0C00 - U+0C7F : Telugu */ GRAPH 0xe0b081 - 0xe0b083 0xe0b085 - 0xe0b08c 0xe0b08e - 0xe0b090 0xe0b092 - 0xe0b0a8 GRAPH 0xe0b0aa - 0xe0b0b3 0xe0b0b5 - 0xe0b0b9 0xe0b0be - 0xe0b184 0xe0b186 - 0xe0b188 GRAPH 0xe0b18a - 0xe0b18d 0xe0b195 0xe0b196 0xe0b1a0 0xe0b1a1 0xe0b1a6 - 0xe0b1af PRINT 0xe0b081 - 0xe0b083 0xe0b085 - 0xe0b08c 0xe0b08e - 0xe0b090 0xe0b092 - 0xe0b0a8 PRINT 0xe0b0aa - 0xe0b0b3 0xe0b0b5 - 0xe0b0b9 0xe0b0be - 0xe0b184 0xe0b186 - 0xe0b188 PRINT 0xe0b18a - 0xe0b18d 0xe0b195 0xe0b196 0xe0b1a0 0xe0b1a1 0xe0b1a6 - 0xe0b1af SWIDTH1 0xe0b081 - 0xe0b083 0xe0b085 - 0xe0b08c 0xe0b08e - 0xe0b090 0xe0b092 - 0xe0b0a8 SWIDTH1 0xe0b0aa - 0xe0b0b3 0xe0b0b5 - 0xe0b0b9 0xe0b0be - 0xe0b184 0xe0b186 - 0xe0b188 SWIDTH1 0xe0b18a - 0xe0b18d 0xe0b195 0xe0b196 0xe0b1a0 0xe0b1a1 0xe0b1a6 - 0xe0b1af /* * U+0C80 - U+0CFF : Kannada */ GRAPH 0xe0b282 0xe0b283 0xe0b285 - 0xe0b28c 0xe0b28e - 0xe0b290 0xe0b292 - 0xe0b2a8 GRAPH 0xe0b2aa - 0xe0b2b3 0xe0b2b5 - 0xe0b2b9 0xe0b2be - 0xe0b384 0xe0b386 - 0xe0b388 GRAPH 0xe0b38a - 0xe0b38d 0xe0b395 0xe0b396 0xe0b39e 0xe0b3a0 0xe0b3a1 GRAPH 0xe0b3a6 - 0xe0b3af PRINT 0xe0b282 0xe0b283 0xe0b285 - 0xe0b28c 0xe0b28e - 0xe0b290 0xe0b292 - 0xe0b2a8 PRINT 0xe0b2aa - 0xe0b2b3 0xe0b2b5 - 0xe0b2b9 0xe0b2be - 0xe0b384 0xe0b386 - 0xe0b388 PRINT 0xe0b38a - 0xe0b38d 0xe0b395 0xe0b396 0xe0b39e 0xe0b3a0 0xe0b3a1 PRINT 0xe0b3a6 - 0xe0b3af SWIDTH1 0xe0b282 0xe0b283 0xe0b285 - 0xe0b28c 0xe0b28e - 0xe0b290 0xe0b292 - 0xe0b2a8 SWIDTH1 0xe0b2aa - 0xe0b2b3 0xe0b2b5 - 0xe0b2b9 0xe0b2be - 0xe0b384 0xe0b386 - 0xe0b388 SWIDTH1 0xe0b38a - 0xe0b38d 0xe0b395 0xe0b396 0xe0b39e 0xe0b3a0 0xe0b3a1 SWIDTH1 0xe0b3a6 - 0xe0b3af /* * U+0D00 - U+0D7F : Malayalam */ GRAPH 0xe0b482 0xe0b483 0xe0b485 - 0xe0b48c 0xe0b48e - 0xe0b490 0xe0b492 - 0xe0b4a8 GRAPH 0xe0b4aa - 0xe0b4b9 0xe0b4be - 0xe0b583 0xe0b586 - 0xe0b588 0xe0b58a - 0xe0b58d GRAPH 0xe0b597 0xe0b5a0 0xe0b5a1 0xe0b5a6 - 0xe0b5af PRINT 0xe0b482 0xe0b483 0xe0b485 - 0xe0b48c 0xe0b48e - 0xe0b490 0xe0b492 - 0xe0b4a8 PRINT 0xe0b4aa - 0xe0b4b9 0xe0b4be - 0xe0b583 0xe0b586 - 0xe0b588 0xe0b58a - 0xe0b58d PRINT 0xe0b597 0xe0b5a0 0xe0b5a1 0xe0b5a6 - 0xe0b5af SWIDTH1 0xe0b482 0xe0b483 0xe0b485 - 0xe0b48c 0xe0b48e - 0xe0b490 0xe0b492 - 0xe0b4a8 SWIDTH1 0xe0b4aa - 0xe0b4b9 0xe0b4be - 0xe0b583 0xe0b586 - 0xe0b588 0xe0b58a - 0xe0b58d SWIDTH1 0xe0b597 0xe0b5a0 0xe0b5a1 0xe0b5a6 - 0xe0b5af /* * U+0D80 - U+0DFF : Sinhala */ GRAPH 0xe0b682 0xe0b683 0xe0b685 - 0xe0b696 0xe0b69a - 0xe0b6b1 0xe0b6b3 - 0xe0b6bb GRAPH 0xe0b6bd 0xe0b780 - 0xe0b786 0xe0b78a 0xe0b78f - 0xe0b794 0xe0b796 GRAPH 0xe0b798 - 0xe0b79f 0xe0b7b2 - 0xe0b7b4 PUNCT 0xe0b7b4 PRINT 0xe0b682 0xe0b683 0xe0b685 - 0xe0b696 0xe0b69a - 0xe0b6b1 0xe0b6b3 - 0xe0b6bb PRINT 0xe0b6bd 0xe0b780 - 0xe0b786 0xe0b78a 0xe0b78f - 0xe0b794 0xe0b796 PRINT 0xe0b798 - 0xe0b79f 0xe0b7b2 - 0xe0b7b4 SWIDTH1 0xe0b682 0xe0b683 0xe0b685 - 0xe0b696 0xe0b69a - 0xe0b6b1 0xe0b6b3 - 0xe0b6bb SWIDTH1 0xe0b6bd 0xe0b780 - 0xe0b786 0xe0b78a 0xe0b78f - 0xe0b794 0xe0b796 SWIDTH1 0xe0b798 - 0xe0b79f 0xe0b7b2 - 0xe0b7b4 /* * U+0E00 - U+0E7F : Thai */ GRAPH 0xe0b881 - 0xe0b8ba 0xe0b8bf - 0xe0b99b PUNCT 0xe0b8bf 0xe0b98f 0xe0b99a 0xe0b99b PRINT 0xe0b881 - 0xe0b8ba 0xe0b8bf - 0xe0b99b SWIDTH0 0xe0b8b1 0xe0b8b4 - 0xe0b8ba 0xe0b987 - 0xe0b98e SWIDTH1 0xe0b881 - 0xe0b8b0 0xe0b8b2 - 0xe0b8b3 0xe0b8bf - 0xe0b986 0xe0b98f - 0xe0b99b /* * U+0E80 - U+0EFF : Lao */ GRAPH 0xe0ba81 0xe0ba82 0xe0ba84 0xe0ba87 0xe0ba88 0xe0ba8a 0xe0ba8d GRAPH 0xe0ba94 - 0xe0ba97 0xe0ba99 - 0xe0ba9f 0xe0baa1 - 0xe0baa3 0xe0baa5 GRAPH 0xe0baa7 0xe0baaa 0xe0baab 0xe0baad - 0xe0bab9 0xe0babb - 0xe0babd GRAPH 0xe0bb80 - 0xe0bb84 0xe0bb86 0xe0bb88 - 0xe0bb8d 0xe0bb90 - 0xe0bb99 GRAPH 0xe0bb9c 0xe0bb9d PRINT 0xe0ba81 0xe0ba82 0xe0ba84 0xe0ba87 0xe0ba88 0xe0ba8a 0xe0ba8d PRINT 0xe0ba94 - 0xe0ba97 0xe0ba99 - 0xe0ba9f 0xe0baa1 - 0xe0baa3 0xe0baa5 PRINT 0xe0baa7 0xe0baaa 0xe0baab 0xe0baad - 0xe0bab9 0xe0babb - 0xe0babd PRINT 0xe0bb80 - 0xe0bb84 0xe0bb86 0xe0bb88 - 0xe0bb8d 0xe0bb90 - 0xe0bb99 PRINT 0xe0bb9c 0xe0bb9d SWIDTH1 0xe0ba81 0xe0ba82 0xe0ba84 0xe0ba87 0xe0ba88 0xe0ba8a 0xe0ba8d SWIDTH1 0xe0ba94 - 0xe0ba97 0xe0ba99 - 0xe0ba9f 0xe0baa1 - 0xe0baa3 0xe0baa5 SWIDTH1 0xe0baa7 0xe0baaa 0xe0baab 0xe0baad - 0xe0bab9 0xe0babb - 0xe0babd SWIDTH1 0xe0bb80 - 0xe0bb84 0xe0bb86 0xe0bb88 - 0xe0bb8d 0xe0bb90 - 0xe0bb99 SWIDTH1 0xe0bb9c 0xe0bb9d /* * U+0F00 - U+0FFF : Tibetan */ GRAPH 0xe0bc80 - 0xe0bd87 0xe0bd89 - 0xe0bdaa 0xe0bdb1 - 0xe0be8b 0xe0be90 - 0xe0be97 GRAPH 0xe0be99 - 0xe0bebc 0xe0bebe - 0xe0bf8c 0xe0bf8f PUNCT 0xe0bc81 - 0xe0bc97 0xe0bc9a - 0xe0bc9f 0xe0bcb4 0xe0bcb6 0xe0bcb8 PUNCT 0xe0bcba - 0xe0bcbd 0xe0be85 0xe0bebe - 0xe0bf85 0xe0bf87 - 0xe0bf8c PUNCT 0xe0bf8f PRINT 0xe0bc80 - 0xe0bd87 0xe0bd89 - 0xe0bdaa 0xe0bdb1 - 0xe0be8b 0xe0be90 - 0xe0be97 PRINT 0xe0be99 - 0xe0bebc 0xe0bebe - 0xe0bf8c 0xe0bf8f SPECIAL 0xe0bcaa - 0xe0bcb3 PHONOGRAM 0xe0bc80 SWIDTH1 0xe0bc80 - 0xe0bd87 0xe0bd89 - 0xe0bdaa 0xe0bdb1 - 0xe0be8b 0xe0be90 - 0xe0be97 SWIDTH1 0xe0be99 - 0xe0bebc 0xe0bebe - 0xe0bf8c 0xe0bf8f /* * U+1000 - U+109F : Myanmar */ GRAPH 0xe18080 - 0xe180a1 0xe180a3 - 0xe180a7 0xe180a9 0xe180aa 0xe180ac - 0xe180b2 GRAPH 0xe180b6 - 0xe180b9 0xe18180 - 0xe18199 PUNCT 0xe1818a - 0xe1818f PRINT 0xe18080 - 0xe180a1 0xe180a3 - 0xe180a7 0xe180a9 0xe180aa 0xe180ac - 0xe180b2 PRINT 0xe180b6 - 0xe180b9 0xe18180 - 0xe18199 SWIDTH1 0xe18080 - 0xe180a1 0xe180a3 - 0xe180a7 0xe180a9 0xe180aa 0xe180ac - 0xe180b2 SWIDTH1 0xe180b6 - 0xe180b9 0xe18180 - 0xe18199 /* * U+10A0 - U+10FF : Georgian */ ALPHA 0xe182a0 - 0xe18385 GRAPH 0xe182a0 - 0xe18385 0xe18390 - 0xe183b8 0xe183bb PUNCT 0xe183bb UPPER 0xe182a0 - 0xe18385 PRINT 0xe182a0 - 0xe18385 0xe18390 - 0xe183b8 0xe183bb SWIDTH1 0xe182a0 - 0xe18385 0xe18390 - 0xe183b8 0xe183bb /* * U+1100 - U+11FF : Hangul Jamo */ GRAPH 0xe18480 - 0xe18599 0xe1859f - 0xe186a2 0xe186a8 - 0xe187b9 PRINT 0xe18480 - 0xe18599 0xe1859f - 0xe186a2 0xe186a8 - 0xe187b9 SWIDTH1 0xe185a0 - 0xe186a2 0xe186a8 - 0xe187b9 SWIDTH2 0xe18480 - 0xe18599 0xe1859f /* * U+1200 - U+137F : Ethiopic */ GRAPH 0xe18880 - 0xe18886 0xe18888 - 0xe18986 0xe18988 0xe1898a - 0xe1898d GRAPH 0xe18990 - 0xe18996 0xe18998 0xe1899a - 0xe1899d 0xe189a0 - 0xe18a86 GRAPH 0xe18a88 0xe18a8a - 0xe18a8d 0xe18a90 - 0xe18aae 0xe18ab0 0xe18ab2 - 0xe18ab5 GRAPH 0xe18ab8 - 0xe18abe 0xe18b80 0xe18b82 - 0xe18b85 0xe18b88 - 0xe18b8e GRAPH 0xe18b90 - 0xe18b96 0xe18b98 - 0xe18bae 0xe18bb0 - 0xe18c8e 0xe18c90 GRAPH 0xe18c92 - 0xe18c95 0xe18c98 - 0xe18c9e 0xe18ca0 - 0xe18d86 0xe18d88 - 0xe18d9a GRAPH 0xe18da1 - 0xe18dbc PUNCT 0xe18da1 - 0xe18da8 PRINT 0xe18880 - 0xe18886 0xe18888 - 0xe18986 0xe18988 0xe1898a - 0xe1898d PRINT 0xe18990 - 0xe18996 0xe18998 0xe1899a - 0xe1899d 0xe189a0 - 0xe18a86 PRINT 0xe18a88 0xe18a8a - 0xe18a8d 0xe18a90 - 0xe18aae 0xe18ab0 0xe18ab2 - 0xe18ab5 PRINT 0xe18ab8 - 0xe18abe 0xe18b80 0xe18b82 - 0xe18b85 0xe18b88 - 0xe18b8e PRINT 0xe18b90 - 0xe18b96 0xe18b98 - 0xe18bae 0xe18bb0 - 0xe18c8e 0xe18c90 PRINT 0xe18c92 - 0xe18c95 0xe18c98 - 0xe18c9e 0xe18ca0 - 0xe18d86 0xe18d88 - 0xe18d9a PRINT 0xe18da1 - 0xe18dbc SPECIAL 0xe18db2 - 0xe18dbc PHONOGRAM 0xe18880 - 0xe18886 0xe18888 - 0xe18986 0xe18988 0xe1898a - 0xe1898d PHONOGRAM 0xe18990 - 0xe18996 0xe18998 0xe1899a - 0xe1899d 0xe189a0 - 0xe18a86 PHONOGRAM 0xe18a88 0xe18a8a - 0xe18a8d 0xe18a90 - 0xe18aae 0xe18ab0 0xe18ab2 - 0xe18ab5 PHONOGRAM 0xe18ab8 - 0xe18abe 0xe18b80 0xe18b82 - 0xe18b85 0xe18b88 - 0xe18b8e PHONOGRAM 0xe18b90 - 0xe18b96 0xe18b98 - 0xe18bae 0xe18bb0 - 0xe18c8e 0xe18c90 PHONOGRAM 0xe18c92 - 0xe18c95 0xe18c98 - 0xe18c9e 0xe18ca0 - 0xe18d86 0xe18d88 - 0xe18d9a SWIDTH1 0xe18880 - 0xe18886 0xe18888 - 0xe18986 0xe18988 0xe1898a - 0xe1898d SWIDTH1 0xe18990 - 0xe18996 0xe18998 0xe1899a - 0xe1899d 0xe189a0 - 0xe18a86 SWIDTH1 0xe18a88 0xe18a8a - 0xe18a8d 0xe18a90 - 0xe18aae 0xe18ab0 0xe18ab2 - 0xe18ab5 SWIDTH1 0xe18ab8 - 0xe18abe 0xe18b80 0xe18b82 - 0xe18b85 0xe18b88 - 0xe18b8e SWIDTH1 0xe18b90 - 0xe18b96 0xe18b98 - 0xe18bae 0xe18bb0 - 0xe18c8e 0xe18c90 SWIDTH1 0xe18c92 - 0xe18c95 0xe18c98 - 0xe18c9e 0xe18ca0 - 0xe18d86 0xe18d88 - 0xe18d9a SWIDTH1 0xe18da1 - 0xe18dbc /* * U+13A0 - U+13FF : Cherokee */ GRAPH 0xe18ea0 - 0xe18fb4 PRINT 0xe18ea0 - 0xe18fb4 SWIDTH1 0xe18ea0 - 0xe18fb4 /* * U+1400 - U+167F : Unified Canadian Aboriginal Syllabics */ GRAPH 0xe19081 - 0xe199b6 PUNCT 0xe199ad 0xe199ae PRINT 0xe19081 - 0xe199b6 PHONOGRAM 0xe19081 - 0xe199ac 0xe199af - 0xe199b6 SWIDTH1 0xe19081 - 0xe199b6 /* * U+1680 - U+169F : Ogham */ GRAPH 0xe19a81 - 0xe19a9c PUNCT 0xe19a9b 0xe19a9c SPACE 0xe19a80 BLANK 0xe19a80 PRINT 0xe19a80 - 0xe19a9c SWIDTH1 0xe19a80 - 0xe19a9c /* * U+16A0 - U+16FF : Runic */ GRAPH 0xe19aa0 - 0xe19bb0 PUNCT 0xe19bab - 0xe19bad PRINT 0xe19aa0 - 0xe19bb0 SPECIAL 0xe19bae - 0xe19bb0 SWIDTH1 0xe19aa0 - 0xe19bb0 /* * U+1700 - U+171F : Tagalog */ GRAPH 0xe19c80 - 0xe19c8c 0xe19c8e - 0xe19c94 PRINT 0xe19c80 - 0xe19c8c 0xe19c8e - 0xe19c94 SWIDTH1 0xe19c80 - 0xe19c8c 0xe19c8e - 0xe19c94 /* * U+1720 - U+173F : Hanunoo */ GRAPH 0xe19ca0 - 0xe19cb6 PUNCT 0xe19cb5 0xe19cb6 PRINT 0xe19ca0 - 0xe19cb6 SWIDTH1 0xe19ca0 - 0xe19cb6 /* * U+1740 - U+175F : Buhid */ GRAPH 0xe19d80 - 0xe19d93 PRINT 0xe19d80 - 0xe19d93 SWIDTH1 0xe19d80 - 0xe19d93 /* * U+1760 - U+177F : Tagbanwa */ GRAPH 0xe19da0 - 0xe19dac 0xe19dae - 0xe19db0 0xe19db2 0xe19db3 PRINT 0xe19da0 - 0xe19dac 0xe19dae - 0xe19db0 0xe19db2 0xe19db3 SWIDTH1 0xe19da0 - 0xe19dac 0xe19dae - 0xe19db0 0xe19db2 0xe19db3 /* * U+1780 - U+17FF : Khmer */ GRAPH 0xe19e80 - 0xe19f9c 0xe19fa0 - 0xe19fa9 PUNCT 0xe19f94 - 0xe19f96 0xe19f98 - 0xe19f9b PRINT 0xe19e80 - 0xe19f9c 0xe19fa0 - 0xe19fa9 SWIDTH1 0xe19e80 - 0xe19f9c 0xe19fa0 - 0xe19fa9 /* * U+1800 - U+18AF : Mongolian */ CONTROL 0xe1a08e GRAPH 0xe1a080 - 0xe1a08d 0xe1a090 - 0xe1a099 0xe1a0a0 - 0xe1a1b7 0xe1a280 - 0xe1a2a9 PUNCT 0xe1a080 - 0xe1a08a PRINT 0xe1a080 - 0xe1a08d 0xe1a090 - 0xe1a099 0xe1a0a0 - 0xe1a1b7 0xe1a280 - 0xe1a2a9 SWIDTH1 0xe1a080 - 0xe1a08d 0xe1a090 - 0xe1a099 0xe1a0a0 - 0xe1a1b7 0xe1a280 - 0xe1a2a9 /* * U+1DC0 - U+1DFF : Combining Diacritical Marks Supplement */ GRAPH 0xe1b780 - 0xe1b783 PRINT 0xe1b780 - 0xe1b783 SWIDTH0 0xe1b780 - 0xe1b783 /* * U+1E00 - U+1EFF : Latin Extended Additional */ ALPHA 0xe1b880 - 0xe1ba9b 0xe1baa0 - 0xe1bbb9 GRAPH 0xe1b880 - 0xe1ba9b 0xe1baa0 - 0xe1bbb9 LOWER 0xe1b881 0xe1b883 0xe1b885 0xe1b887 0xe1b889 0xe1b88b 0xe1b88d LOWER 0xe1b88f 0xe1b891 0xe1b893 0xe1b895 0xe1b897 0xe1b899 0xe1b89b LOWER 0xe1b89d 0xe1b89f 0xe1b8a1 0xe1b8a3 0xe1b8a5 0xe1b8a7 0xe1b8a9 LOWER 0xe1b8ab 0xe1b8ad 0xe1b8af 0xe1b8b1 0xe1b8b3 0xe1b8b5 0xe1b8b7 LOWER 0xe1b8b9 0xe1b8bb 0xe1b8bd 0xe1b8bf 0xe1b981 0xe1b983 0xe1b985 LOWER 0xe1b987 0xe1b989 0xe1b98b 0xe1b98d 0xe1b98f 0xe1b991 0xe1b993 LOWER 0xe1b995 0xe1b997 0xe1b999 0xe1b99b 0xe1b99d 0xe1b99f 0xe1b9a1 LOWER 0xe1b9a3 0xe1b9a5 0xe1b9a7 0xe1b9a9 0xe1b9ab 0xe1b9ad 0xe1b9af LOWER 0xe1b9b1 0xe1b9b3 0xe1b9b5 0xe1b9b7 0xe1b9b9 0xe1b9bb 0xe1b9bd LOWER 0xe1b9bf 0xe1ba81 0xe1ba83 0xe1ba85 0xe1ba87 0xe1ba89 0xe1ba8b LOWER 0xe1ba8d 0xe1ba8f 0xe1ba91 0xe1ba93 0xe1ba95 - 0xe1ba9b 0xe1baa1 LOWER 0xe1baa3 0xe1baa5 0xe1baa7 0xe1baa9 0xe1baab 0xe1baad 0xe1baaf LOWER 0xe1bab1 0xe1bab3 0xe1bab5 0xe1bab7 0xe1bab9 0xe1babb 0xe1babd LOWER 0xe1babf 0xe1bb81 0xe1bb83 0xe1bb85 0xe1bb87 0xe1bb89 0xe1bb8b LOWER 0xe1bb8d 0xe1bb8f 0xe1bb91 0xe1bb93 0xe1bb95 0xe1bb97 0xe1bb99 LOWER 0xe1bb9b 0xe1bb9d 0xe1bb9f 0xe1bba1 0xe1bba3 0xe1bba5 0xe1bba7 LOWER 0xe1bba9 0xe1bbab 0xe1bbad 0xe1bbaf 0xe1bbb1 0xe1bbb3 0xe1bbb5 LOWER 0xe1bbb7 0xe1bbb9 UPPER 0xe1b880 0xe1b882 0xe1b884 0xe1b886 0xe1b888 0xe1b88a 0xe1b88c UPPER 0xe1b88e 0xe1b890 0xe1b892 0xe1b894 0xe1b896 0xe1b898 0xe1b89a UPPER 0xe1b89c 0xe1b89e 0xe1b8a0 0xe1b8a2 0xe1b8a4 0xe1b8a6 0xe1b8a8 UPPER 0xe1b8aa 0xe1b8ac 0xe1b8ae 0xe1b8b0 0xe1b8b2 0xe1b8b4 0xe1b8b6 UPPER 0xe1b8b8 0xe1b8ba 0xe1b8bc 0xe1b8be 0xe1b980 0xe1b982 0xe1b984 UPPER 0xe1b986 0xe1b988 0xe1b98a 0xe1b98c 0xe1b98e 0xe1b990 0xe1b992 UPPER 0xe1b994 0xe1b996 0xe1b998 0xe1b99a 0xe1b99c 0xe1b99e 0xe1b9a0 UPPER 0xe1b9a2 0xe1b9a4 0xe1b9a6 0xe1b9a8 0xe1b9aa 0xe1b9ac 0xe1b9ae UPPER 0xe1b9b0 0xe1b9b2 0xe1b9b4 0xe1b9b6 0xe1b9b8 0xe1b9ba 0xe1b9bc UPPER 0xe1b9be 0xe1ba80 0xe1ba82 0xe1ba84 0xe1ba86 0xe1ba88 0xe1ba8a UPPER 0xe1ba8c 0xe1ba8e 0xe1ba90 0xe1ba92 0xe1ba94 0xe1baa0 0xe1baa2 UPPER 0xe1baa4 0xe1baa6 0xe1baa8 0xe1baaa 0xe1baac 0xe1baae 0xe1bab0 UPPER 0xe1bab2 0xe1bab4 0xe1bab6 0xe1bab8 0xe1baba 0xe1babc 0xe1babe UPPER 0xe1bb80 0xe1bb82 0xe1bb84 0xe1bb86 0xe1bb88 0xe1bb8a 0xe1bb8c UPPER 0xe1bb8e 0xe1bb90 0xe1bb92 0xe1bb94 0xe1bb96 0xe1bb98 0xe1bb9a UPPER 0xe1bb9c 0xe1bb9e 0xe1bba0 0xe1bba2 0xe1bba4 0xe1bba6 0xe1bba8 UPPER 0xe1bbaa 0xe1bbac 0xe1bbae 0xe1bbb0 0xe1bbb2 0xe1bbb4 0xe1bbb6 UPPER 0xe1bbb8 PRINT 0xe1b880 - 0xe1ba9b 0xe1baa0 - 0xe1bbb9 SWIDTH1 0xe1b880 - 0xe1ba9b 0xe1baa0 - 0xe1bbb9 MAPUPPER < 0xe1b881 0xe1b880 > MAPUPPER < 0xe1b883 0xe1b882 > MAPUPPER < 0xe1b885 0xe1b884 > MAPUPPER < 0xe1b887 0xe1b886 > MAPUPPER < 0xe1b889 0xe1b888 > MAPUPPER < 0xe1b88b 0xe1b88a > MAPUPPER < 0xe1b88d 0xe1b88c > MAPUPPER < 0xe1b88f 0xe1b88e > MAPUPPER < 0xe1b891 0xe1b890 > MAPUPPER < 0xe1b893 0xe1b892 > MAPUPPER < 0xe1b895 0xe1b894 > MAPUPPER < 0xe1b897 0xe1b896 > MAPUPPER < 0xe1b899 0xe1b898 > MAPUPPER < 0xe1b89b 0xe1b89a > MAPUPPER < 0xe1b89d 0xe1b89c > MAPUPPER < 0xe1b89f 0xe1b89e > MAPUPPER < 0xe1b8a1 0xe1b8a0 > MAPUPPER < 0xe1b8a3 0xe1b8a2 > MAPUPPER < 0xe1b8a5 0xe1b8a4 > MAPUPPER < 0xe1b8a7 0xe1b8a6 > MAPUPPER < 0xe1b8a9 0xe1b8a8 > MAPUPPER < 0xe1b8ab 0xe1b8aa > MAPUPPER < 0xe1b8ad 0xe1b8ac > MAPUPPER < 0xe1b8af 0xe1b8ae > MAPUPPER < 0xe1b8b1 0xe1b8b0 > MAPUPPER < 0xe1b8b3 0xe1b8b2 > MAPUPPER < 0xe1b8b5 0xe1b8b4 > MAPUPPER < 0xe1b8b7 0xe1b8b6 > MAPUPPER < 0xe1b8b9 0xe1b8b8 > MAPUPPER < 0xe1b8bb 0xe1b8ba > MAPUPPER < 0xe1b8bd 0xe1b8bc > MAPUPPER < 0xe1b8bf 0xe1b8be > MAPUPPER < 0xe1b981 0xe1b980 > MAPUPPER < 0xe1b983 0xe1b982 > MAPUPPER < 0xe1b985 0xe1b984 > MAPUPPER < 0xe1b987 0xe1b986 > MAPUPPER < 0xe1b989 0xe1b988 > MAPUPPER < 0xe1b98b 0xe1b98a > MAPUPPER < 0xe1b98d 0xe1b98c > MAPUPPER < 0xe1b98f 0xe1b98e > MAPUPPER < 0xe1b991 0xe1b990 > MAPUPPER < 0xe1b993 0xe1b992 > MAPUPPER < 0xe1b995 0xe1b994 > MAPUPPER < 0xe1b997 0xe1b996 > MAPUPPER < 0xe1b999 0xe1b998 > MAPUPPER < 0xe1b99b 0xe1b99a > MAPUPPER < 0xe1b99d 0xe1b99c > MAPUPPER < 0xe1b99f 0xe1b99e > MAPUPPER < 0xe1b9a1 0xe1b9a0 > MAPUPPER < 0xe1b9a3 0xe1b9a2 > MAPUPPER < 0xe1b9a5 0xe1b9a4 > MAPUPPER < 0xe1b9a7 0xe1b9a6 > MAPUPPER < 0xe1b9a9 0xe1b9a8 > MAPUPPER < 0xe1b9ab 0xe1b9aa > MAPUPPER < 0xe1b9ad 0xe1b9ac > MAPUPPER < 0xe1b9af 0xe1b9ae > MAPUPPER < 0xe1b9b1 0xe1b9b0 > MAPUPPER < 0xe1b9b3 0xe1b9b2 > MAPUPPER < 0xe1b9b5 0xe1b9b4 > MAPUPPER < 0xe1b9b7 0xe1b9b6 > MAPUPPER < 0xe1b9b9 0xe1b9b8 > MAPUPPER < 0xe1b9bb 0xe1b9ba > MAPUPPER < 0xe1b9bd 0xe1b9bc > MAPUPPER < 0xe1b9bf 0xe1b9be > MAPUPPER < 0xe1ba81 0xe1ba80 > MAPUPPER < 0xe1ba83 0xe1ba82 > MAPUPPER < 0xe1ba85 0xe1ba84 > MAPUPPER < 0xe1ba87 0xe1ba86 > MAPUPPER < 0xe1ba89 0xe1ba88 > MAPUPPER < 0xe1ba8b 0xe1ba8a > MAPUPPER < 0xe1ba8d 0xe1ba8c > MAPUPPER < 0xe1ba8f 0xe1ba8e > MAPUPPER < 0xe1ba91 0xe1ba90 > MAPUPPER < 0xe1ba93 0xe1ba92 > MAPUPPER < 0xe1ba95 0xe1ba94 > MAPUPPER < 0xe1ba9b 0xe1b9a0 > MAPUPPER < 0xe1baa1 0xe1baa0 > MAPUPPER < 0xe1baa3 0xe1baa2 > MAPUPPER < 0xe1baa5 0xe1baa4 > MAPUPPER < 0xe1baa7 0xe1baa6 > MAPUPPER < 0xe1baa9 0xe1baa8 > MAPUPPER < 0xe1baab 0xe1baaa > MAPUPPER < 0xe1baad 0xe1baac > MAPUPPER < 0xe1baaf 0xe1baae > MAPUPPER < 0xe1bab1 0xe1bab0 > MAPUPPER < 0xe1bab3 0xe1bab2 > MAPUPPER < 0xe1bab5 0xe1bab4 > MAPUPPER < 0xe1bab7 0xe1bab6 > MAPUPPER < 0xe1bab9 0xe1bab8 > MAPUPPER < 0xe1babb 0xe1baba > MAPUPPER < 0xe1babd 0xe1babc > MAPUPPER < 0xe1babf 0xe1babe > MAPUPPER < 0xe1bb81 0xe1bb80 > MAPUPPER < 0xe1bb83 0xe1bb82 > MAPUPPER < 0xe1bb85 0xe1bb84 > MAPUPPER < 0xe1bb87 0xe1bb86 > MAPUPPER < 0xe1bb89 0xe1bb88 > MAPUPPER < 0xe1bb8b 0xe1bb8a > MAPUPPER < 0xe1bb8d 0xe1bb8c > MAPUPPER < 0xe1bb8f 0xe1bb8e > MAPUPPER < 0xe1bb91 0xe1bb90 > MAPUPPER < 0xe1bb93 0xe1bb92 > MAPUPPER < 0xe1bb95 0xe1bb94 > MAPUPPER < 0xe1bb97 0xe1bb96 > MAPUPPER < 0xe1bb99 0xe1bb98 > MAPUPPER < 0xe1bb9b 0xe1bb9a > MAPUPPER < 0xe1bb9d 0xe1bb9c > MAPUPPER < 0xe1bb9f 0xe1bb9e > MAPUPPER < 0xe1bba1 0xe1bba0 > MAPUPPER < 0xe1bba3 0xe1bba2 > MAPUPPER < 0xe1bba5 0xe1bba4 > MAPUPPER < 0xe1bba7 0xe1bba6 > MAPUPPER < 0xe1bba9 0xe1bba8 > MAPUPPER < 0xe1bbab 0xe1bbaa > MAPUPPER < 0xe1bbad 0xe1bbac > MAPUPPER < 0xe1bbaf 0xe1bbae > MAPUPPER < 0xe1bbb1 0xe1bbb0 > MAPUPPER < 0xe1bbb3 0xe1bbb2 > MAPUPPER < 0xe1bbb5 0xe1bbb4 > MAPUPPER < 0xe1bbb7 0xe1bbb6 > MAPUPPER < 0xe1bbb9 0xe1bbb8 > MAPLOWER < 0xe1b880 0xe1b881 > MAPLOWER < 0xe1b882 0xe1b883 > MAPLOWER < 0xe1b884 0xe1b885 > MAPLOWER < 0xe1b886 0xe1b887 > MAPLOWER < 0xe1b888 0xe1b889 > MAPLOWER < 0xe1b88a 0xe1b88b > MAPLOWER < 0xe1b88c 0xe1b88d > MAPLOWER < 0xe1b88e 0xe1b88f > MAPLOWER < 0xe1b890 0xe1b891 > MAPLOWER < 0xe1b892 0xe1b893 > MAPLOWER < 0xe1b894 0xe1b895 > MAPLOWER < 0xe1b896 0xe1b897 > MAPLOWER < 0xe1b898 0xe1b899 > MAPLOWER < 0xe1b89a 0xe1b89b > MAPLOWER < 0xe1b89c 0xe1b89d > MAPLOWER < 0xe1b89e 0xe1b89f > MAPLOWER < 0xe1b8a0 0xe1b8a1 > MAPLOWER < 0xe1b8a2 0xe1b8a3 > MAPLOWER < 0xe1b8a4 0xe1b8a5 > MAPLOWER < 0xe1b8a6 0xe1b8a7 > MAPLOWER < 0xe1b8a8 0xe1b8a9 > MAPLOWER < 0xe1b8aa 0xe1b8ab > MAPLOWER < 0xe1b8ac 0xe1b8ad > MAPLOWER < 0xe1b8ae 0xe1b8af > MAPLOWER < 0xe1b8b0 0xe1b8b1 > MAPLOWER < 0xe1b8b2 0xe1b8b3 > MAPLOWER < 0xe1b8b4 0xe1b8b5 > MAPLOWER < 0xe1b8b6 0xe1b8b7 > MAPLOWER < 0xe1b8b8 0xe1b8b9 > MAPLOWER < 0xe1b8ba 0xe1b8bb > MAPLOWER < 0xe1b8bc 0xe1b8bd > MAPLOWER < 0xe1b8be 0xe1b8bf > MAPLOWER < 0xe1b980 0xe1b981 > MAPLOWER < 0xe1b982 0xe1b983 > MAPLOWER < 0xe1b984 0xe1b985 > MAPLOWER < 0xe1b986 0xe1b987 > MAPLOWER < 0xe1b988 0xe1b989 > MAPLOWER < 0xe1b98a 0xe1b98b > MAPLOWER < 0xe1b98c 0xe1b98d > MAPLOWER < 0xe1b98e 0xe1b98f > MAPLOWER < 0xe1b990 0xe1b991 > MAPLOWER < 0xe1b992 0xe1b993 > MAPLOWER < 0xe1b994 0xe1b995 > MAPLOWER < 0xe1b996 0xe1b997 > MAPLOWER < 0xe1b998 0xe1b999 > MAPLOWER < 0xe1b99a 0xe1b99b > MAPLOWER < 0xe1b99c 0xe1b99d > MAPLOWER < 0xe1b99e 0xe1b99f > MAPLOWER < 0xe1b9a0 0xe1b9a1 > MAPLOWER < 0xe1b9a2 0xe1b9a3 > MAPLOWER < 0xe1b9a4 0xe1b9a5 > MAPLOWER < 0xe1b9a6 0xe1b9a7 > MAPLOWER < 0xe1b9a8 0xe1b9a9 > MAPLOWER < 0xe1b9aa 0xe1b9ab > MAPLOWER < 0xe1b9ac 0xe1b9ad > MAPLOWER < 0xe1b9ae 0xe1b9af > MAPLOWER < 0xe1b9b0 0xe1b9b1 > MAPLOWER < 0xe1b9b2 0xe1b9b3 > MAPLOWER < 0xe1b9b4 0xe1b9b5 > MAPLOWER < 0xe1b9b6 0xe1b9b7 > MAPLOWER < 0xe1b9b8 0xe1b9b9 > MAPLOWER < 0xe1b9ba 0xe1b9bb > MAPLOWER < 0xe1b9bc 0xe1b9bd > MAPLOWER < 0xe1b9be 0xe1b9bf > MAPLOWER < 0xe1ba80 0xe1ba81 > MAPLOWER < 0xe1ba82 0xe1ba83 > MAPLOWER < 0xe1ba84 0xe1ba85 > MAPLOWER < 0xe1ba86 0xe1ba87 > MAPLOWER < 0xe1ba88 0xe1ba89 > MAPLOWER < 0xe1ba8a 0xe1ba8b > MAPLOWER < 0xe1ba8c 0xe1ba8d > MAPLOWER < 0xe1ba8e 0xe1ba8f > MAPLOWER < 0xe1ba90 0xe1ba91 > MAPLOWER < 0xe1ba92 0xe1ba93 > MAPLOWER < 0xe1ba94 0xe1ba95 > MAPLOWER < 0xe1baa0 0xe1baa1 > MAPLOWER < 0xe1baa2 0xe1baa3 > MAPLOWER < 0xe1baa4 0xe1baa5 > MAPLOWER < 0xe1baa6 0xe1baa7 > MAPLOWER < 0xe1baa8 0xe1baa9 > MAPLOWER < 0xe1baaa 0xe1baab > MAPLOWER < 0xe1baac 0xe1baad > MAPLOWER < 0xe1baae 0xe1baaf > MAPLOWER < 0xe1bab0 0xe1bab1 > MAPLOWER < 0xe1bab2 0xe1bab3 > MAPLOWER < 0xe1bab4 0xe1bab5 > MAPLOWER < 0xe1bab6 0xe1bab7 > MAPLOWER < 0xe1bab8 0xe1bab9 > MAPLOWER < 0xe1baba 0xe1babb > MAPLOWER < 0xe1babc 0xe1babd > MAPLOWER < 0xe1babe 0xe1babf > MAPLOWER < 0xe1bb80 0xe1bb81 > MAPLOWER < 0xe1bb82 0xe1bb83 > MAPLOWER < 0xe1bb84 0xe1bb85 > MAPLOWER < 0xe1bb86 0xe1bb87 > MAPLOWER < 0xe1bb88 0xe1bb89 > MAPLOWER < 0xe1bb8a 0xe1bb8b > MAPLOWER < 0xe1bb8c 0xe1bb8d > MAPLOWER < 0xe1bb8e 0xe1bb8f > MAPLOWER < 0xe1bb90 0xe1bb91 > MAPLOWER < 0xe1bb92 0xe1bb93 > MAPLOWER < 0xe1bb94 0xe1bb95 > MAPLOWER < 0xe1bb96 0xe1bb97 > MAPLOWER < 0xe1bb98 0xe1bb99 > MAPLOWER < 0xe1bb9a 0xe1bb9b > MAPLOWER < 0xe1bb9c 0xe1bb9d > MAPLOWER < 0xe1bb9e 0xe1bb9f > MAPLOWER < 0xe1bba0 0xe1bba1 > MAPLOWER < 0xe1bba2 0xe1bba3 > MAPLOWER < 0xe1bba4 0xe1bba5 > MAPLOWER < 0xe1bba6 0xe1bba7 > MAPLOWER < 0xe1bba8 0xe1bba9 > MAPLOWER < 0xe1bbaa 0xe1bbab > MAPLOWER < 0xe1bbac 0xe1bbad > MAPLOWER < 0xe1bbae 0xe1bbaf > MAPLOWER < 0xe1bbb0 0xe1bbb1 > MAPLOWER < 0xe1bbb2 0xe1bbb3 > MAPLOWER < 0xe1bbb4 0xe1bbb5 > MAPLOWER < 0xe1bbb6 0xe1bbb7 > MAPLOWER < 0xe1bbb8 0xe1bbb9 > /* * U+1F00 - U+1FFF : Greek Extended */ ALPHA 0xe1bc80 - 0xe1bc95 0xe1bc98 - 0xe1bc9d 0xe1bca0 - 0xe1bd85 0xe1bd88 - 0xe1bd8d ALPHA 0xe1bd90 - 0xe1bd97 0xe1bd99 0xe1bd9b 0xe1bd9d 0xe1bd9f - 0xe1bdbd ALPHA 0xe1be80 - 0xe1beb4 0xe1beb6 - 0xe1bebc 0xe1bebe 0xe1bf82 - 0xe1bf84 ALPHA 0xe1bf86 - 0xe1bf8c 0xe1bf90 - 0xe1bf93 0xe1bf96 - 0xe1bf9b 0xe1bfa0 - 0xe1bfac ALPHA 0xe1bfb2 - 0xe1bfb4 0xe1bfb6 - 0xe1bfbc GRAPH 0xe1bc80 - 0xe1bc95 0xe1bc98 - 0xe1bc9d 0xe1bca0 - 0xe1bd85 0xe1bd88 - 0xe1bd8d GRAPH 0xe1bd90 - 0xe1bd97 0xe1bd99 0xe1bd9b 0xe1bd9d 0xe1bd9f - 0xe1bdbd GRAPH 0xe1be80 - 0xe1beb4 0xe1beb6 - 0xe1bf84 0xe1bf86 - 0xe1bf93 0xe1bf96 - 0xe1bf9b GRAPH 0xe1bf9d - 0xe1bfaf 0xe1bfb2 - 0xe1bfb4 0xe1bfb6 - 0xe1bfbe LOWER 0xe1bc80 - 0xe1bc87 0xe1bc90 - 0xe1bc95 0xe1bca0 - 0xe1bca7 0xe1bcb0 - 0xe1bcb7 LOWER 0xe1bd80 - 0xe1bd85 0xe1bd90 - 0xe1bd97 0xe1bda0 - 0xe1bda7 0xe1bdb0 - 0xe1bdbd LOWER 0xe1be80 - 0xe1be87 0xe1be90 - 0xe1be97 0xe1bea0 - 0xe1bea7 0xe1beb0 - 0xe1beb4 LOWER 0xe1beb6 0xe1beb7 0xe1bebe 0xe1bf82 - 0xe1bf84 0xe1bf86 0xe1bf87 LOWER 0xe1bf90 - 0xe1bf93 0xe1bf96 0xe1bf97 0xe1bfa0 - 0xe1bfa7 0xe1bfb2 - 0xe1bfb4 LOWER 0xe1bfb6 0xe1bfb7 PUNCT 0xe1bebd 0xe1bebf - 0xe1bf81 0xe1bf8d - 0xe1bf8f 0xe1bf9d - 0xe1bf9f PUNCT 0xe1bfad - 0xe1bfaf 0xe1bfbd 0xe1bfbe UPPER 0xe1bc88 - 0xe1bc8f 0xe1bc98 - 0xe1bc9d 0xe1bca8 - 0xe1bcaf 0xe1bcb8 - 0xe1bcbf UPPER 0xe1bd88 - 0xe1bd8d 0xe1bd99 0xe1bd9b 0xe1bd9d 0xe1bd9f 0xe1bda8 - 0xe1bdaf UPPER 0xe1beb8 - 0xe1bebb 0xe1bf88 - 0xe1bf8b 0xe1bf98 - 0xe1bf9b 0xe1bfa8 - 0xe1bfac UPPER 0xe1bfb8 - 0xe1bfbb PRINT 0xe1bc80 - 0xe1bc95 0xe1bc98 - 0xe1bc9d 0xe1bca0 - 0xe1bd85 0xe1bd88 - 0xe1bd8d PRINT 0xe1bd90 - 0xe1bd97 0xe1bd99 0xe1bd9b 0xe1bd9d 0xe1bd9f - 0xe1bdbd PRINT 0xe1be80 - 0xe1beb4 0xe1beb6 - 0xe1bf84 0xe1bf86 - 0xe1bf93 0xe1bf96 - 0xe1bf9b PRINT 0xe1bf9d - 0xe1bfaf 0xe1bfb2 - 0xe1bfb4 0xe1bfb6 - 0xe1bfbe SWIDTH1 0xe1bc80 - 0xe1bc95 0xe1bc98 - 0xe1bc9d 0xe1bca0 - 0xe1bd85 0xe1bd88 - 0xe1bd8d SWIDTH1 0xe1bd90 - 0xe1bd97 0xe1bd99 0xe1bd9b 0xe1bd9d 0xe1bd9f - 0xe1bdbd SWIDTH1 0xe1be80 - 0xe1beb4 0xe1beb6 - 0xe1bf84 0xe1bf86 - 0xe1bf93 0xe1bf96 - 0xe1bf9b SWIDTH1 0xe1bf9d - 0xe1bfaf 0xe1bfb2 - 0xe1bfb4 0xe1bfb6 - 0xe1bfbe MAPUPPER < 0xe1bc80 - 0xe1bc87 : 0xe1bc88 > MAPUPPER < 0xe1bc90 - 0xe1bc95 : 0xe1bc98 > MAPUPPER < 0xe1bca0 - 0xe1bca7 : 0xe1bca8 > MAPUPPER < 0xe1bcb0 - 0xe1bcb7 : 0xe1bcb8 > MAPUPPER < 0xe1bd80 - 0xe1bd85 : 0xe1bd88 > MAPUPPER < 0xe1bd91 0xe1bd99 > MAPUPPER < 0xe1bd93 0xe1bd9b > MAPUPPER < 0xe1bd95 0xe1bd9d > MAPUPPER < 0xe1bd97 0xe1bd9f > MAPUPPER < 0xe1bda0 - 0xe1bda7 : 0xe1bda8 > MAPUPPER < 0xe1bdb0 - 0xe1bdb1 : 0xe1beba > MAPUPPER < 0xe1bdb2 - 0xe1bdb5 : 0xe1bf88 > MAPUPPER < 0xe1bdb6 - 0xe1bdb7 : 0xe1bf9a > MAPUPPER < 0xe1bdb8 - 0xe1bdb9 : 0xe1bfb8 > MAPUPPER < 0xe1bdba - 0xe1bdbb : 0xe1bfaa > MAPUPPER < 0xe1bdbc - 0xe1bdbd : 0xe1bfba > MAPUPPER < 0xe1be80 - 0xe1be87 : 0xe1be88 > MAPUPPER < 0xe1be90 - 0xe1be97 : 0xe1be98 > MAPUPPER < 0xe1bea0 - 0xe1bea7 : 0xe1bea8 > MAPUPPER < 0xe1beb0 - 0xe1beb1 : 0xe1beb8 > MAPUPPER < 0xe1beb3 0xe1bebc > MAPUPPER < 0xe1bebe 0xce99 > MAPUPPER < 0xe1bf83 0xe1bf8c > MAPUPPER < 0xe1bf90 - 0xe1bf91 : 0xe1bf98 > MAPUPPER < 0xe1bfa0 - 0xe1bfa1 : 0xe1bfa8 > MAPUPPER < 0xe1bfa5 0xe1bfac > MAPUPPER < 0xe1bfb3 0xe1bfbc > MAPLOWER < 0xe1bc88 - 0xe1bc8f : 0xe1bc80 > MAPLOWER < 0xe1bc98 - 0xe1bc9d : 0xe1bc90 > MAPLOWER < 0xe1bca8 - 0xe1bcaf : 0xe1bca0 > MAPLOWER < 0xe1bcb8 - 0xe1bcbf : 0xe1bcb0 > MAPLOWER < 0xe1bd88 - 0xe1bd8d : 0xe1bd80 > MAPLOWER < 0xe1bd99 0xe1bd91 > MAPLOWER < 0xe1bd9b 0xe1bd93 > MAPLOWER < 0xe1bd9d 0xe1bd95 > MAPLOWER < 0xe1bd9f 0xe1bd97 > MAPLOWER < 0xe1bda8 - 0xe1bdaf : 0xe1bda0 > MAPLOWER < 0xe1be88 - 0xe1be8f : 0xe1be80 > MAPLOWER < 0xe1be98 - 0xe1be9f : 0xe1be90 > MAPLOWER < 0xe1bea8 - 0xe1beaf : 0xe1bea0 > MAPLOWER < 0xe1beb8 - 0xe1beb9 : 0xe1beb0 > MAPLOWER < 0xe1beba - 0xe1bebb : 0xe1bdb0 > MAPLOWER < 0xe1bebc 0xe1beb3 > MAPLOWER < 0xe1bf88 - 0xe1bf8b : 0xe1bdb2 > MAPLOWER < 0xe1bf8c 0xe1bf83 > MAPLOWER < 0xe1bf98 - 0xe1bf99 : 0xe1bf90 > MAPLOWER < 0xe1bf9a - 0xe1bf9b : 0xe1bdb6 > MAPLOWER < 0xe1bfa8 - 0xe1bfa9 : 0xe1bfa0 > MAPLOWER < 0xe1bfaa - 0xe1bfab : 0xe1bdba > MAPLOWER < 0xe1bfac 0xe1bfa5 > MAPLOWER < 0xe1bfb8 - 0xe1bfb9 : 0xe1bdb8 > MAPLOWER < 0xe1bfba - 0xe1bfbb : 0xe1bdbc > MAPLOWER < 0xe1bfbc 0xe1bfb3 > /* * U+2000 - U+206F : General Punctuation */ CONTROL 0xe2808c - 0xe2808f 0xe280aa - 0xe280ae 0xe281a0 - 0xe281a3 0xe281aa - 0xe281af GRAPH 0xe28090 - 0xe280a7 0xe280b0 - 0xe28192 0xe28197 PUNCT 0xe28090 - 0xe280a7 0xe280b0 - 0xe28192 0xe28197 SPACE 0xe28080 - 0xe2808b 0xe280a8 0xe280a9 0xe280af 0xe2819f BLANK 0xe28080 - 0xe2808b 0xe280af 0xe2819f PRINT 0xe28080 - 0xe2808b 0xe28090 - 0xe280a9 0xe280af - 0xe28192 0xe28197 PRINT 0xe2819f SWIDTH1 0xe28080 - 0xe2808b 0xe28090 - 0xe280a9 0xe280af - 0xe28192 0xe28197 SWIDTH1 0xe2819f /* * U+2070 - U+209F : Superscripts and Subscripts */ ALPHA 0xe281b1 0xe281bf GRAPH 0xe281b0 0xe281b1 0xe281b4 - 0xe2828e LOWER 0xe281b1 0xe281bf PUNCT 0xe281ba - 0xe281be 0xe2828a - 0xe2828e PRINT 0xe281b0 0xe281b1 0xe281b4 - 0xe2828e SPECIAL 0xe281b0 0xe281b4 - 0xe281b9 0xe28280 - 0xe28289 SWIDTH1 0xe281b0 0xe281b1 0xe281b4 - 0xe2828e /* * U+20A0 - U+20CF : Currency Symbols */ GRAPH 0xe282a0 - 0xe282b1 PUNCT 0xe282a0 - 0xe282b1 PRINT 0xe282a0 - 0xe282b1 SWIDTH1 0xe282a0 - 0xe282b1 /* * U+20D0 - U+20FF : Combining Diacritical Marks for Symbols */ GRAPH 0xe28390 - 0xe283ab PRINT 0xe28390 - 0xe283ab SWIDTH0 0xe28390 - 0xe283ab /* * U+2100 - U+214F : Letterlike Symbols */ ALPHA 0xe28482 0xe28487 0xe2848a - 0xe28493 0xe28495 0xe28499 - 0xe2849d ALPHA 0xe284a4 0xe284a6 0xe284a8 0xe284aa - 0xe284ad 0xe284af - 0xe284b1 ALPHA 0xe284b3 0xe284b4 0xe284b9 0xe284bd - 0xe284bf 0xe28585 - 0xe28589 GRAPH 0xe28480 - 0xe284ba 0xe284bd - 0xe2858b LOWER 0xe2848a 0xe2848e 0xe2848f 0xe28493 0xe284af 0xe284b4 0xe284b9 LOWER 0xe284bd 0xe28586 - 0xe28589 PUNCT 0xe28480 0xe28481 0xe28483 - 0xe28486 0xe28488 0xe28489 0xe28494 PUNCT 0xe28496 - 0xe28498 0xe2849e - 0xe284a3 0xe284a5 0xe284a7 0xe284a9 PUNCT 0xe284ae 0xe284b2 0xe284ba 0xe28580 - 0xe28584 0xe2858a 0xe2858b UPPER 0xe28482 0xe28487 0xe2848b - 0xe2848d 0xe28490 - 0xe28492 0xe28495 UPPER 0xe28499 - 0xe2849d 0xe284a4 0xe284a6 0xe284a8 0xe284aa - 0xe284ad UPPER 0xe284b0 0xe284b1 0xe284b3 0xe284be 0xe284bf 0xe28585 PRINT 0xe28480 - 0xe284ba 0xe284bd - 0xe2858b SWIDTH1 0xe28480 - 0xe284ba 0xe284bd - 0xe2858b MAPLOWER < 0xe284a6 0xcf89 > MAPLOWER < 0xe284aa 'k' > MAPLOWER < 0xe284ab 0xc3a5 > /* * U+2150 - U+218F : Number Forms */ GRAPH 0xe28593 - 0xe28683 PRINT 0xe28593 - 0xe28683 SPECIAL 0xe28593 - 0xe28683 SWIDTH1 0xe28593 - 0xe28683 MAPUPPER < 0xe285b0 - 0xe285bf : 0xe285a0 > MAPLOWER < 0xe285a0 - 0xe285af : 0xe285b0 > /* * U+2190 - U+21FF : Arrows */ GRAPH 0xe28690 - 0xe287bf PUNCT 0xe28690 - 0xe287bf PRINT 0xe28690 - 0xe287bf SWIDTH1 0xe28690 - 0xe287bf /* * U+2200 - U+22FF : Mathematical Operators */ GRAPH 0xe28880 - 0xe28bbf PUNCT 0xe28880 - 0xe28bbf PRINT 0xe28880 - 0xe28bbf SWIDTH1 0xe28880 - 0xe28bbf /* * U+2300 - U+23FF : Miscellaneous Technical */ GRAPH 0xe28c80 - 0xe28f8e PUNCT 0xe28c80 - 0xe28f8e PRINT 0xe28c80 - 0xe28f8e SWIDTH1 0xe28c80 - 0xe28ca8 0xe28cab - 0xe28f8e SWIDTH2 0xe28ca9 0xe28caa /* * U+2400 - U+243F : Control Pictures */ GRAPH 0xe29080 - 0xe290a6 PUNCT 0xe29080 - 0xe290a6 PRINT 0xe29080 - 0xe290a6 SWIDTH1 0xe29080 - 0xe290a6 /* * U+2440 - U+245F : Optical Character Recognition */ GRAPH 0xe29180 - 0xe2918a PUNCT 0xe29180 - 0xe2918a PRINT 0xe29180 - 0xe2918a SWIDTH1 0xe29180 - 0xe2918a /* * U+2460 - U+24FF : Enclosed Alphanumerics */ GRAPH 0xe291a0 - 0xe293be PUNCT 0xe2929c - 0xe293a9 PRINT 0xe291a0 - 0xe293be SPECIAL 0xe291a0 - 0xe2929b 0xe293aa - 0xe293be SWIDTH1 0xe291a0 - 0xe293be MAPUPPER < 0xe29390 - 0xe293a9 : 0xe292b6 > MAPLOWER < 0xe292b6 - 0xe2938f : 0xe29390 > /* * U+2500 - U+257F : Box Drawing */ GRAPH 0xe29480 - 0xe295bf PUNCT 0xe29480 - 0xe295bf PRINT 0xe29480 - 0xe295bf SWIDTH1 0xe29480 - 0xe295bf /* * U+2580 - U+259F : Block Elements */ GRAPH 0xe29680 - 0xe2969f PUNCT 0xe29680 - 0xe2969f PRINT 0xe29680 - 0xe2969f SWIDTH1 0xe29680 - 0xe2969f /* * U+25A0 - U+25FF : Geometric Shapes */ GRAPH 0xe296a0 - 0xe297bf PUNCT 0xe296a0 - 0xe297bf PRINT 0xe296a0 - 0xe297bf SWIDTH1 0xe296a0 - 0xe297bf /* * U+2600 - U+26FF : Miscellaneous Symbols */ GRAPH 0xe29880 - 0xe29893 0xe29896 0xe29897 0xe29899 - 0xe299bd 0xe29a80 - 0xe29a89 PUNCT 0xe29880 - 0xe29893 0xe29896 0xe29897 0xe29899 - 0xe299bd 0xe29a80 - 0xe29a89 PRINT 0xe29880 - 0xe29893 0xe29896 0xe29897 0xe29899 - 0xe299bd 0xe29a80 - 0xe29a89 SWIDTH1 0xe29880 - 0xe29893 0xe29896 0xe29897 0xe29899 - 0xe299bd 0xe29a80 - 0xe29a89 /* * U+2700 - U+27BF : Dingbats */ GRAPH 0xe29c81 - 0xe29c84 0xe29c86 - 0xe29c89 0xe29c8c - 0xe29ca7 0xe29ca9 - 0xe29d8b GRAPH 0xe29d8d 0xe29d8f - 0xe29d92 0xe29d96 0xe29d98 - 0xe29d9e 0xe29da1 - 0xe29e94 GRAPH 0xe29e98 - 0xe29eaf 0xe29eb1 - 0xe29ebe PUNCT 0xe29c81 - 0xe29c84 0xe29c86 - 0xe29c89 0xe29c8c - 0xe29ca7 0xe29ca9 - 0xe29d8b PUNCT 0xe29d8d 0xe29d8f - 0xe29d92 0xe29d96 0xe29d98 - 0xe29d9e 0xe29da1 - 0xe29db5 PUNCT 0xe29e94 0xe29e98 - 0xe29eaf 0xe29eb1 - 0xe29ebe PRINT 0xe29c81 - 0xe29c84 0xe29c86 - 0xe29c89 0xe29c8c - 0xe29ca7 0xe29ca9 - 0xe29d8b PRINT 0xe29d8d 0xe29d8f - 0xe29d92 0xe29d96 0xe29d98 - 0xe29d9e 0xe29da1 - 0xe29e94 PRINT 0xe29e98 - 0xe29eaf 0xe29eb1 - 0xe29ebe SPECIAL 0xe29db6 - 0xe29e93 SWIDTH1 0xe29c81 - 0xe29c84 0xe29c86 - 0xe29c89 0xe29c8c - 0xe29ca7 0xe29ca9 - 0xe29d8b SWIDTH1 0xe29d8d 0xe29d8f - 0xe29d92 0xe29d96 0xe29d98 - 0xe29d9e 0xe29da1 - 0xe29e94 SWIDTH1 0xe29e98 - 0xe29eaf 0xe29eb1 - 0xe29ebe /* * U+27C0 - U+27EF : Miscellaneous Mathematical Symbols-A */ GRAPH 0xe29f90 - 0xe29fab PUNCT 0xe29f90 - 0xe29fab PRINT 0xe29f90 - 0xe29fab SWIDTH1 0xe29f90 - 0xe29fab /* * U+27F0 - U+27FF : Supplemental Arrows-A */ GRAPH 0xe29fb0 - 0xe29fbf PUNCT 0xe29fb0 - 0xe29fbf PRINT 0xe29fb0 - 0xe29fbf SWIDTH1 0xe29fb0 - 0xe29fbf /* * U+2800 - U+28FF : Braille Patterns */ GRAPH 0xe2a080 - 0xe2a3bf PUNCT 0xe2a080 - 0xe2a3bf PRINT 0xe2a080 - 0xe2a3bf SWIDTH1 0xe2a080 - 0xe2a3bf /* * U+2900 - U+297F : Supplemental Arrows-B */ GRAPH 0xe2a480 - 0xe2a5bf PUNCT 0xe2a480 - 0xe2a5bf PRINT 0xe2a480 - 0xe2a5bf SWIDTH1 0xe2a480 - 0xe2a5bf /* * U+2980 - U+29FF : Miscellaneous Mathematical Symbols-B */ GRAPH 0xe2a680 - 0xe2a7bf PUNCT 0xe2a680 - 0xe2a7bf PRINT 0xe2a680 - 0xe2a7bf SWIDTH1 0xe2a680 - 0xe2a7bf /* * U+2A00 - U+2AFF : Supplemental Mathematical Operators */ GRAPH 0xe2a880 - 0xe2abbf PUNCT 0xe2a880 - 0xe2abbf PRINT 0xe2a880 - 0xe2abbf SWIDTH1 0xe2a880 - 0xe2abbf /* * U+2E80 - U+2EFF : CJK Radicals Supplement */ GRAPH 0xe2ba80 - 0xe2ba99 0xe2ba9b - 0xe2bbb3 PUNCT 0xe2ba80 - 0xe2ba99 0xe2ba9b - 0xe2bbb3 PRINT 0xe2ba80 - 0xe2ba99 0xe2ba9b - 0xe2bbb3 SWIDTH2 0xe2ba80 - 0xe2ba99 0xe2ba9b - 0xe2bbb3 /* * U+2F00 - U+2FDF : Kangxi Radicals */ GRAPH 0xe2bc80 - 0xe2bf95 PUNCT 0xe2bc80 - 0xe2bf95 PRINT 0xe2bc80 - 0xe2bf95 SWIDTH2 0xe2bc80 - 0xe2bf95 /* * U+2FF0 - U+2FFF : Ideographic Description Characters */ GRAPH 0xe2bfb0 - 0xe2bfbb PUNCT 0xe2bfb0 - 0xe2bfbb PRINT 0xe2bfb0 - 0xe2bfbb SWIDTH2 0xe2bfb0 - 0xe2bfbb /* * U+3000 - U+303F : CJK Symbols and Punctuation */ GRAPH 0xe38081 - 0xe380bf PUNCT 0xe38081 - 0xe38084 0xe38088 - 0xe380a0 0xe380b0 0xe380b6 0xe380b7 PUNCT 0xe380bd - 0xe380bf SPACE 0xe38080 BLANK 0xe38080 PRINT 0xe38080 - 0xe380bf IDEOGRAM 0xe38086 SPECIAL 0xe38087 0xe380a1 - 0xe380a9 0xe380b8 - 0xe380ba SWIDTH1 0xe380bf SWIDTH2 0xe38080 - 0xe380be /* * U+3040 - U+309F : Hiragana */ GRAPH 0xe38181 - 0xe38296 0xe38299 - 0xe3829f PUNCT 0xe3829b 0xe3829c PRINT 0xe38181 - 0xe38296 0xe38299 - 0xe3829f PHONOGRAM 0xe38181 - 0xe38296 0xe3829f SWIDTH0 0xe38299 - 0xe3829a SWIDTH2 0xe38181 - 0xe38296 0xe3829b - 0xe3829f /* * U+30A0 - U+30FF : Katakana */ GRAPH 0xe382a0 - 0xe383bf PUNCT 0xe382a0 0xe383bb PRINT 0xe382a0 - 0xe383bf PHONOGRAM 0xe382a1 - 0xe383ba 0xe383bf SWIDTH2 0xe382a0 - 0xe383bf /* * U+3100 - U+312F : Bopomofo */ GRAPH 0xe38485 - 0xe384ac PRINT 0xe38485 - 0xe384ac SWIDTH2 0xe38485 - 0xe384ac /* * U+3130 - U+318F : Hangul Compatibility Jamo */ GRAPH 0xe384b1 - 0xe3868e PRINT 0xe384b1 - 0xe3868e PHONOGRAM 0xe384b1 - 0xe385a3 0xe385a5 - 0xe3868e SWIDTH2 0xe384b1 - 0xe3868e /* * U+3190 - U+319F : Kanbun */ GRAPH 0xe38690 - 0xe3869f PUNCT 0xe38690 0xe38691 0xe38696 - 0xe3869f PRINT 0xe38690 - 0xe3869f SPECIAL 0xe38692 - 0xe38695 SWIDTH2 0xe38690 - 0xe3869f /* * U+31A0 - U+31BF : Bopomofo Extended */ GRAPH 0xe386a0 - 0xe386b7 PRINT 0xe386a0 - 0xe386b7 SWIDTH2 0xe386a0 - 0xe386b7 /* * U+31F0 - U+31FF : Katakana Phonetic Extensions */ GRAPH 0xe387b0 - 0xe387bf PRINT 0xe387b0 - 0xe387bf PHONOGRAM 0xe387b0 - 0xe387bf SWIDTH2 0xe387b0 - 0xe387bf /* * U+3200 - U+32FF : Enclosed CJK Letters and Months */ GRAPH 0xe38880 - 0xe3889c 0xe388a0 - 0xe38983 0xe38991 - 0xe389bb 0xe389bf - 0xe38b8b GRAPH 0xe38b90 - 0xe38bbe PUNCT 0xe38880 - 0xe3889c 0xe388aa - 0xe38983 0xe389a0 - 0xe389bb 0xe389bf PUNCT 0xe38a8a - 0xe38ab0 0xe38b80 - 0xe38b8b 0xe38b90 - 0xe38bbe PRINT 0xe38880 - 0xe3889c 0xe388a0 - 0xe38983 0xe38991 - 0xe389bb 0xe389bf - 0xe38b8b PRINT 0xe38b90 - 0xe38bbe SPECIAL 0xe388a0 - 0xe388a9 0xe38991 - 0xe3899f 0xe38a80 - 0xe38a89 0xe38ab1 - 0xe38abf SWIDTH2 0xe38880 - 0xe3889c 0xe388a0 - 0xe38983 0xe38991 - 0xe389bb 0xe389bf - 0xe38b8b SWIDTH2 0xe38b90 - 0xe38bbe /* * U+3300 - U+33FF : CJK Compatibility */ GRAPH 0xe38c80 - 0xe38db6 0xe38dbb - 0xe38f9d 0xe38fa0 - 0xe38fbe PUNCT 0xe38c80 - 0xe38db6 0xe38dbb - 0xe38f9d 0xe38fa0 - 0xe38fbe PRINT 0xe38c80 - 0xe38db6 0xe38dbb - 0xe38f9d 0xe38fa0 - 0xe38fbe SWIDTH2 0xe38c80 - 0xe38db6 0xe38dbb - 0xe38f9d 0xe38fa0 - 0xe38fbe /* * U+3400 - U+4DBF : CJK Unified Ideographs Extension A */ GRAPH 0xe39080 - 0xe4b6b5 PRINT 0xe39080 - 0xe4b6b5 IDEOGRAM 0xe39080 - 0xe4b6b5 SWIDTH2 0xe39080 - 0xe4b6b5 /* * U+4E00 - U+9FFF : CJK Unified Ideographs */ GRAPH 0xe4b880 - 0xe9bea5 PRINT 0xe4b880 - 0xe9bea5 IDEOGRAM 0xe4b880 - 0xe9bea5 SWIDTH2 0xe4b880 - 0xe9bea5 /* * U+A000 - U+A48F : Yi Syllables */ GRAPH 0xea8080 - 0xea928c PRINT 0xea8080 - 0xea928c PHONOGRAM 0xea8080 - 0xea928c SWIDTH2 0xea8080 - 0xea928c /* * U+A490 - U+A4CF : Yi Radicals */ GRAPH 0xea9290 - 0xea9386 PUNCT 0xea9290 - 0xea9386 PRINT 0xea9290 - 0xea9386 SWIDTH2 0xea9290 - 0xea9386 /* * U+AC00 - U+D7AF : Hangul Syllables */ GRAPH 0xeab080 - 0xed9ea3 PRINT 0xeab080 - 0xed9ea3 PHONOGRAM 0xeab080 - 0xed9ea3 SWIDTH2 0xeab080 - 0xed9ea3 /* * U+D800 - U+DB7F : High Surrogates */ PRINT 0xeda080 - 0xedadbf SWIDTH1 0xeda080 - 0xedadbf /* * U+DB80 - U+DBFF : High Private Use Surrogates */ PRINT 0xedae80 - 0xedafbf SWIDTH1 0xedae80 - 0xedafbf /* * U+DC00 - U+DFFF : Low Surrogates */ PRINT 0xedb080 - 0xedbfbf SWIDTH1 0xedb080 - 0xedbfbf /* * U+E000 - U+F8FF : Private Use Area */ GRAPH 0xee8080 - 0xefa3bf PRINT 0xee8080 - 0xefa3bf SWIDTH1 0xee8080 - 0xefa3bf /* * U+F900 - U+FAFF : CJK Compatibility Ideographs */ GRAPH 0xefa480 - 0xefa8ad 0xefa8b0 - 0xefa9aa PRINT 0xefa480 - 0xefa8ad 0xefa8b0 - 0xefa9aa IDEOGRAM 0xefa480 - 0xefa8ad 0xefa8b0 - 0xefa9aa SWIDTH2 0xefa480 - 0xefa8ad 0xefa8b0 - 0xefa9aa /* * U+FB00 - U+FB4F : Alphabetic Presentation Forms */ ALPHA 0xefac80 - 0xefac86 0xefac93 - 0xefac97 GRAPH 0xefac80 - 0xefac86 0xefac93 - 0xefac97 0xefac9d - 0xefacb6 0xefacb8 - 0xefacbc GRAPH 0xefacbe 0xefad80 0xefad81 0xefad83 0xefad84 0xefad86 - 0xefad8f LOWER 0xefac80 - 0xefac86 0xefac93 - 0xefac97 PUNCT 0xefaca9 PRINT 0xefac80 - 0xefac86 0xefac93 - 0xefac97 0xefac9d - 0xefacb6 0xefacb8 - 0xefacbc PRINT 0xefacbe 0xefad80 0xefad81 0xefad83 0xefad84 0xefad86 - 0xefad8f SWIDTH1 0xefac80 - 0xefac86 0xefac93 - 0xefac97 0xefac9d - 0xefacb6 0xefacb8 - 0xefacbc SWIDTH1 0xefacbe 0xefad80 0xefad81 0xefad83 0xefad84 0xefad86 - 0xefad8f /* * U+FB50 - U+FDFF : Arabic Presentation Forms-A */ GRAPH 0xefad90 - 0xefaeb1 0xefaf93 - 0xefb4bf 0xefb590 - 0xefb68f 0xefb692 - 0xefb787 GRAPH 0xefb7b0 - 0xefb7bc PUNCT 0xefb4be 0xefb4bf 0xefb7bc PRINT 0xefad90 - 0xefaeb1 0xefaf93 - 0xefb4bf 0xefb590 - 0xefb68f 0xefb692 - 0xefb787 PRINT 0xefb7b0 - 0xefb7bc SWIDTH1 0xefad90 - 0xefaeb1 0xefaf93 - 0xefb4bf 0xefb590 - 0xefb68f 0xefb692 - 0xefb787 SWIDTH1 0xefb7b0 - 0xefb7bc /* * U+FE00 - U+FE0F : Variation Selectors */ GRAPH 0xefb880 - 0xefb88f PRINT 0xefb880 - 0xefb88f SWIDTH1 0xefb880 - 0xefb88f /* * U+FE20 - U+FE2F : Combining Half Marks */ GRAPH 0xefb8a0 - 0xefb8a3 PRINT 0xefb8a0 - 0xefb8a3 SWIDTH0 0xefb8a0 - 0xefb8a3 /* * U+FE30 - U+FE4F : CJK Compatibility Forms */ GRAPH 0xefb8b0 - 0xefb986 0xefb989 - 0xefb98f PUNCT 0xefb8b0 - 0xefb986 0xefb989 - 0xefb98f PRINT 0xefb8b0 - 0xefb986 0xefb989 - 0xefb98f SWIDTH2 0xefb8b0 - 0xefb986 0xefb989 - 0xefb98f /* * U+FE50 - U+FE6F : Small Form Variants */ GRAPH 0xefb990 - 0xefb992 0xefb994 - 0xefb9a6 0xefb9a8 - 0xefb9ab PUNCT 0xefb990 - 0xefb992 0xefb994 - 0xefb9a6 0xefb9a8 - 0xefb9ab PRINT 0xefb990 - 0xefb992 0xefb994 - 0xefb9a6 0xefb9a8 - 0xefb9ab SWIDTH2 0xefb990 - 0xefb992 0xefb994 - 0xefb9a6 0xefb9a8 - 0xefb9ab /* * U+FE70 - U+FEFF : Arabic Presentation Forms-B */ CONTROL 0xefbbbf GRAPH 0xefb9b0 - 0xefb9b4 0xefb9b6 - 0xefbbbc PRINT 0xefb9b0 - 0xefb9b4 0xefb9b6 - 0xefbbbc SWIDTH1 0xefb9b0 - 0xefb9b4 0xefb9b6 - 0xefbbbc /* * U+FF00 - U+FFEF : Halfwidth and Fullwidth Forms */ ALPHA 0xefbca1 - 0xefbcba 0xefbd81 - 0xefbd9a GRAPH 0xefbc81 - 0xefbebe 0xefbf82 - 0xefbf87 0xefbf8a - 0xefbf8f 0xefbf92 - 0xefbf97 GRAPH 0xefbf9a - 0xefbf9c 0xefbfa0 - 0xefbfa6 0xefbfa8 - 0xefbfae LOWER 0xefbd81 - 0xefbd9a PUNCT 0xefbc81 - 0xefbc8f 0xefbc9a - 0xefbca0 0xefbcbb - 0xefbd80 0xefbd9b - 0xefbda5 PUNCT 0xefbfa0 - 0xefbfa6 0xefbfa8 - 0xefbfae UPPER 0xefbca1 - 0xefbcba PRINT 0xefbc81 - 0xefbebe 0xefbf82 - 0xefbf87 0xefbf8a - 0xefbf8f 0xefbf92 - 0xefbf97 PRINT 0xefbf9a - 0xefbf9c 0xefbfa0 - 0xefbfa6 0xefbfa8 - 0xefbfae PHONOGRAM 0xefbda6 - 0xefbdaf 0xefbdb1 - 0xefbe9d 0xefbea1 - 0xefbebe 0xefbf82 - 0xefbf87 PHONOGRAM 0xefbf8a - 0xefbf8f 0xefbf92 - 0xefbf97 0xefbf9a - 0xefbf9c SWIDTH1 0xefbda1 - 0xefbebe 0xefbf82 - 0xefbf87 0xefbf8a - 0xefbf8f 0xefbf92 - 0xefbf97 SWIDTH1 0xefbf9a - 0xefbf9c 0xefbfa8 - 0xefbfae SWIDTH2 0xefbc81 - 0xefbda0 0xefbfa0 - 0xefbfa6 MAPUPPER < 0xefbd81 - 0xefbd9a : 0xefbca1 > MAPLOWER < 0xefbca1 - 0xefbcba : 0xefbd81 > /* * U+FFF0 - U+FFFF : Specials */ CONTROL 0xefbfb9 - 0xefbfbb GRAPH 0xefbfbc 0xefbfbd PUNCT 0xefbfbc 0xefbfbd PRINT 0xefbfbc 0xefbfbd SWIDTH1 0xefbfbc 0xefbfbd /* * U+10300 - U+1032F : Old Italic */ GRAPH 0xf0908c80 - 0xf0908c9e 0xf0908ca0 - 0xf0908ca3 PRINT 0xf0908c80 - 0xf0908c9e 0xf0908ca0 - 0xf0908ca3 SPECIAL 0xf0908ca0 - 0xf0908ca3 SWIDTH1 0xf0908c80 - 0xf0908c9e 0xf0908ca0 - 0xf0908ca3 /* * U+10330 - U+1034F : Gothic */ GRAPH 0xf0908cb0 - 0xf0908d8a PRINT 0xf0908cb0 - 0xf0908d8a SPECIAL 0xf0908d8a SWIDTH1 0xf0908cb0 - 0xf0908d8a /* * U+10400 - U+1044F : Deseret */ ALPHA 0xf0909080 - 0xf09090a5 0xf09090a8 - 0xf090918d GRAPH 0xf0909080 - 0xf09090a5 0xf09090a8 - 0xf090918d LOWER 0xf09090a8 - 0xf090918d UPPER 0xf0909080 - 0xf09090a5 PRINT 0xf0909080 - 0xf09090a5 0xf09090a8 - 0xf090918d SWIDTH1 0xf0909080 - 0xf09090a5 0xf09090a8 - 0xf090918d MAPUPPER < 0xf09090a8 - 0xf090918d : 0xf0909080 > MAPLOWER < 0xf0909080 - 0xf09090a5 : 0xf09090a8 > /* * U+1D000 - U+1D0FF : Byzantine Musical Symbols */ GRAPH 0xf09d8080 - 0xf09d83b5 PUNCT 0xf09d8080 - 0xf09d83b5 PRINT 0xf09d8080 - 0xf09d83b5 SWIDTH1 0xf09d8080 - 0xf09d83b5 /* * U+1D100 - U+1D1FF : Musical Symbols */ CONTROL 0xf09d85b3 - 0xf09d85ba GRAPH 0xf09d8480 - 0xf09d84a6 0xf09d84aa - 0xf09d85b2 0xf09d85bb - 0xf09d879d PUNCT 0xf09d8480 - 0xf09d84a6 0xf09d84aa - 0xf09d85a4 0xf09d85aa - 0xf09d85ac PUNCT 0xf09d8683 0xf09d8684 0xf09d868c - 0xf09d86a9 0xf09d86ae - 0xf09d879d PRINT 0xf09d8480 - 0xf09d84a6 0xf09d84aa - 0xf09d8598 0xf09d859a - 0xf09d85b2 PRINT 0xf09d85bb - 0xf09d879d SWIDTH0 0xf09d85a5 - 0xf09d85a9 0xf09d85ad - 0xf09d85b2 0xf09d85bb - 0xf09d8682 SWIDTH0 0xf09d8685 - 0xf09d868b 0xf09d86aa - 0xf09d86ad SWIDTH1 0xf09d8480 - 0xf09d84a6 0xf09d84aa - 0xf09d8598 0xf09d859a - 0xf09d85a4 SWIDTH1 0xf09d85aa - 0xf09d85ac 0xf09d8683 0xf09d8684 0xf09d868c - 0xf09d86a9 SWIDTH1 0xf09d86ae - 0xf09d879d /* * U+1D400 - U+1D7FF : Mathematical Alphanumeric Symbols */ ALPHA 0xf09d9080 - 0xf09d9194 0xf09d9196 - 0xf09d929c 0xf09d929e 0xf09d929f ALPHA 0xf09d92a2 0xf09d92a5 0xf09d92a6 0xf09d92a9 - 0xf09d92ac 0xf09d92ae - 0xf09d92b9 ALPHA 0xf09d92bb 0xf09d92bd - 0xf09d9380 0xf09d9382 0xf09d9383 0xf09d9385 - 0xf09d9485 ALPHA 0xf09d9487 - 0xf09d948a 0xf09d948d - 0xf09d9494 0xf09d9496 - 0xf09d949c ALPHA 0xf09d949e - 0xf09d94b9 0xf09d94bb - 0xf09d94be 0xf09d9580 - 0xf09d9584 ALPHA 0xf09d9586 0xf09d958a - 0xf09d9590 0xf09d9592 - 0xf09d9aa3 0xf09d9aa8 - 0xf09d9b80 ALPHA 0xf09d9b82 - 0xf09d9b9a 0xf09d9b9c - 0xf09d9bba 0xf09d9bbc - 0xf09d9c94 ALPHA 0xf09d9c96 - 0xf09d9cb4 0xf09d9cb6 - 0xf09d9d8e 0xf09d9d90 - 0xf09d9dae ALPHA 0xf09d9db0 - 0xf09d9e88 0xf09d9e8a - 0xf09d9ea8 0xf09d9eaa - 0xf09d9f82 ALPHA 0xf09d9f84 - 0xf09d9f89 GRAPH 0xf09d9080 - 0xf09d9194 0xf09d9196 - 0xf09d929c 0xf09d929e 0xf09d929f GRAPH 0xf09d92a2 0xf09d92a5 0xf09d92a6 0xf09d92a9 - 0xf09d92ac 0xf09d92ae - 0xf09d92b9 GRAPH 0xf09d92bb 0xf09d92bd - 0xf09d9380 0xf09d9382 0xf09d9383 0xf09d9385 - 0xf09d9485 GRAPH 0xf09d9487 - 0xf09d948a 0xf09d948d - 0xf09d9494 0xf09d9496 - 0xf09d949c GRAPH 0xf09d949e - 0xf09d94b9 0xf09d94bb - 0xf09d94be 0xf09d9580 - 0xf09d9584 GRAPH 0xf09d9586 0xf09d958a - 0xf09d9590 0xf09d9592 - 0xf09d9aa3 0xf09d9aa8 - 0xf09d9f89 GRAPH 0xf09d9f8e - 0xf09d9fbf LOWER 0xf09d909a - 0xf09d90b3 0xf09d918e - 0xf09d9194 0xf09d9196 - 0xf09d91a7 LOWER 0xf09d9282 - 0xf09d929b 0xf09d92b6 - 0xf09d92b9 0xf09d92bb 0xf09d92bd - 0xf09d9380 LOWER 0xf09d9382 0xf09d9383 0xf09d9385 - 0xf09d938f 0xf09d93aa - 0xf09d9483 LOWER 0xf09d949e - 0xf09d94b7 0xf09d9592 - 0xf09d95ab 0xf09d9686 - 0xf09d969f LOWER 0xf09d96ba - 0xf09d9793 0xf09d97ae - 0xf09d9887 0xf09d98a2 - 0xf09d98bb LOWER 0xf09d9996 - 0xf09d99af 0xf09d9a8a - 0xf09d9aa3 0xf09d9b82 - 0xf09d9b9a LOWER 0xf09d9b9c - 0xf09d9ba1 0xf09d9bbc - 0xf09d9c94 0xf09d9c96 - 0xf09d9c9b LOWER 0xf09d9cb6 - 0xf09d9d8e 0xf09d9d90 - 0xf09d9d95 0xf09d9db0 - 0xf09d9e88 LOWER 0xf09d9e8a - 0xf09d9e8f 0xf09d9eaa - 0xf09d9f82 0xf09d9f84 - 0xf09d9f89 PUNCT 0xf09d9b81 0xf09d9b9b 0xf09d9bbb 0xf09d9c95 0xf09d9cb5 0xf09d9d8f PUNCT 0xf09d9daf 0xf09d9e89 0xf09d9ea9 0xf09d9f83 UPPER 0xf09d9080 - 0xf09d9099 0xf09d90b4 - 0xf09d918d 0xf09d91a8 - 0xf09d9281 UPPER 0xf09d929c 0xf09d929e 0xf09d929f 0xf09d92a2 0xf09d92a5 0xf09d92a6 UPPER 0xf09d92a9 - 0xf09d92ac 0xf09d92ae - 0xf09d92b5 0xf09d9390 - 0xf09d93a9 UPPER 0xf09d9484 0xf09d9485 0xf09d9487 - 0xf09d948a 0xf09d948d - 0xf09d9494 UPPER 0xf09d9496 - 0xf09d949c 0xf09d94b8 0xf09d94b9 0xf09d94bb - 0xf09d94be UPPER 0xf09d9580 - 0xf09d9584 0xf09d9586 0xf09d958a - 0xf09d9590 0xf09d95ac - 0xf09d9685 UPPER 0xf09d96a0 - 0xf09d96b9 0xf09d9794 - 0xf09d97ad 0xf09d9888 - 0xf09d98a1 UPPER 0xf09d98bc - 0xf09d9995 0xf09d99b0 - 0xf09d9a89 0xf09d9aa8 - 0xf09d9b80 UPPER 0xf09d9ba2 - 0xf09d9bba 0xf09d9c9c - 0xf09d9cb4 0xf09d9d96 - 0xf09d9dae UPPER 0xf09d9e90 - 0xf09d9ea8 PRINT 0xf09d9080 - 0xf09d9194 0xf09d9196 - 0xf09d929c 0xf09d929e 0xf09d929f PRINT 0xf09d92a2 0xf09d92a5 0xf09d92a6 0xf09d92a9 - 0xf09d92ac 0xf09d92ae - 0xf09d92b9 PRINT 0xf09d92bb 0xf09d92bd - 0xf09d9380 0xf09d9382 0xf09d9383 0xf09d9385 - 0xf09d9485 PRINT 0xf09d9487 - 0xf09d948a 0xf09d948d - 0xf09d9494 0xf09d9496 - 0xf09d949c PRINT 0xf09d949e - 0xf09d94b9 0xf09d94bb - 0xf09d94be 0xf09d9580 - 0xf09d9584 PRINT 0xf09d9586 0xf09d958a - 0xf09d9590 0xf09d9592 - 0xf09d9aa3 0xf09d9aa8 - 0xf09d9f89 PRINT 0xf09d9f8e - 0xf09d9fbf SWIDTH1 0xf09d9080 - 0xf09d9194 0xf09d9196 - 0xf09d929c 0xf09d929e 0xf09d929f SWIDTH1 0xf09d92a2 0xf09d92a5 0xf09d92a6 0xf09d92a9 - 0xf09d92ac 0xf09d92ae - 0xf09d92b9 SWIDTH1 0xf09d92bb 0xf09d92bd - 0xf09d9380 0xf09d9382 0xf09d9383 0xf09d9385 - 0xf09d9485 SWIDTH1 0xf09d9487 - 0xf09d948a 0xf09d948d - 0xf09d9494 0xf09d9496 - 0xf09d949c SWIDTH1 0xf09d949e - 0xf09d94b9 0xf09d94bb - 0xf09d94be 0xf09d9580 - 0xf09d9584 SWIDTH1 0xf09d9586 0xf09d958a - 0xf09d9590 0xf09d9592 - 0xf09d9aa3 0xf09d9aa8 - 0xf09d9f89 SWIDTH1 0xf09d9f8e - 0xf09d9fbf /* * U+20000 - U+2A6DF : CJK Unified Ideographs Extension B */ GRAPH 0xf0a08080 - 0xf0aa9b96 PRINT 0xf0a08080 - 0xf0aa9b96 IDEOGRAM 0xf0a08080 - 0xf0aa9b96 SWIDTH2 0xf0a08080 - 0xf0aa9b96 /* * U+2F800 - U+2FA1F : CJK Compatibility Ideographs Supplement */ GRAPH 0xf0afa080 - 0xf0afa89d PRINT 0xf0afa080 - 0xf0afa89d IDEOGRAM 0xf0afa080 - 0xf0afa89d SWIDTH2 0xf0afa080 - 0xf0afa89d /* * U+E0000 - U+E007F : Tags */ CONTROL 0xf3a08081 0xf3a080a0 - 0xf3a081bf /* * U+F0000 - U+FFFFF : Supplementary Private Use Area-A */ GRAPH 0xf3b08080 - 0xf3bfbfbd PRINT 0xf3b08080 - 0xf3bfbfbd SWIDTH1 0xf3b08080 - 0xf3bfbfbd /* * U+100000 - U+10FFFF : Supplementary Private Use Area-B */ GRAPH 0xf4808080 - 0xf48fbfbd PRINT 0xf4808080 - 0xf48fbfbd SWIDTH1 0xf4808080 - 0xf48fbfbd --ikeVEW9yuYc//A+q-- From owner-freebsd-i18n@FreeBSD.ORG Sun Sep 16 16:00:25 2007 Return-Path: Delivered-To: i18n@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6998716A418; Sun, 16 Sep 2007 16:00:25 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (nagual.pp.ru [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id E157613C46A; Sun, 16 Sep 2007 16:00:24 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.14.1/8.14.1) with ESMTP id l8GG0MQX008237; Sun, 16 Sep 2007 20:00:22 +0400 (MSD) (envelope-from ache@nagual.pp.ru) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nagual.pp.ru; s=default; t=1189958423; bh=2zWxRNxnr8ri28/b5wBEdy2oYTeLQrcnXh8raoZ RKZM=; l=566; h=Date:From:To:Subject:Message-ID:Mail-Followup-To: References:MIME-Version:Content-Type:Content-Disposition: In-Reply-To:User-Agent; b=fEuqlTnWGeeAeCg7VYEcLNxqB8goXnzdI+X81t+B 8NUtXHKKfRr6n/jjaa5RdRin1kXf3MAHpLmdd7Bae0X8vhjIPDQu04h1UXKaoSCsHo0 s8D6LDOHpa0HSTR893V0fPKgIBop13Kdb85bIVxTdGLnoxjDh1hlCupN0r4NBuCg= Received: (from ache@localhost) by nagual.pp.ru (8.14.1/8.14.1/Submit) id l8GG0Mtj008236; Sun, 16 Sep 2007 20:00:22 +0400 (MSD) (envelope-from ache) Date: Sun, 16 Sep 2007 20:00:22 +0400 From: Andrey Chernov To: freebsd-bugs@FreeBSD.ORG, jkoshy@FreeBSD.ORG, perky@FreeBSD.ORG, i18n@FreeBSD.ORG, petr.hroudny@gmail.com Message-ID: <20070916160022.GA8155@nagual.pp.ru> Mail-Followup-To: Andrey Chernov , freebsd-bugs@FreeBSD.ORG, jkoshy@freebsd.org, perky@FreeBSD.org, i18n@freebsd.org, petr.hroudny@gmail.com References: <200709160910.l8G9A6ts050905@freefall.freebsd.org> <20070916103357.GA1691@nagual.pp.ru> <20070916123508.GA4724@nagual.pp.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070916123508.GA4724@nagual.pp.ru> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: Subject: Re: gnu/116363: isspace broken for UTF-8 locales X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2007 16:00:25 -0000 On Sun, 16, 2007 at 04:35:08PM +0400, Chernov wrote: > On Sun, Sep 16, 2007 at 02:33:58PM +0400, Andrey Chernov wrote: > > On Sun, Sep 16, 2007 at 09:10:06AM +0000, Andrey Chernov wrote: > > > Can anybody write replacement for it? > > > > Here is replacement attached, autoconverted to UTF-8 with perl script, > > please check! > > Sorry, high range is not converted properly, here is revised version > attached. Don't even try, because all x-y ranges needs to be expanded and compressed back. In the next version... -- http://ache.pp.ru/ From owner-freebsd-i18n@FreeBSD.ORG Sun Sep 16 16:23:49 2007 Return-Path: Delivered-To: i18n@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 73B1A16A468; Sun, 16 Sep 2007 16:23:49 +0000 (UTC) (envelope-from perky@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 622DD13C491; Sun, 16 Sep 2007 16:23:49 +0000 (UTC) (envelope-from perky@FreeBSD.org) Received: from freefall.freebsd.org (perky@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l8GGNnXP074801; Sun, 16 Sep 2007 16:23:49 GMT (envelope-from perky@freefall.freebsd.org) Received: from localhost (localhost [[UNIX: localhost]]) by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l8GGNnKi074800; Sun, 16 Sep 2007 16:23:49 GMT (envelope-from perky) Date: Mon, 17 Sep 2007 01:22:14 +0900 From: Hye-Shik Chang To: Andrey Chernov , Petr Hroudny , freebsd-gnats-submit@FreeBSD.ORG, jkoshy@FreeBSD.ORG, i18n@FreeBSD.ORG Message-ID: <20070916162214.GA49139@FreeBSD.org> References: <200709150908.l8F981jj075109@www.freebsd.org> <20070916085432.GA8884@nagual.pp.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070916085432.GA8884@nagual.pp.ru> User-Agent: Mutt/1.4.2.3i X-Accept-Language: ko, en Cc: Subject: Re: gnu/116363: isspace broken for UTF-8 locales X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2007 16:23:49 -0000 On Sun, Sep 16, 2007 at 12:54:33PM +0400, Andrey Chernov wrote: > On Sat, Sep 15, 2007 at 09:08:01AM +0000, Petr Hroudny wrote: > > > > >Number: 116363 > > >Category: gnu > > >Synopsis: isspace broken for UTF-8 locales > > >Confidential: no > > >Severity: non-critical > > >Priority: medium > > >Responsible: freebsd-bugs > > >State: open > > >Quarter: > > >Keywords: > > >Date-Required: > > >Class: sw-bug > > >Submitter-Id: current-users > > >Arrival-Date: Sat Sep 15 09:10:02 GMT 2007 > > >Closed-Date: > > >Last-Modified: > > >Originator: Petr Hroudny > > >Release: 6-stable, 7-current > > >Organization: > > >Environment: > > >Description: > > In UTF-8 locales, isspace(0xA0) returns 1 which is wrong. > > > > In UTF-8, 0xA0 could only be the second or third byte of multibyte character, but never a space. > > > > As a consequence, operations like str.upper() and/or str.split() are broken, when > > UTF-8 character with 0xA0 byte is encountered. If you are saying about Python's str.split(), the problem is due to our libc bug (or feature) which is described many times before, and Python already includes a workaround for the problem. http://mail.python.org/pipermail/python-checkins/2004-August/042343.html > It seems that our UTF-8.src is completely wrong, it is just plain Unicode > and not UTF-8 which multibyte values should start from > C2-DF > E0-EF > F0-F4 > only (as stated in http://en.wikipedia.org/wiki/UTF-8 f.e.) > Can anybody write replacement for it? In fact, UTF-8.src defines values for not UTF-8 but Unicode codepoints. Using the Unicode codepoint as wchar_t's internal representation gives much benefit. I think we would be better to make isspace() and other ctypes functions aware of "encoding". IIRC, tjr@ provided the workaround as in the URL mentioned above and said that it would get a chance to be fixed in 6 or 7 on 2004. Hye-Shik From owner-freebsd-i18n@FreeBSD.ORG Sun Sep 16 16:34:10 2007 Return-Path: Delivered-To: i18n@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6905216A418; Sun, 16 Sep 2007 16:34:10 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (nagual.pp.ru [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id DA55C13C468; Sun, 16 Sep 2007 16:34:09 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.14.1/8.14.1) with ESMTP id l8GGY8G1011013; Sun, 16 Sep 2007 20:34:08 +0400 (MSD) (envelope-from ache@nagual.pp.ru) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nagual.pp.ru; s=default; t=1189960448; bh=ImWgPRUJTJpnVZOQ49R9FOgDAUmOOe1wLavhTMK LCzI=; l=974; h=Date:From:To:Cc:Subject:Message-ID:Mail-Followup-To: References:MIME-Version:Content-Type:Content-Disposition: In-Reply-To:User-Agent; b=RmJb/mRRTU8MJhxtOTWjBTswk11e5F+QdoeQxOz5 R2yG7js3pCBww8lNJSiODq9h3bbsz2GDpcCefd79G59YvuHilEGCCyb4aX+Mf9k+7Uq cl4OUZMnf0S8+0yyhNjexInI4WWeXTW3F9s8pujrU1ZJg/y+IdokZbTIubcVMDJo= Received: (from ache@localhost) by nagual.pp.ru (8.14.1/8.14.1/Submit) id l8GGY8Mf011012; Sun, 16 Sep 2007 20:34:08 +0400 (MSD) (envelope-from ache) Date: Sun, 16 Sep 2007 20:34:07 +0400 From: Andrey Chernov To: Hye-Shik Chang Message-ID: <20070916163407.GA10297@nagual.pp.ru> Mail-Followup-To: Andrey Chernov , Hye-Shik Chang , Petr Hroudny , freebsd-gnats-submit@FreeBSD.org, jkoshy@FreeBSD.org, i18n@FreeBSD.org References: <200709150908.l8F981jj075109@www.freebsd.org> <20070916085432.GA8884@nagual.pp.ru> <20070916162214.GA49139@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070916162214.GA49139@FreeBSD.org> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: jkoshy@FreeBSD.org, freebsd-gnats-submit@FreeBSD.org, Petr Hroudny , i18n@FreeBSD.org Subject: Re: gnu/116363: isspace broken for UTF-8 locales X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2007 16:34:10 -0000 On Mon, Sep 17, 2007 at 01:22:14AM +0900, Hye-Shik Chang wrote: > In fact, UTF-8.src defines values for not UTF-8 but Unicode codepoints. > Using the Unicode codepoint as wchar_t's internal representation gives > much benefit. I think we would be better to make isspace() and > other ctypes functions aware of "encoding". IIRC, tjr@ provided the > workaround as in the URL mentioned above and said that it would get > a chance to be fixed in 6 or 7 on 2004. Currently wchar_t represents given encoding in all places including wc<->mbr conversions. To make it UCS-4-only instead we need to rewrite the whole locale system from scratch and I see no benefits from that way. There is no simple workaround exists. In any case there is no excuse to make really-UCS-4.src to mimic UTF-8.src. Providing proper UTF-8.src is much less painful way than whole locale rewritting and I almost half way on converting UCS-4 source to it. -- http://ache.pp.ru/ From owner-freebsd-i18n@FreeBSD.ORG Sun Sep 16 17:01:47 2007 Return-Path: Delivered-To: i18n@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7D9EB16A418; Sun, 16 Sep 2007 17:01:47 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (nagual.pp.ru [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id DCDA413C45A; Sun, 16 Sep 2007 17:01:46 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.14.1/8.14.1) with ESMTP id l8GH1jT0011283; Sun, 16 Sep 2007 21:01:45 +0400 (MSD) (envelope-from ache@nagual.pp.ru) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nagual.pp.ru; s=default; t=1189962105; bh=7J7E7YvcPyoBXdh2xk/E/UkqyAG2cbBbl+Ve8w+ kj7Y=; l=1472; h=Date:From:To:Subject:Message-ID:Mail-Followup-To: References:MIME-Version:Content-Type:Content-Disposition: In-Reply-To:User-Agent; b=CcRvto4h7Jutf9cVzf8hubqDY8F7zODXGsNJ+6Mg FEcqRD7yBB1nTE5DIFFx24aosCNoJbHMNcJjSywu46KmnCqwrgXAL2jURQAnTUmOY6W MDOazCleaeLITeL9XZWbKXCYf4X6Z7yKqeb/SCTNeZBcF8Tbf5XxJggSv2wiTxTs= Received: (from ache@localhost) by nagual.pp.ru (8.14.1/8.14.1/Submit) id l8GH1ind011282; Sun, 16 Sep 2007 21:01:44 +0400 (MSD) (envelope-from ache) Date: Sun, 16 Sep 2007 21:01:43 +0400 From: Andrey Chernov To: freebsd-bugs@FreeBSD.ORG, jkoshy@FreeBSD.ORG, perky@FreeBSD.ORG, i18n@FreeBSD.ORG, petr.hroudny@gmail.com Message-ID: <20070916170142.GA11047@nagual.pp.ru> Mail-Followup-To: Andrey Chernov , freebsd-bugs@FreeBSD.ORG, jkoshy@freebsd.org, perky@FreeBSD.org, i18n@freebsd.org, petr.hroudny@gmail.com References: <200709161640.l8GGe7iQ077745@freefall.freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200709161640.l8GGe7iQ077745@freefall.freebsd.org> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: Subject: Re: gnu/116363: isspace broken for UTF-8 locales X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2007 17:01:47 -0000 On Sun, Sep 16, 2007 at 04:40:07PM +0000, Andrey Chernov wrote: > The following reply was made to PR gnu/116363; it has been noted by GNATS. > > From: Andrey Chernov > To: Hye-Shik Chang > Cc: Petr Hroudny , freebsd-gnats-submit@FreeBSD.org, > jkoshy@FreeBSD.org, i18n@FreeBSD.org > Subject: Re: gnu/116363: isspace broken for UTF-8 locales > Date: Sun, 16 Sep 2007 20:34:07 +0400 > > On Mon, Sep 17, 2007 at 01:22:14AM +0900, Hye-Shik Chang wrote: > > In fact, UTF-8.src defines values for not UTF-8 but Unicode codepoints. > > Using the Unicode codepoint as wchar_t's internal representation gives > > much benefit. I think we would be better to make isspace() and > > other ctypes functions aware of "encoding". IIRC, tjr@ provided the > > workaround as in the URL mentioned above and said that it would get > > a chance to be fixed in 6 or 7 on 2004. > > Currently wchar_t represents given encoding in all places including > wc<->mbr conversions. To make it UCS-4-only instead we need to rewrite the Oops, sorry for my overlook, we really have UCS-4 as wchar_t, no UTF-8.src replacement is needed. In that case iswspace(0xA0) should be 1 but not isspace(0xA0) so it seems it is isspace() (and others plain ctype) bug. It seems even isspace(' ') is illegal in UTF-8 locale because all chars are wide, but I am not sure. -- http://ache.pp.ru/ From owner-freebsd-i18n@FreeBSD.ORG Sun Sep 16 19:29:27 2007 Return-Path: Delivered-To: i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7D43616A41B; Sun, 16 Sep 2007 19:29:27 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (nagual.pp.ru [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id AD9C013C481; Sun, 16 Sep 2007 19:29:26 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.14.1/8.14.1) with ESMTP id l8GJTPZa013006; Sun, 16 Sep 2007 23:29:25 +0400 (MSD) (envelope-from ache@nagual.pp.ru) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nagual.pp.ru; s=default; t=1189970965; bh=xuhy8++5NY6qoZt8MIxebItAWrZBnE7mFCv2Rka on1g=; l=10963; h=Date:From:To:Cc:Subject:Message-ID: Mail-Followup-To:MIME-Version:Content-Type:Content-Disposition: User-Agent; b=qaNBqyFWQDZ8EJq15uj0yVc1T3hZ3Z4QfqeySKzoJe/TqTeDy6J/ a8tzhTCfrGFiQqSP+fL3E/nYCuA1WJ4Cl8CXYQ3lk3M8FRtVJDLA21coK7m7h7Ervdc CqnOUlJTpt8JtWuXGATshJreMEoa6im//AghpjzW0jJb83ZzCzAs= Received: (from ache@localhost) by nagual.pp.ru (8.14.1/8.14.1/Submit) id l8GJTOF9013005; Sun, 16 Sep 2007 23:29:24 +0400 (MSD) (envelope-from ache) Date: Sun, 16 Sep 2007 23:29:24 +0400 From: Andrey Chernov To: current@freebsd.org, i18n@freebsd.org Message-ID: <20070916192924.GA12678@nagual.pp.ru> Mail-Followup-To: Andrey Chernov , current@freebsd.org, i18n@freebsd.org, perky@FreeBSD.org, petr.hroudny@gmail.com MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="fdj2RfSjLxBAspz7" Content-Disposition: inline User-Agent: Mutt/1.5.16 (2007-06-09) Cc: perky@freebsd.org, petr.hroudny@gmail.com Subject: Ctype patch for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2007 19:29:27 -0000 --fdj2RfSjLxBAspz7 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline The problem is: currently our single byte ctype functions are broken for wide characters locales in the argument range >= 0x80 - they may return false positives. For example, for UTF-8 locale we currently have: iswspace(0xA0)==1 and isspace(0xA0)==1 (because iswspace() and isspace() are the same code) but must have isspace(0xA0)==0 (because there is no such character and all others in the range 0x80..0xff for the wide locales, they keep ASCII only in the single byte range because our internal wchar_t representation is UCS-4). Attached patch address this issue and also fix iswascii() (currently iswascii() is broken for arguments > 0xFF). This patch is 100% binary compatible with old binaries, their (broken) behaviour is not changed. I want to hear some comments. -- http://ache.pp.ru/ --fdj2RfSjLxBAspz7 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="ctype.patch" --- _ctype.h.old 2007-09-16 21:13:59.000000000 +0400 +++ _ctype.h 2007-09-16 23:00:38.000000000 +0400 @@ -63,6 +63,7 @@ #define _CTYPE_I 0x00080000L /* Ideogram */ #define _CTYPE_T 0x00100000L /* Special */ #define _CTYPE_Q 0x00200000L /* Phonogram */ +#define _CTYPE_WID 0x10000000L /* wide character function */ #define _CTYPE_SW0 0x20000000L /* 0 width character */ #define _CTYPE_SW1 0x40000000L /* 1 width character */ #define _CTYPE_SW2 0x80000000L /* 2 width character */ @@ -87,6 +88,8 @@ #define __inline #endif +extern int __mb_cur_max; + /* * Use inline functions if we are allowed to and the compiler supports them. */ @@ -98,8 +101,11 @@ static __inline int __maskrune(__ct_rune_t _c, unsigned long _f) { - return ((_c < 0 || _c >= _CACHED_RUNES) ? ___runetype(_c) : + return __mb_cur_max > 1 && !(_f & _CTYPE_WID) && (_c >= 0x80) ? 0 : + ((_c < 0 || _c >= _CACHED_RUNES) ? ___runetype(_c) : _CurrentRuneLocale->__runetype[_c]) & _f; + /* We never set _CTYPE_WID in the locale data, */ + /* so can skip ... & (_f & ~_CTYPE_WID). */ } static __inline int @@ -111,8 +117,11 @@ static __inline int __isctype(__ct_rune_t _c, unsigned long _f) { - return (_c < 0 || _c >= _CACHED_RUNES) ? 0 : + return __mb_cur_max > 1 && !(_f & _CTYPE_WID) && (_c >= 0x80) ? 0 : + (_c < 0 || _c >= _CACHED_RUNES) ? 0 : !!(_DefaultRuneLocale.__runetype[_c] & _f); + /* We never set _CTYPE_WID in the locale data, */ + /* so can skip ... & (_f & ~_CTYPE_WID). */ } static __inline __ct_rune_t @@ -129,6 +138,22 @@ _CurrentRuneLocale->__maplower[_c]; } +static __inline __ct_rune_t +__tosupper(__ct_rune_t _c) +{ + return __mb_cur_max > 1 && (_c >= 0x80) ? _c : + (_c < 0 || _c >= _CACHED_RUNES) ? ___toupper(_c) : + _CurrentRuneLocale->__mapupper[_c]; +} + +static __inline __ct_rune_t +__toslower(__ct_rune_t _c) +{ + return __mb_cur_max > 1 && (_c >= 0x80) ? _c : + (_c < 0 || _c >= _CACHED_RUNES) ? ___tolower(_c) : + _CurrentRuneLocale->__maplower[_c]; +} + static __inline int __wcwidth(__ct_rune_t _c) { @@ -150,6 +175,8 @@ int __isctype(__ct_rune_t, unsigned long); __ct_rune_t __toupper(__ct_rune_t); __ct_rune_t __tolower(__ct_rune_t); +__ct_rune_t __tosupper(__ct_rune_t); +__ct_rune_t __toslower(__ct_rune_t); int __wcwidth(__ct_rune_t); __END_DECLS #endif /* using inlines */ --- ctype.h.old 2007-09-16 22:03:55.000000000 +0400 +++ ctype.h 2007-09-16 22:56:10.000000000 +0400 @@ -97,8 +97,8 @@ #define isspace(c) __istype((c), _CTYPE_S) #define isupper(c) __istype((c), _CTYPE_U) #define isxdigit(c) __isctype((c), _CTYPE_X) /* ANSI -- locale independent */ -#define tolower(c) __tolower(c) -#define toupper(c) __toupper(c) +#define tolower(c) __toslower(c) +#define toupper(c) __tosupper(c) #if __XSI_VISIBLE /* @@ -112,8 +112,8 @@ * * XXX isascii() and toascii() should similarly be undocumented. */ -#define _tolower(c) __tolower(c) -#define _toupper(c) __toupper(c) +#define _tolower(c) __toslower(c) +#define _toupper(c) __tosupper(c) #define isascii(c) (((c) & ~0x7F) == 0) #define toascii(c) ((c) & 0x7F) #endif @@ -128,7 +128,7 @@ #define isideogram(c) __istype((c), _CTYPE_I) #define isnumber(c) __istype((c), _CTYPE_D) #define isphonogram(c) __istype((c), _CTYPE_Q) -#define isrune(c) __istype((c), 0xFFFFFF00L) +#define isrune(c) __istype((c), 0xFFFFFF00L & ~_CTYPE_WID) #define isspecial(c) __istype((c), _CTYPE_T) #endif --- wctype.h.old 2007-09-16 21:59:37.000000000 +0400 +++ wctype.h 2007-09-16 22:56:44.000000000 +0400 @@ -89,30 +89,30 @@ #endif __END_DECLS -#define iswalnum(wc) __istype((wc), _CTYPE_A|_CTYPE_D) -#define iswalpha(wc) __istype((wc), _CTYPE_A) -#define iswblank(wc) __istype((wc), _CTYPE_B) -#define iswcntrl(wc) __istype((wc), _CTYPE_C) -#define iswctype(wc, charclass) __istype((wc), (charclass)) -#define iswdigit(wc) __isctype((wc), _CTYPE_D) -#define iswgraph(wc) __istype((wc), _CTYPE_G) -#define iswlower(wc) __istype((wc), _CTYPE_L) -#define iswprint(wc) __istype((wc), _CTYPE_R) -#define iswpunct(wc) __istype((wc), _CTYPE_P) -#define iswspace(wc) __istype((wc), _CTYPE_S) -#define iswupper(wc) __istype((wc), _CTYPE_U) -#define iswxdigit(wc) __isctype((wc), _CTYPE_X) +#define iswalnum(wc) __istype((wc), _CTYPE_A|_CTYPE_D|_CTYPE_WID) +#define iswalpha(wc) __istype((wc), _CTYPE_A|_CTYPE_WID) +#define iswblank(wc) __istype((wc), _CTYPE_B|_CTYPE_WID) +#define iswcntrl(wc) __istype((wc), _CTYPE_C|_CTYPE_WID) +#define iswctype(wc, charclass) __istype((wc), (charclass)|_CTYPE_WID) +#define iswdigit(wc) __isctype((wc), _CTYPE_D|_CTYPE_WID) +#define iswgraph(wc) __istype((wc), _CTYPE_G|_CTYPE_WID) +#define iswlower(wc) __istype((wc), _CTYPE_L|_CTYPE_WID) +#define iswprint(wc) __istype((wc), _CTYPE_R|_CTYPE_WID) +#define iswpunct(wc) __istype((wc), _CTYPE_P|_CTYPE_WID) +#define iswspace(wc) __istype((wc), _CTYPE_S|_CTYPE_WID) +#define iswupper(wc) __istype((wc), _CTYPE_U|_CTYPE_WID) +#define iswxdigit(wc) __isctype((wc), _CTYPE_X|_CTYPE_WID) #define towlower(wc) __tolower(wc) #define towupper(wc) __toupper(wc) #if __BSD_VISIBLE -#define iswascii(wc) (((wc) & ~0x7F) == 0) -#define iswhexnumber(wc) __istype((wc), _CTYPE_X) -#define iswideogram(wc) __istype((wc), _CTYPE_I) -#define iswnumber(wc) __istype((wc), _CTYPE_D) -#define iswphonogram(wc) __istype((wc), _CTYPE_Q) -#define iswrune(wc) __istype((wc), 0xFFFFFF00L) -#define iswspecial(wc) __istype((wc), _CTYPE_T) +#define iswascii(wc) ((wc) < 0x80) +#define iswhexnumber(wc) __istype((wc), _CTYPE_X|_CTYPE_WID) +#define iswideogram(wc) __istype((wc), _CTYPE_I|_CTYPE_WID) +#define iswnumber(wc) __istype((wc), _CTYPE_D|_CTYPE_WID) +#define iswphonogram(wc) __istype((wc), _CTYPE_Q|_CTYPE_WID) +#define iswrune(wc) __istype((wc), 0xFFFFFF00L) /* already have _CTYPE_WID */ +#define iswspecial(wc) __istype((wc), _CTYPE_T|_CTYPE_WID) #endif #endif /* _WCTYPE_H_ */ --- isctype.c.old 2007-09-16 22:31:26.000000000 +0400 +++ isctype.c 2007-09-16 22:37:54.000000000 +0400 @@ -168,7 +168,7 @@ isrune(c) int c; { - return (__istype(c, 0xFFFFFF00L)); + return (__istype(c, 0xFFFFFF00L & ~_CTYPE_WID)); } #undef isspace @@ -216,7 +216,7 @@ tolower(c) int c; { - return (__tolower(c)); + return (__toslower(c)); } #undef toupper @@ -224,6 +224,6 @@ toupper(c) int c; { - return (__toupper(c)); + return (__tosupper(c)); } --- iswctype.c.old 2007-09-16 22:31:30.000000000 +0400 +++ iswctype.c 2007-09-16 22:41:39.000000000 +0400 @@ -45,7 +45,7 @@ iswalnum(wc) wint_t wc; { - return (__istype(wc, _CTYPE_A|_CTYPE_D)); + return (__istype(wc, _CTYPE_A|_CTYPE_D|_CTYPE_WID)); } #undef iswalpha @@ -53,7 +53,7 @@ iswalpha(wc) wint_t wc; { - return (__istype(wc, _CTYPE_A)); + return (__istype(wc, _CTYPE_A|_CTYPE_WID))); } #undef iswascii @@ -61,7 +61,7 @@ iswascii(wc) wint_t wc; { - return ((wc & ~0x7F) == 0); + return (wc < 0x80); } #undef iswblank @@ -69,7 +69,7 @@ iswblank(wc) wint_t wc; { - return (__istype(wc, _CTYPE_B)); + return (__istype(wc, _CTYPE_B|_CTYPE_WID))); } #undef iswcntrl @@ -77,7 +77,7 @@ iswcntrl(wc) wint_t wc; { - return (__istype(wc, _CTYPE_C)); + return (__istype(wc, _CTYPE_C|_CTYPE_WID))); } #undef iswdigit @@ -85,7 +85,7 @@ iswdigit(wc) wint_t wc; { - return (__isctype(wc, _CTYPE_D)); + return (__isctype(wc, _CTYPE_D|_CTYPE_WID))); } #undef iswgraph @@ -93,7 +93,7 @@ iswgraph(wc) wint_t wc; { - return (__istype(wc, _CTYPE_G)); + return (__istype(wc, _CTYPE_G|_CTYPE_WID))); } #undef iswhexnumber @@ -101,7 +101,7 @@ iswhexnumber(wc) wint_t wc; { - return (__istype(wc, _CTYPE_X)); + return (__istype(wc, _CTYPE_X|_CTYPE_WID))); } #undef iswideogram @@ -109,7 +109,7 @@ iswideogram(wc) wint_t wc; { - return (__istype(wc, _CTYPE_I)); + return (__istype(wc, _CTYPE_I|_CTYPE_WID))); } #undef iswlower @@ -117,7 +117,7 @@ iswlower(wc) wint_t wc; { - return (__istype(wc, _CTYPE_L)); + return (__istype(wc, _CTYPE_L|_CTYPE_WID))); } #undef iswnumber @@ -125,7 +125,7 @@ iswnumber(wc) wint_t wc; { - return (__istype(wc, _CTYPE_D)); + return (__istype(wc, _CTYPE_D|_CTYPE_WID))); } #undef iswphonogram @@ -133,7 +133,7 @@ iswphonogram(wc) wint_t wc; { - return (__istype(wc, _CTYPE_Q)); + return (__istype(wc, _CTYPE_Q|_CTYPE_WID))); } #undef iswprint @@ -141,7 +141,7 @@ iswprint(wc) wint_t wc; { - return (__istype(wc, _CTYPE_R)); + return (__istype(wc, _CTYPE_R|_CTYPE_WID))); } #undef iswpunct @@ -149,7 +149,7 @@ iswpunct(wc) wint_t wc; { - return (__istype(wc, _CTYPE_P)); + return (__istype(wc, _CTYPE_P|_CTYPE_WID))); } #undef iswrune @@ -157,7 +157,7 @@ iswrune(wc) wint_t wc; { - return (__istype(wc, 0xFFFFFF00L)); + return (__istype(wc, 0xFFFFFF00L)); /* already have _CTYPE_WID */ } #undef iswspace @@ -165,7 +165,7 @@ iswspace(wc) wint_t wc; { - return (__istype(wc, _CTYPE_S)); + return (__istype(wc, _CTYPE_S|_CTYPE_WID))); } #undef iswspecial @@ -173,7 +173,7 @@ iswspecial(wc) wint_t wc; { - return (__istype(wc, _CTYPE_T)); + return (__istype(wc, _CTYPE_T|_CTYPE_WID))); } #undef iswupper @@ -181,7 +181,7 @@ iswupper(wc) wint_t wc; { - return (__istype(wc, _CTYPE_U)); + return (__istype(wc, _CTYPE_U|_CTYPE_WID))); } #undef iswxdigit @@ -189,7 +189,7 @@ iswxdigit(wc) wint_t wc; { - return (__isctype(wc, _CTYPE_X)); + return (__isctype(wc, _CTYPE_X|_CTYPE_WID))); } #undef towlower --fdj2RfSjLxBAspz7-- From owner-freebsd-i18n@FreeBSD.ORG Mon Sep 17 08:42:56 2007 Return-Path: Delivered-To: i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8028F16A418 for ; Mon, 17 Sep 2007 08:42:56 +0000 (UTC) (envelope-from petr.hroudny@gmail.com) Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.185]) by mx1.freebsd.org (Postfix) with ESMTP id 0F3C113C442 for ; Mon, 17 Sep 2007 08:42:55 +0000 (UTC) (envelope-from petr.hroudny@gmail.com) Received: by nf-out-0910.google.com with SMTP id b2so1070794nfb for ; Mon, 17 Sep 2007 01:42:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=Ldl9XpUS1tvbRWoMlzTvfd8v+dHIlA89KS7rRcU9hB4=; b=ZuCLrh6j2Tm7rSxKhPLZ/v9YvBuEMu/Q5SuzRv+29r7KKvTMlN4F257+Vik0EC6abUDcTVvc6usvlXHJQ1e43mQetBKuKg8rdLNs7Shc1/SJ79Z17XN2C6K6LCZ/8pnGtrct918JWRiCw7dgrtBBOOw5xiUgPhjFYx8LaELe0qE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=IS6j9Eno5GKAGXdmqZXX5Nbz8UVp6SWjWmW0cXHDoJQ5kHZInHBjT+D7xtXsP1k0r9yjQmxlal7aqXrYL5ncFXWRbsWiVjKRi9kBmzpZL7xrm4VvbHBxB4jDf2EX/SOtoEyWsN5h+hIYTCOTHRFFCWdGmHVBa/qNhi5NQCERSdo= Received: by 10.78.201.10 with SMTP id y10mr2440039huf.1190018152083; Mon, 17 Sep 2007 01:35:52 -0700 (PDT) Received: by 10.78.100.2 with HTTP; Mon, 17 Sep 2007 01:35:52 -0700 (PDT) Message-ID: Date: Mon, 17 Sep 2007 10:35:52 +0200 From: "=?UTF-8?Q?Petr_Hroudn=C3=BD?=" To: "Hye-Shik Chang" In-Reply-To: <20070916162214.GA49139@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <200709150908.l8F981jj075109@www.freebsd.org> <20070916085432.GA8884@nagual.pp.ru> <20070916162214.GA49139@FreeBSD.org> Cc: Andrey Chernov , freebsd-gnats-submit@freebsd.org, i18n@freebsd.org, jkoshy@freebsd.org Subject: Re: gnu/116363: isspace broken for UTF-8 locales X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2007 08:42:56 -0000 2007/9/16, Hye-Shik Chang : > If you are saying about Python's str.split(), the problem is due > to our libc bug (or feature) which is described many times before, > and Python already includes a workaround for the problem. > http://mail.python.org/pipermail/python-checkins/2004-August/042343.html I run into this problem when using mutt, which utilizes isspace to separate tokens in e.g. list of recipients. Then I've found the workaround for Python, saying this problem should be fixed in FreeBSD6 - but it's still present even in 7-current. I do believe it would be better to fix isspace() than introduce workarounds into every application. Regards, Petr From owner-freebsd-i18n@FreeBSD.ORG Mon Sep 17 08:56:39 2007 Return-Path: Delivered-To: i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 70E2E16A417 for ; Mon, 17 Sep 2007 08:56:39 +0000 (UTC) (envelope-from petr.hroudny@gmail.com) Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.189]) by mx1.freebsd.org (Postfix) with ESMTP id 0213B13C458 for ; Mon, 17 Sep 2007 08:56:38 +0000 (UTC) (envelope-from petr.hroudny@gmail.com) Received: by nf-out-0910.google.com with SMTP id b2so1073432nfb for ; Mon, 17 Sep 2007 01:56:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=DY9qk1mrQzFSVme1UmL3SYzjxsNz/ddmqKQVCmrMEss=; b=VJbCVJgXvJvBy9eIBcob8ESz+HuSlqaf+XKdUFaS+eALWYyF3Ma8/QubeRV2o3IenwY4RrWirA7SmjYUGRpaojRGsWvYUqukEpO4iHe54Iy6FARt2rbe+21PF1dP11yWUYI9PgkpYzf3LyEUEtPPbk1HE0lg8+9jkiXS66HSwkM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=p9iT8CMYN0OSiIE5lUo+LvmN+xjlrzAOuwCRvnf8bpmduuSUlgfvhE8FBmm4+8jZH38fdHQPY/7m+h+fasq8CU5kAdVWSFkcVU/Or88yt0S8dME9r67S2jZ/7TAKwyEUB6BTrh2f0nlWU24qd0yVKvGOgJU31/33YSIXVYyhv70= Received: by 10.78.186.9 with SMTP id j9mr2425142huf.1190017761621; Mon, 17 Sep 2007 01:29:21 -0700 (PDT) Received: by 10.78.100.2 with HTTP; Mon, 17 Sep 2007 01:29:21 -0700 (PDT) Message-ID: Date: Mon, 17 Sep 2007 10:29:21 +0200 From: "=?UTF-8?Q?Petr_Hroudn=C3=BD?=" To: "Andrey Chernov" , current@freebsd.org, i18n@freebsd.org, perky@freebsd.org, petr.hroudny@gmail.com In-Reply-To: <20070916192924.GA12678@nagual.pp.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20070916192924.GA12678@nagual.pp.ru> Cc: Subject: Re: Ctype patch for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2007 08:56:39 -0000 2007/9/16, Andrey Chernov : > The problem is: currently our single byte ctype functions are broken for > wide characters locales in the argument range >= 0x80 - they may return > false positives. > > For example, for UTF-8 locale we currently have: > iswspace(0xA0)==1 and isspace(0xA0)==1 > (because iswspace() and isspace() are the same code) > but must have > isspace(0xA0)==0 This is exactly what happens on other OSes and I agree this is the right behaviour for UTF-8. However, we must ensure, that: for C locale: isspace(0xA0)==0 for ISO8859-* locales: isspace(0xA0)==1 for UTF-8 locales: isspace(0xA0)==0 Regards, Petr. From owner-freebsd-i18n@FreeBSD.ORG Mon Sep 17 09:21:32 2007 Return-Path: Delivered-To: i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E5C0B16A468; Mon, 17 Sep 2007 09:21:32 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (nagual.pp.ru [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id 628B313C474; Mon, 17 Sep 2007 09:21:32 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.14.1/8.14.1) with ESMTP id l8H9LUcZ024575; Mon, 17 Sep 2007 13:21:30 +0400 (MSD) (envelope-from ache@nagual.pp.ru) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nagual.pp.ru; s=default; t=1190020890; bh=in9z/PHZlGEE4tG+WYA7NtgItI4Bi1zZQaoCIFt Dt7k=; l=909; h=Date:From:To:Cc:Subject:Message-ID:Mail-Followup-To: References:MIME-Version:Content-Type:Content-Disposition: In-Reply-To:User-Agent; b=uOQLGFj6FPzeQ1rQme8/xregDOupVjSb+ZuPLH71 oVDd54XmuzZ8A9cWWOFoZ0wkHAzdMAtSFwtjU0hTyyNtaNBRlfwo6jH5hmVjw4+7qxE +I1dNJ2C8TCEdtpGzWLXxVRMukoKafMBe7w8IsnU69wLnT0YZT7craYCnE5sSB+w= Received: (from ache@localhost) by nagual.pp.ru (8.14.1/8.14.1/Submit) id l8H9LU6d024574; Mon, 17 Sep 2007 13:21:30 +0400 (MSD) (envelope-from ache) Date: Mon, 17 Sep 2007 13:21:30 +0400 From: Andrey Chernov To: Petr Hroudn?? Message-ID: <20070917092130.GA24424@nagual.pp.ru> Mail-Followup-To: Andrey Chernov , Petr Hroudn?? , current@freebsd.org, i18n@freebsd.org, perky@freebsd.org References: <20070916192924.GA12678@nagual.pp.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.16 (2007-06-09) Cc: perky@freebsd.org, current@freebsd.org, i18n@freebsd.org Subject: Re: Ctype patch for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2007 09:21:33 -0000 On Mon, Sep 17, 2007 at 10:29:21AM +0200, Petr Hroudn?? wrote: > 2007/9/16, Andrey Chernov : > > The problem is: currently our single byte ctype functions are broken for > > wide characters locales in the argument range >= 0x80 - they may return > > false positives. > > > > For example, for UTF-8 locale we currently have: > > iswspace(0xA0)==1 and isspace(0xA0)==1 > > (because iswspace() and isspace() are the same code) > > but must have > > isspace(0xA0)==0 > > This is exactly what happens on other OSes and I agree this is the > right behaviour > for UTF-8. However, we must ensure, that: > > for C locale: isspace(0xA0)==0 > for ISO8859-* locales: isspace(0xA0)==1 > for UTF-8 locales: isspace(0xA0)==0 The patch test for wide char locale presence first (__mb_cur_max > 1), so does not affect single byte locales like ISO8859-* -- http://ache.pp.ru/ From owner-freebsd-i18n@FreeBSD.ORG Mon Sep 17 17:01:09 2007 Return-Path: Delivered-To: i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3331216A417; Mon, 17 Sep 2007 17:01:09 +0000 (UTC) (envelope-from taku@tackymt.homeip.net) Received: from basalt.tackymt.homeip.net (unknown [IPv6:2001:3e0:577:0:20d:61ff:fecc:2253]) by mx1.freebsd.org (Postfix) with ESMTP id DE1EA13C481; Mon, 17 Sep 2007 17:01:08 +0000 (UTC) (envelope-from taku@tackymt.homeip.net) Received: from localhost (localhost [127.0.0.1]) by basalt.tackymt.homeip.net (Postfix) with ESMTP id AFFCB10749; Tue, 18 Sep 2007 02:01:07 +0900 (JST) Received: from basalt.tackymt.homeip.net ([127.0.0.1]) by localhost (basalt.tackymt.homeip.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 24382-07; Tue, 18 Sep 2007 02:01:03 +0900 (JST) Received: from biotite (biotite.tackymt.homeip.net [IPv6:2001:3e0:577:0:216:cfff:febc:1472]) by basalt.tackymt.homeip.net (Postfix) with ESMTP; Tue, 18 Sep 2007 02:01:03 +0900 (JST) Date: Tue, 18 Sep 2007 02:01:00 +0900 From: "YAMAMOTO, Taku" To: Andrey Chernov Message-Id: <20070918020100.d43beb0b.taku@tackymt.homeip.net> In-Reply-To: <20070917092130.GA24424@nagual.pp.ru> References: <20070916192924.GA12678@nagual.pp.ru> <20070917092130.GA24424@nagual.pp.ru> Organization: Trans New Technology, Inc. X-Mailer: Sylpheed 2.4.4 (GTK+ 2.10.14; i386-portbld-freebsd7.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: amavisd-new at tackymt.homeip.net Cc: current@freebsd.org, i18n@freebsd.org, Petr Hroudn?? , perky@freebsd.org Subject: Re: Ctype patch for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2007 17:01:09 -0000 On Mon, 17 Sep 2007 13:21:30 +0400 Andrey Chernov wrote: > On Mon, Sep 17, 2007 at 10:29:21AM +0200, Petr Hroudn?? wrote: > > 2007/9/16, Andrey Chernov : > > > The problem is: currently our single byte ctype functions are broken for > > > wide characters locales in the argument range >= 0x80 - they may return > > > false positives. > > > > > > For example, for UTF-8 locale we currently have: > > > iswspace(0xA0)==1 and isspace(0xA0)==1 > > > (because iswspace() and isspace() are the same code) > > > but must have > > > isspace(0xA0)==0 > > > > This is exactly what happens on other OSes and I agree this is the > > right behaviour > > for UTF-8. However, we must ensure, that: > > > > for C locale: isspace(0xA0)==0 > > for ISO8859-* locales: isspace(0xA0)==1 > > for UTF-8 locales: isspace(0xA0)==0 > > The patch test for wide char locale presence first (__mb_cur_max > 1), so > does not affect single byte locales like ISO8859-* > Checking for __mb_cur_max is not enough for certain locales. For example, SJIS has following range for JIS X0201 (a.k.a. HALFWIDTH KANA). /* * JIS X201 */ PUNCT 0xa1-0xa5 SPACE 0xa0 BLANK 0xa0 SPECIAL 0xa1-0xdf PHONOGRAM 0xa6-0xdf SWIDTH1 0xa0-0xdf -- -|-__ YAMAMOTO, Taku | __ < - A chicken is an egg's way of producing more eggs. - From owner-freebsd-i18n@FreeBSD.ORG Mon Sep 17 17:16:37 2007 Return-Path: Delivered-To: i18n@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7EEC116A417; Mon, 17 Sep 2007 17:16:37 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (nagual.pp.ru [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id 0E43413C4A6; Mon, 17 Sep 2007 17:16:36 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.14.1/8.14.1) with ESMTP id l8HHGZXW031324; Mon, 17 Sep 2007 21:16:35 +0400 (MSD) (envelope-from ache@nagual.pp.ru) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nagual.pp.ru; s=default; t=1190049395; bh=iQfeje9qqF9lbolnz9WqglI0kxZFvHAVG/cL99q 34hg=; l=584; h=Date:From:To:Cc:Subject:Message-ID:Mail-Followup-To: References:MIME-Version:Content-Type:Content-Disposition: In-Reply-To:User-Agent; b=Oz4ClMR4KQGKp8VsvHt6oWON8XlC1myBw0ZDcdXA DCj8z7Fg1BYFZ6tmf2zbw9dYwaNxSXm6xhvm6VPQQKFeKziNfg9/XniR5Vfv1ysTYq2 Ti9snEedii45dytDbDA3VwuMNNvpl3sYmAx+Sh4luxGNBNcurgpV8fMAAnkbnk1Q= Received: (from ache@localhost) by nagual.pp.ru (8.14.1/8.14.1/Submit) id l8HHGXFn031323; Mon, 17 Sep 2007 21:16:34 +0400 (MSD) (envelope-from ache) Date: Mon, 17 Sep 2007 21:16:33 +0400 From: Andrey Chernov To: "YAMAMOTO, Taku" Message-ID: <20070917171633.GA31179@nagual.pp.ru> Mail-Followup-To: Andrey Chernov , "YAMAMOTO, Taku" , current@FreeBSD.ORG, i18n@FreeBSD.ORG, Petr Hroudn?? , perky@FreeBSD.ORG References: <20070916192924.GA12678@nagual.pp.ru> <20070917092130.GA24424@nagual.pp.ru> <20070918020100.d43beb0b.taku@tackymt.homeip.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070918020100.d43beb0b.taku@tackymt.homeip.net> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: Petr Hroudn?? , current@FreeBSD.ORG, perky@FreeBSD.ORG, i18n@FreeBSD.ORG Subject: Re: Ctype patch for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2007 17:16:37 -0000 On Tue, Sep 18, 2007 at 02:01:00AM +0900, YAMAMOTO, Taku wrote: > Checking for __mb_cur_max is not enough for certain locales. > For example, SJIS has following range for JIS X0201 (a.k.a. HALFWIDTH KANA). > > /* > * JIS X201 > */ > PUNCT 0xa1-0xa5 > SPACE 0xa0 > BLANK 0xa0 > SPECIAL 0xa1-0xdf > PHONOGRAM 0xa6-0xdf > SWIDTH1 0xa0-0xdf I don't understand your remark. MSKanji have __mb_cur_max = 2 and so those ranges are wchar_t ranges. My patch restrict unsigned char ranges only. -- http://ache.pp.ru/ From owner-freebsd-i18n@FreeBSD.ORG Wed Sep 19 02:12:10 2007 Return-Path: Delivered-To: i18n@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A008916A417; Wed, 19 Sep 2007 02:12:10 +0000 (UTC) (envelope-from taku@tackymt.homeip.net) Received: from basalt.tackymt.homeip.net (unknown [IPv6:2001:3e0:577:0:20d:61ff:fecc:2253]) by mx1.freebsd.org (Postfix) with ESMTP id 2C94913C46A; Wed, 19 Sep 2007 02:12:10 +0000 (UTC) (envelope-from taku@tackymt.homeip.net) Received: from localhost (localhost [127.0.0.1]) by basalt.tackymt.homeip.net (Postfix) with ESMTP id E7C1210749; Wed, 19 Sep 2007 11:12:08 +0900 (JST) Received: from basalt.tackymt.homeip.net ([127.0.0.1]) by localhost (basalt.tackymt.homeip.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 45232-02; Wed, 19 Sep 2007 11:12:07 +0900 (JST) Received: from basalt.tackymt.homeip.net (basalt.tackymt.homeip.net [IPv6:2001:3e0:577:0:20d:61ff:fecc:2253]) by basalt.tackymt.homeip.net (Postfix) with ESMTP; Wed, 19 Sep 2007 11:12:07 +0900 (JST) Date: Wed, 19 Sep 2007 11:12:07 +0900 From: Taku YAMAMOTO To: Andrey Chernov Message-Id: <20070919111207.f37653fc.taku@tackymt.homeip.net> In-Reply-To: <20070917171633.GA31179@nagual.pp.ru> References: <20070916192924.GA12678@nagual.pp.ru> <20070917092130.GA24424@nagual.pp.ru> <20070918020100.d43beb0b.taku@tackymt.homeip.net> <20070917171633.GA31179@nagual.pp.ru> X-Mailer: Sylpheed 2.4.0 (GTK+ 2.10.11; i386-portbld-freebsd7.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: amavisd-new at tackymt.homeip.net Cc: i18n@FreeBSD.ORG, Petr Hroudn?? , perky@FreeBSD.ORG, current@FreeBSD.ORG Subject: Re: Ctype patch for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2007 02:12:10 -0000 On Mon, 17 Sep 2007 21:16:33 +0400 Andrey Chernov wrote: > On Tue, Sep 18, 2007 at 02:01:00AM +0900, YAMAMOTO, Taku wrote: > > Checking for __mb_cur_max is not enough for certain locales. > > For example, SJIS has following range for JIS X0201 (a.k.a. HALFWIDTH KANA). > > > > /* > > * JIS X201 > > */ > > PUNCT 0xa1-0xa5 > > SPACE 0xa0 > > BLANK 0xa0 > > SPECIAL 0xa1-0xdf > > PHONOGRAM 0xa6-0xdf > > SWIDTH1 0xa0-0xdf > > I don't understand your remark. MSKanji have __mb_cur_max = 2 and so those > ranges are wchar_t ranges. My patch restrict unsigned char ranges only. These characters ARE single byte. The problem is that a byte >= 0x80 does not always mean it composes a multi-byte character in that locale. -- -|-__ YAMAMOTO, Taku | __ < - A chicken is an egg's way of producing more eggs. - From owner-freebsd-i18n@FreeBSD.ORG Wed Sep 19 02:25:59 2007 Return-Path: Delivered-To: i18n@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 44D6D16A420; Wed, 19 Sep 2007 02:25:59 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (nagual.pp.ru [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id 97AD113C465; Wed, 19 Sep 2007 02:25:58 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.14.1/8.14.1) with ESMTP id l8J2PuUf070749; Wed, 19 Sep 2007 06:25:56 +0400 (MSD) (envelope-from ache@nagual.pp.ru) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nagual.pp.ru; s=default; t=1190168757; bh=CgRSFAzMqKFQSJ78EIaFmOIkHIshvpmLzgqxhjX z5qk=; l=514; h=Date:From:To:Cc:Subject:Message-ID:Mail-Followup-To: References:MIME-Version:Content-Type:Content-Disposition: Content-Transfer-Encoding:In-Reply-To:User-Agent; b=EomaGUh2DjV2yK RHickf//1nPnjQAfXqcx6gapz0voCQGMirs8OmBVXnX9/qQHe1rHsk4pA7NiXy+LmJ9 uCkZgKfyZdssSbM445Wl6QvQ+Vy77J/eAMFtlnjLS1ZDFfcVv22FiIA2/y0wzdOS8yu ywjt+bLNnucG/Se7HUoI74E= Received: (from ache@localhost) by nagual.pp.ru (8.14.1/8.14.1/Submit) id l8J2PuTd070748; Wed, 19 Sep 2007 06:25:56 +0400 (MSD) (envelope-from ache) Date: Wed, 19 Sep 2007 06:25:55 +0400 From: Andrey Chernov To: Taku YAMAMOTO Message-ID: <20070919022555.GA70617@nagual.pp.ru> Mail-Followup-To: Andrey Chernov , Taku YAMAMOTO , Petr Hroudn?? , current@FreeBSD.ORG, perky@FreeBSD.ORG, i18n@FreeBSD.ORG References: <20070916192924.GA12678@nagual.pp.ru> <20070917092130.GA24424@nagual.pp.ru> <20070918020100.d43beb0b.taku@tackymt.homeip.net> <20070917171633.GA31179@nagual.pp.ru> <20070919111207.f37653fc.taku@tackymt.homeip.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <20070919111207.f37653fc.taku@tackymt.homeip.net> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: i18n@FreeBSD.ORG, Petr Hroudn?? , perky@FreeBSD.ORG, current@FreeBSD.ORG Subject: Re: Ctype patch for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2007 02:25:59 -0000 On Wed, Sep 19, 2007 at 11:12:07AM +0900, Taku YAMAMOTO wrote: > These characters ARE single byte. > The problem is that a byte >=3D 0x80 does not always mean it composes a > multi-byte character in that locale. Ah, I understand. =46rom mskanji(5): "Characters from the ASCII/JIS-Roman character set are encoded as single bytes between 0x00 and 0x7F (ASCII) or 0xA1 and 0xDF (Half-width katakana)." It means that test needs to be more comprehensive :( I'll think about... --=20 http://ache.pp.ru/ From owner-freebsd-i18n@FreeBSD.ORG Wed Sep 19 02:36:28 2007 Return-Path: Delivered-To: i18n@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 76B5716A417; Wed, 19 Sep 2007 02:36:28 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (nagual.pp.ru [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id C83E413C459; Wed, 19 Sep 2007 02:36:27 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.14.1/8.14.1) with ESMTP id l8J2aQ0n070960; Wed, 19 Sep 2007 06:36:26 +0400 (MSD) (envelope-from ache@nagual.pp.ru) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nagual.pp.ru; s=default; t=1190169386; bh=Q+x0Gg5zbeDMCpXEXHDZRtuHkT1j+aApFOMtwYT bkNo=; l=357; h=Date:From:To:Subject:Message-ID:Mail-Followup-To: References:MIME-Version:Content-Type:Content-Disposition: In-Reply-To:User-Agent; b=ESYVl3keunDBINry99dX3jGBCA596h0XdiZHjDK7 TJ47ENrTb+831nuqmbvMMHV9vehxWpTMjX7yBGKHuJAe0+4to2AlK7iCbEgb/HqarK0 WuO7cNitp+vBD4azmTn0VoWDIIVCArcrZyL5+OKfihjCsgu1Esk3BK01JmHo2cFw= Received: (from ache@localhost) by nagual.pp.ru (8.14.1/8.14.1/Submit) id l8J2aQaQ070959; Wed, 19 Sep 2007 06:36:26 +0400 (MSD) (envelope-from ache) Date: Wed, 19 Sep 2007 06:36:25 +0400 From: Andrey Chernov To: Taku YAMAMOTO , Petr Hroudn?? , current@FreeBSD.ORG, perky@FreeBSD.ORG, i18n@FreeBSD.ORG Message-ID: <20070919023625.GA70891@nagual.pp.ru> Mail-Followup-To: Andrey Chernov , Taku YAMAMOTO , Petr Hroudn?? , current@FreeBSD.ORG, perky@FreeBSD.ORG, i18n@FreeBSD.ORG References: <20070916192924.GA12678@nagual.pp.ru> <20070917092130.GA24424@nagual.pp.ru> <20070918020100.d43beb0b.taku@tackymt.homeip.net> <20070917171633.GA31179@nagual.pp.ru> <20070919111207.f37653fc.taku@tackymt.homeip.net> <20070919022555.GA70617@nagual.pp.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070919022555.GA70617@nagual.pp.ru> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: Subject: Re: Ctype patch for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2007 02:36:28 -0000 On Wed, Sep 19, 2007 at 06:25:55AM +0400, Andrey Chernov wrote: > I'll think about... It seems my first attempt was right, i.e. we need real UTF-8.src instead of UCS-4 mimic of it. All other locales keep their true wchar_t encodings, only UTF-8.src not following the rules. I'll send regenerated UTF-8.src a bit later. -- http://ache.pp.ru/ From owner-freebsd-i18n@FreeBSD.ORG Wed Sep 19 05:18:34 2007 Return-Path: Delivered-To: i18n@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2F16E16A418; Wed, 19 Sep 2007 05:18:34 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (nagual.pp.ru [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id 338CB13C457; Wed, 19 Sep 2007 05:18:32 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.14.1/8.14.1) with ESMTP id l8J5IV9Y072461; Wed, 19 Sep 2007 09:18:31 +0400 (MSD) (envelope-from ache@nagual.pp.ru) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nagual.pp.ru; s=default; t=1190179111; bh=853GFX1UkJzny0J3otxOiD/18/OckDGE67M6Gw1 rcWQ=; l=15943; h=Date:From:To:Subject:Message-ID:Mail-Followup-To: References:MIME-Version:Content-Type:Content-Disposition: In-Reply-To:User-Agent; b=nkgRAbDHMGAkdlAHTDwCV9s91WRLDRSNuR/8GB32 JJo4Chd0Qutr8p6gHukSISc1M4YJ9QOU0ypRIUOPrtpBWIpGrsM+0tILkkFfCctBJcZ fU/gvM9nqVq8bTfmRNIdoDOaxc2ZMFHQLk9kT6y185yPhf2YOe1a+GJ0Zsggp8KI= Received: (from ache@localhost) by nagual.pp.ru (8.14.1/8.14.1/Submit) id l8J5IUXe072460; Wed, 19 Sep 2007 09:18:30 +0400 (MSD) (envelope-from ache) Date: Wed, 19 Sep 2007 09:18:30 +0400 From: Andrey Chernov To: Taku YAMAMOTO , Petr Hroudn?? , current@FreeBSD.ORG, perky@FreeBSD.ORG, i18n@FreeBSD.ORG Message-ID: <20070919051830.GA72429@nagual.pp.ru> Mail-Followup-To: Andrey Chernov , Taku YAMAMOTO , Petr Hroudn?? , current@FreeBSD.ORG, perky@FreeBSD.ORG, i18n@FreeBSD.ORG References: <20070916192924.GA12678@nagual.pp.ru> <20070917092130.GA24424@nagual.pp.ru> <20070918020100.d43beb0b.taku@tackymt.homeip.net> <20070917171633.GA31179@nagual.pp.ru> <20070919111207.f37653fc.taku@tackymt.homeip.net> <20070919022555.GA70617@nagual.pp.ru> <20070919023625.GA70891@nagual.pp.ru> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="OgqxwSJOaUobr8KG" Content-Disposition: inline In-Reply-To: <20070919023625.GA70891@nagual.pp.ru> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: Subject: Re: Ctype patch for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2007 05:18:34 -0000 --OgqxwSJOaUobr8KG Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Wed, Sep 19, 2007 at 06:36:25AM +0400, Andrey Chernov wrote: > only UTF-8.src not following the rules. I'll send regenerated UTF-8.src > a bit later. I change my mind again, now I use new __mb_bit8_override flag specific to UTF-8 encoding (other bit8 overriding encodings could use it too). New patch attached. -- http://ache.pp.ru/ --OgqxwSJOaUobr8KG Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="ctype.patch" --- _ctype.h.old 2007-09-16 21:13:59.000000000 +0400 +++ _ctype.h 2007-09-19 08:46:35.000000000 +0400 @@ -63,6 +63,7 @@ #define _CTYPE_I 0x00080000L /* Ideogram */ #define _CTYPE_T 0x00100000L /* Special */ #define _CTYPE_Q 0x00200000L /* Phonogram */ +#define _CTYPE_WID 0x10000000L /* wide character function */ #define _CTYPE_SW0 0x20000000L /* 0 width character */ #define _CTYPE_SW1 0x40000000L /* 1 width character */ #define _CTYPE_SW2 0x80000000L /* 2 width character */ @@ -87,6 +88,8 @@ #define __inline #endif +extern int __mb_bit8_override; + /* * Use inline functions if we are allowed to and the compiler supports them. */ @@ -98,8 +101,11 @@ static __inline int __maskrune(__ct_rune_t _c, unsigned long _f) { - return ((_c < 0 || _c >= _CACHED_RUNES) ? ___runetype(_c) : + return __mb_bit8_override && !(_f & _CTYPE_WID) && (_c >= 0x80) ? 0 : + ((_c < 0 || _c >= _CACHED_RUNES) ? ___runetype(_c) : _CurrentRuneLocale->__runetype[_c]) & _f; + /* We never set _CTYPE_WID in the locale data, */ + /* so can skip ... & (_f & ~_CTYPE_WID). */ } static __inline int @@ -111,8 +117,11 @@ static __inline int __isctype(__ct_rune_t _c, unsigned long _f) { - return (_c < 0 || _c >= _CACHED_RUNES) ? 0 : + return __mb_bit8_override && !(_f & _CTYPE_WID) && (_c >= 0x80) ? 0 : + (_c < 0 || _c >= _CACHED_RUNES) ? 0 : !!(_DefaultRuneLocale.__runetype[_c] & _f); + /* We never set _CTYPE_WID in the locale data, */ + /* so can skip ... & (_f & ~_CTYPE_WID). */ } static __inline __ct_rune_t @@ -129,6 +138,22 @@ _CurrentRuneLocale->__maplower[_c]; } +static __inline __ct_rune_t +__tosupper(__ct_rune_t _c) +{ + return __mb_bit8_override && (_c >= 0x80) ? _c : + (_c < 0 || _c >= _CACHED_RUNES) ? ___toupper(_c) : + _CurrentRuneLocale->__mapupper[_c]; +} + +static __inline __ct_rune_t +__toslower(__ct_rune_t _c) +{ + return __mb_bit8_override && (_c >= 0x80) ? _c : + (_c < 0 || _c >= _CACHED_RUNES) ? ___tolower(_c) : + _CurrentRuneLocale->__maplower[_c]; +} + static __inline int __wcwidth(__ct_rune_t _c) { @@ -150,6 +175,8 @@ int __isctype(__ct_rune_t, unsigned long); __ct_rune_t __toupper(__ct_rune_t); __ct_rune_t __tolower(__ct_rune_t); +__ct_rune_t __tosupper(__ct_rune_t); +__ct_rune_t __toslower(__ct_rune_t); int __wcwidth(__ct_rune_t); __END_DECLS #endif /* using inlines */ --- big5.c.old 2007-09-19 08:48:55.000000000 +0400 +++ big5.c 2007-09-19 08:56:12.000000000 +0400 @@ -49,6 +49,8 @@ #include #include "mblocal.h" +extern int __mb_bit8_override; + static size_t _BIG5_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _BIG5_mbsinit(const mbstate_t *); @@ -68,6 +70,7 @@ __mbsinit = _BIG5_mbsinit; _CurrentRuneLocale = rl; __mb_cur_max = 2; + __mb_bit8_override = 0; return (0); } --- ctype.h.old 2007-09-16 22:03:55.000000000 +0400 +++ ctype.h 2007-09-16 22:56:10.000000000 +0400 @@ -97,8 +97,8 @@ #define isspace(c) __istype((c), _CTYPE_S) #define isupper(c) __istype((c), _CTYPE_U) #define isxdigit(c) __isctype((c), _CTYPE_X) /* ANSI -- locale independent */ -#define tolower(c) __tolower(c) -#define toupper(c) __toupper(c) +#define tolower(c) __toslower(c) +#define toupper(c) __tosupper(c) #if __XSI_VISIBLE /* @@ -112,8 +112,8 @@ * * XXX isascii() and toascii() should similarly be undocumented. */ -#define _tolower(c) __tolower(c) -#define _toupper(c) __toupper(c) +#define _tolower(c) __toslower(c) +#define _toupper(c) __tosupper(c) #define isascii(c) (((c) & ~0x7F) == 0) #define toascii(c) ((c) & 0x7F) #endif @@ -128,7 +128,7 @@ #define isideogram(c) __istype((c), _CTYPE_I) #define isnumber(c) __istype((c), _CTYPE_D) #define isphonogram(c) __istype((c), _CTYPE_Q) -#define isrune(c) __istype((c), 0xFFFFFF00L) +#define isrune(c) __istype((c), 0xFFFFFF00L & ~_CTYPE_WID) #define isspecial(c) __istype((c), _CTYPE_T) #endif --- euc.c.old 2007-09-19 08:50:57.000000000 +0400 +++ euc.c 2007-09-19 08:56:12.000000000 +0400 @@ -49,6 +49,8 @@ #include #include "mblocal.h" +extern int __mb_bit8_override; + static size_t _EUC_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _EUC_mbsinit(const mbstate_t *); @@ -116,6 +118,7 @@ __mbrtowc = _EUC_mbrtowc; __wcrtomb = _EUC_wcrtomb; __mbsinit = _EUC_mbsinit; + __mb_bit8_override = 0; return (0); } --- gb18030.c.old 2007-09-19 08:59:01.000000000 +0400 +++ gb18030.c 2007-09-19 09:00:10.000000000 +0400 @@ -39,6 +39,8 @@ #include #include "mblocal.h" +extern int __mb_bit8_override; + static size_t _GB18030_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _GB18030_mbsinit(const mbstate_t *); @@ -59,6 +61,7 @@ __mbsinit = _GB18030_mbsinit; _CurrentRuneLocale = rl; __mb_cur_max = 4; + __mb_bit8_override = 0; return (0); } --- gb2312.c.old 2007-09-19 09:00:35.000000000 +0400 +++ gb2312.c 2007-09-19 09:01:05.000000000 +0400 @@ -35,6 +35,8 @@ #include #include "mblocal.h" +extern int __mb_bit8_override; + static size_t _GB2312_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _GB2312_mbsinit(const mbstate_t *); @@ -55,6 +57,7 @@ __wcrtomb = _GB2312_wcrtomb; __mbsinit = _GB2312_mbsinit; __mb_cur_max = 2; + __mb_bit8_override = 0; return (0); } --- gbk.c.old 2007-09-19 09:01:33.000000000 +0400 +++ gbk.c 2007-09-19 09:02:03.000000000 +0400 @@ -42,6 +42,8 @@ #include #include "mblocal.h" +extern int __mb_bit8_override; + static size_t _GBK_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _GBK_mbsinit(const mbstate_t *); @@ -61,6 +63,7 @@ __mbsinit = _GBK_mbsinit; _CurrentRuneLocale = rl; __mb_cur_max = 2; + __mb_bit8_override = 0; return (0); } --- isctype.c.old 2007-09-16 22:31:26.000000000 +0400 +++ isctype.c 2007-09-16 22:37:54.000000000 +0400 @@ -168,7 +168,7 @@ isrune(c) int c; { - return (__istype(c, 0xFFFFFF00L)); + return (__istype(c, 0xFFFFFF00L & ~_CTYPE_WID)); } #undef isspace @@ -216,7 +216,7 @@ tolower(c) int c; { - return (__tolower(c)); + return (__toslower(c)); } #undef toupper @@ -224,6 +224,6 @@ toupper(c) int c; { - return (__toupper(c)); + return (__tosupper(c)); } --- iswctype.c.old 2007-09-16 22:31:30.000000000 +0400 +++ iswctype.c 2007-09-16 22:41:39.000000000 +0400 @@ -45,7 +45,7 @@ iswalnum(wc) wint_t wc; { - return (__istype(wc, _CTYPE_A|_CTYPE_D)); + return (__istype(wc, _CTYPE_A|_CTYPE_D|_CTYPE_WID)); } #undef iswalpha @@ -53,7 +53,7 @@ iswalpha(wc) wint_t wc; { - return (__istype(wc, _CTYPE_A)); + return (__istype(wc, _CTYPE_A|_CTYPE_WID))); } #undef iswascii @@ -61,7 +61,7 @@ iswascii(wc) wint_t wc; { - return ((wc & ~0x7F) == 0); + return (wc < 0x80); } #undef iswblank @@ -69,7 +69,7 @@ iswblank(wc) wint_t wc; { - return (__istype(wc, _CTYPE_B)); + return (__istype(wc, _CTYPE_B|_CTYPE_WID))); } #undef iswcntrl @@ -77,7 +77,7 @@ iswcntrl(wc) wint_t wc; { - return (__istype(wc, _CTYPE_C)); + return (__istype(wc, _CTYPE_C|_CTYPE_WID))); } #undef iswdigit @@ -85,7 +85,7 @@ iswdigit(wc) wint_t wc; { - return (__isctype(wc, _CTYPE_D)); + return (__isctype(wc, _CTYPE_D|_CTYPE_WID))); } #undef iswgraph @@ -93,7 +93,7 @@ iswgraph(wc) wint_t wc; { - return (__istype(wc, _CTYPE_G)); + return (__istype(wc, _CTYPE_G|_CTYPE_WID))); } #undef iswhexnumber @@ -101,7 +101,7 @@ iswhexnumber(wc) wint_t wc; { - return (__istype(wc, _CTYPE_X)); + return (__istype(wc, _CTYPE_X|_CTYPE_WID))); } #undef iswideogram @@ -109,7 +109,7 @@ iswideogram(wc) wint_t wc; { - return (__istype(wc, _CTYPE_I)); + return (__istype(wc, _CTYPE_I|_CTYPE_WID))); } #undef iswlower @@ -117,7 +117,7 @@ iswlower(wc) wint_t wc; { - return (__istype(wc, _CTYPE_L)); + return (__istype(wc, _CTYPE_L|_CTYPE_WID))); } #undef iswnumber @@ -125,7 +125,7 @@ iswnumber(wc) wint_t wc; { - return (__istype(wc, _CTYPE_D)); + return (__istype(wc, _CTYPE_D|_CTYPE_WID))); } #undef iswphonogram @@ -133,7 +133,7 @@ iswphonogram(wc) wint_t wc; { - return (__istype(wc, _CTYPE_Q)); + return (__istype(wc, _CTYPE_Q|_CTYPE_WID))); } #undef iswprint @@ -141,7 +141,7 @@ iswprint(wc) wint_t wc; { - return (__istype(wc, _CTYPE_R)); + return (__istype(wc, _CTYPE_R|_CTYPE_WID))); } #undef iswpunct @@ -149,7 +149,7 @@ iswpunct(wc) wint_t wc; { - return (__istype(wc, _CTYPE_P)); + return (__istype(wc, _CTYPE_P|_CTYPE_WID))); } #undef iswrune @@ -157,7 +157,7 @@ iswrune(wc) wint_t wc; { - return (__istype(wc, 0xFFFFFF00L)); + return (__istype(wc, 0xFFFFFF00L)); /* already have _CTYPE_WID */ } #undef iswspace @@ -165,7 +165,7 @@ iswspace(wc) wint_t wc; { - return (__istype(wc, _CTYPE_S)); + return (__istype(wc, _CTYPE_S|_CTYPE_WID))); } #undef iswspecial @@ -173,7 +173,7 @@ iswspecial(wc) wint_t wc; { - return (__istype(wc, _CTYPE_T)); + return (__istype(wc, _CTYPE_T|_CTYPE_WID))); } #undef iswupper @@ -181,7 +181,7 @@ iswupper(wc) wint_t wc; { - return (__istype(wc, _CTYPE_U)); + return (__istype(wc, _CTYPE_U|_CTYPE_WID))); } #undef iswxdigit @@ -189,7 +189,7 @@ iswxdigit(wc) wint_t wc; { - return (__isctype(wc, _CTYPE_X)); + return (__isctype(wc, _CTYPE_X|_CTYPE_WID))); } #undef towlower --- mskanji.c.old 2007-09-19 09:02:56.000000000 +0400 +++ mskanji.c 2007-09-19 09:03:26.000000000 +0400 @@ -47,6 +47,8 @@ #include #include "mblocal.h" +extern int __mb_bit8_override; + static size_t _MSKanji_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _MSKanji_mbsinit(const mbstate_t *); @@ -66,6 +68,7 @@ __mbsinit = _MSKanji_mbsinit; _CurrentRuneLocale = rl; __mb_cur_max = 2; + __mb_bit8_override = 0; return (0); } --- none.c.old 2007-09-19 08:56:40.000000000 +0400 +++ none.c 2007-09-19 08:58:23.000000000 +0400 @@ -69,6 +69,7 @@ __wcsnrtombs = _none_wcsnrtombs; _CurrentRuneLocale = rl; __mb_cur_max = 1; + __mb_bit8_override = 0; return(0); } @@ -177,6 +178,7 @@ /* setup defaults */ int __mb_cur_max = 1; +int __mb_bit8_override = 0; size_t (*__mbrtowc)(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict) = _none_mbrtowc; int (*__mbsinit)(const mbstate_t *) = _none_mbsinit; --- setrunelocale.c.old 2007-09-19 09:03:59.000000000 +0400 +++ setrunelocale.c 2007-09-19 09:06:45.000000000 +0400 @@ -45,6 +45,8 @@ #include "mblocal.h" #include "setlocale.h" +extern int __mb_bit8_override; + extern _RuneLocale *_Read_RuneMagi(FILE *); static int __setrunelocale(const char *); @@ -59,6 +61,7 @@ static char ctype_encoding[ENCODING_LEN + 1]; static _RuneLocale *CachedRuneLocale; static int Cached__mb_cur_max; + static int Cached__mb_bit8_override; static size_t (*Cached__mbrtowc)(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static size_t (*Cached__wcrtomb)(char * __restrict, wchar_t, @@ -85,6 +88,7 @@ strcmp(encoding, ctype_encoding) == 0) { _CurrentRuneLocale = CachedRuneLocale; __mb_cur_max = Cached__mb_cur_max; + __mb_bit8_override = Cached__mb_bit8_override; __mbrtowc = Cached__mbrtowc; __mbsinit = Cached__mbsinit; __mbsnrtowcs = Cached__mbsnrtowcs; @@ -147,6 +151,7 @@ } CachedRuneLocale = _CurrentRuneLocale; Cached__mb_cur_max = __mb_cur_max; + Cached__mb_bit8_override = __mb_bit8_override; Cached__mbrtowc = __mbrtowc; Cached__mbsinit = __mbsinit; Cached__mbsnrtowcs = __mbsnrtowcs; --- utf8.c.old 2007-09-19 08:18:40.000000000 +0400 +++ utf8.c 2007-09-19 08:56:12.000000000 +0400 @@ -35,6 +35,8 @@ #include #include "mblocal.h" +extern int __mb_bit8_override; + static size_t _UTF8_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _UTF8_mbsinit(const mbstate_t *); @@ -63,6 +65,7 @@ __wcsnrtombs = _UTF8_wcsnrtombs; _CurrentRuneLocale = rl; __mb_cur_max = 6; + __mb_bit8_override = 1; return (0); } --- wctype.h.old 2007-09-16 21:59:37.000000000 +0400 +++ wctype.h 2007-09-16 22:56:44.000000000 +0400 @@ -89,30 +89,30 @@ #endif __END_DECLS -#define iswalnum(wc) __istype((wc), _CTYPE_A|_CTYPE_D) -#define iswalpha(wc) __istype((wc), _CTYPE_A) -#define iswblank(wc) __istype((wc), _CTYPE_B) -#define iswcntrl(wc) __istype((wc), _CTYPE_C) -#define iswctype(wc, charclass) __istype((wc), (charclass)) -#define iswdigit(wc) __isctype((wc), _CTYPE_D) -#define iswgraph(wc) __istype((wc), _CTYPE_G) -#define iswlower(wc) __istype((wc), _CTYPE_L) -#define iswprint(wc) __istype((wc), _CTYPE_R) -#define iswpunct(wc) __istype((wc), _CTYPE_P) -#define iswspace(wc) __istype((wc), _CTYPE_S) -#define iswupper(wc) __istype((wc), _CTYPE_U) -#define iswxdigit(wc) __isctype((wc), _CTYPE_X) +#define iswalnum(wc) __istype((wc), _CTYPE_A|_CTYPE_D|_CTYPE_WID) +#define iswalpha(wc) __istype((wc), _CTYPE_A|_CTYPE_WID) +#define iswblank(wc) __istype((wc), _CTYPE_B|_CTYPE_WID) +#define iswcntrl(wc) __istype((wc), _CTYPE_C|_CTYPE_WID) +#define iswctype(wc, charclass) __istype((wc), (charclass)|_CTYPE_WID) +#define iswdigit(wc) __isctype((wc), _CTYPE_D|_CTYPE_WID) +#define iswgraph(wc) __istype((wc), _CTYPE_G|_CTYPE_WID) +#define iswlower(wc) __istype((wc), _CTYPE_L|_CTYPE_WID) +#define iswprint(wc) __istype((wc), _CTYPE_R|_CTYPE_WID) +#define iswpunct(wc) __istype((wc), _CTYPE_P|_CTYPE_WID) +#define iswspace(wc) __istype((wc), _CTYPE_S|_CTYPE_WID) +#define iswupper(wc) __istype((wc), _CTYPE_U|_CTYPE_WID) +#define iswxdigit(wc) __isctype((wc), _CTYPE_X|_CTYPE_WID) #define towlower(wc) __tolower(wc) #define towupper(wc) __toupper(wc) #if __BSD_VISIBLE -#define iswascii(wc) (((wc) & ~0x7F) == 0) -#define iswhexnumber(wc) __istype((wc), _CTYPE_X) -#define iswideogram(wc) __istype((wc), _CTYPE_I) -#define iswnumber(wc) __istype((wc), _CTYPE_D) -#define iswphonogram(wc) __istype((wc), _CTYPE_Q) -#define iswrune(wc) __istype((wc), 0xFFFFFF00L) -#define iswspecial(wc) __istype((wc), _CTYPE_T) +#define iswascii(wc) ((wc) < 0x80) +#define iswhexnumber(wc) __istype((wc), _CTYPE_X|_CTYPE_WID) +#define iswideogram(wc) __istype((wc), _CTYPE_I|_CTYPE_WID) +#define iswnumber(wc) __istype((wc), _CTYPE_D|_CTYPE_WID) +#define iswphonogram(wc) __istype((wc), _CTYPE_Q|_CTYPE_WID) +#define iswrune(wc) __istype((wc), 0xFFFFFF00L) /* already have _CTYPE_WID */ +#define iswspecial(wc) __istype((wc), _CTYPE_T|_CTYPE_WID) #endif #endif /* _WCTYPE_H_ */ --OgqxwSJOaUobr8KG-- From owner-freebsd-i18n@FreeBSD.ORG Wed Sep 19 12:10:30 2007 Return-Path: Delivered-To: i18n@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 01D9716A41B; Wed, 19 Sep 2007 12:10:30 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (nagual.pp.ru [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id 27F7313C48A; Wed, 19 Sep 2007 12:10:28 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.14.1/8.14.1) with ESMTP id l8JCARcc081816; Wed, 19 Sep 2007 16:10:27 +0400 (MSD) (envelope-from ache@nagual.pp.ru) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nagual.pp.ru; s=default; t=1190203827; bh=JL/Hv73FefsaA0z0rex6B157SVZKWSsrmb+tZ8R XV8w=; l=16576; h=Date:From:To:Subject:Message-ID:Mail-Followup-To: References:MIME-Version:Content-Type:Content-Disposition: In-Reply-To:User-Agent; b=dOXshvrv1dLWo+BK2XW++iPMZ58WM3dskWeXo21d 8s7nyfGQwmk7szxhyX3ffJonzcos5fYr2Sae2ykso/Twir/w5iArvqLchquBqasyJCL S8Ml9vfgb2xmaKsgRnacEZ4LdDC3YWrjPypjBmmUgcTkZ3lp/G5hLxZmH1FkTRe4= Received: (from ache@localhost) by nagual.pp.ru (8.14.1/8.14.1/Submit) id l8JCAPpd081812; Wed, 19 Sep 2007 16:10:25 +0400 (MSD) (envelope-from ache) Date: Wed, 19 Sep 2007 16:10:24 +0400 From: Andrey Chernov To: Taku YAMAMOTO , Petr Hroudn?? , current@FreeBSD.ORG, perky@FreeBSD.ORG, i18n@FreeBSD.ORG Message-ID: <20070919121024.GA81606@nagual.pp.ru> Mail-Followup-To: Andrey Chernov , Taku YAMAMOTO , Petr Hroudn?? , current@FreeBSD.ORG, perky@FreeBSD.ORG, i18n@FreeBSD.ORG References: <20070916192924.GA12678@nagual.pp.ru> <20070917092130.GA24424@nagual.pp.ru> <20070918020100.d43beb0b.taku@tackymt.homeip.net> <20070917171633.GA31179@nagual.pp.ru> <20070919111207.f37653fc.taku@tackymt.homeip.net> <20070919022555.GA70617@nagual.pp.ru> <20070919023625.GA70891@nagual.pp.ru> <20070919051830.GA72429@nagual.pp.ru> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="0OAP2g/MAC+5xKAE" Content-Disposition: inline In-Reply-To: <20070919051830.GA72429@nagual.pp.ru> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: Subject: Re: Ctype patch for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2007 12:10:30 -0000 --0OAP2g/MAC+5xKAE Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Wed, Sep 19, 2007 at 09:18:30AM +0400, Andrey Chernov wrote: > I change my mind again, now I use new __mb_bit8_override flag specific to > UTF-8 encoding (other bit8 overriding encodings could use it too). New > patch attached. Improved vesrsion. Intoduce general __mb_sch_limit parameter instead for all locales specifying upper limit of single char range. It allows also fix the bug when ctype(3) functions called with arg > 0xFF for wide character locales and simplifies all checks. New patch is attached. Here is updated rationale again: ------------------------------------------------------------------------- The problem is: currently our single byte ctype(3) functions are broken for wide characters locales in the argument range >= 0x80 - they may return false positives. Example 1: for UTF-8 locale we currently have: iswspace(0xA0)==1 and isspace(0xA0)==1 (because iswspace() and isspace() are the same code) but must have iswspace(0xA0)==1 and isspace(0xA0)==0 (because there is no such character and all others in the range 0x80..0xff for the UTF-8 locale, it keeps ASCII only in the single byte range because our internal wchar_t representation for UTF-8 is UCS-4). Example 2: for all wide character locales isalpha(arg) when arg > 0xFF may return false positives (must be 0). (because iswalpha() and isalpha() are the same code) Attached patch address this issue and also fix iswascii() (currently iswascii() is broken for arguments > 0xFF). This patch is 100% binary compatible with old binaries. -- http://ache.pp.ru/ --0OAP2g/MAC+5xKAE Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="ctype.patch" --- _ctype.h.old 2007-09-16 21:13:59.000000000 +0400 +++ _ctype.h 2007-09-19 15:29:41.000000000 +0400 @@ -63,6 +63,7 @@ #define _CTYPE_I 0x00080000L /* Ideogram */ #define _CTYPE_T 0x00100000L /* Special */ #define _CTYPE_Q 0x00200000L /* Phonogram */ +#define _CTYPE_WID 0x10000000L /* wide character function */ #define _CTYPE_SW0 0x20000000L /* 0 width character */ #define _CTYPE_SW1 0x40000000L /* 1 width character */ #define _CTYPE_SW2 0x80000000L /* 2 width character */ @@ -87,6 +88,8 @@ #define __inline #endif +extern int __mb_sch_limit; + /* * Use inline functions if we are allowed to and the compiler supports them. */ @@ -98,8 +101,11 @@ static __inline int __maskrune(__ct_rune_t _c, unsigned long _f) { - return ((_c < 0 || _c >= _CACHED_RUNES) ? ___runetype(_c) : + return (_c < 0 || (!(_f & _CTYPE_WID) && _c >= __mb_sch_limit)) ? 0 : + (_c >= _CACHED_RUNES ? ___runetype(_c) : _CurrentRuneLocale->__runetype[_c]) & _f; + /* We never set _CTYPE_WID in the locale data, */ + /* so can skip ... & (_f & ~_CTYPE_WID). */ } static __inline int @@ -111,7 +117,7 @@ static __inline int __isctype(__ct_rune_t _c, unsigned long _f) { - return (_c < 0 || _c >= _CACHED_RUNES) ? 0 : + return (_c < 0 || _c >= __mb_sch_limit) ? 0 : !!(_DefaultRuneLocale.__runetype[_c] & _f); } @@ -129,6 +135,20 @@ _CurrentRuneLocale->__maplower[_c]; } +static __inline __ct_rune_t +__tosupper(__ct_rune_t _c) +{ + return (_c < 0 || _c >= __mb_sch_limit) ? _c : + _CurrentRuneLocale->__mapupper[_c]; +} + +static __inline __ct_rune_t +__toslower(__ct_rune_t _c) +{ + return (_c < 0 || _c >= __mb_sch_limit) ? _c : + _CurrentRuneLocale->__maplower[_c]; +} + static __inline int __wcwidth(__ct_rune_t _c) { @@ -150,6 +170,8 @@ int __isctype(__ct_rune_t, unsigned long); __ct_rune_t __toupper(__ct_rune_t); __ct_rune_t __tolower(__ct_rune_t); +__ct_rune_t __tosupper(__ct_rune_t); +__ct_rune_t __toslower(__ct_rune_t); int __wcwidth(__ct_rune_t); __END_DECLS #endif /* using inlines */ --- big5.c.old 2007-09-19 08:48:55.000000000 +0400 +++ big5.c 2007-09-19 15:41:26.000000000 +0400 @@ -49,6 +49,8 @@ #include #include "mblocal.h" +extern int __mb_sch_limit; + static size_t _BIG5_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _BIG5_mbsinit(const mbstate_t *); @@ -68,6 +70,7 @@ __mbsinit = _BIG5_mbsinit; _CurrentRuneLocale = rl; __mb_cur_max = 2; + __mb_sch_limit = 256; return (0); } --- ctype.h.old 2007-09-16 22:03:55.000000000 +0400 +++ ctype.h 2007-09-16 22:56:10.000000000 +0400 @@ -97,8 +97,8 @@ #define isspace(c) __istype((c), _CTYPE_S) #define isupper(c) __istype((c), _CTYPE_U) #define isxdigit(c) __isctype((c), _CTYPE_X) /* ANSI -- locale independent */ -#define tolower(c) __tolower(c) -#define toupper(c) __toupper(c) +#define tolower(c) __toslower(c) +#define toupper(c) __tosupper(c) #if __XSI_VISIBLE /* @@ -112,8 +112,8 @@ * * XXX isascii() and toascii() should similarly be undocumented. */ -#define _tolower(c) __tolower(c) -#define _toupper(c) __toupper(c) +#define _tolower(c) __toslower(c) +#define _toupper(c) __tosupper(c) #define isascii(c) (((c) & ~0x7F) == 0) #define toascii(c) ((c) & 0x7F) #endif @@ -128,7 +128,7 @@ #define isideogram(c) __istype((c), _CTYPE_I) #define isnumber(c) __istype((c), _CTYPE_D) #define isphonogram(c) __istype((c), _CTYPE_Q) -#define isrune(c) __istype((c), 0xFFFFFF00L) +#define isrune(c) __istype((c), 0xFFFFFF00L & ~_CTYPE_WID) #define isspecial(c) __istype((c), _CTYPE_T) #endif --- euc.c.old 2007-09-19 08:50:57.000000000 +0400 +++ euc.c 2007-09-19 15:41:26.000000000 +0400 @@ -49,6 +49,8 @@ #include #include "mblocal.h" +extern int __mb_sch_limit; + static size_t _EUC_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _EUC_mbsinit(const mbstate_t *); @@ -116,6 +118,7 @@ __mbrtowc = _EUC_mbrtowc; __wcrtomb = _EUC_wcrtomb; __mbsinit = _EUC_mbsinit; + __mb_sch_limit = 256; return (0); } --- gb18030.c.old 2007-09-19 08:59:01.000000000 +0400 +++ gb18030.c 2007-09-19 15:41:26.000000000 +0400 @@ -39,6 +39,8 @@ #include #include "mblocal.h" +extern int __mb_sch_limit; + static size_t _GB18030_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _GB18030_mbsinit(const mbstate_t *); @@ -59,6 +61,7 @@ __mbsinit = _GB18030_mbsinit; _CurrentRuneLocale = rl; __mb_cur_max = 4; + __mb_sch_limit = 256; return (0); } --- gb2312.c.old 2007-09-19 09:00:35.000000000 +0400 +++ gb2312.c 2007-09-19 15:41:26.000000000 +0400 @@ -35,6 +35,8 @@ #include #include "mblocal.h" +extern int __mb_sch_limit; + static size_t _GB2312_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _GB2312_mbsinit(const mbstate_t *); @@ -55,6 +57,7 @@ __wcrtomb = _GB2312_wcrtomb; __mbsinit = _GB2312_mbsinit; __mb_cur_max = 2; + __mb_sch_limit = 256; return (0); } --- gbk.c.old 2007-09-19 09:01:33.000000000 +0400 +++ gbk.c 2007-09-19 15:41:26.000000000 +0400 @@ -42,6 +42,8 @@ #include #include "mblocal.h" +extern int __mb_sch_limit; + static size_t _GBK_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _GBK_mbsinit(const mbstate_t *); @@ -61,6 +63,7 @@ __mbsinit = _GBK_mbsinit; _CurrentRuneLocale = rl; __mb_cur_max = 2; + __mb_sch_limit = 256; return (0); } --- isctype.c.old 2007-09-16 22:31:26.000000000 +0400 +++ isctype.c 2007-09-16 22:37:54.000000000 +0400 @@ -168,7 +168,7 @@ isrune(c) int c; { - return (__istype(c, 0xFFFFFF00L)); + return (__istype(c, 0xFFFFFF00L & ~_CTYPE_WID)); } #undef isspace @@ -216,7 +216,7 @@ tolower(c) int c; { - return (__tolower(c)); + return (__toslower(c)); } #undef toupper @@ -224,6 +224,6 @@ toupper(c) int c; { - return (__toupper(c)); + return (__tosupper(c)); } --- iswctype.c.old 2007-09-16 22:31:30.000000000 +0400 +++ iswctype.c 2007-09-19 15:45:26.000000000 +0400 @@ -45,7 +45,7 @@ iswalnum(wc) wint_t wc; { - return (__istype(wc, _CTYPE_A|_CTYPE_D)); + return (__istype(wc, _CTYPE_A|_CTYPE_D|_CTYPE_WID)); } #undef iswalpha @@ -53,7 +53,7 @@ iswalpha(wc) wint_t wc; { - return (__istype(wc, _CTYPE_A)); + return (__istype(wc, _CTYPE_A|_CTYPE_WID)); } #undef iswascii @@ -61,7 +61,7 @@ iswascii(wc) wint_t wc; { - return ((wc & ~0x7F) == 0); + return (wc < 0x80); } #undef iswblank @@ -69,7 +69,7 @@ iswblank(wc) wint_t wc; { - return (__istype(wc, _CTYPE_B)); + return (__istype(wc, _CTYPE_B|_CTYPE_WID)); } #undef iswcntrl @@ -77,7 +77,7 @@ iswcntrl(wc) wint_t wc; { - return (__istype(wc, _CTYPE_C)); + return (__istype(wc, _CTYPE_C|_CTYPE_WID)); } #undef iswdigit @@ -93,7 +93,7 @@ iswgraph(wc) wint_t wc; { - return (__istype(wc, _CTYPE_G)); + return (__istype(wc, _CTYPE_G|_CTYPE_WID)); } #undef iswhexnumber @@ -101,7 +101,7 @@ iswhexnumber(wc) wint_t wc; { - return (__istype(wc, _CTYPE_X)); + return (__istype(wc, _CTYPE_X|_CTYPE_WID)); } #undef iswideogram @@ -109,7 +109,7 @@ iswideogram(wc) wint_t wc; { - return (__istype(wc, _CTYPE_I)); + return (__istype(wc, _CTYPE_I|_CTYPE_WID)); } #undef iswlower @@ -117,7 +117,7 @@ iswlower(wc) wint_t wc; { - return (__istype(wc, _CTYPE_L)); + return (__istype(wc, _CTYPE_L|_CTYPE_WID)); } #undef iswnumber @@ -125,7 +125,7 @@ iswnumber(wc) wint_t wc; { - return (__istype(wc, _CTYPE_D)); + return (__istype(wc, _CTYPE_D|_CTYPE_WID)); } #undef iswphonogram @@ -133,7 +133,7 @@ iswphonogram(wc) wint_t wc; { - return (__istype(wc, _CTYPE_Q)); + return (__istype(wc, _CTYPE_Q|_CTYPE_WID)); } #undef iswprint @@ -141,7 +141,7 @@ iswprint(wc) wint_t wc; { - return (__istype(wc, _CTYPE_R)); + return (__istype(wc, _CTYPE_R|_CTYPE_WID)); } #undef iswpunct @@ -149,7 +149,7 @@ iswpunct(wc) wint_t wc; { - return (__istype(wc, _CTYPE_P)); + return (__istype(wc, _CTYPE_P|_CTYPE_WID)); } #undef iswrune @@ -157,7 +157,7 @@ iswrune(wc) wint_t wc; { - return (__istype(wc, 0xFFFFFF00L)); + return (__istype(wc, 0xFFFFFF00L)); /* already have _CTYPE_WID */ } #undef iswspace @@ -165,7 +165,7 @@ iswspace(wc) wint_t wc; { - return (__istype(wc, _CTYPE_S)); + return (__istype(wc, _CTYPE_S|_CTYPE_WID)); } #undef iswspecial @@ -173,7 +173,7 @@ iswspecial(wc) wint_t wc; { - return (__istype(wc, _CTYPE_T)); + return (__istype(wc, _CTYPE_T|_CTYPE_WID)); } #undef iswupper @@ -181,7 +181,7 @@ iswupper(wc) wint_t wc; { - return (__istype(wc, _CTYPE_U)); + return (__istype(wc, _CTYPE_U|_CTYPE_WID)); } #undef iswxdigit --- mskanji.c.old 2007-09-19 09:02:56.000000000 +0400 +++ mskanji.c 2007-09-19 15:41:26.000000000 +0400 @@ -47,6 +47,8 @@ #include #include "mblocal.h" +extern int __mb_sch_limit; + static size_t _MSKanji_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _MSKanji_mbsinit(const mbstate_t *); @@ -66,6 +68,7 @@ __mbsinit = _MSKanji_mbsinit; _CurrentRuneLocale = rl; __mb_cur_max = 2; + __mb_sch_limit = 256; return (0); } --- none.c.old 2007-09-19 08:56:40.000000000 +0400 +++ none.c 2007-09-19 15:51:44.000000000 +0400 @@ -58,6 +58,11 @@ static size_t _none_wcsnrtombs(char * __restrict, const wchar_t ** __restrict, size_t, size_t, mbstate_t * __restrict); +/* setup defaults */ + +int __mb_cur_max = 1; +int __mb_sch_limit = 256; /* Expected to be <= _CACHED_RUNES */ + int _none_init(_RuneLocale *rl) { @@ -69,6 +74,7 @@ __wcsnrtombs = _none_wcsnrtombs; _CurrentRuneLocale = rl; __mb_cur_max = 1; + __mb_sch_limit = 256; return(0); } @@ -176,7 +182,6 @@ /* setup defaults */ -int __mb_cur_max = 1; size_t (*__mbrtowc)(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict) = _none_mbrtowc; int (*__mbsinit)(const mbstate_t *) = _none_mbsinit; --- setrunelocale.c.old 2007-09-19 09:03:59.000000000 +0400 +++ setrunelocale.c 2007-09-19 15:41:26.000000000 +0400 @@ -45,6 +45,8 @@ #include "mblocal.h" #include "setlocale.h" +extern int __mb_sch_limit; + extern _RuneLocale *_Read_RuneMagi(FILE *); static int __setrunelocale(const char *); @@ -59,6 +61,7 @@ static char ctype_encoding[ENCODING_LEN + 1]; static _RuneLocale *CachedRuneLocale; static int Cached__mb_cur_max; + static int Cached__mb_sch_limit; static size_t (*Cached__mbrtowc)(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static size_t (*Cached__wcrtomb)(char * __restrict, wchar_t, @@ -85,6 +88,7 @@ strcmp(encoding, ctype_encoding) == 0) { _CurrentRuneLocale = CachedRuneLocale; __mb_cur_max = Cached__mb_cur_max; + __mb_sch_limit = Cached__mb_sch_limit; __mbrtowc = Cached__mbrtowc; __mbsinit = Cached__mbsinit; __mbsnrtowcs = Cached__mbsnrtowcs; @@ -147,6 +151,7 @@ } CachedRuneLocale = _CurrentRuneLocale; Cached__mb_cur_max = __mb_cur_max; + Cached__mb_sch_limit = __mb_sch_limit; Cached__mbrtowc = __mbrtowc; Cached__mbsinit = __mbsinit; Cached__mbsnrtowcs = __mbsnrtowcs; --- utf8.c.old 2007-09-19 08:18:40.000000000 +0400 +++ utf8.c 2007-09-19 15:55:35.000000000 +0400 @@ -35,6 +35,8 @@ #include #include "mblocal.h" +extern int __mb_sch_limit; + static size_t _UTF8_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _UTF8_mbsinit(const mbstate_t *); @@ -63,6 +65,7 @@ __wcsnrtombs = _UTF8_wcsnrtombs; _CurrentRuneLocale = rl; __mb_cur_max = 6; + __mb_sch_limit = 128; return (0); } --- wctype.h.old 2007-09-16 21:59:37.000000000 +0400 +++ wctype.h 2007-09-19 15:31:40.000000000 +0400 @@ -89,30 +89,30 @@ #endif __END_DECLS -#define iswalnum(wc) __istype((wc), _CTYPE_A|_CTYPE_D) -#define iswalpha(wc) __istype((wc), _CTYPE_A) -#define iswblank(wc) __istype((wc), _CTYPE_B) -#define iswcntrl(wc) __istype((wc), _CTYPE_C) -#define iswctype(wc, charclass) __istype((wc), (charclass)) +#define iswalnum(wc) __istype((wc), _CTYPE_A|_CTYPE_D|_CTYPE_WID) +#define iswalpha(wc) __istype((wc), _CTYPE_A|_CTYPE_WID) +#define iswblank(wc) __istype((wc), _CTYPE_B|_CTYPE_WID) +#define iswcntrl(wc) __istype((wc), _CTYPE_C|_CTYPE_WID) +#define iswctype(wc, charclass) __istype((wc), (charclass)|_CTYPE_WID) #define iswdigit(wc) __isctype((wc), _CTYPE_D) -#define iswgraph(wc) __istype((wc), _CTYPE_G) -#define iswlower(wc) __istype((wc), _CTYPE_L) -#define iswprint(wc) __istype((wc), _CTYPE_R) -#define iswpunct(wc) __istype((wc), _CTYPE_P) -#define iswspace(wc) __istype((wc), _CTYPE_S) -#define iswupper(wc) __istype((wc), _CTYPE_U) +#define iswgraph(wc) __istype((wc), _CTYPE_G|_CTYPE_WID) +#define iswlower(wc) __istype((wc), _CTYPE_L|_CTYPE_WID) +#define iswprint(wc) __istype((wc), _CTYPE_R|_CTYPE_WID) +#define iswpunct(wc) __istype((wc), _CTYPE_P|_CTYPE_WID) +#define iswspace(wc) __istype((wc), _CTYPE_S|_CTYPE_WID) +#define iswupper(wc) __istype((wc), _CTYPE_U|_CTYPE_WID) #define iswxdigit(wc) __isctype((wc), _CTYPE_X) #define towlower(wc) __tolower(wc) #define towupper(wc) __toupper(wc) #if __BSD_VISIBLE -#define iswascii(wc) (((wc) & ~0x7F) == 0) -#define iswhexnumber(wc) __istype((wc), _CTYPE_X) -#define iswideogram(wc) __istype((wc), _CTYPE_I) -#define iswnumber(wc) __istype((wc), _CTYPE_D) -#define iswphonogram(wc) __istype((wc), _CTYPE_Q) -#define iswrune(wc) __istype((wc), 0xFFFFFF00L) -#define iswspecial(wc) __istype((wc), _CTYPE_T) +#define iswascii(wc) ((wc) < 0x80) +#define iswhexnumber(wc) __istype((wc), _CTYPE_X|_CTYPE_WID) +#define iswideogram(wc) __istype((wc), _CTYPE_I|_CTYPE_WID) +#define iswnumber(wc) __istype((wc), _CTYPE_D|_CTYPE_WID) +#define iswphonogram(wc) __istype((wc), _CTYPE_Q|_CTYPE_WID) +#define iswrune(wc) __istype((wc), 0xFFFFFF00L) /* already have _CTYPE_WID */ +#define iswspecial(wc) __istype((wc), _CTYPE_T|_CTYPE_WID) #endif #endif /* _WCTYPE_H_ */ --0OAP2g/MAC+5xKAE-- From owner-freebsd-i18n@FreeBSD.ORG Fri Sep 21 02:41:11 2007 Return-Path: Delivered-To: i18n@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 52BF916A500; Fri, 21 Sep 2007 02:41:11 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (nagual.pp.ru [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id 6FB9713C45B; Fri, 21 Sep 2007 02:41:09 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.14.1/8.14.1) with ESMTP id l8L2f8j8021365; Fri, 21 Sep 2007 06:41:08 +0400 (MSD) (envelope-from ache@nagual.pp.ru) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nagual.pp.ru; s=default; t=1190342468; bh=FRYmNZ/Vd3kG7s1WD6jEj9RxrWqBOnBoecBcT/a Kphs=; l=14929; h=Date:From:To:Subject:Message-ID:Mail-Followup-To: References:MIME-Version:Content-Type:Content-Disposition: In-Reply-To:User-Agent; b=bpPRSYcXDhqaRo41WRfJduklP15VlvIw+zISKTTO ni/SMjRp8nZOkZgExUveg6ddDhGzSjDfTYXoQ6T3epuYqZZHzZDC6n6ct5kpuSo79aV b5gWWPkhnS3u2fKLU7uGIGu5jnz38Y5XEEWFG5Vlc+MGebHHoIihmsKgZFdeNIbk= Received: (from ache@localhost) by nagual.pp.ru (8.14.1/8.14.1/Submit) id l8L2f7rT021364; Fri, 21 Sep 2007 06:41:07 +0400 (MSD) (envelope-from ache) Date: Fri, 21 Sep 2007 06:41:07 +0400 From: Andrey Chernov To: Taku YAMAMOTO , Petr Hroudn?? , current@FreeBSD.ORG, perky@FreeBSD.ORG, i18n@FreeBSD.ORG Message-ID: <20070921024107.GA21223@nagual.pp.ru> Mail-Followup-To: Andrey Chernov , Taku YAMAMOTO , Petr Hroudn?? , current@FreeBSD.ORG, perky@FreeBSD.ORG, i18n@FreeBSD.ORG References: <20070916192924.GA12678@nagual.pp.ru> <20070917092130.GA24424@nagual.pp.ru> <20070918020100.d43beb0b.taku@tackymt.homeip.net> <20070917171633.GA31179@nagual.pp.ru> <20070919111207.f37653fc.taku@tackymt.homeip.net> <20070919022555.GA70617@nagual.pp.ru> <20070919023625.GA70891@nagual.pp.ru> <20070919051830.GA72429@nagual.pp.ru> <20070919121024.GA81606@nagual.pp.ru> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="rwEMma7ioTxnRzrJ" Content-Disposition: inline In-Reply-To: <20070919121024.GA81606@nagual.pp.ru> User-Agent: Mutt/1.5.16 (2007-06-09) Cc: Subject: Re: Ctype patch for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 02:41:11 -0000 --rwEMma7ioTxnRzrJ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Wed, Sep 19, 2007 at 04:10:24PM +0400, Andrey Chernov wrote: > Improved vesrsion. Intoduce general __mb_sch_limit parameter instead for > all locales specifying upper limit of single char range. It allows also > fix the bug when ctype(3) functions called with arg > 0xFF for wide > character locales and simplifies all checks. New patch is attached. Here > is updated rationale again: Next improved version, now optimized for speed. I decide to remove extra _CTYPE_WID flag and duplicate needed functions instead. -- http://ache.pp.ru/ --rwEMma7ioTxnRzrJ Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="ctype.patch" --- Symbol.map.old 2007-09-19 22:37:21.000000000 +0400 +++ Symbol.map 2007-09-21 06:31:56.000000000 +0400 @@ -60,12 +60,17 @@ nextwctype; nl_langinfo; __maskrune; + __sbmaskrune; __istype; + __sbistype; __isctype; __toupper; + __sbtoupper; __tolower; + __sbtolower; __wcwidth; __mb_cur_max; + __mb_sch_limit; rpmatch; ___runetype; setlocale; --- _ctype.h.old 2007-09-16 21:13:59.000000000 +0400 +++ _ctype.h 2007-09-21 06:21:59.000000000 +0400 @@ -87,6 +87,8 @@ #define __inline #endif +extern int __mb_sch_limit; + /* * Use inline functions if we are allowed to and the compiler supports them. */ @@ -103,15 +105,28 @@ } static __inline int +__sbmaskrune(__ct_rune_t _c, unsigned long _f) +{ + return (_c < 0 || _c >= __mb_sch_limit) ? 0 : + _CurrentRuneLocale->__runetype[_c] & _f; +} + +static __inline int __istype(__ct_rune_t _c, unsigned long _f) { return (!!__maskrune(_c, _f)); } static __inline int +__sbistype(__ct_rune_t _c, unsigned long _f) +{ + return (!!__sbmasksrune(_c, _f)); +} + +static __inline int __isctype(__ct_rune_t _c, unsigned long _f) { - return (_c < 0 || _c >= _CACHED_RUNES) ? 0 : + return (_c < 0 || _c >= __mb_sch_limit) ? 0 : !!(_DefaultRuneLocale.__runetype[_c] & _f); } @@ -123,12 +138,26 @@ } static __inline __ct_rune_t +__sbtoupper(__ct_rune_t _c) +{ + return (_c < 0 || _c >= __mb_sch_limit) ? _c : + _CurrentRuneLocale->__mapupper[_c]; +} + +static __inline __ct_rune_t __tolower(__ct_rune_t _c) { return (_c < 0 || _c >= _CACHED_RUNES) ? ___tolower(_c) : _CurrentRuneLocale->__maplower[_c]; } +static __inline __ct_rune_t +__sbtolower(__ct_rune_t _c) +{ + return (_c < 0 || _c >= __mb_sch_limit) ? _c : + _CurrentRuneLocale->__maplower[_c]; +} + static __inline int __wcwidth(__ct_rune_t _c) { @@ -146,10 +175,14 @@ __BEGIN_DECLS int __maskrune(__ct_rune_t, unsigned long); +int __sbmaskrune(__ct_rune_t, unsigned long); int __istype(__ct_rune_t, unsigned long); +int __sbistype(__ct_rune_t, unsigned long); int __isctype(__ct_rune_t, unsigned long); __ct_rune_t __toupper(__ct_rune_t); +__ct_rune_t __sbtoupper(__ct_rune_t); __ct_rune_t __tolower(__ct_rune_t); +__ct_rune_t __sbtolower(__ct_rune_t); int __wcwidth(__ct_rune_t); __END_DECLS #endif /* using inlines */ --- big5.c.old 2007-09-19 08:48:55.000000000 +0400 +++ big5.c 2007-09-19 15:41:26.000000000 +0400 @@ -49,6 +49,8 @@ #include #include "mblocal.h" +extern int __mb_sch_limit; + static size_t _BIG5_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _BIG5_mbsinit(const mbstate_t *); @@ -68,6 +70,7 @@ __mbsinit = _BIG5_mbsinit; _CurrentRuneLocale = rl; __mb_cur_max = 2; + __mb_sch_limit = 256; return (0); } --- ctype.h.old 2007-09-16 22:03:55.000000000 +0400 +++ ctype.h 2007-09-21 06:26:26.000000000 +0400 @@ -86,19 +86,19 @@ #endif __END_DECLS -#define isalnum(c) __istype((c), _CTYPE_A|_CTYPE_D) -#define isalpha(c) __istype((c), _CTYPE_A) -#define iscntrl(c) __istype((c), _CTYPE_C) +#define isalnum(c) __sbistype((c), _CTYPE_A|_CTYPE_D) +#define isalpha(c) __sbistype((c), _CTYPE_A) +#define iscntrl(c) __sbistype((c), _CTYPE_C) #define isdigit(c) __isctype((c), _CTYPE_D) /* ANSI -- locale independent */ -#define isgraph(c) __istype((c), _CTYPE_G) -#define islower(c) __istype((c), _CTYPE_L) -#define isprint(c) __istype((c), _CTYPE_R) -#define ispunct(c) __istype((c), _CTYPE_P) -#define isspace(c) __istype((c), _CTYPE_S) -#define isupper(c) __istype((c), _CTYPE_U) +#define isgraph(c) __sbistype((c), _CTYPE_G) +#define islower(c) __sbistype((c), _CTYPE_L) +#define isprint(c) __sbistype((c), _CTYPE_R) +#define ispunct(c) __sbistype((c), _CTYPE_P) +#define isspace(c) __sbistype((c), _CTYPE_S) +#define isupper(c) __sbistype((c), _CTYPE_U) #define isxdigit(c) __isctype((c), _CTYPE_X) /* ANSI -- locale independent */ -#define tolower(c) __tolower(c) -#define toupper(c) __toupper(c) +#define tolower(c) __sbtolower(c) +#define toupper(c) __sbtoupper(c) #if __XSI_VISIBLE /* @@ -112,24 +112,24 @@ * * XXX isascii() and toascii() should similarly be undocumented. */ -#define _tolower(c) __tolower(c) -#define _toupper(c) __toupper(c) +#define _tolower(c) __sbtolower(c) +#define _toupper(c) __sbtoupper(c) #define isascii(c) (((c) & ~0x7F) == 0) #define toascii(c) ((c) & 0x7F) #endif #if __ISO_C_VISIBLE >= 1999 -#define isblank(c) __istype((c), _CTYPE_B) +#define isblank(c) __sbistype((c), _CTYPE_B) #endif #if __BSD_VISIBLE -#define digittoint(c) __maskrune((c), 0xFF) -#define ishexnumber(c) __istype((c), _CTYPE_X) -#define isideogram(c) __istype((c), _CTYPE_I) -#define isnumber(c) __istype((c), _CTYPE_D) -#define isphonogram(c) __istype((c), _CTYPE_Q) -#define isrune(c) __istype((c), 0xFFFFFF00L) -#define isspecial(c) __istype((c), _CTYPE_T) +#define digittoint(c) __sbmaskrune((c), 0xFF) +#define ishexnumber(c) __sbistype((c), _CTYPE_X) +#define isideogram(c) __sbistype((c), _CTYPE_I) +#define isnumber(c) __sbistype((c), _CTYPE_D) +#define isphonogram(c) __sbistype((c), _CTYPE_Q) +#define isrune(c) __sbistype((c), 0xFFFFFF00L) +#define isspecial(c) __sbistype((c), _CTYPE_T) #endif #endif /* !_CTYPE_H_ */ --- euc.c.old 2007-09-19 08:50:57.000000000 +0400 +++ euc.c 2007-09-19 15:41:26.000000000 +0400 @@ -49,6 +49,8 @@ #include #include "mblocal.h" +extern int __mb_sch_limit; + static size_t _EUC_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _EUC_mbsinit(const mbstate_t *); @@ -116,6 +118,7 @@ __mbrtowc = _EUC_mbrtowc; __wcrtomb = _EUC_wcrtomb; __mbsinit = _EUC_mbsinit; + __mb_sch_limit = 256; return (0); } --- gb18030.c.old 2007-09-19 08:59:01.000000000 +0400 +++ gb18030.c 2007-09-19 15:41:26.000000000 +0400 @@ -39,6 +39,8 @@ #include #include "mblocal.h" +extern int __mb_sch_limit; + static size_t _GB18030_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _GB18030_mbsinit(const mbstate_t *); @@ -59,6 +61,7 @@ __mbsinit = _GB18030_mbsinit; _CurrentRuneLocale = rl; __mb_cur_max = 4; + __mb_sch_limit = 256; return (0); } --- gb2312.c.old 2007-09-19 09:00:35.000000000 +0400 +++ gb2312.c 2007-09-19 15:41:26.000000000 +0400 @@ -35,6 +35,8 @@ #include #include "mblocal.h" +extern int __mb_sch_limit; + static size_t _GB2312_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _GB2312_mbsinit(const mbstate_t *); @@ -55,6 +57,7 @@ __wcrtomb = _GB2312_wcrtomb; __mbsinit = _GB2312_mbsinit; __mb_cur_max = 2; + __mb_sch_limit = 256; return (0); } --- gbk.c.old 2007-09-19 09:01:33.000000000 +0400 +++ gbk.c 2007-09-19 15:41:26.000000000 +0400 @@ -42,6 +42,8 @@ #include #include "mblocal.h" +extern int __mb_sch_limit; + static size_t _GBK_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _GBK_mbsinit(const mbstate_t *); @@ -61,6 +63,7 @@ __mbsinit = _GBK_mbsinit; _CurrentRuneLocale = rl; __mb_cur_max = 2; + __mb_sch_limit = 256; return (0); } --- isctype.c.old 2007-09-16 22:31:26.000000000 +0400 +++ isctype.c 2007-09-21 06:28:30.000000000 +0400 @@ -48,7 +48,7 @@ digittoint(c) int c; { - return (__maskrune(c, 0xFF)); + return (__sbmaskrune(c, 0xFF)); } #undef isalnum @@ -56,7 +56,7 @@ isalnum(c) int c; { - return (__istype(c, _CTYPE_A|_CTYPE_D)); + return (__sbistype(c, _CTYPE_A|_CTYPE_D)); } #undef isalpha @@ -64,7 +64,7 @@ isalpha(c) int c; { - return (__istype(c, _CTYPE_A)); + return (__sbistype(c, _CTYPE_A)); } #undef isascii @@ -80,7 +80,7 @@ isblank(c) int c; { - return (__istype(c, _CTYPE_B)); + return (__sbistype(c, _CTYPE_B)); } #undef iscntrl @@ -88,7 +88,7 @@ iscntrl(c) int c; { - return (__istype(c, _CTYPE_C)); + return (__sbistype(c, _CTYPE_C)); } #undef isdigit @@ -104,7 +104,7 @@ isgraph(c) int c; { - return (__istype(c, _CTYPE_G)); + return (__sbistype(c, _CTYPE_G)); } #undef ishexnumber @@ -112,7 +112,7 @@ ishexnumber(c) int c; { - return (__istype(c, _CTYPE_X)); + return (__sbistype(c, _CTYPE_X)); } #undef isideogram @@ -120,7 +120,7 @@ isideogram(c) int c; { - return (__istype(c, _CTYPE_I)); + return (__sbistype(c, _CTYPE_I)); } #undef islower @@ -128,7 +128,7 @@ islower(c) int c; { - return (__istype(c, _CTYPE_L)); + return (__sbistype(c, _CTYPE_L)); } #undef isnumber @@ -136,7 +136,7 @@ isnumber(c) int c; { - return (__istype(c, _CTYPE_D)); + return (__sbistype(c, _CTYPE_D)); } #undef isphonogram @@ -144,7 +144,7 @@ isphonogram(c) int c; { - return (__istype(c, _CTYPE_Q)); + return (__sbistype(c, _CTYPE_Q)); } #undef isprint @@ -152,7 +152,7 @@ isprint(c) int c; { - return (__istype(c, _CTYPE_R)); + return (__sbistype(c, _CTYPE_R)); } #undef ispunct @@ -160,7 +160,7 @@ ispunct(c) int c; { - return (__istype(c, _CTYPE_P)); + return (__sbistype(c, _CTYPE_P)); } #undef isrune @@ -168,7 +168,7 @@ isrune(c) int c; { - return (__istype(c, 0xFFFFFF00L)); + return (__sbistype(c, 0xFFFFFF00L)); } #undef isspace @@ -176,7 +176,7 @@ isspace(c) int c; { - return (__istype(c, _CTYPE_S)); + return (__sbistype(c, _CTYPE_S)); } #undef isspecial @@ -184,7 +184,7 @@ isspecial(c) int c; { - return (__istype(c, _CTYPE_T)); + return (__sbistype(c, _CTYPE_T)); } #undef isupper @@ -192,7 +192,7 @@ isupper(c) int c; { - return (__istype(c, _CTYPE_U)); + return (__sbistype(c, _CTYPE_U)); } #undef isxdigit @@ -216,7 +216,7 @@ tolower(c) int c; { - return (__tolower(c)); + return (__sbtolower(c)); } #undef toupper @@ -224,6 +224,6 @@ toupper(c) int c; { - return (__toupper(c)); + return (__sbtoupper(c)); } --- iswctype.c.old 2007-09-16 22:31:30.000000000 +0400 +++ iswctype.c 2007-09-21 06:29:59.000000000 +0400 @@ -61,7 +61,7 @@ iswascii(wc) wint_t wc; { - return ((wc & ~0x7F) == 0); + return (wc < 0x80); } #undef iswblank --- mskanji.c.old 2007-09-19 09:02:56.000000000 +0400 +++ mskanji.c 2007-09-19 15:41:26.000000000 +0400 @@ -47,6 +47,8 @@ #include #include "mblocal.h" +extern int __mb_sch_limit; + static size_t _MSKanji_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _MSKanji_mbsinit(const mbstate_t *); @@ -66,6 +68,7 @@ __mbsinit = _MSKanji_mbsinit; _CurrentRuneLocale = rl; __mb_cur_max = 2; + __mb_sch_limit = 256; return (0); } --- none.c.old 2007-09-19 08:56:40.000000000 +0400 +++ none.c 2007-09-19 21:16:11.000000000 +0400 @@ -58,6 +58,11 @@ static size_t _none_wcsnrtombs(char * __restrict, const wchar_t ** __restrict, size_t, size_t, mbstate_t * __restrict); +/* setup defaults */ + +int __mb_cur_max = 1; +int __mb_sch_limit = 256; /* Expected to be <= _CACHED_RUNES */ + int _none_init(_RuneLocale *rl) { @@ -69,6 +74,7 @@ __wcsnrtombs = _none_wcsnrtombs; _CurrentRuneLocale = rl; __mb_cur_max = 1; + __mb_sch_limit = 256; return(0); } @@ -176,7 +182,6 @@ /* setup defaults */ -int __mb_cur_max = 1; size_t (*__mbrtowc)(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict) = _none_mbrtowc; int (*__mbsinit)(const mbstate_t *) = _none_mbsinit; --- setrunelocale.c.old 2007-09-19 09:03:59.000000000 +0400 +++ setrunelocale.c 2007-09-19 15:41:26.000000000 +0400 @@ -45,6 +45,8 @@ #include "mblocal.h" #include "setlocale.h" +extern int __mb_sch_limit; + extern _RuneLocale *_Read_RuneMagi(FILE *); static int __setrunelocale(const char *); @@ -59,6 +61,7 @@ static char ctype_encoding[ENCODING_LEN + 1]; static _RuneLocale *CachedRuneLocale; static int Cached__mb_cur_max; + static int Cached__mb_sch_limit; static size_t (*Cached__mbrtowc)(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static size_t (*Cached__wcrtomb)(char * __restrict, wchar_t, @@ -85,6 +88,7 @@ strcmp(encoding, ctype_encoding) == 0) { _CurrentRuneLocale = CachedRuneLocale; __mb_cur_max = Cached__mb_cur_max; + __mb_sch_limit = Cached__mb_sch_limit; __mbrtowc = Cached__mbrtowc; __mbsinit = Cached__mbsinit; __mbsnrtowcs = Cached__mbsnrtowcs; @@ -147,6 +151,7 @@ } CachedRuneLocale = _CurrentRuneLocale; Cached__mb_cur_max = __mb_cur_max; + Cached__mb_sch_limit = __mb_sch_limit; Cached__mbrtowc = __mbrtowc; Cached__mbsinit = __mbsinit; Cached__mbsnrtowcs = __mbsnrtowcs; --- utf8.c.old 2007-09-19 08:18:40.000000000 +0400 +++ utf8.c 2007-09-19 15:55:35.000000000 +0400 @@ -35,6 +35,8 @@ #include #include "mblocal.h" +extern int __mb_sch_limit; + static size_t _UTF8_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _UTF8_mbsinit(const mbstate_t *); @@ -63,6 +65,7 @@ __wcsnrtombs = _UTF8_wcsnrtombs; _CurrentRuneLocale = rl; __mb_cur_max = 6; + __mb_sch_limit = 128; return (0); } --- wctype.h.old 2007-09-16 21:59:37.000000000 +0400 +++ wctype.h 2007-09-21 06:08:40.000000000 +0400 @@ -106,7 +106,7 @@ #define towupper(wc) __toupper(wc) #if __BSD_VISIBLE -#define iswascii(wc) (((wc) & ~0x7F) == 0) +#define iswascii(wc) ((wc) < 0x80) #define iswhexnumber(wc) __istype((wc), _CTYPE_X) #define iswideogram(wc) __istype((wc), _CTYPE_I) #define iswnumber(wc) __istype((wc), _CTYPE_D) --rwEMma7ioTxnRzrJ-- From owner-freebsd-i18n@FreeBSD.ORG Fri Sep 21 07:04:46 2007 Return-Path: Delivered-To: i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5E7CE16A41A for ; Fri, 21 Sep 2007 07:04:46 +0000 (UTC) (envelope-from petr.hroudny@gmail.com) Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.186]) by mx1.freebsd.org (Postfix) with ESMTP id DD54813C467 for ; Fri, 21 Sep 2007 07:04:45 +0000 (UTC) (envelope-from petr.hroudny@gmail.com) Received: by nf-out-0910.google.com with SMTP id b2so644236nfb for ; Fri, 21 Sep 2007 00:04:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=qVuhfMNZ8bSrnDuI4F+m8Cm5GSUHhg/DW31XlJP/1/I=; b=lBwQczuPo5wz0ThUQHNmjpEHI1GN7vUJ/yHe9ssWFUraYyhEHBgsUvmOaBg/tQ8Mvr1XDm9BZwUkpvb0CWMR4NSLwZJwzDBPvFbNYvPV9/BbzamHeKJ2UnxoQhtudglv3KjYIWSEzzLGXas+Z0p73GVKkygdYmORfKaLf0h1UNk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=opKZmOSxaAYAENdNXIhPn9Wde/0ZXaJIODflaGXEq50M75Xkw8QD263lonbLuLwA8dpz2P2khnXKOxea9qd+mfLyrbqbzUCrcntgm0erLL+TFy5W+3trrZiqjYmpQ6L7t9v2tsMqO25h3jZT0vBTlZT/xSPWpBOBPNZH8YJa3qY= Received: by 10.78.170.6 with SMTP id s6mr1762014hue.1190358284294; Fri, 21 Sep 2007 00:04:44 -0700 (PDT) Received: by 10.78.100.2 with HTTP; Fri, 21 Sep 2007 00:04:44 -0700 (PDT) Message-ID: Date: Fri, 21 Sep 2007 09:04:44 +0200 From: "=?UTF-8?Q?Petr_Hroudn=C3=BD?=" To: "Andrey Chernov" , "Taku YAMAMOTO" , "Petr Hroudn??" , current@freebsd.org, perky@freebsd.org, i18n@freebsd.org In-Reply-To: <20070921024107.GA21223@nagual.pp.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20070916192924.GA12678@nagual.pp.ru> <20070917092130.GA24424@nagual.pp.ru> <20070918020100.d43beb0b.taku@tackymt.homeip.net> <20070917171633.GA31179@nagual.pp.ru> <20070919111207.f37653fc.taku@tackymt.homeip.net> <20070919022555.GA70617@nagual.pp.ru> <20070919023625.GA70891@nagual.pp.ru> <20070919051830.GA72429@nagual.pp.ru> <20070919121024.GA81606@nagual.pp.ru> <20070921024107.GA21223@nagual.pp.ru> Cc: Subject: Re: Ctype patch for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 07:04:46 -0000 2007/9/21, Andrey Chernov : > On Wed, Sep 19, 2007 at 04:10:24PM +0400, Andrey Chernov wrote: > > Improved vesrsion. Intoduce general __mb_sch_limit parameter instead for > > all locales specifying upper limit of single char range. It allows also > > fix the bug when ctype(3) functions called with arg > 0xFF for wide > > character locales and simplifies all checks. New patch is attached. Here > > is updated rationale again: > > Next improved version, now optimized for speed. I decide to remove extra > _CTYPE_WID flag and duplicate needed functions instead. I believe your patch needs some adjustments for CJK charsets. You are setting __mb_sch_limit to 256 for all multibyte locales except UTF-8, but I believe it should be 128 also for Big5, GB18030, GBK. Regards, Petr From owner-freebsd-i18n@FreeBSD.ORG Fri Sep 21 18:02:30 2007 Return-Path: Delivered-To: i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5E27A16A418; Fri, 21 Sep 2007 18:02:30 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (nagual.pp.ru [194.87.13.69]) by mx1.freebsd.org (Postfix) with ESMTP id 2AA4513C459; Fri, 21 Sep 2007 18:02:28 +0000 (UTC) (envelope-from ache@nagual.pp.ru) Received: from nagual.pp.ru (ache@localhost [127.0.0.1]) by nagual.pp.ru (8.14.1/8.14.1) with ESMTP id l8LI2QQD038792; Fri, 21 Sep 2007 22:02:26 +0400 (MSD) (envelope-from ache@nagual.pp.ru) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nagual.pp.ru; s=default; t=1190397746; bh=5Od60kz9Fs4Njw3ZHpOwuCDDQCmJgFuonPN7Yva 51CE=; l=14788; h=Date:From:To:Cc:Subject:Message-ID: Mail-Followup-To:References:MIME-Version:Content-Type: Content-Disposition:In-Reply-To:User-Agent; b=EQ0+MQR4lC6Hg16/6cUh TVqM9xuEd2/aq9bgQ4EZPk650E5JO52EqHbXlpnw/UiuxkJdzyn2/TSYE4eIvWO/UUn AUY7hSjBKc0KLaJpvgrmurV4DIcysb/nyNQHkZ/JlmVQ+NOSO5xljiKDU5UyyUQsKiE m33OiA77tbfI9ONv8= Received: (from ache@localhost) by nagual.pp.ru (8.14.1/8.14.1/Submit) id l8LI2O5L038791; Fri, 21 Sep 2007 22:02:24 +0400 (MSD) (envelope-from ache) Date: Fri, 21 Sep 2007 22:02:23 +0400 From: Andrey Chernov To: Petr Hroudn?? Message-ID: <20070921180223.GA38675@nagual.pp.ru> Mail-Followup-To: Andrey Chernov , Petr Hroudn?? , Taku YAMAMOTO , current@freebsd.org, perky@freebsd.org, i18n@freebsd.org References: <20070917092130.GA24424@nagual.pp.ru> <20070918020100.d43beb0b.taku@tackymt.homeip.net> <20070917171633.GA31179@nagual.pp.ru> <20070919111207.f37653fc.taku@tackymt.homeip.net> <20070919022555.GA70617@nagual.pp.ru> <20070919023625.GA70891@nagual.pp.ru> <20070919051830.GA72429@nagual.pp.ru> <20070919121024.GA81606@nagual.pp.ru> <20070921024107.GA21223@nagual.pp.ru> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="LQksG6bCIzRHxTLp" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.16 (2007-06-09) Cc: Taku YAMAMOTO , i18n@freebsd.org, current@freebsd.org, perky@freebsd.org Subject: Re: Ctype patch for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 18:02:30 -0000 --LQksG6bCIzRHxTLp Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Fri, Sep 21, 2007 at 09:04:44AM +0200, Petr Hroudn?? wrote: > I believe your patch needs some adjustments for CJK charsets. You are setting > __mb_sch_limit to 256 for all multibyte locales except UTF-8, but I believe it > should be 128 also for Big5, GB18030, GBK. For GB2312 too. Thanx for pointing out, here is adjusted patch. And __mb_sch_limit renamed to _mb_sb_limit to better match functions prefix. -- http://ache.pp.ru/ --LQksG6bCIzRHxTLp Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="ctype.patch" --- Symbol.map.old 2007-09-19 22:37:21.000000000 +0400 +++ Symbol.map 2007-09-21 21:52:23.000000000 +0400 @@ -60,12 +60,17 @@ nextwctype; nl_langinfo; __maskrune; + __sbmaskrune; __istype; + __sbistype; __isctype; __toupper; + __sbtoupper; __tolower; + __sbtolower; __wcwidth; __mb_cur_max; + __mb_sb_limit; rpmatch; ___runetype; setlocale; --- _ctype.h.old 2007-09-16 21:13:59.000000000 +0400 +++ _ctype.h 2007-09-21 21:44:31.000000000 +0400 @@ -87,6 +87,8 @@ #define __inline #endif +extern int __mb_sb_limit; + /* * Use inline functions if we are allowed to and the compiler supports them. */ @@ -103,15 +105,28 @@ } static __inline int +__sbmaskrune(__ct_rune_t _c, unsigned long _f) +{ + return (_c < 0 || _c >= __mb_sb_limit) ? 0 : + _CurrentRuneLocale->__runetype[_c] & _f; +} + +static __inline int __istype(__ct_rune_t _c, unsigned long _f) { return (!!__maskrune(_c, _f)); } static __inline int +__sbistype(__ct_rune_t _c, unsigned long _f) +{ + return (!!__sbmasksrune(_c, _f)); +} + +static __inline int __isctype(__ct_rune_t _c, unsigned long _f) { - return (_c < 0 || _c >= _CACHED_RUNES) ? 0 : + return (_c < 0 || _c >= __mb_sb_limit) ? 0 : !!(_DefaultRuneLocale.__runetype[_c] & _f); } @@ -123,12 +138,26 @@ } static __inline __ct_rune_t +__sbtoupper(__ct_rune_t _c) +{ + return (_c < 0 || _c >= __mb_sb_limit) ? _c : + _CurrentRuneLocale->__mapupper[_c]; +} + +static __inline __ct_rune_t __tolower(__ct_rune_t _c) { return (_c < 0 || _c >= _CACHED_RUNES) ? ___tolower(_c) : _CurrentRuneLocale->__maplower[_c]; } +static __inline __ct_rune_t +__sbtolower(__ct_rune_t _c) +{ + return (_c < 0 || _c >= __mb_sb_limit) ? _c : + _CurrentRuneLocale->__maplower[_c]; +} + static __inline int __wcwidth(__ct_rune_t _c) { @@ -146,10 +175,14 @@ __BEGIN_DECLS int __maskrune(__ct_rune_t, unsigned long); +int __sbmaskrune(__ct_rune_t, unsigned long); int __istype(__ct_rune_t, unsigned long); +int __sbistype(__ct_rune_t, unsigned long); int __isctype(__ct_rune_t, unsigned long); __ct_rune_t __toupper(__ct_rune_t); +__ct_rune_t __sbtoupper(__ct_rune_t); __ct_rune_t __tolower(__ct_rune_t); +__ct_rune_t __sbtolower(__ct_rune_t); int __wcwidth(__ct_rune_t); __END_DECLS #endif /* using inlines */ --- big5.c.old 2007-09-19 08:48:55.000000000 +0400 +++ big5.c 2007-09-21 21:44:31.000000000 +0400 @@ -49,6 +49,8 @@ #include #include "mblocal.h" +extern int __mb_sb_limit; + static size_t _BIG5_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _BIG5_mbsinit(const mbstate_t *); @@ -68,6 +70,7 @@ __mbsinit = _BIG5_mbsinit; _CurrentRuneLocale = rl; __mb_cur_max = 2; + __mb_sb_limit = 128; return (0); } --- ctype.h.old 2007-09-16 22:03:55.000000000 +0400 +++ ctype.h 2007-09-21 06:26:26.000000000 +0400 @@ -86,19 +86,19 @@ #endif __END_DECLS -#define isalnum(c) __istype((c), _CTYPE_A|_CTYPE_D) -#define isalpha(c) __istype((c), _CTYPE_A) -#define iscntrl(c) __istype((c), _CTYPE_C) +#define isalnum(c) __sbistype((c), _CTYPE_A|_CTYPE_D) +#define isalpha(c) __sbistype((c), _CTYPE_A) +#define iscntrl(c) __sbistype((c), _CTYPE_C) #define isdigit(c) __isctype((c), _CTYPE_D) /* ANSI -- locale independent */ -#define isgraph(c) __istype((c), _CTYPE_G) -#define islower(c) __istype((c), _CTYPE_L) -#define isprint(c) __istype((c), _CTYPE_R) -#define ispunct(c) __istype((c), _CTYPE_P) -#define isspace(c) __istype((c), _CTYPE_S) -#define isupper(c) __istype((c), _CTYPE_U) +#define isgraph(c) __sbistype((c), _CTYPE_G) +#define islower(c) __sbistype((c), _CTYPE_L) +#define isprint(c) __sbistype((c), _CTYPE_R) +#define ispunct(c) __sbistype((c), _CTYPE_P) +#define isspace(c) __sbistype((c), _CTYPE_S) +#define isupper(c) __sbistype((c), _CTYPE_U) #define isxdigit(c) __isctype((c), _CTYPE_X) /* ANSI -- locale independent */ -#define tolower(c) __tolower(c) -#define toupper(c) __toupper(c) +#define tolower(c) __sbtolower(c) +#define toupper(c) __sbtoupper(c) #if __XSI_VISIBLE /* @@ -112,24 +112,24 @@ * * XXX isascii() and toascii() should similarly be undocumented. */ -#define _tolower(c) __tolower(c) -#define _toupper(c) __toupper(c) +#define _tolower(c) __sbtolower(c) +#define _toupper(c) __sbtoupper(c) #define isascii(c) (((c) & ~0x7F) == 0) #define toascii(c) ((c) & 0x7F) #endif #if __ISO_C_VISIBLE >= 1999 -#define isblank(c) __istype((c), _CTYPE_B) +#define isblank(c) __sbistype((c), _CTYPE_B) #endif #if __BSD_VISIBLE -#define digittoint(c) __maskrune((c), 0xFF) -#define ishexnumber(c) __istype((c), _CTYPE_X) -#define isideogram(c) __istype((c), _CTYPE_I) -#define isnumber(c) __istype((c), _CTYPE_D) -#define isphonogram(c) __istype((c), _CTYPE_Q) -#define isrune(c) __istype((c), 0xFFFFFF00L) -#define isspecial(c) __istype((c), _CTYPE_T) +#define digittoint(c) __sbmaskrune((c), 0xFF) +#define ishexnumber(c) __sbistype((c), _CTYPE_X) +#define isideogram(c) __sbistype((c), _CTYPE_I) +#define isnumber(c) __sbistype((c), _CTYPE_D) +#define isphonogram(c) __sbistype((c), _CTYPE_Q) +#define isrune(c) __sbistype((c), 0xFFFFFF00L) +#define isspecial(c) __sbistype((c), _CTYPE_T) #endif #endif /* !_CTYPE_H_ */ --- euc.c.old 2007-09-19 08:50:57.000000000 +0400 +++ euc.c 2007-09-21 21:44:31.000000000 +0400 @@ -49,6 +49,8 @@ #include #include "mblocal.h" +extern int __mb_sb_limit; + static size_t _EUC_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _EUC_mbsinit(const mbstate_t *); @@ -116,6 +118,7 @@ __mbrtowc = _EUC_mbrtowc; __wcrtomb = _EUC_wcrtomb; __mbsinit = _EUC_mbsinit; + __mb_sb_limit = 256; return (0); } --- gb18030.c.old 2007-09-19 08:59:01.000000000 +0400 +++ gb18030.c 2007-09-21 21:44:31.000000000 +0400 @@ -39,6 +39,8 @@ #include #include "mblocal.h" +extern int __mb_sb_limit; + static size_t _GB18030_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _GB18030_mbsinit(const mbstate_t *); @@ -59,6 +61,7 @@ __mbsinit = _GB18030_mbsinit; _CurrentRuneLocale = rl; __mb_cur_max = 4; + __mb_sb_limit = 128; return (0); } --- gb2312.c.old 2007-09-19 09:00:35.000000000 +0400 +++ gb2312.c 2007-09-21 21:44:31.000000000 +0400 @@ -35,6 +35,8 @@ #include #include "mblocal.h" +extern int __mb_sb_limit; + static size_t _GB2312_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _GB2312_mbsinit(const mbstate_t *); @@ -55,6 +57,7 @@ __wcrtomb = _GB2312_wcrtomb; __mbsinit = _GB2312_mbsinit; __mb_cur_max = 2; + __mb_sb_limit = 128; return (0); } --- gbk.c.old 2007-09-19 09:01:33.000000000 +0400 +++ gbk.c 2007-09-21 21:44:31.000000000 +0400 @@ -42,6 +42,8 @@ #include #include "mblocal.h" +extern int __mb_sb_limit; + static size_t _GBK_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _GBK_mbsinit(const mbstate_t *); @@ -61,6 +63,7 @@ __mbsinit = _GBK_mbsinit; _CurrentRuneLocale = rl; __mb_cur_max = 2; + __mb_sb_limit = 128; return (0); } --- isctype.c.old 2007-09-16 22:31:26.000000000 +0400 +++ isctype.c 2007-09-21 06:28:30.000000000 +0400 @@ -48,7 +48,7 @@ digittoint(c) int c; { - return (__maskrune(c, 0xFF)); + return (__sbmaskrune(c, 0xFF)); } #undef isalnum @@ -56,7 +56,7 @@ isalnum(c) int c; { - return (__istype(c, _CTYPE_A|_CTYPE_D)); + return (__sbistype(c, _CTYPE_A|_CTYPE_D)); } #undef isalpha @@ -64,7 +64,7 @@ isalpha(c) int c; { - return (__istype(c, _CTYPE_A)); + return (__sbistype(c, _CTYPE_A)); } #undef isascii @@ -80,7 +80,7 @@ isblank(c) int c; { - return (__istype(c, _CTYPE_B)); + return (__sbistype(c, _CTYPE_B)); } #undef iscntrl @@ -88,7 +88,7 @@ iscntrl(c) int c; { - return (__istype(c, _CTYPE_C)); + return (__sbistype(c, _CTYPE_C)); } #undef isdigit @@ -104,7 +104,7 @@ isgraph(c) int c; { - return (__istype(c, _CTYPE_G)); + return (__sbistype(c, _CTYPE_G)); } #undef ishexnumber @@ -112,7 +112,7 @@ ishexnumber(c) int c; { - return (__istype(c, _CTYPE_X)); + return (__sbistype(c, _CTYPE_X)); } #undef isideogram @@ -120,7 +120,7 @@ isideogram(c) int c; { - return (__istype(c, _CTYPE_I)); + return (__sbistype(c, _CTYPE_I)); } #undef islower @@ -128,7 +128,7 @@ islower(c) int c; { - return (__istype(c, _CTYPE_L)); + return (__sbistype(c, _CTYPE_L)); } #undef isnumber @@ -136,7 +136,7 @@ isnumber(c) int c; { - return (__istype(c, _CTYPE_D)); + return (__sbistype(c, _CTYPE_D)); } #undef isphonogram @@ -144,7 +144,7 @@ isphonogram(c) int c; { - return (__istype(c, _CTYPE_Q)); + return (__sbistype(c, _CTYPE_Q)); } #undef isprint @@ -152,7 +152,7 @@ isprint(c) int c; { - return (__istype(c, _CTYPE_R)); + return (__sbistype(c, _CTYPE_R)); } #undef ispunct @@ -160,7 +160,7 @@ ispunct(c) int c; { - return (__istype(c, _CTYPE_P)); + return (__sbistype(c, _CTYPE_P)); } #undef isrune @@ -168,7 +168,7 @@ isrune(c) int c; { - return (__istype(c, 0xFFFFFF00L)); + return (__sbistype(c, 0xFFFFFF00L)); } #undef isspace @@ -176,7 +176,7 @@ isspace(c) int c; { - return (__istype(c, _CTYPE_S)); + return (__sbistype(c, _CTYPE_S)); } #undef isspecial @@ -184,7 +184,7 @@ isspecial(c) int c; { - return (__istype(c, _CTYPE_T)); + return (__sbistype(c, _CTYPE_T)); } #undef isupper @@ -192,7 +192,7 @@ isupper(c) int c; { - return (__istype(c, _CTYPE_U)); + return (__sbistype(c, _CTYPE_U)); } #undef isxdigit @@ -216,7 +216,7 @@ tolower(c) int c; { - return (__tolower(c)); + return (__sbtolower(c)); } #undef toupper @@ -224,6 +224,6 @@ toupper(c) int c; { - return (__toupper(c)); + return (__sbtoupper(c)); } --- iswctype.c.old 2007-09-16 22:31:30.000000000 +0400 +++ iswctype.c 2007-09-21 06:29:59.000000000 +0400 @@ -61,7 +61,7 @@ iswascii(wc) wint_t wc; { - return ((wc & ~0x7F) == 0); + return (wc < 0x80); } #undef iswblank --- mskanji.c.old 2007-09-19 09:02:56.000000000 +0400 +++ mskanji.c 2007-09-21 21:44:31.000000000 +0400 @@ -47,6 +47,8 @@ #include #include "mblocal.h" +extern int __mb_sb_limit; + static size_t _MSKanji_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _MSKanji_mbsinit(const mbstate_t *); @@ -66,6 +68,7 @@ __mbsinit = _MSKanji_mbsinit; _CurrentRuneLocale = rl; __mb_cur_max = 2; + __mb_sb_limit = 256; return (0); } --- none.c.old 2007-09-19 08:56:40.000000000 +0400 +++ none.c 2007-09-21 21:44:31.000000000 +0400 @@ -58,6 +58,11 @@ static size_t _none_wcsnrtombs(char * __restrict, const wchar_t ** __restrict, size_t, size_t, mbstate_t * __restrict); +/* setup defaults */ + +int __mb_cur_max = 1; +int __mb_sb_limit = 256; /* Expected to be <= _CACHED_RUNES */ + int _none_init(_RuneLocale *rl) { @@ -69,6 +74,7 @@ __wcsnrtombs = _none_wcsnrtombs; _CurrentRuneLocale = rl; __mb_cur_max = 1; + __mb_sb_limit = 256; return(0); } @@ -176,7 +182,6 @@ /* setup defaults */ -int __mb_cur_max = 1; size_t (*__mbrtowc)(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict) = _none_mbrtowc; int (*__mbsinit)(const mbstate_t *) = _none_mbsinit; --- setrunelocale.c.old 2007-09-19 09:03:59.000000000 +0400 +++ setrunelocale.c 2007-09-21 21:44:31.000000000 +0400 @@ -45,6 +45,8 @@ #include "mblocal.h" #include "setlocale.h" +extern int __mb_sb_limit; + extern _RuneLocale *_Read_RuneMagi(FILE *); static int __setrunelocale(const char *); @@ -59,6 +61,7 @@ static char ctype_encoding[ENCODING_LEN + 1]; static _RuneLocale *CachedRuneLocale; static int Cached__mb_cur_max; + static int Cached__mb_sb_limit; static size_t (*Cached__mbrtowc)(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static size_t (*Cached__wcrtomb)(char * __restrict, wchar_t, @@ -85,6 +88,7 @@ strcmp(encoding, ctype_encoding) == 0) { _CurrentRuneLocale = CachedRuneLocale; __mb_cur_max = Cached__mb_cur_max; + __mb_sb_limit = Cached__mb_sb_limit; __mbrtowc = Cached__mbrtowc; __mbsinit = Cached__mbsinit; __mbsnrtowcs = Cached__mbsnrtowcs; @@ -147,6 +151,7 @@ } CachedRuneLocale = _CurrentRuneLocale; Cached__mb_cur_max = __mb_cur_max; + Cached__mb_sb_limit = __mb_sb_limit; Cached__mbrtowc = __mbrtowc; Cached__mbsinit = __mbsinit; Cached__mbsnrtowcs = __mbsnrtowcs; --- utf8.c.old 2007-09-19 08:18:40.000000000 +0400 +++ utf8.c 2007-09-21 21:44:31.000000000 +0400 @@ -35,6 +35,8 @@ #include #include "mblocal.h" +extern int __mb_sb_limit; + static size_t _UTF8_mbrtowc(wchar_t * __restrict, const char * __restrict, size_t, mbstate_t * __restrict); static int _UTF8_mbsinit(const mbstate_t *); @@ -63,6 +65,7 @@ __wcsnrtombs = _UTF8_wcsnrtombs; _CurrentRuneLocale = rl; __mb_cur_max = 6; + __mb_sb_limit = 128; return (0); } --- wctype.h.old 2007-09-16 21:59:37.000000000 +0400 +++ wctype.h 2007-09-21 06:08:40.000000000 +0400 @@ -106,7 +106,7 @@ #define towupper(wc) __toupper(wc) #if __BSD_VISIBLE -#define iswascii(wc) (((wc) & ~0x7F) == 0) +#define iswascii(wc) ((wc) < 0x80) #define iswhexnumber(wc) __istype((wc), _CTYPE_X) #define iswideogram(wc) __istype((wc), _CTYPE_I) #define iswnumber(wc) __istype((wc), _CTYPE_D) --LQksG6bCIzRHxTLp--