From owner-freebsd-bugs@FreeBSD.ORG Sat Sep 4 12:00:48 2004 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C9D9216A4D0 for ; Sat, 4 Sep 2004 12:00:44 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id C233B43D49 for ; Sat, 4 Sep 2004 12:00:44 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.12.11/8.12.11) with ESMTP id i84C0iGC095864 for ; Sat, 4 Sep 2004 12:00:44 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.12.11/8.12.11/Submit) id i84C0ixg095863; Sat, 4 Sep 2004 12:00:44 GMT (envelope-from gnats) Date: Sat, 4 Sep 2004 12:00:44 GMT Message-Id: <200409041200.i84C0ixg095863@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: Kuang-che Wu Subject: Re: bin/71367: regex multibyte support is really slow X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Kuang-che Wu List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Sep 2004 12:00:49 -0000 The following reply was made to PR bin/71367; it has been noted by GNATS. From: Kuang-che Wu To: freebsd-gnats-submit@freebsd.org Cc: Subject: Re: bin/71367: regex multibyte support is really slow Date: Sat, 4 Sep 2004 20:00:49 +0800 On Sat, Sep 04, 2004 at 09:36:16PM +1000, Tim Robbins wrote: > On Sat, Sep 04, 2004 at 01:21:22PM +0200, Simon L. Nielsen wrote: > Do you have any non-standard options in /etc/make.conf? Have you changed > the C library at all locally? Can you confirm that the system you ran > this on was idle? The system is idle and without C library changed. The only related option in /etc/make.conf is COMPAT4X=yes. > Could you please try this patch? I test my following program, without the patch: case 0: 0.000000s case 1: 7.390625s case 2: (matched)0.000000s case 3: 0.000000s case 4: (matched)0.125000s case 5: 0.000000s case 6: 7.398438s case 7: 0.000000s case 8: 0.000000s with the patch: case 0: 0.000000s case 1: 0.000000s case 2: (matched)0.000000s case 3: 0.000000s case 4: (matched)0.000000s case 5: 0.000000s case 6: 0.000000s case 7: 0.000000s case 8: 0.000000s -------------------------- #include #include #include #include #define EN "blah" char en[1024]= EN EN EN EN EN EN EN EN EN EN; #define XX "!@#$" char xx[1024]= XX XX XX XX XX XX XX XX XX XX; char utf8[1024]={ #define U8 0xe6,0x85,0xa2 // UTF-8 character U8, U8, U8, U8, U8, U8, U8, U8, U8, U8, U8, U8, U8, U8, U8, U8, U8, U8, U8, U8, 0 }; char big5[1024]={ #define B5 0xa6,0x72 // Big5 character B5, B5, B5, B5, B5, B5, B5, B5, B5, B5, B5, B5, B5, B5, B5, B5, B5, B5, B5, B5, 0 }; struct T { char *locale,*pattern,*text; int flag; } test[]={ { "C", "[[:alnum:]]", utf8, REG_EXTENDED|REG_ICASE }, { "zh_TW.UTF-8", "[[:alnum:]]", utf8, REG_EXTENDED|REG_ICASE }, { "zh_TW.UTF-8", "[[:alnum:]]", en, REG_EXTENDED|REG_ICASE }, { "zh_TW.UTF-8", "[[:alnum:]]", xx, REG_EXTENDED|REG_ICASE }, { "zh_TW.UTF-8", "[^[:alnum:]]", utf8, REG_EXTENDED|REG_ICASE }, { "zh_TW.Big5", "[[:alnum:]]", big5, REG_EXTENDED|REG_ICASE }, { "en_US.UTF-8", "[[:alnum:]]", utf8, REG_EXTENDED|REG_ICASE }, { "en_US.UTF-8", "[A-Za-z0-9]", utf8, REG_ICASE }, { "en_US.UTF-8", "[[:alnum:]]", utf8, REG_EXTENDED }, }; int main(void) { int i; clock_t st; regex_t re; for(i=0; test[i].locale; i++) { printf("case %d: ",i); if(setlocale(LC_CTYPE,test[i].locale)==NULL) return 1; if(regcomp(&re,test[i].pattern,test[i].flag)!=0) return 2; st=clock(); if(regexec(&re,test[i].text,0,NULL,0)==0) printf("(matched)"); printf("%fs\n",(double)(clock()-st)/CLOCKS_PER_SEC); } return 0; }