Date: Sat, 4 Sep 2004 17:39:33 +0800 (CST) From: Kuang-che Wu <kcwu@csie.org> To: FreeBSD-gnats-submit@FreeBSD.org Subject: bin/71367: regex multibyte support is really slow Message-ID: <200409040939.i849dXYC096862@kcwu.homeip.net> Resent-Message-ID: <200409040940.i849eAqb081884@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 71367
>Category: bin
>Synopsis: regex multibyte support is really slow
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Sat Sep 04 09:40:09 GMT 2004
>Closed-Date:
>Last-Modified:
>Originator: Kuang-che Wu
>Release: FreeBSD 6.0-CURRENT i386
>Organization:
>Environment:
System: FreeBSD kcwu.homeip.net 6.0-CURRENT FreeBSD 6.0-CURRENT #0: Sat Sep 4 05:33:38 CST 2004 root@kcwu.homeip.net:/usr/obj/usr/src/sys/DESKTOP i386
CPU: AMD Athlon(tm) XP 2000+ (1665.59-MHz 686-class CPU)
>Description:
regex in UTF-8 locale
+ flag REG_EXTENDED|REG_ICASE
+ pattern [[:alnum:]]
= unacceptable slow
>How-To-Repeat:
$ cc -O -pipe re.c -o re
$ time ./re
7.65 real 7.51 user 0.06 sys
#include <stdio.h>
#include <locale.h>
#include <regex.h>
int main(void)
{
regex_t re;
char string[1024]={
#define WORD 0xe6,0x85,0xa2 /* UTF-8 character */
WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD,
WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD,
0
};
if(setlocale(LC_CTYPE,"zh_TW.UTF-8")==NULL)
return 1;
if(regcomp(&re,"[[:alnum:]]",REG_EXTENDED|REG_ICASE)!=0)
return 2;
if(regexec(&re,string,0,NULL,0)==0)
printf("matched\n");
return 0;
}
>Fix:
>Release-Note:
>Audit-Trail:
>Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200409040939.i849dXYC096862>
