Date: Sat, 4 Sep 2004 17:39:33 +0800 (CST) From: Kuang-che Wu <kcwu@csie.org> To: FreeBSD-gnats-submit@FreeBSD.org Subject: bin/71367: regex multibyte support is really slow Message-ID: <200409040939.i849dXYC096862@kcwu.homeip.net> Resent-Message-ID: <200409040940.i849eAqb081884@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 71367 >Category: bin >Synopsis: regex multibyte support is really slow >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sat Sep 04 09:40:09 GMT 2004 >Closed-Date: >Last-Modified: >Originator: Kuang-che Wu >Release: FreeBSD 6.0-CURRENT i386 >Organization: >Environment: System: FreeBSD kcwu.homeip.net 6.0-CURRENT FreeBSD 6.0-CURRENT #0: Sat Sep 4 05:33:38 CST 2004 root@kcwu.homeip.net:/usr/obj/usr/src/sys/DESKTOP i386 CPU: AMD Athlon(tm) XP 2000+ (1665.59-MHz 686-class CPU) >Description: regex in UTF-8 locale + flag REG_EXTENDED|REG_ICASE + pattern [[:alnum:]] = unacceptable slow >How-To-Repeat: $ cc -O -pipe re.c -o re $ time ./re 7.65 real 7.51 user 0.06 sys #include <stdio.h> #include <locale.h> #include <regex.h> int main(void) { regex_t re; char string[1024]={ #define WORD 0xe6,0x85,0xa2 /* UTF-8 character */ WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, 0 }; if(setlocale(LC_CTYPE,"zh_TW.UTF-8")==NULL) return 1; if(regcomp(&re,"[[:alnum:]]",REG_EXTENDED|REG_ICASE)!=0) return 2; if(regexec(&re,string,0,NULL,0)==0) printf("matched\n"); return 0; } >Fix: >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200409040939.i849dXYC096862>