Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 4 Sep 2004 17:39:33 +0800 (CST)
From:      Kuang-che Wu <kcwu@csie.org>
To:        FreeBSD-gnats-submit@FreeBSD.org
Subject:   bin/71367: regex multibyte support is really slow
Message-ID:  <200409040939.i849dXYC096862@kcwu.homeip.net>
Resent-Message-ID: <200409040940.i849eAqb081884@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         71367
>Category:       bin
>Synopsis:       regex multibyte support is really slow
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Sep 04 09:40:09 GMT 2004
>Closed-Date:
>Last-Modified:
>Originator:     Kuang-che Wu
>Release:        FreeBSD 6.0-CURRENT i386
>Organization:
>Environment:
System: FreeBSD kcwu.homeip.net 6.0-CURRENT FreeBSD 6.0-CURRENT #0: Sat Sep 4 05:33:38 CST 2004 root@kcwu.homeip.net:/usr/obj/usr/src/sys/DESKTOP i386

CPU: AMD Athlon(tm) XP 2000+ (1665.59-MHz 686-class CPU)

	
>Description:
	regex in UTF-8 locale
	+ flag REG_EXTENDED|REG_ICASE
	+ pattern [[:alnum:]]
	= unacceptable slow
	
>How-To-Repeat:
	$ cc -O -pipe   re.c  -o re
	$ time ./re
	        7.65 real         7.51 user         0.06 sys

#include <stdio.h>
#include <locale.h>
#include <regex.h>

int main(void)
{
  regex_t re;
  char string[1024]={
#define WORD 0xe6,0x85,0xa2 /* UTF-8 character */
    WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD,
    WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD, WORD,
    0
  };

  if(setlocale(LC_CTYPE,"zh_TW.UTF-8")==NULL)
    return 1;

  if(regcomp(&re,"[[:alnum:]]",REG_EXTENDED|REG_ICASE)!=0)
    return 2;
  if(regexec(&re,string,0,NULL,0)==0)
    printf("matched\n");

  return 0;
}
	
>Fix:

	


>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200409040939.i849dXYC096862>