Date: Wed, 02 Jun 2021 20:19:55 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 223532] GNU egrep -i is terrible slow if utf-8 locale is enabled Message-ID: <bug-223532-227-jXazx17h1Y@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-223532-227@https.bugs.freebsd.org/bugzilla/>
index | next in thread | previous in thread | raw e-mail
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223532 --- Comment #8 from Stefan Eßer <se@FreeBSD.org> --- (In reply to Helge Oldach from comment #5) My comment #4 referred to the commengt #3, which used BSD fgrep (despite the title of the PR referring to GNU egrep). I have first compared fgrep with C or UTF-8 locale and found they had about the same performance. Adding -i in the UTF-8 case increased the run time from 0.03 seconds to 4.47 seconds (or by a factor of more than 100). With LANG=C the run time is 3.36 seconds, BTW. The patch that I have attached speeds this case up to 0.09 seconds by using an internal function instead of the regex library. fgrep-FBSD meant fgrep-ORIG (sorry for the confusion). This is the binary as built in -CURRENT without the patch. WITH_INTERNAL_NOSPEC is not documented, except for by a comment in the sources (in util.c) which explains that this option exists for systems that lack REG_NOSPEC or REG_LITERAL and specifically mentions libgnuregex. In fact, this function has a bit more overhead than necessary. An optimized variant of the strcsasestr_l() function could be inlined in util.c, but I did not try to measure the performance difference. (The optimization would cache the locale instead of calling __getlocale() and FIX_LOCALE for each invocation of strcasestr().) -- You are receiving this mail because: You are the assignee for the bug.home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-223532-227-jXazx17h1Y>
