Date: Wed, 02 Jun 2021 20:19:55 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 223532] GNU egrep -i is terrible slow if utf-8 locale is enabled Message-ID: <bug-223532-227-jXazx17h1Y@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-223532-227@https.bugs.freebsd.org/bugzilla/> References: <bug-223532-227@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D223532 --- Comment #8 from Stefan E=C3=9Fer <se@FreeBSD.org> --- (In reply to Helge Oldach from comment #5) My comment #4 referred to the commengt #3, which used BSD fgrep (despite the title of the PR referring to GNU egrep). I have first compared fgrep with C or UTF-8 locale and found they had about= the same performance. Adding -i in the UTF-8 case increased the run time from 0.03 seconds to 4.47 seconds (or by a factor of more than 100). With LANG=3DC the run time is 3.= 36 seconds, BTW. The patch that I have attached speeds this case up to 0.09 seconds by using= an internal function instead of the regex library. fgrep-FBSD meant fgrep-ORIG (sorry for the confusion). This is the binary as built in -CURRENT without the patch. WITH_INTERNAL_NOSPEC is not documented, except for by a comment in the sour= ces (in util.c) which explains that this option exists for systems that lack REG_NOSPEC or REG_LITERAL and specifically mentions libgnuregex. In fact, this function has a bit more overhead than necessary. An optimized variant of the strcsasestr_l() function could be inlined in util.c, but I d= id not try to measure the performance difference. (The optimization would cache the locale instead of calling __getlocale() and FIX_LOCALE for each invocat= ion of strcasestr().) --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-223532-227-jXazx17h1Y>