Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 02 Jun 2021 20:19:55 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 223532] GNU egrep -i is terrible slow if utf-8 locale is enabled
Message-ID:  <bug-223532-227-jXazx17h1Y@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-223532-227@https.bugs.freebsd.org/bugzilla/>
References:  <bug-223532-227@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D223532

--- Comment #8 from Stefan E=C3=9Fer <se@FreeBSD.org> ---
(In reply to Helge Oldach from comment #5)

My comment #4 referred to the commengt #3, which used BSD fgrep (despite the
title of the PR referring to GNU egrep).

I have first compared fgrep with C or UTF-8 locale and found they had about=
 the
same performance.

Adding -i in the UTF-8 case increased the run time from 0.03 seconds to 4.47
seconds (or by a factor of more than 100). With LANG=3DC the run time is 3.=
36
seconds, BTW.

The patch that I have attached speeds this case up to 0.09 seconds by using=
 an
internal function instead of the regex library.

fgrep-FBSD meant fgrep-ORIG (sorry for the confusion). This is the binary as
built in -CURRENT without the patch.

WITH_INTERNAL_NOSPEC is not documented, except for by a comment in the sour=
ces
(in util.c) which explains that this option exists for systems that lack
REG_NOSPEC or REG_LITERAL and specifically mentions libgnuregex.

In fact, this function has a bit more overhead than necessary. An optimized
variant of the strcsasestr_l() function could be inlined in util.c, but I d=
id
not try to measure the performance difference. (The optimization would cache
the locale instead of calling __getlocale() and FIX_LOCALE for each invocat=
ion
of strcasestr().)

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-223532-227-jXazx17h1Y>