Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 08 Nov 2017 12:59:43 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 223532] egrep -i is terrible slow if utf-8 locale is enabled
Message-ID:  <bug-223532-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D223532

            Bug ID: 223532
           Summary: egrep -i is terrible slow if utf-8 locale is enabled
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: bin
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: wosch@FreeBSD.org

egrep -i is terrible slow if the locale is set to utf-8. In fact, it is 77
times slower then a case sensitive search.


How to repeat:

First, we create a 100MB text file:
for i in $(seq 1 20);do man tcsh;done > /tmp/tcsh20;
for i in $(seq 1 20); do cat /tmp/tcsh20;done > /tmp/tcsh400

$ du -hs /tmp/tcsh400
 99M    /tmp/tcsh400


# case sensitive search with utf-8
LANG=3Den_CA.UTF-8 time egrep  -c foobar /tmp/tcsh400
0
        0.11 real         0.06 user         0.04 sys


# case in-sensitive search with utf-8, terrible slow
LANG=3Den_CA.UTF-8 time egrep  -ic  foobar /tmp/tcsh400
0
        8.47 real         8.42 user         0.04 sys


# case sensitive search with ASCII
LANG=3DC time egrep  -c  foobar /tmp/tcsh400
0
        0.10 real         0.06 user         0.03 sys


# case in-sensitive search with ASCII
LANG=3DC time egrep  -ic foobar /tmp/tcsh400
0
        0.10 real         0.07 user         0.03 sys

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-223532-8>