From owner-freebsd-current@freebsd.org Thu May 3 18:11:28 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 60854FB4641 for ; Thu, 3 May 2018 18:11:28 +0000 (UTC) (envelope-from kevans@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "smtp.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 347A380B42; Thu, 3 May 2018 18:11:28 +0000 (UTC) (envelope-from kevans@freebsd.org) Received: from mail-lf0-f48.google.com (mail-lf0-f48.google.com [209.85.215.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) (Authenticated sender: kevans) by smtp.freebsd.org (Postfix) with ESMTPSA id E2BC115A3B; Thu, 3 May 2018 18:11:27 +0000 (UTC) (envelope-from kevans@freebsd.org) Received: by mail-lf0-f48.google.com with SMTP id j193-v6so27319861lfg.6; Thu, 03 May 2018 11:11:27 -0700 (PDT) X-Gm-Message-State: ALQs6tBd7NVBN0GuXXBSAT3jMdd0OsVeRzzHYPmYekzFyiqF7Se3T0RD jGlS/zPHz6aSZzZQzvy9Uo0np9TLKZkGT+mcWaw= X-Google-Smtp-Source: AB8JxZoKG7TZWUF+rUlVlr7OZBjN3QeqXupnf0pBg61DvwJk3uppkuimuMtpcpxHDtbwd8vpoj9uvTKuyMr+y8+wa60= X-Received: by 2002:a2e:7113:: with SMTP id m19-v6mr17595925ljc.44.1525371086359; Thu, 03 May 2018 11:11:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.46.49.18 with HTTP; Thu, 3 May 2018 11:11:05 -0700 (PDT) In-Reply-To: <2324e7f9-e691-00ba-d45f-c392d2889416@freebsd.org> References: <08d32caa-aa44-cff7-d09c-af2444674958@freebsd.org> <2324e7f9-e691-00ba-d45f-c392d2889416@freebsd.org> From: Kyle Evans Date: Thu, 3 May 2018 13:11:05 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: grep extremely slow for LC_CTYPE=C? [SOLVED] To: Stefan Esser Cc: FreeBSD Current Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 May 2018 18:11:28 -0000 On Thu, May 3, 2018 at 12:54 PM, Stefan Esser wrote: > Am 03.05.18 um 17:28 schrieb Kyle Evans: >> On Thu, May 3, 2018 at 10:19 AM, Stefan Esser wrote: >>> Am 03.05.18 um 16:41 schrieb Kyle Evans: >>>> Hmm... what does `grep -V` look like, just to confirm? >>> >>> Ah, yes, good point ... >>> >>> $ which grep >>> /usr/bin/grep >>> >>> $ grep -V >>> grep (GNU grep) 2.5.1-FreeBSD >>> >>> Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc. >>> This is free software; see the source for copying conditions. There is NO >>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. >>> >>> So, it seems I have to complain somewhere else about this behavior ... >> >> Eh, no worries there. Newer GNU grep sucks less, and we're going to >> replace it Real Soon Now (TM). > > Thank you very much - your reply was really helpful! > > I just tested with GNU grep 2.27 (the current port version) and it does not > show the extreme slowness of the old version in FreeBSD, but is still more > than 10 times slower than BSD grep on my test data. > This is good. =) We tend to be slower in most areas, so any win is a good one. >>> But I have (for a long time) in my /etc/src.conf: >>> >>> WITH_BSDGREP= yes >>> WITH_BSD_GREP_FASTMATCH= yes >>> WITHOUT_GNU_GREP_COMPAT= yes >>> >>> And before seeing the grep -V output, I was convinced that I had been using >>> BSD grep (i.e. that it replaced GNU grep with above options) by default ... >>> >>> But now I see that I need to invoke bsdgrep under that name. It is very fast, >>> but does not give the expected (correct?) result, which is the single line >>> that is not suppressed by the pattern match ... >> >> This is actually because you've typo'd WITH_BSD_GREP. =) WITH_BSD_GREP >> will replace /usr/bin/grep with bsdgrep and put GNU grep at >> /usr/bin/gnugrep. > > Yes, that was what I had expected, and I had correctly spelled WITH_BSD_PATCH, > but never bother to check that I got the "grep" I wanted ... > >> I also recommend using WITHOUT_BSD_GREP_FASTMATCH / not using >> WITH_BSD_GREP_FASTMATCH. See below response. > > It is so much faster than GNU grep on this use-case anyway ;-) > > $ sh grep-test.sh > All/mpfr-3.1.7.tgz > 0.14 real 0.13 user 0.00 sys > All/mpfr-3.1.7.tgz > 0.13 real 0.13 user 0.00 sys > > This is a factor 30 to 40 better than with our GNU grep (for the UTF-8 case, > where it finishes in finite time, orders of magnitude faster for LANG=C ;-) ). > > And yes, FASTMATCH was responsible for the erroneous result in my previous > tests with BSD grep. Now that I have rebuild it without that option, it works > perfectly for me :) Also good to hear! >> BSD_GREP_FASTMATCH is best left off (default on HEAD)- it was disabled >> because the version of tre ("fastmatch") that bsdgrep uses is buggy >> and I don't want to invest the time to fix it. The performance of the >> version we use isn't any better than our libc regex(3), so I made the >> decision to switch it to that and focus efforts on optimizing our >> general regex implementation instead. > > A decision I can well understand and sympathize with. > > How about removing the BSD_GREP_FASTMATCH option, then? Right- I've been meaning to find time to rip it all out. I'll see if I can harvest some spare time from the weekend to make it happen. >> I have plans to replace our libc regex(3) with Onigmo [1], which is at >> least twice as fast as what we have and comes with all kinds of other >> extensions- GNU extensions will be exposed via libregex, and I also >> plan to install Onigmo on its own so that others can use that with its >> own interface. The difference between it and libregex will be that >> libregex exposes a regex(3) interface for using extensions with an >> option to go REG_POSIX. >> >> [1] https://github.com/k-takata/Onigmo > > Great plan! But for now BSD grep seems well up to the task and my only > problem is now, that I need to support stable releases that use (and will > stay with) the old GNU grep, so I'll need to keep the work-around (or > perhaps depend on the port version?). I do recommend pulling in textproc/gnugrep if you can. GNU grep in base has bugs that are likely going to stay unless someone (that isn't me =)) wants to take up the task of maintaining an older version of GNU Grep that's going to be disappearing from head. Newer versions have a lot more sensible behavior than what we have in base. > Thanks again! > > Best regards, STefan