From owner-freebsd-current@freebsd.org Thu May 3 15:19:49 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6EC40FAE35D for ; Thu, 3 May 2018 15:19:49 +0000 (UTC) (envelope-from se@freebsd.org) Received: from mailout12.t-online.de (mailout12.t-online.de [194.25.134.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mailout00.t-online.de", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E4ED17A927; Thu, 3 May 2018 15:19:48 +0000 (UTC) (envelope-from se@freebsd.org) Received: from fwd32.aul.t-online.de (fwd32.aul.t-online.de [172.20.26.144]) by mailout12.t-online.de (Postfix) with SMTP id 964D841C2885; Thu, 3 May 2018 17:19:41 +0200 (CEST) Received: from Stefans-MBP-LAN.fritz.box (SyLEuiZCZhFrk2-4zza7OTk6OCT7aEvgW1fFjMpVSLC5ZtphoVUzmrmDh2CR9prZvS@[84.154.116.170]) by fwd32.t-online.de with (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384 encrypted) esmtp id 1fEG19-3fzk8G0; Thu, 3 May 2018 17:19:35 +0200 Subject: Re: grep extremely slow for LC_CTYPE=C? To: Kyle Evans References: <08d32caa-aa44-cff7-d09c-af2444674958@freebsd.org> From: Stefan Esser Openpgp: preference=signencrypt Autocrypt: addr=se@freebsd.org; prefer-encrypt=mutual; keydata= xsBNBFVxiRIBCADOLNOZBsqlplHUQ3tG782FNtVT33rQli9EjNt2fhFERHIo4NxHlWBpHLnU b0s4L/eItx7au0i7Gegv01A9LUMwOnAc9EFAm4EW3Wmoa6MYrcP7xDClohg/Y69f7SNpEs3x YATBy+L6NzWZbJjZXD4vqPgZSDuMcLU7BEdJf0f+6h1BJPnGuwHpsSdnnMrZeIM8xQ8PPUVQ L0GZkVojHgNUngJH6e21qDrud0BkdiBcij0M3TCP4GQrJ/YMdurfc8mhueLpwGR2U1W8TYB7 4UY+NLw0McThOCLCxXflIeF/Y7jSB0zxzvb/H3LWkodUTkV57yX9IbUAGA5RKRg9zsUtABEB AAHNLlN0ZWZhbiBFw59lciAoVC1PbmxpbmUpIDxzdC5lc3NlckB0LW9ubGluZS5kZT7CwH8E EwEIACkFAlhtTvQCGwMFCQWjmoAHCwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRBH67Xv Wv31RAn0B/9skuajrZxjtCiaOFeJw9l8qEOSNF6PKMN2i/wosqNK57yRQ9AS18x4+mJKXQtc mwyejjQTO9wasBcniKMYyUiie3p7iGuFR4kSqi4xG7dXKjMkYvArWH5DxeWBrVf94yPDexEV FnEG9t1sIXjL17iFR8ng5Kkya5yGWWmikmPdtZChj9OUq4NKHKR7/HGM2dxP3I7BheOwY9PF 4mhqVN2Hu1ZpbzzJo68N8GGBmpQNmahnTsLQ97lsirbnPWyMviWcbzfBCocI9IlepwTCqzlN FMctBpLYjpgBwHZVGXKucU+eQ/FAm+6NWatcs7fpGr7dN99S8gVxnCFX1Lzp/T1YzsBNBFVx iRIBCACxI/aglzGVbnI6XHd0MTP05VK/fJub4hHdc+LQpz1MkVnCAhFbY9oecTB/togdKtfi loavjbFrb0nJhJnx57K+3SdSuu+znaQ4SlWiZOtXnkbpRWNUeMm+gtTDMSvloGAfr76RtFHs kdDOLgXsHD70bKuMhlBxUCrSwGzHaD00q8iQPhJZ5itb3WPqz3B4IjiDAWTO2obD1wtAvSuH uUj/XJRsiKDKW3x13cfavkad81bZW4cpNwUv8XHLv/vaZPSAly+hkY7NrDZydMMXVNQ7AJQu fWuTJ0q7sImRcEZ5EIa98esJPey4O7C0vY405wjeyxpVZkpqThDMurqtQFn1ABEBAAHCwGUE GAEKAA8FAlVxiRICGwwFCQWjmoAACgkQR+u171r99UQEHAf/ZxNbMxwX1v/hXc2ytE6yCAil piZzOffT1VtS3ET66iQRe5VVKL1RXHoIkDRXP7ihm3WF7ZKy9yA9BafMmFxsbXR3+2f+oND6 nRFqQHpiVB/QsVFiRssXeJ2f0WuPYqhpJMFpKTTW/wUWhsDbytFAKXLLfesKdUlpcrwpPnJo KqtVbWAtQ2/o3y+icYOUYzUig+CHl/0pEPr7cUhdDWqZfVdRGVIk6oy00zNYYUmlkkVoU7MB V5D7ZwcBPtjs254P3ecG42szSiEo2cvY9vnMTCIL37tX0M5fE/rHub/uKfG2+JdYSlPJUlva RS1+ODuLoy1pzRd907hl8a7eaVLQWA== Message-ID: Date: Thu, 3 May 2018 17:19:34 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: 7bit X-ID: SyLEuiZCZhFrk2-4zza7OTk6OCT7aEvgW1fFjMpVSLC5ZtphoVUzmrmDh2CR9prZvS X-TOI-MSGID: 87543dc1-8411-49b6-b5b7-c04aa39b9d7f X-Mailman-Approved-At: Thu, 03 May 2018 15:27:46 +0000 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 May 2018 15:19:49 -0000 Am 03.05.18 um 16:41 schrieb Kyle Evans: Hi Kyle, thank you for the fast reply. You were right to request grep -V output, but see below ... ;-) > On Thu, May 3, 2018 at 9:08 AM, Stefan Esser wrote: >> The first "grep" needs 3.5 seconds to finish on my system, but the second >> one (with LC_CTYPE=C or no locale set at all) runs for minutes (I did not >> bother to check whether it finishes at all). >> >> Is this a bug in grep? >> >> Maybe there is something odd in the data file (loading the pattern is not >> slower with LC_CTYPE=C, it takes 0.8 seconds on my system), but this is a >> problem that was observed with "real" data, not a specifically constructed >> worst case. >> >> Any ideas what's causing this behavior? >> >> I'm currently setting the UTF-8 locale as in the first invocation above >> to make grep run in reasonable time, but I'd expect it to be faster in >> the C locale ... >> >> Regards, STefan > > Hmm... what does `grep -V` look like, just to confirm? Ah, yes, good point ... $ which grep /usr/bin/grep $ grep -V grep (GNU grep) 2.5.1-FreeBSD Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. So, it seems I have to complain somewhere else about this behavior ... But I have (for a long time) in my /etc/src.conf: WITH_BSDGREP= yes WITH_BSD_GREP_FASTMATCH= yes WITHOUT_GNU_GREP_COMPAT= yes And before seeing the grep -V output, I was convinced that I had been using BSD grep (i.e. that it replaced GNU grep with above options) by default ... But now I see that I need to invoke bsdgrep under that name. It is very fast, but does not give the expected (correct?) result, which is the single line that is not suppressed by the pattern match ... > These are the results on my local system: > > root@viper:/tmp/grep# ./grep-test.sh > All/mpfr-3.1.7.tgz > 0.10 real 0.10 user 0.00 sys > All/mpfr-3.1.7.tgz > 0.09 real 0.08 user 0.00 sys > > But I don't immediately recall if I have local modifications in > regex(3)/bsdgrep that might have affected this. =( Yes, that's the correct result and extremely fast! But on my system (with only "bsdgrep" substituted for "grep") I get $ sh bsdgrep-test.sh | wc 0.15 real 0.14 user 0.00 sys 0.15 real 0.15 user 0.00 sys 3362 3362 94700 I.e. only about 1/3 of the lines are suppressed by the pattern, while all but 1 line should be ... Or is one of the build options that I used unsafe? Best regards, STefan