From owner-freebsd-questions@freebsd.org Sat Nov 2 18:57:29 2019 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 453E81AFEFE for ; Sat, 2 Nov 2019 18:57:29 +0000 (UTC) (envelope-from kh@panix.com) Received: from mailbackend.panix.com (mailbackend.panix.com [166.84.1.89]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4757bX5Mh5z43rd for ; Sat, 2 Nov 2019 18:57:28 +0000 (UTC) (envelope-from kh@panix.com) Received: from rain.home (pool-72-74-69-77.bstnma.fios.verizon.net [72.74.69.77]) by mailbackend.panix.com (Postfix) with ESMTPSA id 4757bW1kRbz1w8j for ; Sat, 2 Nov 2019 14:57:27 -0400 (EDT) Subject: Re: grep for ascii nul To: freebsd-questions@freebsd.org References: <20191101092716.GA67658@admin.sibptus.ru> <63808.1572638827@segfault.tristatelogic.com> <20191102064505.GA98558@admin.sibptus.ru> From: Kurt Hackenberg Message-ID: <7775e7f8-89ba-d057-67d3-cdcb92d2bbb4@panix.com> Date: Sat, 2 Nov 2019 14:57:14 -0400 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:68.0) Gecko/20100101 Thunderbird/68.1.2 MIME-Version: 1.0 In-Reply-To: <20191102064505.GA98558@admin.sibptus.ru> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4757bX5Mh5z43rd X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of kh@panix.com designates 166.84.1.89 as permitted sender) smtp.mailfrom=kh@panix.com X-Spamd-Result: default: False [-5.59 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:166.84.0.0/16]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-questions@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; DMARC_NA(0.00)[panix.com]; RCVD_IN_DNSWL_MED(-0.20)[89.1.84.166.list.dnswl.org : 127.0.5.2]; IP_SCORE(-3.09)[ip: (-9.07), ipnet: 166.84.0.0/16(-3.51), asn: 2033(-2.81), country: US(-0.05)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:2033, ipnet:166.84.0.0/16, country:US]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Nov 2019 18:57:29 -0000 On 2019-11-02 02:45, Victor Sudakov wrote: > I'm a big fan of awk, awk is in the base system and should be able to do > it, right? > > $ hd trees.txt > 00000000 66 69 72 0a 6f 61 6b 0a 63 65 64 00 61 72 0a 62 |fir.oak.ced.ar.b| > 00000010 69 72 63 68 0a 70 61 6c 6d 0a |irch.palm.| > 0000001a > $ > > Note the ascii null embedded in the word "cedar" > > $ awk '/\x66\x69/{print $0}' trees.txt > fir > > So far so good. But with the ascii nul it behaves in an unexpected way: > > $ awk '/\x00/{print $0}' trees.txt > fir > oak > ced > birch > palm > $ Looks like it has the same problem that I guess grep does: it takes that NUL as the end of a C string, so the regexp becomes a null string (zero length), which matches everything.