From owner-freebsd-questions@freebsd.org Tue Nov 7 18:37:05 2017 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9418CE61538 for ; Tue, 7 Nov 2017 18:37:05 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from mailrelay13.qsc.de (mailrelay13.qsc.de [212.99.187.253]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.antispameurope.com", Issuer "TeleSec ServerPass Class 2 CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1EBDE7EF1D for ; Tue, 7 Nov 2017 18:37:04 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from mx01.qsc.de ([213.148.129.14]) by mailrelay13.qsc.de; Tue, 07 Nov 2017 19:36:54 +0100 Received: from r56.edvax.de (port-92-195-23-159.dynamic.qsc.de [92.195.23.159]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx01.qsc.de (Postfix) with ESMTPS id 7DC643CBF9; Tue, 7 Nov 2017 19:36:53 +0100 (CET) Received: from r56.edvax.de (localhost [127.0.0.1]) by r56.edvax.de (8.14.5/8.14.5) with SMTP id vA7Iarn6002415; Tue, 7 Nov 2017 19:36:53 +0100 (CET) (envelope-from freebsd@edvax.de) Date: Tue, 7 Nov 2017 19:36:52 +0100 From: Polytropon To: byrnejb@harte-lyne.ca Cc: "James B. Byrne via freebsd-questions" Subject: Re: sed - remove nul lines from file Message-Id: <20171107193652.7b0aa08f.freebsd@edvax.de> In-Reply-To: References: Reply-To: Polytropon Organization: EDVAX X-Mailer: Sylpheed 3.1.1 (GTK+ 2.24.5; i386-portbld-freebsd8.2) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-cloud-security-sender: freebsd@edvax.de X-cloud-security-recipient: freebsd-questions@freebsd.org X-cloud-security-Virusscan: CLEAN X-cloud-security-disclaimer: This E-Mail was scanned by E-Mailservice on mailrelay13.qsc.de with 075D76834CB X-cloud-security-connect: mx01.qsc.de[213.148.129.14], TLS=1, IP=213.148.129.14 X-cloud-security: scantime:.1499 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Nov 2017 18:37:05 -0000 On Tue, 7 Nov 2017 12:12:55 -0500, James B. Byrne via freebsd-questions wrote: > I have a data file created by an ancient proprietary scripting > language called QTP. There is a bug in this program which, on > occasion, manifests itself by inserting output records consisting > entirely of nul (^@) (\x00) bytes at regular intervals. In the > present case every 47th. record consists entirely of nuls. If you know that the 7th line is to be removed, awk can do this easily: $ awk '(NR != 7)' < infile.txt > outfile.txt This will print all lines except the 7th one with the NULs. But if it's not the 7th line, you need a more flexible solution. > The purpose of this data file is to feed a psql COPY statement for > loading into a PostgreSQL database. The presence of the NUL > characters prevents this. I have previously used the tr utility to > remove the NUL characters but this requires me to manually remove the > residual empty lines. In this case, awk can also help: $ awk '(length > 0)' < infile.txt > outfile.txt This will print all lines which are longer than 0 characters. > I have tried various permutations of the sed invocation reproduced > below to remove these lines directly but without success. The > examples that I have found on StackExchange and various other > self-help sites do not give the results claimed, at least not for me > on FreeBSD. So, I would appreciate if anyone here can point out what I > am doing wrong or how the sed on FreeBSD differs in behaviour for that > used in the examples I have found. > > Given a file INFILE with records containing the following: > > . . . > *93566000008166*,*CCTL*,*3072 49534494 * > *93566000008166*,*CCTL*,*3072 49534493 * > *93566000008166*,*CCTL*,*3072 49534497 * > *93566000015962*,*CCTL*,*8156 4171000541 * > ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ . . . > *93566000198850*,*CCTL*,*417 1003874 * > *93566000010320*,*CCTL*,*8084 2601553853102 * > . . . > > I wish to remove (all) the line(s) with the nul (^@) characters. I > have tried this: > > sed '/^\x00*$/d' INFILE > INFILE.sed > > and this: > > sed _E '/^\x00*$/d' INFILE > INFILE.sed > > but neither these nor the many other combinations that I have tried > remove the lines. What is the method of accomplishing this in sed or > is it not possible? I'd suggest using the tr utility, especially with the -d option which does not translate, but delete characters: $ tr -d '\000' < infile.txt > outfile.txt This of course leaves an empty line (as the trailing \n will not be translated), so using the awk step in combination would help: $ tr -d '\000' < infile.txt | awk '(length > 0)' > outfile.txt This will remove the entire lines with the NULs, no matter at which line position they appear in the input file. -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...