Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 7 Nov 2017 19:36:52 +0100
From:      Polytropon <freebsd@edvax.de>
To:        byrnejb@harte-lyne.ca
Cc:        "James B. Byrne via freebsd-questions" <freebsd-questions@freebsd.org>
Subject:   Re: sed - remove nul lines from file
Message-ID:  <20171107193652.7b0aa08f.freebsd@edvax.de>
In-Reply-To: <b21bf201363c34a90ab55c4a05ff8fd7.squirrel@webmail.harte-lyne.ca>
References:  <b21bf201363c34a90ab55c4a05ff8fd7.squirrel@webmail.harte-lyne.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 7 Nov 2017 12:12:55 -0500, James B. Byrne via freebsd-questions wrote:
> I have a data file created by an ancient proprietary scripting
> language called QTP.  There is a bug in this program which, on
> occasion, manifests itself by inserting output records consisting
> entirely of nul (^@) (\x00) bytes at regular intervals.  In the
> present case every 47th. record consists entirely of nuls.

If you know that the 7th line is to be removed, awk can do
this easily:

	$ awk '(NR != 7)' < infile.txt > outfile.txt

This will print all lines except the 7th one with the NULs.
But if it's not the 7th line, you need a more flexible solution.



> The purpose of this data file is to feed a psql COPY statement for
> loading into a PostgreSQL database.  The presence of the NUL
> characters prevents this.  I have previously used the tr utility to
> remove the NUL characters but this requires me to manually remove the
> residual empty lines.

In this case, awk can also help:

	$ awk '(length > 0)' < infile.txt > outfile.txt

This will print all lines which are longer than 0 characters.



> I have tried various permutations of the sed invocation reproduced
> below to remove these lines directly but without success.  The
> examples that I have found on StackExchange and various other
> self-help sites do not give the results claimed, at least not for me
> on FreeBSD. So, I would appreciate if anyone here can point out what I
> am doing wrong or how the sed on FreeBSD differs in behaviour for that
> used in the examples I have found.
> 
> Given a file INFILE with records containing the following:
> 
> . . .
> *93566000008166*,*CCTL*,*3072 49534494                 *
> *93566000008166*,*CCTL*,*3072 49534493                 *
> *93566000008166*,*CCTL*,*3072 49534497                 *
> *93566000015962*,*CCTL*,*8156 4171000541               *
> ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ . . .
> *93566000198850*,*CCTL*,*417 1003874                   *
> *93566000010320*,*CCTL*,*8084 2601553853102            *
> . . .
> 
> I wish to remove (all) the line(s) with the nul (^@) characters.  I
> have tried this:
> 
> sed '/^\x00*$/d' INFILE > INFILE.sed
> 
> and this:
> 
> sed _E '/^\x00*$/d' INFILE > INFILE.sed
> 
> but neither these nor the many other combinations that I have tried
> remove the lines.  What is the method of accomplishing this in sed or
> is it not possible?

I'd suggest using the tr utility, especially with the -d option
which does not translate, but delete characters:

	$ tr -d '\000' < infile.txt > outfile.txt

This of course leaves an empty line (as the trailing \n will not
be translated), so using the awk step in combination would help:

	$ tr -d '\000' < infile.txt | awk '(length > 0)' > outfile.txt

This will remove the entire lines with the NULs, no matter at
which line position they appear in the input file.





-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20171107193652.7b0aa08f.freebsd>