Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 08 Apr 2017 13:35:09 -0400
From:      Ernie Luzar <luzar722@gmail.com>
To:        Polytropon <freebsd@edvax.de>
Cc:        RW <rwmaillists@googlemail.com>, freebsd-questions@freebsd.org
Subject:   Re: Is there a database built into the base system
Message-ID:  <58E91F4D.90005@gmail.com>
In-Reply-To: <20170408191633.70d1f303.freebsd@edvax.de>
References:  <58E696BD.6050503@gmail.com>	<69607026-F68C-4D9D-A826-3EFE9ECE12AB@mac.com>	<58E69E59.6020108@gmail.com>	<20170406210516.c63644064eb99f7b60dbd8f4@sohara.org>	<58E6AFC0.2080404@gmail.com>	<20170407001101.GA5885@tau1.ceti.pl>	<20170407210629.GR2787@mailboy.kipshouse.net>	<58E83E19.8010709@gmail.com>	<20170408145503.69ddf649@gumby.homeunix.com>	<58E9171F.3060405@gmail.com> <20170408191633.70d1f303.freebsd@edvax.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Polytropon wrote:
> On Sat, 08 Apr 2017 13:00:15 -0400, Ernie Luzar wrote:
>> Here is my first try at using awk to Read every record in the input 
>> file and drop duplicates records from output file.
>>
>>
>> This what the data looks like.
>> /etc >cat /ip.org.sorted
>> 1.121.136.228;
>> 1.186.172.200;
>> 1.186.172.210;
>> 1.186.172.218;
>> 1.186.172.218;
>> 1.186.172.218;
>> 1.34.169.204;
>> 101.109.155.81;
>> 101.109.155.81;
>> 101.109.155.81;
>> 101.109.155.81;
>> 104.121.89.129;
> 
> Why not simply use "sort | uniq" to eliminate duplicates?
> 
> 
> 
>> /etc >cat /root/bin/ipf.table.awk.dup
>> #! /bin/sh
>>
>>    file_in="/ip.org.sorted"
>>    file_out="/ip.no-dups"
>>
>>    awk '{ in_ip = $1 }'
>>        END { (if in_ip = prev_ip)
>>                 next
>>               else
>>                 prev_ip > $file_out
>>                 prev_ip = in_ip
>>            } $file_in
>>
>> When I run this script it just hangs there. I have to ctrl/c to break 
>> out of it. What is wrong with my awk command?
> 
> For each line, you store the 1st field (in this case, the entire
> line) in in_ip, and you overwrite (!) that variable with each new
> line. At the end of the file (!!!) you make a comparison and even
> request the next data line. Additionally, keep an eye on the quotes
> you use: '...' will keep the $ in $file_out, that's now a variable
> inside awk which is empty. The '...' close before END, so outside
> of awk. Remember that awk reads from standard input, so your
> redirection for the input file would need to be "< $file_in",
> or useless use of cat, "cat $file_in | awk > $file_out".
> 
> In your specific case, I'd say not that awk is the wrong tool.
> If you simply want to eliminate duplicates, use the classic
> UNIX approach "sort | uniq". Both tools are part of the OS.
> 

The awk script I posted is a learning tool. I know about "sort | uniq"

I though "end" was end of line not end of file. So how should that awk 
command look to drop dups from the out put file?






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?58E91F4D.90005>