From owner-freebsd-questions Sun Apr 7 21:56:08 1996 Return-Path: owner-questions Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id VAA01276 for questions-outgoing; Sun, 7 Apr 1996 21:56:08 -0700 (PDT) Received: from digital.netvoyage.net (root@digital.netvoyage.net [205.162.154.10]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id VAA01266 for ; Sun, 7 Apr 1996 21:56:03 -0700 (PDT) Received: from localhost (bkogawa@localhost) by digital.netvoyage.net (8.6.13/8.6.9) with SMTP id VAA19551; Sun, 7 Apr 1996 21:55:47 -0700 Date: Sun, 7 Apr 1996 21:55:47 -0700 (PDT) From: "Bryan K. Ogawa" To: Dave Andersen cc: Jaye Mathisen , freebsd-questions@FreeBSD.ORG Subject: IP-host lookups in NCSA common log files (was: Re: Apache still and timeouts) In-Reply-To: <199603301043.DAA20878@shell.aros.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-questions@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk On Sat, 30 Mar 1996, Dave Andersen wrote: > Lo and behold, Jaye Mathisen once said: > > > The local named would cache some of them I would think as well, it may be > > better to let named worry about it... > > > > I'd be interested in the script if you finish it. > > The only downside to that is that you'll suffer a pretty hefty > performance penalty. Yes, the odds are .. somewhat good that named will > cache the successful hits, but you're still stuck using the networking > interface to do lookups (read: slow as hell) instead of reading them from > local memory (infinitely faster. :) and you lose the benefit of being > able to 'flag' unlookupable addresses quickly and efficiently so you > don't do multiple unsuccessful queries - the real bogdown. > > Just make sure you've got enough memory in the beast. Even using a > bunch of swap would be faster than a reverse namelookup on the IP. [...] My tests with my original script demonstrated the above behavior (e.g. the caching was much faster). I was also surprised at the small amount of memory the caching script took. That said, I rewrote the scripts. I have included two versions below: Somewhat obtuse, and extremely obtuse. :) The first version is pretty straightforward, but it includes a subroutine that does all of the caching IP to host conversions that is designed to be short and magic. Here it is: #!/usr/bin/perl # does IP to hostname conversion of NCSA common log format access_log # files # Usage: ip2host ... # Or, it will take input from standard in. In either case, output is to # standard out. # Bryan K. Ogawa require "sys/socket.ph"; while(<>) { ($ip, $rest) = split(/ /,$_,2); print &gethost($ip), " $rest"; } sub gethost { $CACHE{$_[0]} || ($CACHE{$_[0]} = (gethostbyaddr(pack("C4",split(/\./,$_[0])), &AF_INET))[0] || $_[0]); } __END__ For you non-perl people out there, you can leave out the __END__ if the program (from the !# to the last } ) is by itself in a file. Short explanation: The subroutine gethost uses an associative array as a cache, filling it with the found hostname, or the original value if no hostname is found (so, if the IP lookup fails, or the item was already a name, the returned value should be the original value). If the host has a name like 205.162.host.net , and there is a name associated with 205.162.0.0 , it might produce incorrect values; I didn't test that case. I decided I wanted to see how small I could make it, so I came up with this: #!/usr/bin/perl -ap $_=$F[0];$_=join(" ",$a{$_}||($a{$_}= (gethostbyaddr(pack(C4,split(/\./)),2))[0]||$_),@F[1..$#F])."\n"; __END__ Again, the __END__ isn't necessary. The 2nd and 3rd lines can be concatenated together--I split it for sending via mail. In addition, it can be invoked from the command line as: perl -ape '$_=$F[0] ......' Again, it can also use STDIN instead of a list of files, and in both cases, it outputs to standard out. This second version has another known caveat--it presumes AF_INET equals 2. I hope you find this useful. bryan -- Bryan K. Ogawa II Infinitum <>< On this account I speak for myself. SDG http://www.netvoyage.net/~bkogawa/