Date: Sun, 7 Apr 1996 21:55:47 -0700 (PDT) From: "Bryan K. Ogawa" <bkogawa@netvoyage.net> To: Dave Andersen <angio@shell.aros.net> Cc: Jaye Mathisen <mrcpu@cdsnet.net>, freebsd-questions@FreeBSD.ORG Subject: IP-host lookups in NCSA common log files (was: Re: Apache still and timeouts) Message-ID: <Pine.NEB.3.92.960407213404.9801A-100000@digital.netvoyage.net> In-Reply-To: <199603301043.DAA20878@shell.aros.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 30 Mar 1996, Dave Andersen wrote:
> Lo and behold, Jaye Mathisen once said:
>
> > The local named would cache some of them I would think as well, it may be
> > better to let named worry about it...
> >
> > I'd be interested in the script if you finish it.
>
> The only downside to that is that you'll suffer a pretty hefty
> performance penalty. Yes, the odds are .. somewhat good that named will
> cache the successful hits, but you're still stuck using the networking
> interface to do lookups (read: slow as hell) instead of reading them from
> local memory (infinitely faster. :) and you lose the benefit of being
> able to 'flag' unlookupable addresses quickly and efficiently so you
> don't do multiple unsuccessful queries - the real bogdown.
>
> Just make sure you've got enough memory in the beast. Even using a
> bunch of swap would be faster than a reverse namelookup on the IP.
[...]
My tests with my original script demonstrated the above behavior (e.g. the
caching was much faster). I was also surprised at the small amount of
memory the caching script took.
That said, I rewrote the scripts. I have included two versions below:
Somewhat obtuse, and extremely obtuse. :) The first version is pretty
straightforward, but it includes a subroutine that does all of the caching
IP to host conversions that is designed to be short and magic.
Here it is:
#!/usr/bin/perl
# does IP to hostname conversion of NCSA common log format access_log
# files
# Usage: ip2host <filename> <filename> ...
# Or, it will take input from standard in. In either case, output is to
# standard out.
# Bryan K. Ogawa <bkogawa@netvoyage.net>
require "sys/socket.ph";
while(<>) {
($ip, $rest) = split(/ /,$_,2);
print &gethost($ip), " $rest";
}
sub gethost {
$CACHE{$_[0]} || ($CACHE{$_[0]}
= (gethostbyaddr(pack("C4",split(/\./,$_[0])), &AF_INET))[0] || $_[0]);
}
__END__
For you non-perl people out there, you can leave out the __END__ if the
program (from the !# to the last } ) is by itself in a file.
Short explanation: The subroutine gethost uses an associative array as a
cache, filling it with the found hostname, or the original value if no
hostname is found (so, if the IP lookup fails, or the item was already a
name, the returned value should be the original value).
If the host has a name like 205.162.host.net , and there is a name
associated with 205.162.0.0 , it might produce incorrect values; I didn't
test that case.
I decided I wanted to see how small I could make it, so I came up with
this:
#!/usr/bin/perl -ap
$_=$F[0];$_=join(" ",$a{$_}||($a{$_}=
(gethostbyaddr(pack(C4,split(/\./)),2))[0]||$_),@F[1..$#F])."\n";
__END__
Again, the __END__ isn't necessary. The 2nd and 3rd lines can be
concatenated together--I split it for sending via mail. In addition, it
can be invoked from the command line as:
perl -ape '$_=$F[0] ...<insert rest of 2nd/3rd lines here>...' <filenames>
Again, it can also use STDIN instead of a list of files, and in both
cases, it outputs to standard out.
This second version has another known caveat--it presumes AF_INET equals
2.
I hope you find this useful.
bryan
--
Bryan K. Ogawa II Infinitum <>< On this account I speak for myself.
<bkogawa@netvoyage.net> SDG http://www.netvoyage.net/~bkogawa/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.92.960407213404.9801A-100000>
