Date: Sun, 7 Apr 1996 21:55:47 -0700 (PDT) From: "Bryan K. Ogawa" <bkogawa@netvoyage.net> To: Dave Andersen <angio@shell.aros.net> Cc: Jaye Mathisen <mrcpu@cdsnet.net>, freebsd-questions@FreeBSD.ORG Subject: IP-host lookups in NCSA common log files (was: Re: Apache still and timeouts) Message-ID: <Pine.NEB.3.92.960407213404.9801A-100000@digital.netvoyage.net> In-Reply-To: <199603301043.DAA20878@shell.aros.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 30 Mar 1996, Dave Andersen wrote: > Lo and behold, Jaye Mathisen once said: > > > The local named would cache some of them I would think as well, it may be > > better to let named worry about it... > > > > I'd be interested in the script if you finish it. > > The only downside to that is that you'll suffer a pretty hefty > performance penalty. Yes, the odds are .. somewhat good that named will > cache the successful hits, but you're still stuck using the networking > interface to do lookups (read: slow as hell) instead of reading them from > local memory (infinitely faster. :) and you lose the benefit of being > able to 'flag' unlookupable addresses quickly and efficiently so you > don't do multiple unsuccessful queries - the real bogdown. > > Just make sure you've got enough memory in the beast. Even using a > bunch of swap would be faster than a reverse namelookup on the IP. [...] My tests with my original script demonstrated the above behavior (e.g. the caching was much faster). I was also surprised at the small amount of memory the caching script took. That said, I rewrote the scripts. I have included two versions below: Somewhat obtuse, and extremely obtuse. :) The first version is pretty straightforward, but it includes a subroutine that does all of the caching IP to host conversions that is designed to be short and magic. Here it is: #!/usr/bin/perl # does IP to hostname conversion of NCSA common log format access_log # files # Usage: ip2host <filename> <filename> ... # Or, it will take input from standard in. In either case, output is to # standard out. # Bryan K. Ogawa <bkogawa@netvoyage.net> require "sys/socket.ph"; while(<>) { ($ip, $rest) = split(/ /,$_,2); print &gethost($ip), " $rest"; } sub gethost { $CACHE{$_[0]} || ($CACHE{$_[0]} = (gethostbyaddr(pack("C4",split(/\./,$_[0])), &AF_INET))[0] || $_[0]); } __END__ For you non-perl people out there, you can leave out the __END__ if the program (from the !# to the last } ) is by itself in a file. Short explanation: The subroutine gethost uses an associative array as a cache, filling it with the found hostname, or the original value if no hostname is found (so, if the IP lookup fails, or the item was already a name, the returned value should be the original value). If the host has a name like 205.162.host.net , and there is a name associated with 205.162.0.0 , it might produce incorrect values; I didn't test that case. I decided I wanted to see how small I could make it, so I came up with this: #!/usr/bin/perl -ap $_=$F[0];$_=join(" ",$a{$_}||($a{$_}= (gethostbyaddr(pack(C4,split(/\./)),2))[0]||$_),@F[1..$#F])."\n"; __END__ Again, the __END__ isn't necessary. The 2nd and 3rd lines can be concatenated together--I split it for sending via mail. In addition, it can be invoked from the command line as: perl -ape '$_=$F[0] ...<insert rest of 2nd/3rd lines here>...' <filenames> Again, it can also use STDIN instead of a list of files, and in both cases, it outputs to standard out. This second version has another known caveat--it presumes AF_INET equals 2. I hope you find this useful. bryan -- Bryan K. Ogawa II Infinitum <>< On this account I speak for myself. <bkogawa@netvoyage.net> SDG http://www.netvoyage.net/~bkogawa/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.92.960407213404.9801A-100000>