Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 26 Jul 1997 12:12:27 -0600 (MDT)
From:      Marc Slemko <marcs@znep.com>
To:        Robert Shady <rls@mail.id.net>
Cc:        freebsd-isp@FreeBSD.ORG
Subject:   Re: analog and Apache?
Message-ID:  <Pine.BSF.3.95.970726115632.19606H-100000@alive.znep.com>
In-Reply-To: <199707261028.GAA21284@server.id.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 26 Jul 1997, Robert Shady wrote:

> Hmmm.. We log DNS lookups, perhaps you guys are just "lucky" in a sense, but
> these are a couple of our logfiles (we have several like this), and these are
> about 200-300MB smaller than "normal" since they haven't been promoting their
> website for a while now.
> 
> -rw-r--r--  1 nobody nogroup 540540984 Jul 26 06:13 access_log
> -rw-r--r--  1 nobody nogroup 584473315 Jul 26 06:14 access_log
> 
> And let me tell you... Going through more than
> 
> # wc -l access_log 
>  2592183 access_log
> 
> requests and processing DNS lookups AFTER the fact is a complete pain in the
> rump and DOES have an impact on DNS machines, the network, etc.  We haven't
> really noticed too many problems with real-time DNS lookups.  What *SHOULD*
> be done (perhaps it already is) is that the logging trail behind the incoming
> requests and the DNS lookups be done at the webservers convience, but as close
> to real-time as possible so as not to slow down any interactive connections..

It would be possible to implement something like that in the current
version of Apache by logging to a pipe and having it do lookups and buffer
things, but logging to programs isn't entirely reliable yet.  

Deferred logging has the problem that it ruins any idea of having reliable
logs for each "transaction", since a crash could wipe out unwritten logs.

If you are having trouble doing DNS lookups on a large logfile, then
simply do it more frequently on small logfiles.  A script to rotate the
logfile out, do DNS lookups and append it to the old one isn't that hard.
There _are_ some potential negative performance implications of doing
lookups in Apache.  The biggest comes from sites without reverse DNS,
especially if you are using older versions of BIND that don't do negative
caching, but even if your version does do negative caching there is still
quite an impact.

Also note that, using the default settings in Apache, if you have it do
reverse lookups then the results you get are easily faked.  Unless you
compile with MAXIMUM_DNS defined, Apache will only do a reverse lookup,
not a reverse lookup then a forward lookup on the result to ensure it is
valid.  That means that anyone who can control their reverse DNS can
appear to be from wherever they want.

I don't find doing resolution after the fact to be that bad.  Any high
volume sites we rotate every day or two, so we only have around a hundred
meg slice of logfile to deal with at one shot, which works fine.  OTHO, if
you have smaller machines or smaller network connectivity it can cause
problems.

If you have a program that can handle multiple pending queries at once
(not nice to do with current resolvers), you could speed things up a good
bit more.  Hmm, perhaps I should try that.  Fork a bunch of child
processes and have a parent process send names to be resolved to them.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.95.970726115632.19606H-100000>