Date: Sat, 26 Jul 1997 12:12:27 -0600 (MDT) From: Marc Slemko <marcs@znep.com> To: Robert Shady <rls@mail.id.net> Cc: freebsd-isp@FreeBSD.ORG Subject: Re: analog and Apache? Message-ID: <Pine.BSF.3.95.970726115632.19606H-100000@alive.znep.com> In-Reply-To: <199707261028.GAA21284@server.id.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 26 Jul 1997, Robert Shady wrote: > Hmmm.. We log DNS lookups, perhaps you guys are just "lucky" in a sense, but > these are a couple of our logfiles (we have several like this), and these are > about 200-300MB smaller than "normal" since they haven't been promoting their > website for a while now. > > -rw-r--r-- 1 nobody nogroup 540540984 Jul 26 06:13 access_log > -rw-r--r-- 1 nobody nogroup 584473315 Jul 26 06:14 access_log > > And let me tell you... Going through more than > > # wc -l access_log > 2592183 access_log > > requests and processing DNS lookups AFTER the fact is a complete pain in the > rump and DOES have an impact on DNS machines, the network, etc. We haven't > really noticed too many problems with real-time DNS lookups. What *SHOULD* > be done (perhaps it already is) is that the logging trail behind the incoming > requests and the DNS lookups be done at the webservers convience, but as close > to real-time as possible so as not to slow down any interactive connections.. It would be possible to implement something like that in the current version of Apache by logging to a pipe and having it do lookups and buffer things, but logging to programs isn't entirely reliable yet. Deferred logging has the problem that it ruins any idea of having reliable logs for each "transaction", since a crash could wipe out unwritten logs. If you are having trouble doing DNS lookups on a large logfile, then simply do it more frequently on small logfiles. A script to rotate the logfile out, do DNS lookups and append it to the old one isn't that hard. There _are_ some potential negative performance implications of doing lookups in Apache. The biggest comes from sites without reverse DNS, especially if you are using older versions of BIND that don't do negative caching, but even if your version does do negative caching there is still quite an impact. Also note that, using the default settings in Apache, if you have it do reverse lookups then the results you get are easily faked. Unless you compile with MAXIMUM_DNS defined, Apache will only do a reverse lookup, not a reverse lookup then a forward lookup on the result to ensure it is valid. That means that anyone who can control their reverse DNS can appear to be from wherever they want. I don't find doing resolution after the fact to be that bad. Any high volume sites we rotate every day or two, so we only have around a hundred meg slice of logfile to deal with at one shot, which works fine. OTHO, if you have smaller machines or smaller network connectivity it can cause problems. If you have a program that can handle multiple pending queries at once (not nice to do with current resolvers), you could speed things up a good bit more. Hmm, perhaps I should try that. Fork a bunch of child processes and have a parent process send names to be resolved to them.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.95.970726115632.19606H-100000>