From owner-freebsd-isp Fri Apr 17 07:49:47 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id HAA03612 for freebsd-isp-outgoing; Fri, 17 Apr 1998 07:49:47 -0700 (PDT) (envelope-from owner-freebsd-isp@FreeBSD.ORG) Received: from loa.part.net ([155.99.143.146]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id OAA03606 for ; Fri, 17 Apr 1998 14:49:42 GMT (envelope-from jlp@Part.NET) Received: from loa.part.net (localhost [127.0.0.1]) by loa.part.net (8.8.5/8.8.5) with ESMTP id IAA23909; Fri, 17 Apr 1998 08:48:32 -0600 (MDT) Message-Id: <199804171448.IAA23909@loa.part.net> X-Mailer: exmh version 2.0.2 2/24/98 To: spork cc: isp@FreeBSD.ORG Subject: Re: log to st0? X-face: p=61=y<.Il$z+k*y~"j>%c[8R~8{j3WTnaSd-'RyC>t.Ub>AAm\zYA#5JF +W=G?EI+|EI);]=fs_MOfKN0n9`OlmB[1^0;L^64K5][nOb&gv/n}p@mm06|J|WNa asp7mMEw0w)e_6T~7v-\]yHKvI^1}[2k)] References: In-reply-to: Your message of "Fri, 17 Apr 1998 01:50:06 EDT." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 17 Apr 1998 08:48:32 -0600 From: "Jan L. Peterson" Sender: owner-freebsd-isp@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > We're running into problems with archiving hits from some of the larger > sites we host. We have been toying with the idea of a "log server" to > collect and analyze the logs. Any suggestions? Well, I mentioned some things about how we did log processing at iMALL. Here's a little more detail: We were running the Stronghold server, which is based on Apache. It has the facility to log to a pipe instead of a file, which we used to feed a process called "batcher". We also used the LogFormat feature to tag log lines for rooted vdomains with the account name (letting us only have to maintain one log stream instead of one for each vdomain). batcher would produce batch files containing five minutes worth of logs each, which would be copied over to the log machine with ssh. All logs were written without DNS resolution (made the servers too slow), and logging via a pipe meant that we never had to reload the servers just to change log files. Also, you could have multiple servers (we had four) all feeding the same log processing system. On the log machine, a process called the "cooker" would take the raw five minute batches and process out the DNS information, leaving hostnames for IP's that could be resolved (local caching of both resolvable and unresolvable IP addresses was also maintained). It would also re-write the request for any log lines that came from a rooted vdomain so that they looked like they were served by the normal web servers (all vdomains were actually sub-directories of the main server, i.e. http://www.circuscircus.com/ could also be referenced as http://www.imall.com/stores/ccmain/inc/, so all log lines from http://www.circuscircus.com/whatever were re-written by the cooker to look like they hit http://www.imall.com/stores/ccmain/inc/whatever. After the cooker finished with the batches, they were handed off to a third processes called the "splitter". splitter would take the five minute batches and join them together into a master log file for each day. When a day's log file had not been modified for 24 hours, splitter would compress it with gzip. splitter also had the functionality to split out a particular store's logs from the master log and save them in an independent log file (one per store per day), but that facility was turned off since our custom log processing software did not require separate log files for each customer yet provided sufficient information to the customer that they didn't need the raw logs themselves. These master logs were processed nightly by a locally developed package called DAP, which would produce summary files and detailed log breakdowns for each customer. These summaries were placed in a "backroom" for each store, where the customer could pick them up. (They included not only web server log information, but also information about how many times the store had shown up in a search, and how many times that resulted in a visit to the store. Also logged were the number and dollar amounts of all sales through that store, and summary information such as dollars per visit, etc.) DAP also mailed a summary to the store owner every two weeks. The finished master logs were left in a directory, and a cron job would come along and move any that were more than 45 days old out to a separate archive directory. Another cron job would watch this archive directory, and when it was getting close to having about 650MB in it, would mail a request to our operations queue, requesting that it be burned off to CD. This process was performed manually by a staff member (usually about once every four to six weeks). This way, we had a permanent record of all web server accesses, broken down by day (in case some auditor needed to see or sample them). Archiving them off to tape would have been similar, but we would have lost the random-access quality of the CDs (i.e., say you wanted logs from 8 September 1997, you'd probably have to spend a lot of time reading tape to find the log file you want... with the CD, you just pop it into a drive, mount it up, and copy off the log file you want). Oh yeah, we did run nightly backups of the log processing machine, so worst case we could lose one day's worth of logs. It wouldn't have been too difficult to have a staging area on another machine that would have removed this risk, but we determined that it wasn't worth the cost. Another option would have been to have two independent log processing machines, and copy the batches down to both of them, but again, the cost outweighed the risk in our opinion. Our hardware investment to handle this log processing was a Pentium 133 based system running FreeBSD (128MB of RAM), with a buslogics fast/wide scsi controller driving a 4GB disk. All of the log processing programs described above were written in Perl. The CD's were burned on a second Pentium 133 running (gasp) Linux (we were never able to get cdrecord to work under FreeBSD with our weird-o CD-R drive, but Linux drove it just fine). If you're interested in setting up a similar system, that should be enough detail to get you going. If not, I'm available for consulting at $100/hour plus expenses. :-) -jan- -- Jan L. Peterson PartNET tel. +1 801 581 1118 Senior Systems Admin 423 Wakara Way, Suite 216 fax +1 801 581 1785 jlp@part.net Salt Lake City, UT 84108 http://www.part.net/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-isp" in the body of the message