From owner-freebsd-www Mon Mar 31 16:56:13 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id QAA25041 for www-outgoing; Mon, 31 Mar 1997 16:56:13 -0800 (PST) Received: from time.cdrom.com (root@time.cdrom.com [204.216.27.226]) by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id QAA25028; Mon, 31 Mar 1997 16:56:08 -0800 (PST) Received: from time.cdrom.com (jkh@localhost [127.0.0.1]) by time.cdrom.com (8.8.5/8.6.9) with ESMTP id QAA13143; Mon, 31 Mar 1997 16:56:08 -0800 (PST) To: hubs@freebsd.org cc: www@freebsd.org Subject: Unified download stats [and better presentation of same] Date: Mon, 31 Mar 1997 16:56:07 -0800 Message-ID: <13139.859856167@time.cdrom.com> From: "Jordan K. Hubbard" Sender: owner-freebsd-www@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I've been getting more and more emails lately from folks asking me why we're not doing more to call attention to all of FreeBSD's cool features, it's growing popularity in the marketplace and so on. I sometimes wonder this too, and occasionally I even wander over to Red Hat's site and look at all the (paid, full-time) energy that they've managed to pour into their site and I get a little depressed about it, so rather than cry into my milk I decided to start doing at least *something* to make better use of our statistical information in marketing ourselves, and the first step (IMHO) is to get a better feel for how many copies of FreeBSD are being downloaded each day with the eventual intention of putting this information up on a web page. As a very rough and first-order attempt at this, for the last week or so I've been having ftp.freebsd.org send me little messages like this: From: "Root wcarchive.cdrom.com" Message-Id: <199703311111.DAA07942@wcarchive.cdrom.com> To: freebsd-stats@freebsd.org FreeBSD downloads over last 24 hours (up to Mar 31 03:01) from ftp.FreeBSD.org: 3.0-970209-SNAP 14 copies. 2.1.7.1-RELEASE 53 copies. 2.2.1-RELEASE 139 copies. and I have found it to be informative for keeping track of which releases seem to be popular and what seems to drive demand. It only shows ONE mirror site, however, and this is a drawback which we need to address before such statistics will become truly useful. We also need to make sure of the accuracy of our statistics, and this means that any mirrors which are mirroring things multiple times (I seem to recall Christoph Kukulies having some problems with this?) need to be identified and their duplicate entries filtered out of the raw data. I'm already suspicious that our numbers from ftp.freebsd.org may be inflated by as much as 10% due to bogus repetitive mirroring, and if anyone has any ideas on filtering this out of 500MB weekly xferlogs (I figure you'd need at least a week's granularity to do proper detection), I'm more than open to suggestions. I really do want these stats to be as accurate as possible, neither under or over inflated, since bogus stats help noone. I also need to somehow collect stats for all the mirrors, run once a day and sent to the ``freebsd-stats@freebsd.org'' alias I've just created on freefall. This currently fans out to just David Greenman, myself and Poul-Henning but it could just as easily be a mailing list for those interested in the same information. Ideally, one of our webfolk would also collect the info and chart it on the "FreeBSD Bragging Page" one of us (even if it has to be me :) will be putting up soon, right? :-) So, to summarize: 1. Anyone have a good way of detecting mirror looping from wu-ftpd style xferlogs? This would help both our bandwidth and our stats. 2. Would those hubs who are willing to participate in this be willing to run a little something from /etc/daily (or whatever crontab driven thing they have on their mirror machine) which summarizes the stats and sends it to the freebsd-stats@freebsd.org redistribution point? For stats collection, I'm currently just using this braindead little shell script: #!/bin/sh if [ $? -lt 1 ]; then file=/var/log/xferlog.day else file=$1 fi date=`ls -l $file | awk '{print $6 " " $7 " " $8}'` echo "FreeBSD downloads over last 24 hours (up to $date) from ftp.FreeBSD.org:" awk '/.16\/FreeBSD\/.*\/bin\/bin.aa/ { files[$9]++; } END { for (i in files) if (files[i] > 1) printf("\t%s\t%4d copies.\n", i, files[i]); }' $file | sed -e 's;/.16/FreeBSD/\(.*\)/bin/bin.aa;\1;' It's hard-coded for ftp.freebsd.org's FTP setup, of course, and the output it emits is more designed for humans than machine tabulation. Perhaps the stats counter we eventually come up with should have both? A nice textual summary and then a little embedded comment somewhere in it which just contains the raw numbers for a counter / web page generator's benefit would be ideal, I'd think. The ultimate goal here is, of course, to have a page at www.freebsd.org (and its mirrors) which shows the world-wide weekly download count, perhaps even with a little pie-chart showing it by country, and CD sales to some rough approximation. Assuming that the numbers are impressive, this would then be excellent fodder for people giving talks or trying to talk ISVs and other vendor into supporting FreeBSD in some way. Comments? Any perl programmers out here who'd care to give me an assist in producing the more definitive daily stats summarizer? :-) Jordan