Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 31 Mar 1997 16:56:07 -0800
From:      "Jordan K. Hubbard" <jkh@time.cdrom.com>
To:        hubs@freebsd.org
Cc:        www@freebsd.org
Subject:   Unified download stats [and better presentation of same]
Message-ID:  <13139.859856167@time.cdrom.com>

next in thread | raw e-mail | index | archive | help
I've been getting more and more emails lately from folks asking me why
we're not doing more to call attention to all of FreeBSD's cool
features, it's growing popularity in the marketplace and so on.  I
sometimes wonder this too, and occasionally I even wander over to Red
Hat's site and look at all the (paid, full-time) energy that they've
managed to pour into their site and I get a little depressed about it,
so rather than cry into my milk I decided to start doing at least
*something* to make better use of our statistical information in
marketing ourselves, and the first step (IMHO) is to get a better feel
for how many copies of FreeBSD are being downloaded each day with the
eventual intention of putting this information up on a web page.

As a very rough and first-order attempt at this, for the last week or
so I've been having ftp.freebsd.org send me little messages like this:

From: "Root wcarchive.cdrom.com" <root@wcarchive.cdrom.com>
Message-Id: <199703311111.DAA07942@wcarchive.cdrom.com>
To: freebsd-stats@freebsd.org

FreeBSD downloads over last 24 hours (up to Mar 31 03:01) from ftp.FreeBSD.org:
        3.0-970209-SNAP   14 copies.
        2.1.7.1-RELEASE   53 copies.
        2.2.1-RELEASE    139 copies.

and I have found it to be informative for keeping track of which
releases seem to be popular and what seems to drive demand.  It only
shows ONE mirror site, however, and this is a drawback which we need
to address before such statistics will become truly useful.  We also
need to make sure of the accuracy of our statistics, and this means
that any mirrors which are mirroring things multiple times (I seem to
recall Christoph Kukulies having some problems with this?) need to be
identified and their duplicate entries filtered out of the raw data.
I'm already suspicious that our numbers from ftp.freebsd.org may be
inflated by as much as 10% due to bogus repetitive mirroring, and if
anyone has any ideas on filtering this out of 500MB weekly xferlogs (I
figure you'd need at least a week's granularity to do proper
detection), I'm more than open to suggestions.  I really do want these
stats to be as accurate as possible, neither under or over inflated,
since bogus stats help noone.

I also need to somehow collect stats for all the mirrors, run once a
day and sent to the ``freebsd-stats@freebsd.org'' alias I've just
created on freefall.  This currently fans out to just David Greenman,
myself and Poul-Henning but it could just as easily be a mailing list
for those interested in the same information.  Ideally, one of our
webfolk would also collect the info and chart it on the "FreeBSD
Bragging Page" one of us (even if it has to be me :) will be putting
up soon, right? :-)


So, to summarize:

	1. Anyone have a good way of detecting mirror looping from
	   wu-ftpd style xferlogs?

	   This would help both our bandwidth and our stats.

	2. Would those hubs who are willing to participate in this
	   be willing to run a little something from /etc/daily (or
	   whatever crontab driven thing they have on their mirror
	   machine) which summarizes the stats and sends it to the
	   freebsd-stats@freebsd.org redistribution point?

For stats collection, I'm currently just using this braindead
little shell script:

#!/bin/sh
if [ $? -lt 1 ]; then
        file=/var/log/xferlog.day
else
        file=$1
fi
date=`ls -l $file | awk '{print $6 " " $7 " " $8}'`
echo "FreeBSD downloads over last 24 hours (up to $date) from ftp.FreeBSD.org:"
awk '/.16\/FreeBSD\/.*\/bin\/bin.aa/ {
        files[$9]++;
}
END {
        for (i in files)
                if (files[i] > 1)
                        printf("\t%s\t%4d copies.\n", i, files[i]);
}' $file | sed -e 's;/.16/FreeBSD/\(.*\)/bin/bin.aa;\1;'

It's hard-coded for ftp.freebsd.org's FTP setup, of course, and the
output it emits is more designed for humans than machine tabulation.
Perhaps the stats counter we eventually come up with should have both?
A nice textual summary and then a little embedded comment somewhere in
it which just contains the raw numbers for a counter / web page
generator's benefit would be ideal, I'd think.

The ultimate goal here is, of course, to have a page at
www.freebsd.org (and its mirrors) which shows the world-wide weekly
download count, perhaps even with a little pie-chart showing it by
country, and CD sales to some rough approximation.  Assuming that the
numbers are impressive, this would then be excellent fodder for people
giving talks or trying to talk ISVs and other vendor into supporting
FreeBSD in some way.

Comments?  Any perl programmers out here who'd care to give me an
assist in producing the more definitive daily stats summarizer? :-)

					Jordan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?13139.859856167>