Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 9 Dec 2012 12:58:50 -0500
From:      Rich Kulawiec <rsk@gsp.org>
To:        chat@freebsd.org
Subject:   Re: Google spyware on FreeBSD Web site?
Message-ID:  <20121209175850.GA31072@gsp.org>
In-Reply-To: <201212041900.MAA14107@lariat.net>
References:  <201212041900.MAA14107@lariat.net>

next in thread | previous in thread | raw e-mail | index | archive | help
I often disgree with Brett, sometimes sharply, but on this
issue I strongly concur.  Some points to augment/add, briefly:

- Requiring "opt-out" is always an explicit admission that what's
being done is being inflicted on users without their prior, informed,
express consent.  It's disrepectful and abusive.  The FreeBSD project
should be better than that and never require opt-out of anything, ever.

- Of course, as has been pointed out, open-source analytic software that
runs on FreeBSD is of course not only available, but vastly preferable.

- But it won't work either, because most of the input data is crud.
(This isn't FreeBSD's fault: most of the data accumulated in most of
the web server logs globally is crud.) [1]   The conclusions drawn from
processing mostly-crud data won't have much, if any, validity.

- This also presumes that the right questions are being asked.  It's not
clear, at this point, whether they are or aren't.  If, for example,
the question is "will page Z on the FreeBSD web site work with
browser X on operating system Y?"  then resources such as BrowserShots
(http://www.browsershots.org) will provide credible answers.  If the
question is "how long is a user spending on page Z?" then no tool will
provide a credible answer.  Moreover, the question itself is pointless.

So I suggest, if the goal is to improve the web site (and that is a good
goal) an open public debate over which questions should be asked before
moving on to the question of which software tools might be able to
provide answers to those questions.  (And yes, since I'm arguing that
it should happen, I'll contribute to the effort.)

- If you want active feedback from users, then maintaining a proper
role address (webmaster@) is the best way to do that.  I see that's
already in place, and that's excellent.

- Of course standards compliance and cross-browser/cross-platform testing
are great ways to ensure that the site is as usable as possible by
as many people as possible.  Based on what I see on the site via things
like the W3C validator as well as trying it using multiple browsers on
multiple operating systems, it appears to me that considerable work
has already been done on this: the site is viewable, navigable, etc.
without issue on any of them.  Once again, that's excellent.

---rsk

[1] The majority of data found in the typical public webserver's logs is
crud because it doesn't originate from human action: it originates from
software.  Of course, in case of many common webcrawlers, this activity
is relatively easy to isolate.  But that leaves all the crud originating
from malicious/surreptitious software agents such as those running on
a few hundred million compromised/botted/zombied systems.  This data is
(mostly) functionally indistinguishable from that originating from humans,
and for many sites, it dwarfs the latter.  So while certainly it can be
fed to analytic software (along with all of the actual human-originated
data) what emerges is quite often useless.

Techniques *do* exist to isolate and filter this spurious data out,
but they're unreliable, tedious, manual, and they don't scale well.
"GIGO" is an old acronym, and not used much any more, but it certainly
applies here.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121209175850.GA31072>