From owner-freebsd-arch@FreeBSD.ORG  Thu Apr 19 09:31:16 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: arch@FreeBSD.org
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id A4D7E16A404;
	Thu, 19 Apr 2007 09:31:16 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 51CFA13C46E;
	Thu, 19 Apr 2007 09:31:16 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id C5AC447413;
	Thu, 19 Apr 2007 05:31:15 -0400 (EDT)
Date: Thu, 19 Apr 2007 10:31:15 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Diomidis Spinellis <dds@aueb.gr>
In-Reply-To: <46231C64.9010707@aueb.gr>
Message-ID: <20070419101815.Y2913@fledge.watson.org>
References: <461958CC.4040804@aueb.gr>
	<20070414170218.M76326@fledge.watson.org>
	<4621E826.6050306@aueb.gr> <20070415105157.J84174@fledge.watson.org>
	<46231C64.9010707@aueb.gr>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: arch@FreeBSD.org, re@FreeBSD.org
Subject: Re: Accounting changes
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Apr 2007 09:31:16 -0000

On Mon, 16 Apr 2007, Diomidis Spinellis wrote:

> Robert Watson wrote:
>
>>>> What do you think of the idea of changing the file format a little to 
>>>> include a short file header at the front, and that the first field of 
>>>> that head is zero-filled u_int32_t, and the second a version number? 
>>>> Right now, the first field of the acct structure is the name of the 
>>>> command, which will always be a non-nul string, so always have a first 
>>>> character non-nul. If we see non-nul data in the file header's first 
>>>> field, we use the old structure layout, and otherwise we check the 
>>>> version number and use the new layout?  This would provide backwards 
>>>> compatibility for reading old accounting data, which I would think would 
>>>> generally be desirable, and allow us to explicitly version the file in 
>>>> the future.
>> 
>> The sites I know of that use accounting don't care about CPU use in the 
>> sa(8) sense at all.  They care about tracking commands run.  While acct(5) 
>> doesn't do this extraordinarily well, it does it well enough to allow basic 
>> command execution logging and analysis.  Hence the desire to be able to 
>> continue readding preserved acct(5) data files in the future.
>
> I see three options for satisfying this requirement.
>
> One is to move the existing acct.h into usr.bin/lastcomm, and add to 
> lastcomm(1) and option to read legacy files.  I don't like this approach, 
> because it doesn't include sa(8) in the picture, and, more importantly, it 
> doesn't scale well for future changes.  Every time we change the type of a 
> field of acct.h (for example widening ac_gid) we will have to add 
> architecture-specific code in the legacy file reading module.

If we're willing to assume architectures can only read their own accounting 
files (the status quo), the above argument doesn't really make sense.  You end 
up with a series of versions of "struct acct", and that code is 
architecture-neutral.  Thinking about it more, I'm not sure a per file header 
is even required or desired (as I had previously suggested), simply a 
per-record versioning scheme, allowing a reboot onto a new kernel to continue 
to write to the existing accounting data.  Read the first 16 bytes, if the 
first byte is non-0 then it's the original "struct acct" layout, and otherwise 
the second byte is the version number to use.  Or in the interests of forward 
compatibility, include a length parameter in another 16 bytes so you can skip 
over records if necessary in order to allow the kernel to move back and 
forward across file versions if there's a problem after the upgrade.

> A variation of the above approach would be to create a library for reading 
> legacy accounting data formats.  I think this is an overkill, given that the 
> two users are sa(8) and lastcomm(1), and of the two lastcomm appears to be 
> really needed.

Sounds like overkill.  All you really need is a common routine to return the 
next record in the current native version given a file descriptor for the open 
file, and that one routine can handle the versioning concerns easily.  No need 
to have a library, just compile a common .c file from the lastcomm directory 
into the sa directory (or vice versa).  Notice that sa's decoding routing 
already does conversion from the file type to C types for computation.

> The approach I favor is to add to lastcomm an option to dump an accounting 
> file in text format, and a second option to read text accounting data from 
> stdin and write it out in the current accounting file format.  Users can 
> then either store accounting data in (compressed) text files, or pipe them 
> through a pipeline that will transform the legacy format into the current 
> one.  (In the latter case they will need to keep through an upgrade a 
> lastcomm(1) binary compiled to read the legacy format - I can provide the 
> appropriate cvs incantation in UPDATING).  This approach also simplifies the 
> writing of test cases.

You're putting the burden on the people with data they need to preserve to 
deal with checking out specific revisions of accounting source code from CVS, 
get it building on whatever the current rev is (perhaps requiring a buildworld 
to get build tools and libraries), etc?  Your basic assumption in all of this 
is that no one uses or preserves accounting data, and I think that is a false 
assumption.  On all of my server boxes, I keep at least five days of back 
accounting data, and I know of sites that keep back accounting data for months 
or years.  I don't think you should be assuming no one cares about this data 
and breaking compatibility.  Since there's a structured file format, it's easy 
to provide compatibility (and we can make it easier in the future by adding 
versioning information this time).  I certainly don't object to the text 
export, but I don't think it really addresses the problem of backward 
compatibility at all.

Robert N M Watson
Computer Laboratory
University of Cambridge