Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Aug 2007 08:26:09 +0100 (BST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Diomidis Spinellis <dds@aueb.gr>
Cc:        attilio@FreeBSD.ORG, jeff@FreeBSD.ORG, current@FreeBSD.ORG
Subject:   Re: Crash in accounting code: encode_long(), due to bad rusage data?
Message-ID:  <20070820082429.L15651@fledge.watson.org>
In-Reply-To: <46C8C2E6.9030405@aueb.gr>
References:  <20070819214953.I1099@fledge.watson.org> <46C8C2E6.9030405@aueb.gr>

next in thread | previous in thread | raw e-mail | index | archive | help

On Mon, 20 Aug 2007, Diomidis Spinellis wrote:

> Robert Watson wrote:
>
>> I recently upgraded two servers from FreeBSD 6-STABLE to FreeBSD 7-CURRENT 
>> in anticipation of the forthcoming release.  Both of them run with 
>> accounting enabled at all times.  When a large pine session was exiting on 
>> one of the two boxes, I ran into the following panic:
>> 
>> panic: encode_long: -ve value -32749
>
> Getting rid of the panic is easy:
>
> --- kern_acct.c	2007-08-20 01:15:18.000000000 +0300
> +++ kern_acct.c.new	2007-08-20 01:16:06.000000000 +0300
> @@ -523,8 +523,7 @@
> 	int norm_exp;	/* Normalized exponent */
> 	int shift;
>
> -	KASSERT(val >= 0,  ("encode_long: -ve value %ld", val));
> -	if (val == 0)
> +	if (val <= 0)
> 		return (0);
> 	norm_exp = fls(val) - 1;
> 	shift = FLT_MANT_DIG - norm_exp - 1;
>
> However, as you wrote, this doesn't fix the root of the problem.
>
>> I find the large negative value in ru_idrss somewhat sad to contemplate, 
>> and while this could well be a problem with capturing of process runtime 
>> information, I'd like it if the accounting code were robust against this 
>> sort of bug, rather than panicking, and I guess I'd also rather than the 
>> process runtime information also be correctly captured :-).
>
> Do you think it makes any sense for encode_long to be correctly encoding 
> negative numbers, or should we concentrate on locating and fixing the 
> negative ru_idrss problem?

I'm not sure it's worth being able to encode negative numbers based on the 
current set of measurements, but it's probably worth having some sort of 
encoding for an error case.  (0) is a fine encoding of an error from my 
perspective, and we might consider a printf warning as opposed to a panic so 
we catch these in a more gentle way.

As Jeff has mentioned in his e-mail, this is almost certainly a usage stat 
that requires upgrading to 64-bit, as 32 bits for that stat is so small as to 
be entirely unuseful :-).

Robert N M Watson
Computer Laboratory
University of Cambridge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070820082429.L15651>