From owner-freebsd-arch@FreeBSD.ORG  Wed May 30 06:18:33 2007
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
X-Original-To: arch@FreeBSD.org
Delivered-To: freebsd-arch@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 3352716A482
	for <arch@FreeBSD.org>; Wed, 30 May 2007 06:18:33 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail11.syd.optusnet.com.au (mail11.syd.optusnet.com.au
	[211.29.132.192])
	by mx1.freebsd.org (Postfix) with ESMTP id A9D9513C46C
	for <arch@FreeBSD.org>; Wed, 30 May 2007 06:18:32 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c211-30-225-63.carlnfd3.nsw.optusnet.com.au
	[211.30.225.63])
	by mail11.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	l4U6IIDS017262
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 30 May 2007 16:18:21 +1000
Date: Wed, 30 May 2007 16:18:20 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Jeff Roberson <jroberson@chesapeake.net>
In-Reply-To: <20070529195649.E661@10.0.0.1>
Message-ID: <20070530150615.Y12540@besplex.bde.org>
References: <20070529152059.Y661@10.0.0.1>
	<20070530094538.D11725@besplex.bde.org>
	<20070529195649.E661@10.0.0.1>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: arch@FreeBSD.org
Subject: Re: initial rusage patch.
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 30 May 2007 06:18:33 -0000

On Tue, 29 May 2007, Jeff Roberson wrote:

> On Wed, 30 May 2007, Bruce Evans wrote:
>
>> On Tue, 29 May 2007, Jeff Roberson wrote:
>> 
>>> http://people.freebsd.org/~jeff/rusage3.diff
>> 
>> % Index: compat/svr4/svr4_misc.c
>> % ===================================================================
>> % RCS file: /usr/home/ncvs/src/sys/compat/svr4/svr4_misc.c,v
>> % retrieving revision 1.92
>> % diff -u -p -r1.92 svr4_misc.c
>> % --- compat/svr4/svr4_misc.c	18 May 2007 07:10:44 -0000	1.92
>> % +++ compat/svr4/svr4_misc.c	29 May 2007 14:42:19 -0000
>> % @@ -1238,7 +1238,7 @@ loop:
>> %  			/* Found a zombie, so cache info in local variables. 
>> */
>> %  			pid = p->p_pid;
>> %  			status = p->p_xstat;
>> % -			ru = *p->p_ru;
>> % +			rufetch(p, &ru);
>> %  			calcru(p, &ru.ru_utime, &ru.ru_stime);
>> %  			PROC_UNLOCK(p);
>> %  			sx_sunlock(&proctree_lock);
>> 
>> This and similar changes seem to be wrong.  *p->p_ru is where things have
>> been accumulated already.  At this point (with a zombie in wait*()), I
>> think all of the threads have gone away so rufetch() will find nothing.
>> Accumulation into *p->p_ru must occur at thread exit time, and wait*()
>> doesn't need changing to access it.  The non-compat wait*() does this
>> correctly.
>
> Please refresh your patch. I fixed this in the follow up when I changed 
> rufetch() to check first for p_ruexit.  There is no p_ru and stats are not 
> accumulated in pstats anymore.  They are accumulated in p_rux and td_ru.

I don't like that fix.  Why not just use p_exitru after finalizing it
except for the times?  The non-compat wait*() still does this like I want.

>> % Index: kern/kern_acct.c
>> % ===================================================================
>> % RCS file: /usr/home/ncvs/src/sys/kern/kern_acct.c,v
>> % retrieving revision 1.89
>> % diff -u -p -r1.89 kern_acct.c
>> % --- kern/kern_acct.c	22 May 2007 06:51:37 -0000	1.89
>> % +++ kern/kern_acct.c	29 May 2007 14:43:21 -0000
>> % @@ -383,19 +383,19 @@ acct_process(struct thread *td)
>> %  	acct.ac_etime = encode_timeval(tmp);
>> % %  	/* (4) The average amount of memory used */
>> % -	r = &p->p_stats->p_ru;
>> % +	rufetch(p, &ru);
>> 
>> Should be same null change as as for wait*()?  If not, an rufetch() call
>> is probably needed before the calcru() in (2).
>
> What do you mean by 2?

Step (2) of the excessively commented steps in kern_acct.c, of which the
above shows step (4).

>> % Index: kern/kern_clock.c
>> ...

Please trim quotes.

>> % Index: kern/kern_exit.c
>> [p_exitru] is a better name [than p_ru], but more churn.
>
> I want to change the name so no one mistakenly follows the pointer.  I'm 
> explicitly breaking the API because it has a totally different use now.

It doesn't seem to any different except for the parts that I don't like.
It was already the exit ru.  Do you plan further changes than make it
a non-exit ru?

>> % @@ -445,8 +445,8 @@ retry:
>> %  	PROC_LOCK(p);
>> %  	p->p_xstat = rv;
>> %  	p->p_xthread = td;
>> % -	p->p_stats->p_ru.ru_nvcsw++;
>> % -	*p->p_ru = p->p_stats->p_ru;
>> % +	rufetch(p, p->p_exitru);
>> % +	p->p_exitru->ru_nvcsw++;
>> % %  	/*
>> %  	 * Notify interested parties of our demise.
>> 
>> This fixes 2 races (needed sched_lock here).
>> 
>> If possible this should be moved later, together with the times
>> finalization in thread_exit(), to pick up any changes that occur during
>> exit.
>
> I think I have resolved this in my most recent patch.

No, it doesn't move things later so it has incompletely finalized data,
and it restores the races fixed by calling rufetch() in the above.
thread_exit() has not yet been called for the last thread in the
process.  A statclock interrupt may generate new statistics after any
code near the above "finalizes" the statstics, and without locking a
statclock interrupt in the middle of the copying may gave an incoherent
set of statistics.  The rufetch() call in the above fixes the second
of these problems by sypplying locking.

>> % @@ -721,7 +721,7 @@ loop:
>> %  			if (status)
>> %  				*status = p->p_xstat;	/* convert to int */
>> %  			if (rusage) {
>> % -				*rusage = *p->p_ru;
>> % +				*rusage = *p->p_exitru;
>> %  				calcru(p, &rusage->ru_utime, 
>> &rusage->ru_stime);
>> %  			}
>> %
>> 
>> This is correct code for wait*() -- don't call rufetch(), but just use
>> the result of the rufetch() to p_exitru.
>
> In my newest patch rufetch() acomplishes the same thing.  It'd be nice to 
> always handle it via this function so it's consistent.  Perhaps I should 
> change this also to be an rufetch().  Now the setting of p_exitru signals 
> callers to no longer iterate over the list.

I prefer the direct access.

>> % @@ -950,22 +952,21 @@ kern_getrusage(td, who, rup)
>> % ...
>> % +ruadd(struct rusage *ru, struct rusage_ext *rux, struct rusage *ru2,
>> % +    struct rusage_ext *rux2)
>> % +{
>> % +	long *ip, *ip2;
>> % +	int i;
>> % +
>> % +	if (rux) {
>> % +		rux->rux_runtime += rux2->rux_runtime;
>> ...
>> I don't like the interface chaange.
>
> I can define another function which adds copies the ru and not the rux if you 
> like that better.

Yes, I like specialized functions.

>> % Index: sys/proc.h
>> % ===================================================================
>> % RCS file: /usr/home/ncvs/src/sys/sys/proc.h,v
>> % retrieving revision 1.477
>> % diff -u -p -r1.477 proc.h
>> % --- sys/proc.h	24 Apr 2007 10:59:21 -0000	1.477
>> % +++ sys/proc.h	29 May 2007 15:14:31 -0000
>> % @@ -49,6 +49,7 @@
>> %  #include <sys/priority.h>
>> %  #include <sys/rtprio.h>			/* XXX. */
>> %  #include <sys/runq.h>
>> % +#include <sys/resource.h>
>> 
>> More namespace pollution.  rusage_ext exists partly to avoid this include.
>> The include of rtprio is XXXed because it is pollution.
>
> Yes, I thought you wouldn't be happy about this.  I'm not impressed with the 
> idea of changing every kernel file that includes proc.h to also include 
> resource.h however.

It could remain indirect, or minimize the pollution using a new header
that declares only the resource struct (but the latter would be an
obfuscation).

>> % @@ -255,10 +256,12 @@ struct thread {
>> ...
>> % +	uint64_t	td_pticks;	/* (j) Statclock hits for profiling 
>> */
>> % +	uint64_t	td_sticks;	/* (j) Statclock hits in system mode. 
>> */
>> % +	uint64_t	td_iticks;	/* (j) Statclock hits in intr mode. 
>> */
>> % +	u_int		td_uticks;	/* (j) Statclock hits in user mode. 
>> */
>> 
>> u_int is the the correct type for these tick counters.  This is enough
>> for 388 days worth of ticks unless stathz is broken (> 128 Hz).  64-bis
>> ...
> I changed them to match the 64bit counters they were feeding into in the 
> proc.  If you like I can change them back and check for overflow.

Their overflow can't happen :-).  I think 32+64->64 bit additions are
efficient enough (more efficient that 64+64->64 on 32-bit machines)
so these counters should be optimized for space.

Bruce