Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 05 May 2010 00:48:53 +0000
From:      Alexander Krizhanovsky <ak@natsys-lab.com>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        freebsd-hackers@freebsd.org, svn-src-all@freebsd.org, ur4ina@gmail.com
Subject:   Re: [PATCH] RUSAGE_THREAD
Message-ID:  <4BE0C075.2040106@natsys-lab.com>
In-Reply-To: <20100503163524.GE23646@deviant.kiev.zoral.com.ua>
References:  <4BCBA7F8.3060109@natsys-lab.com> <20100425145929.GJ2422@deviant.kiev.zoral.com.ua> <4BDF2E4C.4090009@natsys-lab.com> <20100503163524.GE23646@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Konstantin,

Concerning i/o counters we collect them in rucollect() in for loop and 
update in various places, for example in vfs_bio.c. Rusage of an exiting 
threads is handled in exit1() -> ruadd().

Your version of the patch certainly is more robust, however I'm still 
concerning about following things, which could be done better:

1) Zeroing of thread runtime statistic looks fine if we use it only for 
whole process statistic calculating, but now (when it's also used as is 
for the thread statistic) it should be handled independently, i.e. 
RUSAGE_{SELF,CHILDREN} calls should not affect the thread structures 
somehow. So I suppose we need to introduce some new counters to proc 
structure and counters update code (it was stopped me to go on this way).

2) With first in mind, getruasge(RUSAGE_THREAD) is called in current 
thread and doesn't affect or use information from other threads, so it 
definitely should use less number of locks, may be with atomic 
operations for read-update-write operations. In fact the same getting 
per-thread statistic in Linux is done _without_ locks at all (however 
Linux uses different process/thread model).

If we're in time and it really looks like a problem (is getrusage() ever 
a hotspot?) I can try to prepare the patch with less locking on this 
weekend.

Also I still don't understand the sanity check in calccru1() for 
negativeness of tu ( (int64_t)tu < 0 ) - is it reachable? It seems that 
it is possible only after about 300 thousand years of uptime... Could 
you please explain this?

Should I write further about the patch to svn-src-all@ (sorry, but I'm 
new in FreeBSD mailing) ?

Kostik Belousov wrote:
> On Mon, May 03, 2010 at 08:13:00PM +0000, Alexander Krizhanovsky wrote:
>   
>> Kostik,
>>
>> thank you very much for the review!
>>
>> Kostik Belousov wrote:
>>     
>>> On Mon, Apr 19, 2010 at 12:46:48AM +0000, Alexander Krizhanovsky wrote:
>>>  
>>>       
>>>> Hi all!
>>>>
>>>> This patch implements per-thread rusage statistic (like RUSAGE_THREAD in 
>>>> Linux and RUSAGE_LWP in OpenSolaris).
>>>>
>>>> Unfortunately, we have to acquire a number of locks to read/update 
>>>> system and user times for current thread rusage information because it's 
>>>> also used for whole process statistic and needs to be zeroed.
>>>>
>>>> Any comments are very appreciated.
>>>>
>>>> It's tested against 8.0-RELEASE. Appropriate PR is submitted.
>>>>
>>>> -- 
>>>> Alexander Krizhanovsky
>>>> NatSys Lab. (http://natsys-lab.com)
>>>> tel: +7 (916) 717-3899, +7 (499) 747-6304
>>>> e-mail: ak@natsys-lab.com
>>>>
>>>>    
>>>>         
>>> I think this should be somewhat modified before commit.
>>>
>>> First, please separate patch into two pieces. One would introduce
>>> ruxagg_tlock() helper and use it in existing locations instead of
>>> three-liners. Possibly, add K&R->ANSI C kern_getrusage definition
>>> change.
>>>
>>> Second would actually add RUSAGE_THREAD. There, please follow
>>> the style(9), in particular, do not initialize the local variables
>>> at the declaration, instead, use explicit initialization in the code.
>>>  
>>>       
>> Done. I have also introduced third patch for casting microseconds to 
>> timeval and replaced a few appropriate places in whole kernel code.
>> patch-rusage-thread-03052010.txt is incremental for 
>> patch-ruxagg_tlock-03052010.txt, which is by turn incremental for 
>> patch-usec2timeval-03052010.txt.
>>
>> I have made following change for calcru1():
>> -       if ((int64_t)tu < 0) {
>> -               /* XXX: this should be an assert /phk */
>> -               printf("calcru: negative runtime of %jd usec for pid %d 
>> (%s)\n",
>> -                   (intmax_t)tu, p->p_pid, p->p_comm);
>> -               tu = ruxp->rux_tu;
>> -       }
>> +       KASSERT((int64_t)tu < 0, ("calcru: negative runtime of %jd usec "
>> +                               "for pid %d (%s)\n",
>> +                               (intmax_t)tu, p->p_pid, p->p_comm));
>>
>> tu can become negative at about 300 thousand years of uptime, so this 
>> condition seems quite unrealistic and indeed should be replaced by an 
>> assertion. However I didn't understand why it isn't done so far and the 
>> comment only was added. Did I miss something?
>>
>>     
>>> Should calctru() share the code with calcru1() ? I am esp. concerned
>>> with sanity check like negative runtime etc. Possibly, this code
>>>       
>> >from calcru1() should be separated into function used from both
>>     
>>> calcru1() and calctru().
>>>
>>> Man page for getrusage(2) should be updated.
>>>  
>>>       
>> It's added to patch-rusage-thread-03052010.txt.
>>
>> In the end - shoud I raise a new PR (the original one is kern/145813)?
>>
>>     
>>> Thanks for the submission.
>>>       
>
> It was some time, I already committed ruxagg_tlock() part, that caused
> some feedback, and I reworked the rest of the patch myself. See svn-src@
> for some discussion, and the latest version that I intend to commit
> shortly is below. Your comments are welcome.
>
> Lets discuss third patch after this is landed.
>
> diff --git a/lib/libc/sys/getrusage.2 b/lib/libc/sys/getrusage.2
> index bdf5d45..423503f 100644
> --- a/lib/libc/sys/getrusage.2
> +++ b/lib/libc/sys/getrusage.2
> @@ -28,7 +28,7 @@
>  .\"     @(#)getrusage.2	8.1 (Berkeley) 6/4/93
>  .\" $FreeBSD$
>  .\"
> -.Dd June 4, 1993
> +.Dd May 1, 2010
>  .Dt GETRUSAGE 2
>  .Os
>  .Sh NAME
> @@ -42,6 +42,7 @@
>  .In sys/resource.h
>  .Fd "#define	RUSAGE_SELF	 0"
>  .Fd "#define	RUSAGE_CHILDREN	-1"
> +.Fd "#define	RUSAGE_THREAD	1"
>  .Ft int
>  .Fn getrusage "int who" "struct rusage *rusage"
>  .Sh DESCRIPTION
> @@ -49,11 +50,12 @@ The
>  .Fn getrusage
>  system call
>  returns information describing the resources utilized by the current
> -process, or all its terminated child processes.
> +thread, the current process, or all its terminated child processes.
>  The
>  .Fa who
>  argument is either
> -.Dv RUSAGE_SELF
> +.Dv RUSAGE_THREAD ,
> +.Dv RUSAGE_SELF ,
>  or
>  .Dv RUSAGE_CHILDREN .
>  The buffer to which
> @@ -175,6 +177,10 @@ The
>  .Fn getrusage
>  system call appeared in
>  .Bx 4.2 .
> +The
> +.Dv RUSAGE_THREAD
> +facility first appeared in
> +.Fx 8.1 .
>  .Sh BUGS
>  There is no way to obtain information about a child process
>  that has not yet terminated.
> diff --git a/sys/kern/kern_proc.c b/sys/kern/kern_proc.c
> index 49a3097..6c50f1f 100644
> --- a/sys/kern/kern_proc.c
> +++ b/sys/kern/kern_proc.c
> @@ -901,7 +901,7 @@ fill_kinfo_thread(struct thread *td, struct kinfo_proc *kp, int preferthread)
>  	kp->ki_pri.pri_user = td->td_user_pri;
>  
>  	if (preferthread) {
> -		kp->ki_runtime = cputick2usec(td->td_runtime);
> +		kp->ki_runtime = td->td_rux.rux_runtime;
>  		kp->ki_pctcpu = sched_pctcpu(td);
>  		kp->ki_estcpu = td->td_estcpu;
>  	}
> diff --git a/sys/kern/kern_resource.c b/sys/kern/kern_resource.c
> index a3ed75d..0bc78d0 100644
> --- a/sys/kern/kern_resource.c
> +++ b/sys/kern/kern_resource.c
> @@ -76,7 +76,7 @@ static void	calcru1(struct proc *p, struct rusage_ext *ruxp,
>  		    struct timeval *up, struct timeval *sp);
>  static int	donice(struct thread *td, struct proc *chgp, int n);
>  static struct uidinfo *uilookup(uid_t uid);
> -static void	ruxagg_tlock(struct proc *p, struct thread *td);
> +static void	ruxagg(struct proc *p, struct thread *td);
>  
>  /*
>   * Resource controls and accounting.
> @@ -630,7 +630,7 @@ lim_cb(void *arg)
>  		return;
>  	PROC_SLOCK(p);
>  	FOREACH_THREAD_IN_PROC(p, td) {
> -		ruxagg_tlock(p, td);
> +		ruxagg(p, td);
>  	}
>  	PROC_SUNLOCK(p);
>  	if (p->p_rux.rux_runtime > p->p_cpulimit * cpu_tickrate()) {
> @@ -841,7 +841,7 @@ calcru(struct proc *p, struct timeval *up, struct timeval *sp)
>  	FOREACH_THREAD_IN_PROC(p, td) {
>  		if (td->td_incruntime == 0)
>  			continue;
> -		ruxagg_tlock(p, td);
> +		ruxagg(p, td);
>  	}
>  	calcru1(p, &p->p_rux, up, sp);
>  }
> @@ -961,6 +961,16 @@ kern_getrusage(struct thread *td, int who, struct rusage *rup)
>  		calccru(p, &rup->ru_utime, &rup->ru_stime);
>  		break;
>  
> +	case RUSAGE_THREAD:
> +		PROC_SLOCK(p);
> +		ruxagg(p, td);
> +		PROC_SUNLOCK(p);
> +		thread_lock(td);
> +		*rup = td->td_ru;
> +		calcru1(p, &td->td_rux, &rup->ru_utime, &rup->ru_stime);
> +		thread_unlock(td);
> +		break;
> +
>  	default:
>  		error = EINVAL;
>  	}
> @@ -1001,7 +1011,7 @@ ruadd(struct rusage *ru, struct rusage_ext *rux, struct rusage *ru2,
>   * Aggregate tick counts into the proc's rusage_ext.
>   */
>  void
> -ruxagg(struct rusage_ext *rux, struct thread *td)
> +ruxagg_locked(struct rusage_ext *rux, struct thread *td)
>  {
>  
>  	THREAD_LOCK_ASSERT(td, MA_OWNED);
> @@ -1010,18 +1020,19 @@ ruxagg(struct rusage_ext *rux, struct thread *td)
>  	rux->rux_uticks += td->td_uticks;
>  	rux->rux_sticks += td->td_sticks;
>  	rux->rux_iticks += td->td_iticks;
> -	td->td_incruntime = 0;
> -	td->td_uticks = 0;
> -	td->td_iticks = 0;
> -	td->td_sticks = 0;
>  }
>  
>  static void
> -ruxagg_tlock(struct proc *p, struct thread *td)
> +ruxagg(struct proc *p, struct thread *td)
>  {
>  
>  	thread_lock(td);
> -	ruxagg(&p->p_rux, td);
> +	ruxagg_locked(&p->p_rux, td);
> +	ruxagg_locked(&td->td_rux, td);
> +	td->td_incruntime = 0;
> +	td->td_uticks = 0;
> +	td->td_iticks = 0;
> +	td->td_sticks = 0;
>  	thread_unlock(td);
>  }
>  
> @@ -1039,7 +1050,7 @@ rufetch(struct proc *p, struct rusage *ru)
>  	*ru = p->p_ru;
>  	if (p->p_numthreads > 0)  {
>  		FOREACH_THREAD_IN_PROC(p, td) {
> -			ruxagg_tlock(p, td);
> +			ruxagg(p, td);
>  			rucollect(ru, &td->td_ru);
>  		}
>  	}
> diff --git a/sys/kern/kern_thread.c b/sys/kern/kern_thread.c
> index 9be4c2f..b32c584 100644
> --- a/sys/kern/kern_thread.c
> +++ b/sys/kern/kern_thread.c
> @@ -432,7 +432,7 @@ thread_exit(void)
>  	PROC_UNLOCK(p);
>  	thread_lock(td);
>  	/* Save our tick information with both the thread and proc locked */
> -	ruxagg(&p->p_rux, td);
> +	ruxagg_locked(&p->p_rux, td);
>  	PROC_SUNLOCK(p);
>  	td->td_state = TDS_INACTIVE;
>  #ifdef WITNESS
> diff --git a/sys/sys/proc.h b/sys/sys/proc.h
> index fb31cfc..e32e494 100644
> --- a/sys/sys/proc.h
> +++ b/sys/sys/proc.h
> @@ -173,6 +173,27 @@ struct kdtrace_thread;
>  struct cpuset;
>  
>  /*
> + * XXX: Does this belong in resource.h or resourcevar.h instead?
> + * Resource usage extension.  The times in rusage structs in the kernel are
> + * never up to date.  The actual times are kept as runtimes and tick counts
> + * (with control info in the "previous" times), and are converted when
> + * userland asks for rusage info.  Backwards compatibility prevents putting
> + * this directly in the user-visible rusage struct.
> + *
> + * Locking for p_rux: (cj) means (j) for p_rux and (c) for p_crux.
> + * Locking for td_rux: (t) for all fields.
> + */
> +struct rusage_ext {
> +	u_int64_t	rux_runtime;    /* (cj) Real time. */
> +	u_int64_t	rux_uticks;     /* (cj) Statclock hits in user mode. */
> +	u_int64_t	rux_sticks;     /* (cj) Statclock hits in sys mode. */
> +	u_int64_t	rux_iticks;     /* (cj) Statclock hits in intr mode. */
> +	u_int64_t	rux_uu;         /* (c) Previous user time in usec. */
> +	u_int64_t	rux_su;         /* (c) Previous sys time in usec. */
> +	u_int64_t	rux_tu;         /* (c) Previous total time in usec. */
> +};
> +
> +/*
>   * Kernel runnable context (thread).
>   * This is what is put to sleep and reactivated.
>   * Thread context.  Processes may have multiple threads.
> @@ -219,7 +240,8 @@ struct thread {
>  	u_int		td_estcpu;	/* (t) estimated cpu utilization */
>  	int		td_slptick;	/* (t) Time at sleep. */
>  	int		td_blktick;	/* (t) Time spent blocked. */
> -	struct rusage	td_ru;		/* (t) rusage information */
> +	struct rusage	td_ru;		/* (t) rusage information. */
> +	struct rusage_ext td_rux;	/* (t) Internal rusage information. */
>  	uint64_t	td_incruntime;	/* (t) Cpu ticks to transfer to proc. */
>  	uint64_t	td_runtime;	/* (t) How many cpu ticks we've run. */
>  	u_int 		td_pticks;	/* (t) Statclock hits for profiling */
> @@ -426,26 +448,6 @@ do {									\
>  #define	TD_SET_CAN_RUN(td)	(td)->td_state = TDS_CAN_RUN
>  
>  /*
> - * XXX: Does this belong in resource.h or resourcevar.h instead?
> - * Resource usage extension.  The times in rusage structs in the kernel are
> - * never up to date.  The actual times are kept as runtimes and tick counts
> - * (with control info in the "previous" times), and are converted when
> - * userland asks for rusage info.  Backwards compatibility prevents putting
> - * this directly in the user-visible rusage struct.
> - *
> - * Locking: (cj) means (j) for p_rux and (c) for p_crux.
> - */
> -struct rusage_ext {
> -	u_int64_t	rux_runtime;    /* (cj) Real time. */
> -	u_int64_t	rux_uticks;     /* (cj) Statclock hits in user mode. */
> -	u_int64_t	rux_sticks;     /* (cj) Statclock hits in sys mode. */
> -	u_int64_t	rux_iticks;     /* (cj) Statclock hits in intr mode. */
> -	u_int64_t	rux_uu;         /* (c) Previous user time in usec. */
> -	u_int64_t	rux_su;         /* (c) Previous sys time in usec. */
> -	u_int64_t	rux_tu;         /* (c) Previous total time in usec. */
> -};
> -
> -/*
>   * Process structure.
>   */
>  struct proc {
> diff --git a/sys/sys/resource.h b/sys/sys/resource.h
> index 9af96af..e703744 100644
> --- a/sys/sys/resource.h
> +++ b/sys/sys/resource.h
> @@ -56,6 +56,7 @@
>  
>  #define	RUSAGE_SELF	0
>  #define	RUSAGE_CHILDREN	-1
> +#define	RUSAGE_THREAD	1
>  
>  struct rusage {
>  	struct timeval ru_utime;	/* user time used */
> diff --git a/sys/sys/resourcevar.h b/sys/sys/resourcevar.h
> index 21728aa..95a9b49 100644
> --- a/sys/sys/resourcevar.h
> +++ b/sys/sys/resourcevar.h
> @@ -131,7 +131,7 @@ void	 rucollect(struct rusage *ru, struct rusage *ru2);
>  void	 rufetch(struct proc *p, struct rusage *ru);
>  void	 rufetchcalc(struct proc *p, struct rusage *ru, struct timeval *up,
>  	    struct timeval *sp);
> -void	 ruxagg(struct rusage_ext *rux, struct thread *td);
> +void	 ruxagg_locked(struct rusage_ext *rux, struct thread *td);
>  int	 suswintr(void *base, int word);
>  struct uidinfo
>  	*uifind(uid_t uid);
>   


-- 
Alexander Krizhanovsky
NatSys Lab. (http://natsys-lab.com)
tel: +7 (916) 717-3899, +7 (499) 747-6304
e-mail: ak@natsys-lab.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4BE0C075.2040106>