From owner-freebsd-hackers@FreeBSD.ORG Mon May 3 16:35:29 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id AC3C5106566B for ; Mon, 3 May 2010 16:35:29 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id CA5B38FC24 for ; Mon, 3 May 2010 16:35:28 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id o43GZOI2018954 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 3 May 2010 19:35:24 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id o43GZOmS025377; Mon, 3 May 2010 19:35:24 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id o43GZOsV025376; Mon, 3 May 2010 19:35:24 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 3 May 2010 19:35:24 +0300 From: Kostik Belousov To: Alexander Krizhanovsky Message-ID: <20100503163524.GE23646@deviant.kiev.zoral.com.ua> References: <4BCBA7F8.3060109@natsys-lab.com> <20100425145929.GJ2422@deviant.kiev.zoral.com.ua> <4BDF2E4C.4090009@natsys-lab.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="T6xhMxlHU34Bk0ad" Content-Disposition: inline In-Reply-To: <4BDF2E4C.4090009@natsys-lab.com> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-2.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_50, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-hackers@freebsd.org, ur4ina@gmail.com Subject: Re: [PATCH] RUSAGE_THREAD X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 May 2010 16:35:29 -0000 --T6xhMxlHU34Bk0ad Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, May 03, 2010 at 08:13:00PM +0000, Alexander Krizhanovsky wrote: > Kostik, >=20 > thank you very much for the review! >=20 > Kostik Belousov wrote: > >On Mon, Apr 19, 2010 at 12:46:48AM +0000, Alexander Krizhanovsky wrote: > > =20 > >>Hi all! > >> > >>This patch implements per-thread rusage statistic (like RUSAGE_THREAD i= n=20 > >>Linux and RUSAGE_LWP in OpenSolaris). > >> > >>Unfortunately, we have to acquire a number of locks to read/update=20 > >>system and user times for current thread rusage information because it'= s=20 > >>also used for whole process statistic and needs to be zeroed. > >> > >>Any comments are very appreciated. > >> > >>It's tested against 8.0-RELEASE. Appropriate PR is submitted. > >> > >>--=20 > >>Alexander Krizhanovsky > >>NatSys Lab. (http://natsys-lab.com) > >>tel: +7 (916) 717-3899, +7 (499) 747-6304 > >>e-mail: ak@natsys-lab.com > >> > >> =20 > >I think this should be somewhat modified before commit. > > > >First, please separate patch into two pieces. One would introduce > >ruxagg_tlock() helper and use it in existing locations instead of > >three-liners. Possibly, add K&R->ANSI C kern_getrusage definition > >change. > > > >Second would actually add RUSAGE_THREAD. There, please follow > >the style(9), in particular, do not initialize the local variables > >at the declaration, instead, use explicit initialization in the code. > > =20 > Done. I have also introduced third patch for casting microseconds to=20 > timeval and replaced a few appropriate places in whole kernel code. > patch-rusage-thread-03052010.txt is incremental for=20 > patch-ruxagg_tlock-03052010.txt, which is by turn incremental for=20 > patch-usec2timeval-03052010.txt. >=20 > I have made following change for calcru1(): > - if ((int64_t)tu < 0) { > - /* XXX: this should be an assert /phk */ > - printf("calcru: negative runtime of %jd usec for pid %d= =20 > (%s)\n", > - (intmax_t)tu, p->p_pid, p->p_comm); > - tu =3D ruxp->rux_tu; > - } > + KASSERT((int64_t)tu < 0, ("calcru: negative runtime of %jd usec " > + "for pid %d (%s)\n", > + (intmax_t)tu, p->p_pid, p->p_comm)); >=20 > tu can become negative at about 300 thousand years of uptime, so this=20 > condition seems quite unrealistic and indeed should be replaced by an=20 > assertion. However I didn't understand why it isn't done so far and the= =20 > comment only was added. Did I miss something? >=20 > >Should calctru() share the code with calcru1() ? I am esp. concerned > >with sanity check like negative runtime etc. Possibly, this code > >from calcru1() should be separated into function used from both > >calcru1() and calctru(). > > > >Man page for getrusage(2) should be updated. > > =20 > It's added to patch-rusage-thread-03052010.txt. >=20 > In the end - shoud I raise a new PR (the original one is kern/145813)? >=20 > >Thanks for the submission. It was some time, I already committed ruxagg_tlock() part, that caused some feedback, and I reworked the rest of the patch myself. See svn-src@ for some discussion, and the latest version that I intend to commit shortly is below. Your comments are welcome. Lets discuss third patch after this is landed. diff --git a/lib/libc/sys/getrusage.2 b/lib/libc/sys/getrusage.2 index bdf5d45..423503f 100644 --- a/lib/libc/sys/getrusage.2 +++ b/lib/libc/sys/getrusage.2 @@ -28,7 +28,7 @@ .\" @(#)getrusage.2 8.1 (Berkeley) 6/4/93 .\" $FreeBSD$ .\" -.Dd June 4, 1993 +.Dd May 1, 2010 .Dt GETRUSAGE 2 .Os .Sh NAME @@ -42,6 +42,7 @@ .In sys/resource.h .Fd "#define RUSAGE_SELF 0" .Fd "#define RUSAGE_CHILDREN -1" +.Fd "#define RUSAGE_THREAD 1" .Ft int .Fn getrusage "int who" "struct rusage *rusage" .Sh DESCRIPTION @@ -49,11 +50,12 @@ The .Fn getrusage system call returns information describing the resources utilized by the current -process, or all its terminated child processes. +thread, the current process, or all its terminated child processes. The .Fa who argument is either -.Dv RUSAGE_SELF +.Dv RUSAGE_THREAD , +.Dv RUSAGE_SELF , or .Dv RUSAGE_CHILDREN . The buffer to which @@ -175,6 +177,10 @@ The .Fn getrusage system call appeared in .Bx 4.2 . +The +.Dv RUSAGE_THREAD +facility first appeared in +.Fx 8.1 . .Sh BUGS There is no way to obtain information about a child process that has not yet terminated. diff --git a/sys/kern/kern_proc.c b/sys/kern/kern_proc.c index 49a3097..6c50f1f 100644 --- a/sys/kern/kern_proc.c +++ b/sys/kern/kern_proc.c @@ -901,7 +901,7 @@ fill_kinfo_thread(struct thread *td, struct kinfo_proc = *kp, int preferthread) kp->ki_pri.pri_user =3D td->td_user_pri; =20 if (preferthread) { - kp->ki_runtime =3D cputick2usec(td->td_runtime); + kp->ki_runtime =3D td->td_rux.rux_runtime; kp->ki_pctcpu =3D sched_pctcpu(td); kp->ki_estcpu =3D td->td_estcpu; } diff --git a/sys/kern/kern_resource.c b/sys/kern/kern_resource.c index a3ed75d..0bc78d0 100644 --- a/sys/kern/kern_resource.c +++ b/sys/kern/kern_resource.c @@ -76,7 +76,7 @@ static void calcru1(struct proc *p, struct rusage_ext *ru= xp, struct timeval *up, struct timeval *sp); static int donice(struct thread *td, struct proc *chgp, int n); static struct uidinfo *uilookup(uid_t uid); -static void ruxagg_tlock(struct proc *p, struct thread *td); +static void ruxagg(struct proc *p, struct thread *td); =20 /* * Resource controls and accounting. @@ -630,7 +630,7 @@ lim_cb(void *arg) return; PROC_SLOCK(p); FOREACH_THREAD_IN_PROC(p, td) { - ruxagg_tlock(p, td); + ruxagg(p, td); } PROC_SUNLOCK(p); if (p->p_rux.rux_runtime > p->p_cpulimit * cpu_tickrate()) { @@ -841,7 +841,7 @@ calcru(struct proc *p, struct timeval *up, struct timev= al *sp) FOREACH_THREAD_IN_PROC(p, td) { if (td->td_incruntime =3D=3D 0) continue; - ruxagg_tlock(p, td); + ruxagg(p, td); } calcru1(p, &p->p_rux, up, sp); } @@ -961,6 +961,16 @@ kern_getrusage(struct thread *td, int who, struct rusa= ge *rup) calccru(p, &rup->ru_utime, &rup->ru_stime); break; =20 + case RUSAGE_THREAD: + PROC_SLOCK(p); + ruxagg(p, td); + PROC_SUNLOCK(p); + thread_lock(td); + *rup =3D td->td_ru; + calcru1(p, &td->td_rux, &rup->ru_utime, &rup->ru_stime); + thread_unlock(td); + break; + default: error =3D EINVAL; } @@ -1001,7 +1011,7 @@ ruadd(struct rusage *ru, struct rusage_ext *rux, stru= ct rusage *ru2, * Aggregate tick counts into the proc's rusage_ext. */ void -ruxagg(struct rusage_ext *rux, struct thread *td) +ruxagg_locked(struct rusage_ext *rux, struct thread *td) { =20 THREAD_LOCK_ASSERT(td, MA_OWNED); @@ -1010,18 +1020,19 @@ ruxagg(struct rusage_ext *rux, struct thread *td) rux->rux_uticks +=3D td->td_uticks; rux->rux_sticks +=3D td->td_sticks; rux->rux_iticks +=3D td->td_iticks; - td->td_incruntime =3D 0; - td->td_uticks =3D 0; - td->td_iticks =3D 0; - td->td_sticks =3D 0; } =20 static void -ruxagg_tlock(struct proc *p, struct thread *td) +ruxagg(struct proc *p, struct thread *td) { =20 thread_lock(td); - ruxagg(&p->p_rux, td); + ruxagg_locked(&p->p_rux, td); + ruxagg_locked(&td->td_rux, td); + td->td_incruntime =3D 0; + td->td_uticks =3D 0; + td->td_iticks =3D 0; + td->td_sticks =3D 0; thread_unlock(td); } =20 @@ -1039,7 +1050,7 @@ rufetch(struct proc *p, struct rusage *ru) *ru =3D p->p_ru; if (p->p_numthreads > 0) { FOREACH_THREAD_IN_PROC(p, td) { - ruxagg_tlock(p, td); + ruxagg(p, td); rucollect(ru, &td->td_ru); } } diff --git a/sys/kern/kern_thread.c b/sys/kern/kern_thread.c index 9be4c2f..b32c584 100644 --- a/sys/kern/kern_thread.c +++ b/sys/kern/kern_thread.c @@ -432,7 +432,7 @@ thread_exit(void) PROC_UNLOCK(p); thread_lock(td); /* Save our tick information with both the thread and proc locked */ - ruxagg(&p->p_rux, td); + ruxagg_locked(&p->p_rux, td); PROC_SUNLOCK(p); td->td_state =3D TDS_INACTIVE; #ifdef WITNESS diff --git a/sys/sys/proc.h b/sys/sys/proc.h index fb31cfc..e32e494 100644 --- a/sys/sys/proc.h +++ b/sys/sys/proc.h @@ -173,6 +173,27 @@ struct kdtrace_thread; struct cpuset; =20 /* + * XXX: Does this belong in resource.h or resourcevar.h instead? + * Resource usage extension. The times in rusage structs in the kernel are + * never up to date. The actual times are kept as runtimes and tick counts + * (with control info in the "previous" times), and are converted when + * userland asks for rusage info. Backwards compatibility prevents putting + * this directly in the user-visible rusage struct. + * + * Locking for p_rux: (cj) means (j) for p_rux and (c) for p_crux. + * Locking for td_rux: (t) for all fields. + */ +struct rusage_ext { + u_int64_t rux_runtime; /* (cj) Real time. */ + u_int64_t rux_uticks; /* (cj) Statclock hits in user mode. */ + u_int64_t rux_sticks; /* (cj) Statclock hits in sys mode. */ + u_int64_t rux_iticks; /* (cj) Statclock hits in intr mode. */ + u_int64_t rux_uu; /* (c) Previous user time in usec. */ + u_int64_t rux_su; /* (c) Previous sys time in usec. */ + u_int64_t rux_tu; /* (c) Previous total time in usec. */ +}; + +/* * Kernel runnable context (thread). * This is what is put to sleep and reactivated. * Thread context. Processes may have multiple threads. @@ -219,7 +240,8 @@ struct thread { u_int td_estcpu; /* (t) estimated cpu utilization */ int td_slptick; /* (t) Time at sleep. */ int td_blktick; /* (t) Time spent blocked. */ - struct rusage td_ru; /* (t) rusage information */ + struct rusage td_ru; /* (t) rusage information. */ + struct rusage_ext td_rux; /* (t) Internal rusage information. */ uint64_t td_incruntime; /* (t) Cpu ticks to transfer to proc. */ uint64_t td_runtime; /* (t) How many cpu ticks we've run. */ u_int td_pticks; /* (t) Statclock hits for profiling */ @@ -426,26 +448,6 @@ do { \ #define TD_SET_CAN_RUN(td) (td)->td_state =3D TDS_CAN_RUN =20 /* - * XXX: Does this belong in resource.h or resourcevar.h instead? - * Resource usage extension. The times in rusage structs in the kernel are - * never up to date. The actual times are kept as runtimes and tick counts - * (with control info in the "previous" times), and are converted when - * userland asks for rusage info. Backwards compatibility prevents putting - * this directly in the user-visible rusage struct. - * - * Locking: (cj) means (j) for p_rux and (c) for p_crux. - */ -struct rusage_ext { - u_int64_t rux_runtime; /* (cj) Real time. */ - u_int64_t rux_uticks; /* (cj) Statclock hits in user mode. */ - u_int64_t rux_sticks; /* (cj) Statclock hits in sys mode. */ - u_int64_t rux_iticks; /* (cj) Statclock hits in intr mode. */ - u_int64_t rux_uu; /* (c) Previous user time in usec. */ - u_int64_t rux_su; /* (c) Previous sys time in usec. */ - u_int64_t rux_tu; /* (c) Previous total time in usec. */ -}; - -/* * Process structure. */ struct proc { diff --git a/sys/sys/resource.h b/sys/sys/resource.h index 9af96af..e703744 100644 --- a/sys/sys/resource.h +++ b/sys/sys/resource.h @@ -56,6 +56,7 @@ =20 #define RUSAGE_SELF 0 #define RUSAGE_CHILDREN -1 +#define RUSAGE_THREAD 1 =20 struct rusage { struct timeval ru_utime; /* user time used */ diff --git a/sys/sys/resourcevar.h b/sys/sys/resourcevar.h index 21728aa..95a9b49 100644 --- a/sys/sys/resourcevar.h +++ b/sys/sys/resourcevar.h @@ -131,7 +131,7 @@ void rucollect(struct rusage *ru, struct rusage *ru2); void rufetch(struct proc *p, struct rusage *ru); void rufetchcalc(struct proc *p, struct rusage *ru, struct timeval *up, struct timeval *sp); -void ruxagg(struct rusage_ext *rux, struct thread *td); +void ruxagg_locked(struct rusage_ext *rux, struct thread *td); int suswintr(void *base, int word); struct uidinfo *uifind(uid_t uid); --T6xhMxlHU34Bk0ad Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iEYEARECAAYFAkve+0sACgkQC3+MBN1Mb4hhTQCdHxQjkIky/1btVbOxD+rJK8Eg W3wAoOfLQpeAaYAMFX85U2Bph/HHyj8b =rSe2 -----END PGP SIGNATURE----- --T6xhMxlHU34Bk0ad--