Date: Sun, 17 Jun 2007 16:37:08 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Jeff Roberson <jroberson@chesapeake.net> Cc: Attilio Rao <attilio@freebsd.org>, freebsd-arch@freebsd.org Subject: Re: Updated rusage patch Message-ID: <20070617153238.K21498@besplex.bde.org> In-Reply-To: <20070606152352.H606@10.0.0.1> References: <20070529105856.L661@10.0.0.1> <200705291456.38515.jhb@freebsd.org> <20070529121653.P661@10.0.0.1> <20070530065423.H93410@delplex.bde.org> <20070529141342.D661@10.0.0.1> <20070530125553.G12128@besplex.bde.org> <20070529201255.X661@10.0.0.1> <20070529220936.W661@10.0.0.1> <20070530201618.T13220@besplex.bde.org> <20070530115752.F661@10.0.0.1> <20070531091419.S826@besplex.bde.org> <20070531010631.N661@10.0.0.1> <20070601154833.O4207@besplex.bde.org> <20070601014601.I799@10.0.0.1> <20070601200348.G6201@delplex.bde.org> <20070601123530.B606@10.0.0.1> <20070604160036.N1084@besplex.bde.org> <46652D17.5090903@FreeBSD.org> <20070605214404.X47001@delplex.bde.org> <20070606152352.H606@10.0.0.1>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 6 Jun 2007, Jeff Roberson wrote: > I'd like to make a list of the remaining problems with rusage and potential > fixes. Then we can decide which ones myself and attilio will resolve > immediately to clean up some of the effect of the sched lock changes. I haven't verified which of these fixes is necessary and/or has been done yet. The list is a bit incomplete. 3 more minor problems turned up (one caused by applying one of these fixes?). (1) Results of some makeworlds run after the threead lock changes, all with fixes for pagezero (best previous result 827 seconds; results without touching pagezero ~845 seconds without PREEMPTION; ~837 seconds with PREEMPTION). Only the differences in the following results are interesting. % Sat Jun 9 03:28:33 UTC 2007: % 831.61 real 1308.57 user 184.80 sys % 1320199 voluntary context switches % 1533639 involuntary context switches % pgzero time 7 seconds Base result. % Wed Jun 13 14:52:15 UTC 2007: % 833.97 real 1291.71 user 201.64 sys % 1329247 voluntary context switches % 1518959 involuntary context switches % pgzero time 7 seconds Some change between June 9 and June 13 made a big difference to the user+sys decomposition. I think the June 9 result is more correct. % Wed Jun 13 14:52:15 UTC 2007: % Same kernel as previous with HZ = 1000 (HZ = 100 except as noted); stathz = 100 % 836.24 real 1310.22 user 191.04 sys % 1323793 voluntary context switches % 1559229 involuntary context switches % pgzero time 7 seconds The accuracy of the decomposition depends mainly on stathz (the decomposition is based on statclock tick counts, and there is a significant bias towards system time when the tick counts are all 0 -- see calcru1() -- which is reduced by increasing stathz) I forgot that stathz != HZ and tried the HZ = 1000 pessimization to fix it. This somehow gave the old decomposition. (2) By reading the code, in sched_throw() (from sched_4bsd.c; the version in sched_ule.c is identical; duplicating this is another bug): % /* % * A CPU is entering for the first time or a thread is exiting. % */ % void % sched_throw(struct thread *td) % { % /* % * Correct spinlock nesting. The idle thread context that we are % * borrowing was created so that it would start out with a single % * spin lock (sched_lock) held in fork_trampoline(). Since we've % * explicitly acquired locks in this function, the nesting count % * is now 2 rather than 1. Since we are nested, calling % * spinlock_exit() will simply adjust the counts without allowing % * spin lock using code to interrupt us. % */ % if (td == NULL) { % mtx_lock_spin(&sched_lock); % spinlock_exit(); % } else { % MPASS(td->td_lock == &sched_lock); % } Comment doesn't match code (comment only applies to td == NULL case). % mtx_assert(&sched_lock, MA_OWNED); % KASSERT(curthread->td_md.md_spinlock_count == 1, ("invalid count")); % PCPU_SET(switchtime, cpu_ticks()); % PCPU_SET(switchticks, ticks); % cpu_throw(td, choosethread()); /* doesn't return */ % } Setting switchtime, etc., here loses the delta between the current time and switchtime. Old code only sets switchtime when a CPU is entering for the first time. switchtime is normally not actually a switch time, but is set by thread_exit() just before calling here. Not much time should be lost from this, but lots seems to be in practice. According to a benchmark that does 100000 fork/wait/exits: 2.99 real 0.13 user 2.78 sys About 3% of the time is not accounted for. Interrupt and kernel thread time can only account for < 1%. Old code didn't get this nearly right either, despite my attempts to minimize the unaccounted-for time. Fixing it should be easier now. Of course, the part of the time for exiting cannot _all_ be accounted to the exiting thread. I want as much of it as possible to go there and the rest to the next thread (which might be idlethread in general, so the time would be almost invisible, but for the fork-wait-exit benchmark the fork-wait thread should always be switched to next to complete its wait()). (3) Bugs found while grepping near cpu_throw: - kern_thread.c has cpu_throw() hard-coded in 4 comments and one string, but now only calls sched_throw(). - sched_throw() is not declared as non-returning in sys/sched.h. - kern_thread.c has a bogus panic and NOTREACHED comment after sched_throw() doesn't return. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070617153238.K21498>