From owner-freebsd-arch@FreeBSD.ORG Sat Dec 2 00:07:37 2006 Return-Path: X-Original-To: freebsd-arch@FreeBSD.org Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 41BD616A407; Sat, 2 Dec 2006 00:07:37 +0000 (UTC) (envelope-from bde@zeta.org.au) Received: from mailout2.pacific.net.au (mailout2-3.pacific.net.au [61.8.2.226]) by mx1.FreeBSD.org (Postfix) with ESMTP id B058643CA3; Sat, 2 Dec 2006 00:07:19 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.2.162]) by mailout2.pacific.net.au (Postfix) with ESMTP id 5FBD86E3C9; Sat, 2 Dec 2006 11:07:34 +1100 (EST) Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (Postfix) with ESMTP id B72C48C02; Sat, 2 Dec 2006 11:07:33 +1100 (EST) Date: Sat, 2 Dec 2006 11:07:27 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Ivan Voras In-Reply-To: <45701A49.5020809@fer.hr> Message-ID: <20061202094431.O16375@delplex.bde.org> References: <20061119041421.I16763@delplex.bde.org> <20061126174041.V83346@fledge.watson.org> <20061128142218.P44465@fledge.watson.org> <45701A49.5020809@fer.hr> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Robert Watson , freebsd-arch@FreeBSD.org Subject: Re: What is the PREEMPTION option good for? X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Dec 2006 00:07:37 -0000 On Fri, 1 Dec 2006, Ivan Voras wrote: > Robert Watson wrote: > >> They're independent twiddles, and can be frobbed separately. If you can >> easily measure performance in the different configurations, seeing a >> table of permutations and results would be very nice to see what happens >> :-). > > Ok, this is what I found: > > - ipiwakeup doesn't produce differences as calculated by ministat > - turning off preemption produces visible differences, which are > calculated by ministat to be upto 10%. 10% is surprisingly high. I found another setup where PREEMPTION (should) help -- nfs servers. For building kernels, PREEMPTION on the client is just a tiny pessimization, but network latency is a problem for nfs and not having PREEMPTION configured makes it worse. PREEMPTION is needed even to give correct scheduling of interrupt threads, and that seems to be all that it gives, at least in the !KSE case, though the main comment about it says otherwise. From kern_switch.c: % int % maybe_preempt(struct thread *td) % { % ... % * [... conditions for preempting] % * - If the new thread's priority is not a realtime priority and ^^^^^^^^^^^^^^^^^^^^^^^ % * the current thread's priority is not an idle priority and % * FULL_PREEMPTION is disabled. % ... % #ifndef FULL_PREEMPTION % if (pri > PRI_MAX_ITHD && cpri < PRI_MIN_IDLE) % ^^^^^^^^^^^^^^^^^^ % return (0); % #endif The condition in the code is very far from being a realtime priority. "Realtime priority" is a technical term meaning "a user thread whose scheduling class is PRI_REALTIME" and there is a classification macro PRI_IS_REALTIME() for such priorities. Of course, "realtime priority" in the comment doesn't mean that -- it means something more informal, which I would expect to include all kernel threads and all realtime priority user threads. But the condition in the code is just "not an interrupt thread". I don't understand maybe_preempt_in_ksegrp() and have KSE unconfigured. FULL_PREEMPTION is apparently needed to get kernel threads preempted by anything other than interrupt threads. It is not the default, apparently because it pessimizes more cases than PREEMPTION. Anyway, with kernels already optimized by about 30% for nfs (mainly in the client), my ~5.2 UP kernel (with working preemption to interrupt threads, unlike 5.2) used as the server beats a -current UP kernel (without PREEMPTION) by about 3% in real time and 30% in dead time for building kernels with a -current SMP kernel (without PREEMPTION) as the client. The difference is entirely due to dead time somewhere in nfs. Unfortunately, turning on PREEMPTION and IPI_PREEMPTION didn't recover all the lost performance. This is despite the ~current kernel having slightly lower latency for flood pings and similar optimizations for nfs that reduce the RPC count by a factor of 4 and the ping latency by a factor of 2. In previously clipped context, Robert Watson wrote: > There's a known performance regression with PREEMPTION and loopback network > traffic on UP or UP-like systems due to a poor series of context switches > occuring in the network stack. If your benchmark involves the above web load > over the loopback, that could be the source of what you're seeing. If it's > not loopback traffic, then that's not the source of the problem. I see only a slight additional loss of performance since ~5.2 for loopback. Approximate latencies for flood pings: Celeron 366: RELENG_3: 14uS; RELENG_4: 19uS; current-2006/04/16: 48uS AthlonXP 2223: RELENG_4: 2uS; 4-5uS ... ... -current 5-6uS > You might try fiddling with kern.sched.ipiwakeup.enabled and see what the > effect is, btw -- this controls whether or not the scheduler wakes up another > idle CPU to run a thread when waking up that thread, rather than queuing it to > run which may occur on the other CPU at the next clock tick. kern.sched.ipiwakeup.enabled seems to be the default. Does it work without IPI_PREEMPTION? Is the rescheduling of even interrupt threads really delayed until the next clock tick? I guess it is -- scheduling delays are normally good for efficiency. I use HZ = 100 which might delay scheduling more than the default, but I think you mean scheduling clock ticks and stathz is normally only 128 Hz. Scheduling also occurs on other (non-fast) interrupts. Maybe the fast interrupt handers in some network drivers work better mainly because they do more forceful scheduling (of the task queue thread) than now happens for normal interrupt handlers. Bruce