From owner-freebsd-current@FreeBSD.ORG Thu Nov 10 11:45:56 2005 Return-Path: X-Original-To: freebsd-current@FreeBSD.org Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6E5E916A421; Thu, 10 Nov 2005 11:45:56 +0000 (GMT) (envelope-from gavin.atkinson@ury.york.ac.uk) Received: from mail-gw1.york.ac.uk (mail-gw1.york.ac.uk [144.32.128.246]) by mx1.FreeBSD.org (Postfix) with ESMTP id BF6E343D49; Thu, 10 Nov 2005 11:45:55 +0000 (GMT) (envelope-from gavin.atkinson@ury.york.ac.uk) Received: from ury.york.ac.uk (ury.york.ac.uk [144.32.108.81]) by mail-gw1.york.ac.uk (8.12.10/8.12.10) with ESMTP id jAABjlt6022509; Thu, 10 Nov 2005 11:45:47 GMT Received: from ury.york.ac.uk (localhost.york.ac.uk [127.0.0.1]) by ury.york.ac.uk (8.13.1/8.13.1) with ESMTP id jAABjlmS014044; Thu, 10 Nov 2005 11:45:47 GMT (envelope-from gavin.atkinson@ury.york.ac.uk) Received: from localhost (gavin@localhost) by ury.york.ac.uk (8.13.1/8.13.1/Submit) with ESMTP id jAABjlVK014041; Thu, 10 Nov 2005 11:45:47 GMT (envelope-from gavin.atkinson@ury.york.ac.uk) X-Authentication-Warning: ury.york.ac.uk: gavin owned process doing -bs Date: Thu, 10 Nov 2005 11:45:47 +0000 (GMT) From: Gavin Atkinson X-X-Sender: gavin@ury.york.ac.uk To: Robert Watson In-Reply-To: <20051102164157.A18382@fledge.watson.org> Message-ID: <20051104181424.R93043@ury.york.ac.uk> References: <1130943516.51544.34.camel@buffy.york.ac.uk> <20051102152322.GF93549@cicely12.cicely.de> <1130945849.51544.42.camel@buffy.york.ac.uk> <20051102164157.A18382@fledge.watson.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-York-MailScanner: Found to be clean X-York-MailScanner-From: gavin.atkinson@ury.york.ac.uk Cc: freebsd-current@FreeBSD.org, ticso@cicely.de Subject: Re: Poor NFS server performance in 6.0 with SMP and mpsafenet=1 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Nov 2005 11:45:56 -0000 On Wed, 2 Nov 2005, Robert Watson wrote: > On Wed, 2 Nov 2005, Gavin Atkinson wrote: >> On Wed, 2005-11-02 at 16:23 +0100, Bernd Walter wrote: >>> On Wed, Nov 02, 2005 at 02:58:36PM +0000, Gavin Atkinson wrote: >>>> I'm seeing incredibly poor performance when serving files from an SMP >>>> FreeBSD 6.0RC1 server to a Solaris 10 client. I've done some >>>> experimenting and have discovered that either removing SMP from the >>>> kernel, or setting debug.mpsafenet=0 in loader.conf massively improves >>>> the speed. Switching preemption off seems to also help. >>> Which scheduler? >> >> BSD. As I say, I'm running 6.0-RC1 with the standard GENERIC kernel, apart >> from the options I have listed as being changed above. Polling is >> therefore also not enabled. > > This does sound like a scheduling problem. I realize it's time-consuming, > but would it be possible to have you run each of the above test cases twice > more (or maybe even once) to confirm that in each case, the result is > reproduceable? I've recently been looking at a scheduling problem relating > to PREEMPTION and the netisr for loopback traffic, and is basically a result > of poorly timed context switching ending up being a worst cast scenario. I > suspect something similar is likely here. Have you tried varying the number > of nfsd worker threads on the server to see how that changes matters? No problem. Sorry it's taken so long to get back to you, it's been a hectic week :( Anyway, the trend is consistantly reproducable, although the results themselves can vary between runs in the SMP/mpsafenet cases by as much as 20%. Here are the averages of three reruns, which I've also done for ULE: 4BSD ULE No SMP, mpsafenet=1 78.7 62.7 No SMP, mpsafenet=0 71.1 76.0 No SMP, mpsafenet=1, no PREEMPTION 54.7 55.5 No SMP, mpsafenet=0, no PREEMPTION 73.6 77.6 SMP, mpsafenet=1 346.5 309.5 SMP, mpsafenet=0 56.9 88.4 SMP, mpsafenet=1, no PREEMPTION 320.2 136.6 SMP, mpsafenet=0, no PREEMPTION 57.0 77.9 The above are results for 4 nfsd servers (nfsd -n 4). It turns out that you were correct in thinking that the number of nfsd processes would make a difference, here are some timings for the GENERIC+SMP kernel (eg with PREEMPTION/4BSD, the slowest one above), with varying numbers of processes: 1 2 4 8 12 16 52.8 59.2 319.3 356.1 377.3 388.1 As before, all tests were done with freshly rebooted server and with a single "dry run" transfer to warm the vm cache up. The file transferred each time is 512meg worth of /dev/random output. I'm actually quite surprised about how much difference reducing the number of threads made. Does all of this information help track down the cause of the problem? I'm happy to time more transfers with different configs if you want to explore other avenues. Thanks, Gavin