From owner-freebsd-net@FreeBSD.ORG Tue Feb 24 23:26:05 2015 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7D4BD362; Tue, 24 Feb 2015 23:26:05 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id EDD02B9C; Tue, 24 Feb 2015 23:26:04 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2AeBgAbCO1U/95baINbg1haBIMEwBIKhSdJAoFkAQEBAQEBbw2EDwEBAQMBAQEBIAQnIAsFFhgCAg0ZAikBCSYGCAcEARwEiAYIDbtvmRcBAQEHAQEBAQEBARuBIYlyhB0BARsBMweCLTsSgTEFik6IbYIWgTCDOjmFMIw3IoQMIDEHgQQ5fwEBAQ X-IronPort-AV: E=Sophos;i="5.09,641,1418101200"; d="scan'208";a="192707656" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 24 Feb 2015 18:25:57 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id E36A9B3FD5; Tue, 24 Feb 2015 18:25:57 -0500 (EST) Date: Tue, 24 Feb 2015 18:25:57 -0500 (EST) From: Rick Macklem To: Tim Borgeaud Message-ID: <388835013.10159778.1424820357923.JavaMail.root@uoguelph.ca> In-Reply-To: Subject: Re: NFS: kernel modules (loading/unloading) and scheduling MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.95.12] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: freebsd-net@freebsd.org, Alexander Motin , "Kenneth D. Merry" , Mark Hills X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Feb 2015 23:26:05 -0000 Tim Borgeaud wrote: > Hi FreeBSD folks, >=20 > here at Framestore Mark Hills (cc'd) has been looking at how NFS > servers > schedule/prioritize incoming requests with a view towards a > client/user > 'fair share' of a service. >=20 > We are taking a look at trying out some simple approaches of queuing > up and > handling requests. >=20 > There are, obviously, various matters to deal with here, and several > complications. However, an initial consideration is how we might best > develop and test with the FreeBSD kernel code. Specifically, whether > we > would be able to unload and reload NFS related kernel modules, or > whether > there are any other alternatives. >=20 > It looks like there are several modules involved such as krpc, > nfssvc, > nfscommon, nfslock, which we can build outside of the kernel itself, > but > which do not all support unloading. >=20 > Not being able to reload a module does seem to present a hurdle to > development and, therefore, I'd like to know how FreeBSD developers > have > managed development of the NFS functionality and what approaches may > be > recommended. >=20 > It occurs to me that it would be possible for us to consider adding > some > ability to unload some of these modules, even if it were not suitable > for > anything other than development. Therefore, an extension of my main, > more > general, query is to ask how straightforward or fundamentally > difficult > this may be (for the NFS modules)? >=20 Well, the ones that don't allow unloading do so because they can't safely be unloaded. For most (maybe all), the problem is that there is no way for the module to know if there is an RPC in progress, so unloading them when there is traffic arriving from clients could cause a crash. If you can block all incoming NFS RPC requests (maybe just turn off the net interface(s)?), then you could probably unload/reload them without causing any disaster. (Since this hasn't been done, there may be some memory leaks and maybe a variable that needs to be re-initialized in the load.) If you grep for MOD_UNLOAD, you can probably find the code that returns an error and just comment that out. --> Then you can try and if it doesn't crash on the unload... I would have thought a server reboot wouldn't take much longer than the unload/reload and would do the same thing safely. Now, on the "big picture": - I think you'll find that the "interesting case" is when there are several RPC requests to choose from. However, the catch22 here is that, when you get an NFS server to that state, it is heavily overloaded and, as such, isn't performing at an acceptable level. Yes, there probably are short bursts of RPC traffic where the server can choose an RPC ordering, but I suspect these will usually be a burst of I/O on a single file (due to readaheads/write behinds, client side buffer cache flushing etc). --> This brings us to "file handle affinity". Before FreeBSD had shared vnode locks, the vnode lock serialized all operations on a given vnode (ie. file). If these RPCs were handed to different nfsd threads, they were all tied up doing RPCs for one file serially and weren't available for other RPCs. --> This was "solved" by assigning an nfsd thread to do Ops for a given file handle. Then shared vnodes came along and allow many ops to be done on a given file concurrently via different nfsd threads, which I'd argue is a good thing? --> Unfortunately, ken@ found that when read/wrote ops. were done on ZFS "out of sequential order", ZFS's sequential I/O heuristic would fail and decide that the I/O was random. This caused a big performance hit for ZFS. --> As such, he found that file handle affinity does help w.r.t. I/O performance for ZFS because it reduced the "out-of-sequential-order= ness" of the I/O ops. When I discussed this with mav@, he felt that ZFS's sequential I/O heuristic needed to be fixed/improved and that was where the problem should be attacked. So, I think that the benefit of file handle affinity when used with shared vnode locking is still an open question (in general, ignoring the above ZFS case). Since file handle affinity is very hard to do for NFSv4, I like the idea of fixing ZFS. I tend to think that a bias towards doing Getattr/Lookup over Read/Write may help performance (the old "shortest job first" principal), I'm not sure you'll have a big enough queue of outstanding RPCs under normal load for this to make a real difference. I don't think you want to delay doing any RPC "waiting" for a preferred RPC to arrive. Any delay like this will increase RPC response time, which is the main NFS performance issue. rick ps: NFS types are more likely to read freebsd-fs@ than freebsd-net@. pss: I hope ken and mav don't mind me adding them as cc's. > Thanks very much >=20 > -- > Tim Borgeaud > Systems Developer > =E2=80=8BFramestore > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to > "freebsd-net-unsubscribe@freebsd.org"