From owner-freebsd-fs@freebsd.org Sun Feb 28 22:34:34 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BB098AB78B1 for ; Sun, 28 Feb 2016 22:34:34 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 9D2511A42 for ; Sun, 28 Feb 2016 22:34:34 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: by mailman.ysv.freebsd.org (Postfix) id 98570AB78B0; Sun, 28 Feb 2016 22:34:34 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 97E45AB78AF for ; Sun, 28 Feb 2016 22:34:34 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 329B51A41 for ; Sun, 28 Feb 2016 22:34:33 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:+bmXkhO/VT1Y5cKi194l6mtUPXoX/o7sNwtQ0KIMzox0KPn+rarrMEGX3/hxlliBBdydsKIbzbSI+Pm9CCQp2tWojjMrSNR0TRgLiMEbzUQLIfWuLgnFFsPsdDEwB89YVVVorDmROElRH9viNRWJ+iXhpQAbFhi3DwdpPOO9QteU1JTokb7ssMSOMk1hv3mUX/BbFF2OtwLft80b08NJC50a7V/3mEZOYPlc3mhyJFiezF7W78a0+4N/oWwL46pyv50IbaKvVb4lRrEQISovNXt9sMfxuRTrShOT+2AaX3lQmR1NRQHYukLURJD05xH7vek1/SCRPsn7SPhgQzGr5KRvRRrAlSAIKjM96GGRgcUm3/ETmw6ouxEqm92cW4qSLvcrJq4= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DOAQC+dNNW/61jaINdhH+6WAENgWaGEwKBWRQBAQEBAQEBAWMngi2CFAEBAQMBIwRSBQsCAQgOCgICDRkCAlcCBIgqCK9ujiMBAQEBAQEBAwEBAQEBARp7hReBdIJGhB2DGIE6BY0rdIRWhBeISIZ4hESDJYUtjkgCHgEBQoIDGYFmHodnfgEBAQ X-IronPort-AV: E=Sophos;i="5.22,517,1449550800"; d="scan'208";a="268225169" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 28 Feb 2016 17:34:26 -0500 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id B445315F577; Sun, 28 Feb 2016 17:34:26 -0500 (EST) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id zN-7mw4Ra7oN; Sun, 28 Feb 2016 17:34:25 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 85B8515F583; Sun, 28 Feb 2016 17:34:25 -0500 (EST) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id JuyFYBoc-CAX; Sun, 28 Feb 2016 17:34:25 -0500 (EST) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 65AFD15F57F; Sun, 28 Feb 2016 17:34:25 -0500 (EST) Date: Sun, 28 Feb 2016 17:34:25 -0500 (EST) From: Rick Macklem To: Bruce Evans Cc: fs@freebsd.org Message-ID: <631790565.12474657.1456698865377.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <20160228143858.W2389@besplex.bde.org> References: <20160226164613.N2180@besplex.bde.org> <1403082388.11082060.1456545103011.JavaMail.zimbra@uoguelph.ca> <20160227153409.W1735@besplex.bde.org> <813250886.11776455.1456622740632.JavaMail.zimbra@uoguelph.ca> <20160228143858.W2389@besplex.bde.org> Subject: Re: silly write caching in nfs3 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.10] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF44 (Win)/8.0.9_GA_6191) Thread-Topic: silly write caching in nfs3 Thread-Index: OEMFLDJC9KEWbx1HLFCY1RvpT0NrpQ== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Feb 2016 22:34:34 -0000 Bruce Evans wrote: > On Sat, 27 Feb 2016, Rick Macklem wrote: > > > Bruce Evans wrote: > >> ... > >> I blamed newnfs before :-), but when I looked at newnfs more closely I > >> found that it was almost the same lexically in the most interesting > >> places (but unfortunately has lexical differences from s/nfs/ncl/, > >> and but doesn't have enough of these differences for debugging -- > >> debugging is broken by having 2 static functions named nfs_foo() for > >> many values of foo). But newnfs seems to have always been missing this > >> critical code: > >> > >> X 1541 rgrimes int > >> X 83651 peter nfs_writerpc(struct vnode *vp, struct uio *uiop, > >> struct > >> ucred *cred, > >> X 158739 mohans int *iomode, int *must_commit) > >> X 1541 rgrimes { > >> X 9336 dfr if (v3) { > >> X 9336 dfr wccflag = NFSV3_WCCCHK; > >> X ... > >> X 158739 mohans } > >> X 158739 mohans if (wccflag) { > >> X 158739 mohans mtx_lock(&(VTONFS(vp))->n_mtx); > >> X 158739 mohans VTONFS(vp)->n_mtime = VTONFS(vp)->n_vattr.va_mtime; > >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > >> X 158739 mohans mtx_unlock(&(VTONFS(vp))->n_mtx); > >> X 158739 mohans } > >> > > Well, this code does exist in the new client. The function is > > nfsrpc_writerpc() > > found in sys/fs/nfsclient/nfs_clrpcops.c. It calls nfscl_wcc_data() which > > does > > the same test as nfsm_wcc_data_xx() did in the old client. > > Then NFSWRITERPC_SETTIME() set the time if wccflag was set. > > > > However...NFSWRITERPC_SETTIME() is broken and sets n_mtime from the one in > > the nfs vnode (same as the old code). Unfortuantely the cached value in the > > nfs > > vnode hasn't been updated at that point. (The RPC layer functions don't do > > cache > > stuff.) > > > > The attached patch fixes this. Please test with the attached patch. > > This passes my tests. newfs now works slightly better than oldnfs in > FreeBSD-10 (higher throughput in re-read test and fewer read RPCs in > compile tests). > Thanks for testing it. I can't do commits until mid-April, but will commit it then. > The negative regression in read RPCs might be a bug somewhere. In a larger > compile test FreeBSD-11 has the opposite problem of many more read RPCs. Hmm. There is very little difference between the FreeBSD-10 and FreeBSD-11 NFS code. (I have MFC'd most all of the changes.) Did you happen to look and see if the buffer cache was about the same size for both -10 and -11? > I haven't started isolating this. Previous tests isolated the following > regressions in the number of RPCs: > - r208602:208603 gives more correct attribute cache clearing but more RPCs > - r209947:209948 gives the same > - r247115:247116 gives a 5-10% increase in sys time for makeworld by doing > lots of locking in a slow way (thread_lock() in sigdefer/allowstop()). > Removing the thread locking and doing racy accesses to td_flags reduces > this overhead to below 5%. Removing the calls to sigdefer/allowstop() > reduces it to a hard-to-measure amount (thus uses even more flags in > VFS_PROLOGUE/EPILOGUE()) and doesn't remove the overhead for other file > systems, but branch prediction apparently works well unless there are > function calls there). > > For building a single kernel, in FreeBSD-[8-10] the RPC counts are > down by about 10% relative to my reference version (which has an old > implementation of negative name caching and cto improvements and a > hack to work around broken dotdot namecaching), but for makeworld they > are up by about 5%. The increase is probably another namecache or > dirents bug, or just from different cache timeouts. > > > ... > >> I tried this to see if it would fix the unordered writes. I didn't > >> expect it to do much because I usually only have a single active > >> client and a single active writer per file. It didn't make much > >> difference. > > Today I tested on a 2-core system. The loss from extra nfsiods was > much smaller than on a faster 8-core system with each core 2-3 times > as fast and a slower (max throughput 28MB/S down from 70) but slightly > lower latency network (ping latency 63 usec down from 75) (no loss up to > a file size of about 32MB). > > > ... > >>> The patches were all client side. Maybe I'll try and recreate them. > >> > >> It seems to require lots of communication between separate nfsiods to > >> even preserve an order that has carefully been set up for them. If > >> you have this then it is unclear why it can't be done more simply using > >> a single nfsiod thread (per NIC or ifq). Only 1 thread should talk to > >> the NIC/ifq, since you lose control if put other threads in between. > >> If the NIC/ifq uses multiple threads then maintaining the order is its > >> problem. > > > > Yes. The patches I had didn't guarantee complete ordering in part because > > the > > one was for the nfsiods and the other for the RPC request in the krpc code. > > I will try and redo them one of these days. > > > > I will mention that the NFS RFCs have no requirement nor recommendation for > > ordering. The RPCs are assumed to be atomic entities. Having said that, I > > do believe that FreeBSD servers will perform better when the reads/writes > > are > > in sequentially increasing byte order, so this is worth working on. I will try and come up with another patch to try and improve ordering. > > FreeBSD servers do lots of clustering, but not very well. > > Write clustering is easier than read clustering since it is reasonable to > wait a bit to combine writes. > > A single client with a single writer is an easy case. For a single client > with multiple writers, I think the nfsiods should try to keep the writes > sequential for each writer but not across writers. This gives an order > similar to local writers. FreeBSD servers (except maybe zfs ones) don't > handle multiple writers well, but an nfs client can't be expected to > understand the server's deficiencies better that the server does so that > they can be avoided. > The krpc just sees RPC requests (doesn't really know they are even NFS, although they are for the most part now.) As such, the patch I did (and will probably try and reproduce) just tried to keep the order (ie. retain the order that the krpc call got the RPC requests in). When I did a little testing, the nfsdiods didn't appear to be causing much reordering, but I should take another look at this. Thanks for all the info, rick > Bruce >