Date: Mon, 15 Dec 2014 09:07:32 +0000 From: "=?utf-8?B?TG/Dr2MgQmxvdA==?=" <loic.blot@unix-experience.fr> To: "Rick Macklem" <rmacklem@uoguelph.ca> Cc: freebsd-fs@freebsd.org Subject: Re: High Kernel Load with nfsv4 Message-ID: <2efc29240b59eabfdea79fe29744178d@mail.unix-experience.fr> In-Reply-To: <fc9e829cf79a03cd72f21226d276eb78@mail.unix-experience.fr> References: <fc9e829cf79a03cd72f21226d276eb78@mail.unix-experience.fr> <1280247055.9141285.1418216202088.JavaMail.root@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Rick, after talking with my N+1, NFSv4 is required on our infrastructure. I tried to upgrade NFSv4+ZFS server from 9.3 to 10.1, i hope this will resolve some issues... Regards, Loïc Blot, UNIX Systems, Network and Security Engineer http://www.unix-experience.fr 10 décembre 2014 15:36 "Loïc Blot" <loic.blot@unix-experience.fr> a écrit: > Hi Rick, > thanks for your suggestion. > For my locking bug, rpc.lockd is stucked in rpcrecv state on the server. kill -9 doesn't affect the > process, it's blocked.... (State: Ds) > > for the performances > > NFSv3: 60Mbps > NFSv4: 45Mbps > Regards, > > Loïc Blot, > UNIX Systems, Network and Security Engineer > http://www.unix-experience.fr > > 10 décembre 2014 13:56 "Rick Macklem" <rmacklem@uoguelph.ca> a écrit: > >> Loic Blot wrote: >> >>> Hi Rick, >>> I'm trying NFSv3. >>> Some jails are starting very well but now i have an issue with lockd >>> after some minutes: >>> >>> nfs server 10.10.X.8:/jails: lockd not responding >>> nfs server 10.10.X.8:/jails lockd is alive again >>> >>> I look at mbuf, but i seems there is no problem. >> >> Well, if you need locks to be visible across multiple clients, then >> I'm afraid you are stuck with using NFSv4 and the performance you get >> from it. (There is no way to do file handle affinity for NFSv4 because >> the read and write ops are buried in the compound RPC and not easily >> recognized.) >> >> If the locks don't need to be visible across multiple clients, I'd >> suggest trying the "nolockd" option with nfsv3. >> >>> Here is my rc.conf on server: >>> >>> nfs_server_enable="YES" >>> nfsv4_server_enable="YES" >>> nfsuserd_enable="YES" >>> nfsd_server_flags="-u -t -n 256" >>> mountd_enable="YES" >>> mountd_flags="-r" >>> nfsuserd_flags="-usertimeout 0 -force 20" >>> rpcbind_enable="YES" >>> rpc_lockd_enable="YES" >>> rpc_statd_enable="YES" >>> >>> Here is the client: >>> >>> nfsuserd_enable="YES" >>> nfsuserd_flags="-usertimeout 0 -force 20" >>> nfscbd_enable="YES" >>> rpc_lockd_enable="YES" >>> rpc_statd_enable="YES" >>> >>> Have you got an idea ? >>> >>> Regards, >>> >>> Loïc Blot, >>> UNIX Systems, Network and Security Engineer >>> http://www.unix-experience.fr >>> >>> 9 décembre 2014 04:31 "Rick Macklem" <rmacklem@uoguelph.ca> a écrit: >>>> Loic Blot wrote: >>>> >>>>> Hi rick, >>>>> >>>>> I waited 3 hours (no lag at jail launch) and now I do: sysrc >>>>> memcached_flags="-v -m 512" >>>>> Command was very very slow... >>>>> >>>>> Here is a dd over NFS: >>>>> >>>>> 601062912 bytes transferred in 21.060679 secs (28539579 bytes/sec) >>>> >>>> Can you try the same read using an NFSv3 mount? >>>> (If it runs much faster, you have probably been bitten by the ZFS >>>> "sequential vs random" read heuristic which I've been told things >>>> NFS is doing "random" reads without file handle affinity. File >>>> handle affinity is very hard to do for NFSv4, so it isn't done.) >> >> I was actually suggesting that you try the "dd" over nfsv3 to see how >> the performance compared with nfsv4. If you do that, please post the >> comparable results. >> >> Someday I would like to try and get ZFS's sequential vs random read >> heuristic modified and any info on what difference in performance that >> might make for NFS would be useful. >> >> rick >> >>>> rick >>>> >>>>> This is quite slow... >>>>> >>>>> You can found some nfsstat below (command isn't finished yet) >>>>> >>>>> nfsstat -c -w 1 >>>>> >>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir >>>>> 0 0 0 0 0 0 0 0 >>>>> 4 0 0 0 0 0 16 0 >>>>> 2 0 0 0 0 0 17 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 4 0 0 0 0 4 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 4 0 0 0 0 0 3 0 >>>>> 0 0 0 0 0 0 3 0 >>>>> 37 10 0 8 0 0 14 1 >>>>> 18 16 0 4 1 2 4 0 >>>>> 78 91 0 82 6 12 30 0 >>>>> 19 18 0 2 2 4 2 0 >>>>> 0 0 0 0 2 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 1 0 0 0 0 1 0 >>>>> 4 6 0 0 6 0 3 0 >>>>> 2 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 1 0 0 0 0 0 0 0 >>>>> 0 0 0 0 1 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 6 108 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 98 54 0 86 11 0 25 0 >>>>> 36 24 0 39 25 0 10 1 >>>>> 67 8 0 63 63 0 41 0 >>>>> 34 0 0 35 34 0 0 0 >>>>> 75 0 0 75 77 0 0 0 >>>>> 34 0 0 35 35 0 0 0 >>>>> 75 0 0 74 76 0 0 0 >>>>> 33 0 0 34 33 0 0 0 >>>>> 0 0 0 0 5 0 0 0 >>>>> 0 0 0 0 0 0 6 0 >>>>> 11 0 0 0 0 0 11 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 17 0 0 0 0 1 0 >>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir >>>>> 4 5 0 0 0 0 12 0 >>>>> 2 0 0 0 0 0 26 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 4 0 0 0 0 4 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 4 0 0 0 0 0 2 0 >>>>> 2 0 0 0 0 0 24 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 4 0 0 0 0 0 7 0 >>>>> 2 1 0 0 0 0 1 0 >>>>> 0 0 0 0 2 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 6 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 4 6 0 0 0 0 3 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 2 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 4 71 0 0 0 0 0 0 >>>>> 0 1 0 0 0 0 0 0 >>>>> 2 36 0 0 0 0 1 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 1 0 0 0 0 0 1 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 79 6 0 79 79 0 2 0 >>>>> 25 0 0 25 26 0 6 0 >>>>> 43 18 0 39 46 0 23 0 >>>>> 36 0 0 36 36 0 31 0 >>>>> 68 1 0 66 68 0 0 0 >>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir >>>>> 36 0 0 36 36 0 0 0 >>>>> 48 0 0 48 49 0 0 0 >>>>> 20 0 0 20 20 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 3 14 0 1 0 0 11 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 0 4 0 0 0 0 4 0 >>>>> 0 0 0 0 0 0 0 0 >>>>> 4 22 0 0 0 0 16 0 >>>>> 2 0 0 0 0 0 23 0 >>>>> >>>>> Regards, >>>>> >>>>> Loïc Blot, >>>>> UNIX Systems, Network and Security Engineer >>>>> http://www.unix-experience.fr >>>>> >>>>> 8 décembre 2014 09:36 "Loïc Blot" <loic.blot@unix-experience.fr> a >>>>> écrit: >>>>>> Hi Rick, >>>>>> I stopped the jails this week-end and started it this morning, >>>>>> i'll >>>>>> give you some stats this week. >>>>>> >>>>>> Here is my nfsstat -m output (with your rsize/wsize tweaks) >> >> > nfsv4,tcp,resvport,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negna >> >>>>>> >> >> > etimeo=60,rsize=32768,wsize=32768,readdirsize=32768,readahead=1,wcommitsize=773136,timeout=120,retra >> >>>>>> s=2147483647 >>>>>> >>>>>> On server side my disks are on a raid controller which show a >>>>>> 512b >>>>>> volume and write performances >>>>>> are very honest (dd if=/dev/zero of=/jails/test.dd bs=4096 >>>>>> count=100000000 => 450MBps) >>>>>> >>>>>> Regards, >>>>>> >>>>>> Loïc Blot, >>>>>> UNIX Systems, Network and Security Engineer >>>>>> http://www.unix-experience.fr >>>>>> >>>>>> 5 décembre 2014 15:14 "Rick Macklem" <rmacklem@uoguelph.ca> a >>>>>> écrit: >>>>>> >>>>>>> Loic Blot wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> i'm trying to create a virtualisation environment based on >>>>>>>> jails. >>>>>>>> Those jails are stored under a big ZFS pool on a FreeBSD 9.3 >>>>>>>> which >>>>>>>> export a NFSv4 volume. This NFSv4 volume was mounted on a big >>>>>>>> hypervisor (2 Xeon E5v3 + 128GB memory and 8 ports (but only 1 >>>>>>>> was >>>>>>>> used at this time). >>>>>>>> >>>>>>>> The problem is simple, my hypervisors runs 6 jails (used 1% cpu >>>>>>>> and >>>>>>>> 10GB RAM approximatively and less than 1MB bandwidth) and works >>>>>>>> fine at start but the system slows down and after 2-3 days >>>>>>>> become >>>>>>>> unusable. When i look at top command i see 80-100% on system >>>>>>>> and >>>>>>>> commands are very very slow. Many process are tagged with >>>>>>>> nfs_cl*. >>>>>>> >>>>>>> To be honest, I would expect the slowness to be because of slow >>>>>>> response >>>>>>> from the NFSv4 server, but if you do: >>>>>>> # ps axHl >>>>>>> on a client when it is slow and post that, it would give us some >>>>>>> more >>>>>>> information on where the client side processes are sitting. >>>>>>> If you also do something like: >>>>>>> # nfsstat -c -w 1 >>>>>>> and let it run for a while, that should show you how many RPCs >>>>>>> are >>>>>>> being done and which ones. >>>>>>> >>>>>>> # nfsstat -m >>>>>>> will show you what your mount is actually using. >>>>>>> The only mount option I can suggest trying is >>>>>>> "rsize=32768,wsize=32768", >>>>>>> since some network environments have difficulties with 64K. >>>>>>> >>>>>>> There are a few things you can try on the NFSv4 server side, if >>>>>>> it >>>>>>> appears >>>>>>> that the clients are generating a large RPC load. >>>>>>> - disabling the DRC cache for TCP by setting vfs.nfsd.cachetcp=0 >>>>>>> - If the server is seeing a large write RPC load, then >>>>>>> "sync=disabled" >>>>>>> might help, although it does run a risk of data loss when the >>>>>>> server >>>>>>> crashes. >>>>>>> Then there are a couple of other ZFS related things (I'm not a >>>>>>> ZFS >>>>>>> guy, >>>>>>> but these have shown up on the mailing lists). >>>>>>> - make sure your volumes are 4K aligned and ashift=12 (in case a >>>>>>> drive >>>>>>> that uses 4K sectors is pretending to be 512byte sectored) >>>>>>> - never run over 70-80% full if write performance is an issue >>>>>>> - use a zil on an SSD with good write performance >>>>>>> >>>>>>> The only NFSv4 thing I can tell you is that it is known that >>>>>>> ZFS's >>>>>>> algorithm for determining sequential vs random I/O fails for >>>>>>> NFSv4 >>>>>>> during writing and this can be a performance hit. The only >>>>>>> workaround >>>>>>> is to use NFSv3 mounts, since file handle affinity apparently >>>>>>> fixes >>>>>>> the problem and this is only done for NFSv3. >>>>>>> >>>>>>> rick >>>>>>> >>>>>>>> I saw that there are TSO issues with igb then i'm trying to >>>>>>>> disable >>>>>>>> it with sysctl but the situation wasn't solved. >>>>>>>> >>>>>>>> Someone has got ideas ? I can give you more informations if you >>>>>>>> need. >>>>>>>> >>>>>>>> Thanks in advance. >>>>>>>> Regards, >>>>>>>> >>>>>>>> Loïc Blot, >>>>>>>> UNIX Systems, Network and Security Engineer >>>>>>>> http://www.unix-experience.fr >>>>>>>> _______________________________________________ >>>>>>>> freebsd-fs@freebsd.org mailing list >>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>>>> To unsubscribe, send any mail to >>>>>>>> "freebsd-fs-unsubscribe@freebsd.org" >>>>>> >>>>>> _______________________________________________ >>>>>> freebsd-fs@freebsd.org mailing list >>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>> To unsubscribe, send any mail to >>>>>> "freebsd-fs-unsubscribe@freebsd.org" > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2efc29240b59eabfdea79fe29744178d>
