Date: Mon, 15 Dec 2014 12:29:27 +0000 From: "=?utf-8?B?TG/Dr2MgQmxvdA==?=" <loic.blot@unix-experience.fr> To: "Rick Macklem" <rmacklem@uoguelph.ca> Cc: freebsd-fs@freebsd.org Subject: Re: High Kernel Load with nfsv4 Message-ID: <db7be16e523322eec76d281a9a9c5934@mail.unix-experience.fr> In-Reply-To: <2efc29240b59eabfdea79fe29744178d@mail.unix-experience.fr> References: <2efc29240b59eabfdea79fe29744178d@mail.unix-experience.fr> <fc9e829cf79a03cd72f21226d276eb78@mail.unix-experience.fr> <1280247055.9141285.1418216202088.JavaMail.root@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
Hmmm...=0Anow i'm experiencing a deadlock.=0A=0A 0 918 915 0 21 0= 12352 3372 zfs D - 1:48.64 nfsd: server (nfsd)=0A=0Athe on= ly issue was to reboot the server, but after rebooting deadlock arrives a= second time when i start my jails over NFS.=0A=0ARegards,=0A=0ALo=C3=AFc= Blot,=0AUNIX Systems, Network and Security Engineer=0Ahttp://www.unix-ex= perience.fr=0A=0A15 d=C3=A9cembre 2014 10:07 "Lo=C3=AFc Blot" <loic.blot@= unix-experience.fr> a =C3=A9crit: =0A=0AHi Rick,=0Aafter talking with my = N+1, NFSv4 is required on our infrastructure. I tried to upgrade NFSv4+ZF= S server from 9.3 to 10.1, i hope this will resolve some issues...=0A=0AR= egards,=0A=0ALo=C3=AFc Blot,=0AUNIX Systems, Network and Security Enginee= r=0Ahttp://www.unix-experience.fr=0A=0A10 d=C3=A9cembre 2014 15:36 "Lo=C3= =AFc Blot" <loic.blot@unix-experience.fr> a =C3=A9crit:=0A=0A=0AHi Rick,= =0Athanks for your suggestion.=0AFor my locking bug, rpc.lockd is stucked= in rpcrecv state on the server. kill -9 doesn't affect the=0Aprocess, it= 's blocked.... (State: Ds)=0A=0Afor the performances=0A=0ANFSv3: 60Mbps= =0ANFSv4: 45Mbps=0ARegards,=0A=0ALo=C3=AFc Blot,=0AUNIX Systems, Network = and Security Engineer=0Ahttp://www.unix-experience.fr=0A=0A10 d=C3=A9cemb= re 2014 13:56 "Rick Macklem" <rmacklem@uoguelph.ca> a =C3=A9crit:=0A=0A> = Loic Blot wrote:=0A> =0A>> Hi Rick,=0A>> I'm trying NFSv3.=0A>> Some jail= s are starting very well but now i have an issue with lockd=0A>> after so= me minutes:=0A>> =0A>> nfs server 10.10.X.8:/jails: lockd not responding= =0A>> nfs server 10.10.X.8:/jails lockd is alive again=0A>> =0A>> I look = at mbuf, but i seems there is no problem.=0A> =0A> Well, if you need lock= s to be visible across multiple clients, then=0A> I'm afraid you are stuc= k with using NFSv4 and the performance you get=0A> from it. (There is no = way to do file handle affinity for NFSv4 because=0A> the read and write o= ps are buried in the compound RPC and not easily=0A> recognized.)=0A> =0A= > If the locks don't need to be visible across multiple clients, I'd=0A> = suggest trying the "nolockd" option with nfsv3.=0A> =0A>> Here is my rc.c= onf on server:=0A>> =0A>> nfs_server_enable=3D"YES"=0A>> nfsv4_server_ena= ble=3D"YES"=0A>> nfsuserd_enable=3D"YES"=0A>> nfsd_server_flags=3D"-u -t = -n 256"=0A>> mountd_enable=3D"YES"=0A>> mountd_flags=3D"-r"=0A>> nfsuserd= _flags=3D"-usertimeout 0 -force 20"=0A>> rpcbind_enable=3D"YES"=0A>> rpc_= lockd_enable=3D"YES"=0A>> rpc_statd_enable=3D"YES"=0A>> =0A>> Here is the= client:=0A>> =0A>> nfsuserd_enable=3D"YES"=0A>> nfsuserd_flags=3D"-usert= imeout 0 -force 20"=0A>> nfscbd_enable=3D"YES"=0A>> rpc_lockd_enable=3D"Y= ES"=0A>> rpc_statd_enable=3D"YES"=0A>> =0A>> Have you got an idea ?=0A>> = =0A>> Regards,=0A>> =0A>> Lo=C3=AFc Blot,=0A>> UNIX Systems, Network and = Security Engineer=0A>> http://www.unix-experience.fr=0A>> =0A>> 9 d=C3=A9= cembre 2014 04:31 "Rick Macklem" <rmacklem@uoguelph.ca> a =C3=A9crit: =0A= >>> Loic Blot wrote:=0A>>> =0A>>>> Hi rick,=0A>>>> =0A>>>> I waited 3 hou= rs (no lag at jail launch) and now I do: sysrc=0A>>>> memcached_flags=3D"= -v -m 512"=0A>>>> Command was very very slow...=0A>>>> =0A>>>> Here is a = dd over NFS:=0A>>>> =0A>>>> 601062912 bytes transferred in 21.060679 secs= (28539579 bytes/sec)=0A>>> =0A>>> Can you try the same read using an NFS= v3 mount?=0A>>> (If it runs much faster, you have probably been bitten by= the ZFS=0A>>> "sequential vs random" read heuristic which I've been told= things=0A>>> NFS is doing "random" reads without file handle affinity. F= ile=0A>>> handle affinity is very hard to do for NFSv4, so it isn't done.= )=0A> =0A> I was actually suggesting that you try the "dd" over nfsv3 to = see how=0A> the performance compared with nfsv4. If you do that, please p= ost the=0A> comparable results.=0A> =0A> Someday I would like to try and = get ZFS's sequential vs random read=0A> heuristic modified and any info o= n what difference in performance that=0A> might make for NFS would be use= ful.=0A> =0A> rick=0A> =0A>>> rick=0A>>> =0A>>>> This is quite slow...=0A= >>>> =0A>>>> You can found some nfsstat below (command isn't finished yet= )=0A>>>> =0A>>>> nfsstat -c -w 1=0A>>>> =0A>>>> GtAttr Lookup Rdlink Read= Write Rename Access Rddir=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 0 0 0 0 0 16 0= =0A>>>> 2 0 0 0 0 0 17 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A= >>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 4 0 0 0 0 4 0=0A>>>>= 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0= 0 0 0 0 0 0=0A>>>> 4 0 0 0 0 0 3 0=0A>>>> 0 0 0 0 0 0 3 0=0A>>>> 37 10 0= 8 0 0 14 1=0A>>>> 18 16 0 4 1 2 4 0=0A>>>> 78 91 0 82 6 12 30 0=0A>>>> 1= 9 18 0 2 2 4 2 0=0A>>>> 0 0 0 0 2 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> GtA= ttr Lookup Rdlink Read Write Rename Access Rddir=0A>>>> 0 0 0 0 0 0 0 0= =0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 1 0 0 0 0 1 0=0A>= >>> 4 6 0 0 6 0 3 0=0A>>>> 2 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> = 1 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 1 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 = 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 = 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 = 0 0=0A>>>> 6 108 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 = 0=0A>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A>>>> 0 0 0= 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0= 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0= 0=0A>>>> 98 54 0 86 11 0 25 0=0A>>>> 36 24 0 39 25 0 10 1=0A>>>> 67 8 0 = 63 63 0 41 0=0A>>>> 34 0 0 35 34 0 0 0=0A>>>> 75 0 0 75 77 0 0 0=0A>>>> 3= 4 0 0 35 35 0 0 0=0A>>>> 75 0 0 74 76 0 0 0=0A>>>> 33 0 0 34 33 0 0 0=0A>= >>> 0 0 0 0 5 0 0 0=0A>>>> 0 0 0 0 0 0 6 0=0A>>>> 11 0 0 0 0 0 11 0=0A>>>= > 0 0 0 0 0 0 0 0=0A>>>> 0 17 0 0 0 0 1 0=0A>>>> GtAttr Lookup Rdlink Rea= d Write Rename Access Rddir=0A>>>> 4 5 0 0 0 0 12 0=0A>>>> 2 0 0 0 0 0 26= 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0= =0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 4 0 0 0 0 4 0=0A>= >>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> = 4 0 0 0 0 0 2 0=0A>>>> 2 0 0 0 0 0 24 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0= 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0= 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> GtAttr Look= up Rdlink Read Write Rename Access Rddir=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 = 0 0 0 0 0 0 0=0A>>>> 4 0 0 0 0 0 7 0=0A>>>> 2 1 0 0 0 0 1 0=0A>>>> 0 0 0 = 0 2 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 6 0 0 0=0A>>>> 0 0 0 0 0 = 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 = 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 6 0 0 0 0 3 0=0A= >>>> 0 0 0 0 0 0 0 0=0A>>>> 2 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>>= 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> GtA= ttr Lookup Rdlink Read Write Rename Access Rddir=0A>>>> 0 0 0 0 0 0 0 0= =0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>= >>> 0 0 0 0 0 0 0 0=0A>>>> 4 71 0 0 0 0 0 0=0A>>>> 0 1 0 0 0 0 0 0=0A>>>>= 2 36 0 0 0 0 1 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 = 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 1 0 0 0 0 0 1 0=0A>>>> 0 0 0 = 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 79 6 0 79 79 0 2 0=0A>>>> 25 0 0 = 25 26 0 6 0=0A>>>> 43 18 0 39 46 0 23 0=0A>>>> 36 0 0 36 36 0 31 0=0A>>>>= 68 1 0 66 68 0 0 0=0A>>>> GtAttr Lookup Rdlink Read Write Rename Access = Rddir=0A>>>> 36 0 0 36 36 0 0 0=0A>>>> 48 0 0 48 49 0 0 0=0A>>>> 20 0 0 2= 0 20 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 3 14 0 1 0 0 11 0=0A>>>> 0 0 0 0= 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 4 0 0 0 0 4 0=0A>>>> 0 0 0 0 0 0= 0 0=0A>>>> 4 22 0 0 0 0 16 0=0A>>>> 2 0 0 0 0 0 23 0=0A>>>> =0A>>>> Rega= rds,=0A>>>> =0A>>>> Lo=C3=AFc Blot,=0A>>>> UNIX Systems, Network and Secu= rity Engineer=0A>>>> http://www.unix-experience.fr=0A>>>> =0A>>>> 8 d=C3= =A9cembre 2014 09:36 "Lo=C3=AFc Blot" <loic.blot@unix-experience.fr> a=0A= >>>> =C3=A9crit: =0A>>>>> Hi Rick,=0A>>>>> I stopped the jails this week-= end and started it this morning,=0A>>>>> i'll=0A>>>>> give you some stats= this week.=0A>>>>> =0A>>>>> Here is my nfsstat -m output (with your rsiz= e/wsize tweaks)=0A=0Anfsv4,tcp,resvport,hard,cto,sec=3Dsys,acdirmin=3D3,a= cdirmax=3D60,acregmin=3D5,acregmax=3D60,nametimeo=3D60,negna =0A>>>>> =0A= =0Aetimeo=3D60,rsize=3D32768,wsize=3D32768,readdirsize=3D32768,readahead= =3D1,wcommitsize=3D773136,timeout=3D120,retra =0A=0A=0A=0A=0A=0A=0A=0A=0A= =0As=3D2147483647=0A=0AOn server side my disks are on a raid controller w= hich show a=0A512b=0Avolume and write performances=0Aare very honest (dd = if=3D/dev/zero of=3D/jails/test.dd bs=3D4096=0Acount=3D100000000 =3D> 450= MBps)=0A=0ARegards,=0A=0ALo=C3=AFc Blot,=0AUNIX Systems, Network and Secu= rity Engineer=0Ahttp://www.unix-experience.fr=0A=0A5 d=C3=A9cembre 2014 1= 5:14 "Rick Macklem" <rmacklem@uoguelph.ca> a=0A=C3=A9crit:=0A=0A> Loic Bl= ot wrote:=0A> =0A>> Hi,=0A>> i'm trying to create a virtualisation enviro= nment based on=0A>> jails.=0A>> Those jails are stored under a big ZFS po= ol on a FreeBSD 9.3=0A>> which=0A>> export a NFSv4 volume. This NFSv4 vol= ume was mounted on a big=0A>> hypervisor (2 Xeon E5v3 + 128GB memory and = 8 ports (but only 1=0A>> was=0A>> used at this time).=0A>> =0A>> The prob= lem is simple, my hypervisors runs 6 jails (used 1% cpu=0A>> and=0A>> 10G= B RAM approximatively and less than 1MB bandwidth) and works=0A>> fine at= start but the system slows down and after 2-3 days=0A>> become=0A>> unus= able. When i look at top command i see 80-100% on system=0A>> and=0A>> co= mmands are very very slow. Many process are tagged with=0A>> nfs_cl*.=0A>= =0A> To be honest, I would expect the slowness to be because of slow=0A>= response=0A> from the NFSv4 server, but if you do:=0A> # ps axHl=0A> on = a client when it is slow and post that, it would give us some=0A> more=0A= > information on where the client side processes are sitting.=0A> If you = also do something like:=0A> # nfsstat -c -w 1=0A> and let it run for a wh= ile, that should show you how many RPCs=0A> are=0A> being done and which = ones.=0A> =0A> # nfsstat -m=0A> will show you what your mount is actually= using.=0A> The only mount option I can suggest trying is=0A> "rsize=3D32= 768,wsize=3D32768",=0A> since some network environments have difficulties= with 64K.=0A> =0A> There are a few things you can try on the NFSv4 serve= r side, if=0A> it=0A> appears=0A> that the clients are generating a large= RPC load.=0A> - disabling the DRC cache for TCP by setting vfs.nfsd.cach= etcp=3D0=0A> - If the server is seeing a large write RPC load, then=0A> "= sync=3Ddisabled"=0A> might help, although it does run a risk of data loss= when the=0A> server=0A> crashes.=0A> Then there are a couple of other ZF= S related things (I'm not a=0A> ZFS=0A> guy,=0A> but these have shown up = on the mailing lists).=0A> - make sure your volumes are 4K aligned and as= hift=3D12 (in case a=0A> drive=0A> that uses 4K sectors is pretending to = be 512byte sectored)=0A> - never run over 70-80% full if write performanc= e is an issue=0A> - use a zil on an SSD with good write performance=0A> = =0A> The only NFSv4 thing I can tell you is that it is known that=0A> ZFS= 's=0A> algorithm for determining sequential vs random I/O fails for=0A> N= FSv4=0A> during writing and this can be a performance hit. The only=0A> w= orkaround=0A> is to use NFSv3 mounts, since file handle affinity apparent= ly=0A> fixes=0A> the problem and this is only done for NFSv3.=0A> =0A> ri= ck=0A> =0A>> I saw that there are TSO issues with igb then i'm trying to= =0A>> disable=0A>> it with sysctl but the situation wasn't solved.=0A>> = =0A>> Someone has got ideas ? I can give you more informations if you=0A>= > need.=0A>> =0A>> Thanks in advance.=0A>> Regards,=0A>> =0A>> Lo=C3=AFc = Blot,=0A>> UNIX Systems, Network and Security Engineer=0A>> http://www.un= ix-experience.fr=0A>> _______________________________________________=0A>= > freebsd-fs@freebsd.org mailing list=0A>> http://lists.freebsd.org/mailm= an/listinfo/freebsd-fs=0A>> To unsubscribe, send any mail to=0A>> "freebs= d-fs-unsubscribe@freebsd.org"=0A=0A______________________________________= _________=0Afreebsd-fs@freebsd.org mailing list=0Ahttp://lists.freebsd.or= g/mailman/listinfo/freebsd-fs=0ATo unsubscribe, send any mail to=0A"freeb= sd-fs-unsubscribe@freebsd.org"=0A=0A=0A=0A=0A=0A=0A=0A=0A=0A=0A__________= _____________________________________=0Afreebsd-fs@freebsd.org mailing li= st=0Ahttp://lists.freebsd.org/mailman/listinfo/freebsd-fs=0ATo unsubscrib= e, send any mail to "freebsd-fs-unsubscribe@freebsd.org"=0A=0A=0A________= _______________________________________=0Afreebsd-fs@freebsd.org mailing = list=0Ahttp://lists.freebsd.org/mailman/listinfo/freebsd-fs=0ATo unsubscr= ibe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?db7be16e523322eec76d281a9a9c5934>