Date: Wed, 10 Dec 2014 11:33:14 +0000 From: "=?utf-8?B?TG/Dr2MgQmxvdA==?=" <loic.blot@unix-experience.fr> To: "Rick Macklem" <rmacklem@uoguelph.ca> Cc: freebsd-fs@freebsd.org Subject: Re: High Kernel Load with nfsv4 Message-ID: <1e19554bc0d4eb3e8dab74e2056b5ec4@mail.unix-experience.fr> In-Reply-To: <766911003.8048587.1418095910736.JavaMail.root@uoguelph.ca> References: <766911003.8048587.1418095910736.JavaMail.root@uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Rick,=0AI'm trying NFSv3.=0ASome jails are starting very well but now = i have an issue with lockd after some minutes:=0A=0Anfs server 10.10.X.8:= /jails: lockd not responding=0Anfs server 10.10.X.8:/jails lockd is alive= again=0A=0AI look at mbuf, but i seems there is no problem.=0A=0AHere is= my rc.conf on server:=0A=0Anfs_server_enable=3D"YES"=0Anfsv4_server_enab= le=3D"YES"=0Anfsuserd_enable=3D"YES"=0Anfsd_server_flags=3D"-u -t -n 256"= =0Amountd_enable=3D"YES"=0Amountd_flags=3D"-r"=0Anfsuserd_flags=3D"-usert= imeout 0 -force 20"=0Arpcbind_enable=3D"YES"=0Arpc_lockd_enable=3D"YES"= =0Arpc_statd_enable=3D"YES"=0A=0AHere is the client:=0A=0Anfsuserd_enable= =3D"YES"=0Anfsuserd_flags=3D"-usertimeout 0 -force 20"=0Anfscbd_enable=3D= "YES"=0Arpc_lockd_enable=3D"YES"=0Arpc_statd_enable=3D"YES"=0A=0AHave you= got an idea ?=0A=0ARegards,=0A=0ALo=C3=AFc Blot,=0AUNIX Systems, Network= and Security Engineer=0Ahttp://www.unix-experience.fr=0A=0A9 d=C3=A9cemb= re 2014 04:31 "Rick Macklem" <rmacklem@uoguelph.ca> a =C3=A9crit: =0A> Lo= ic Blot wrote:=0A> =0A>> Hi rick,=0A>> =0A>> I waited 3 hours (no lag at = jail launch) and now I do: sysrc=0A>> memcached_flags=3D"-v -m 512"=0A>> = Command was very very slow...=0A>> =0A>> Here is a dd over NFS:=0A>> =0A>= > 601062912 bytes transferred in 21.060679 secs (28539579 bytes/sec)=0A> = =0A> Can you try the same read using an NFSv3 mount?=0A> (If it runs much= faster, you have probably been bitten by the ZFS=0A> "sequential vs rand= om" read heuristic which I've been told things=0A> NFS is doing "random" = reads without file handle affinity. File=0A> handle affinity is very hard= to do for NFSv4, so it isn't done.)=0A> =0A> rick=0A> =0A>> This is quit= e slow...=0A>> =0A>> You can found some nfsstat below (command isn't fini= shed yet)=0A>> =0A>> nfsstat -c -w 1=0A>> =0A>> GtAttr Lookup Rdlink Read= Write Rename Access Rddir=0A>> 0 0 0 0 0 0 0 0=0A>> 4 0 0 0 0 0 16 0=0A>= > 2 0 0 0 0 0 17 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0= 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 4 0 0 0 0 4 0=0A>> 0 0 0 0 0 0 0 0= =0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 4 0 = 0 0 0 0 3 0=0A>> 0 0 0 0 0 0 3 0=0A>> 37 10 0 8 0 0 14 1=0A>> 18 16 0 4 1= 2 4 0=0A>> 78 91 0 82 6 12 30 0=0A>> 19 18 0 2 2 4 2 0=0A>> 0 0 0 0 2 0 = 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> GtAttr Lookup Rdlink Read Write Rename Acce= ss Rddir=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0= =0A>> 0 1 0 0 0 0 1 0=0A>> 4 6 0 0 6 0 3 0=0A>> 2 0 0 0 0 0 0 0=0A>> 0 0 = 0 0 0 0 0 0=0A>> 1 0 0 0 0 0 0 0=0A>> 0 0 0 0 1 0 0 0=0A>> 0 0 0 0 0 0 0 = 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0= 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0= 0=0A>> 6 108 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> = GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A>> 0 0 0 0 0 0 0 0= =0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 = 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 98 54 0 86 11 = 0 25 0=0A>> 36 24 0 39 25 0 10 1=0A>> 67 8 0 63 63 0 41 0=0A>> 34 0 0 35 = 34 0 0 0=0A>> 75 0 0 75 77 0 0 0=0A>> 34 0 0 35 35 0 0 0=0A>> 75 0 0 74 7= 6 0 0 0=0A>> 33 0 0 34 33 0 0 0=0A>> 0 0 0 0 5 0 0 0=0A>> 0 0 0 0 0 0 6 0= =0A>> 11 0 0 0 0 0 11 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 17 0 0 0 0 1 0=0A>> G= tAttr Lookup Rdlink Read Write Rename Access Rddir=0A>> 4 5 0 0 0 0 12 0= =0A>> 2 0 0 0 0 0 26 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0= 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 4 0 0 0 0 4= 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 4 = 0 0 0 0 0 2 0=0A>> 2 0 0 0 0 0 24 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0= 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> = 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> GtAttr Lookup Rdlink Read Write= Rename Access Rddir=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 4 0 0= 0 0 0 7 0=0A>> 2 1 0 0 0 0 1 0=0A>> 0 0 0 0 2 0 0 0=0A>> 0 0 0 0 0 0 0 0= =0A>> 0 0 0 0 6 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 = 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 = 0=0A>> 4 6 0 0 0 0 3 0=0A>> 0 0 0 0 0 0 0 0=0A>> 2 0 0 0 0 0 0 0=0A>> 0 0= 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0= 0=0A>> GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A>> 0 0 0 0 = 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A= >> 0 0 0 0 0 0 0 0=0A>> 4 71 0 0 0 0 0 0=0A>> 0 1 0 0 0 0 0 0=0A>> 2 36 0= 0 0 0 1 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0= =0A>> 0 0 0 0 0 0 0 0=0A>> 1 0 0 0 0 0 1 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 = 0 0 0 0 0 0=0A>> 79 6 0 79 79 0 2 0=0A>> 25 0 0 25 26 0 6 0=0A>> 43 18 0 = 39 46 0 23 0=0A>> 36 0 0 36 36 0 31 0=0A>> 68 1 0 66 68 0 0 0=0A>> GtAttr= Lookup Rdlink Read Write Rename Access Rddir=0A>> 36 0 0 36 36 0 0 0=0A>= > 48 0 0 48 49 0 0 0=0A>> 20 0 0 20 20 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 3 = 14 0 1 0 0 11 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 4 0 0 0 = 0 4 0=0A>> 0 0 0 0 0 0 0 0=0A>> 4 22 0 0 0 0 16 0=0A>> 2 0 0 0 0 0 23 0= =0A>> =0A>> Regards,=0A>> =0A>> Lo=C3=AFc Blot,=0A>> UNIX Systems, Networ= k and Security Engineer=0A>> http://www.unix-experience.fr=0A>> =0A>> 8 d= =C3=A9cembre 2014 09:36 "Lo=C3=AFc Blot" <loic.blot@unix-experience.fr> a= =0A>> =C3=A9crit:=0A>>> Hi Rick,=0A>>> I stopped the jails this week-end = and started it this morning, i'll=0A>>> give you some stats this week.=0A= >>> =0A>>> Here is my nfsstat -m output (with your rsize/wsize tweaks)=0A= >>> =0A>>> =0A>> =0A> nfsv4,tcp,resvport,hard,cto,sec=3Dsys,acdirmin=3D3,= acdirmax=3D60,acregmin=3D5,acregmax=3D60,nametimeo=3D60,negna=0A>>> =0A>>= =0A> etimeo=3D60,rsize=3D32768,wsize=3D32768,readdirsize=3D32768,readahe= ad=3D1,wcommitsize=3D773136,timeout=3D120,retra=0A>>> s=3D2147483647=0A>>= > =0A>>> On server side my disks are on a raid controller which show a 51= 2b=0A>>> volume and write performances=0A>>> are very honest (dd if=3D/de= v/zero of=3D/jails/test.dd bs=3D4096=0A>>> count=3D100000000 =3D> 450MBps= )=0A>>> =0A>>> Regards,=0A>>> =0A>>> Lo=C3=AFc Blot,=0A>>> UNIX Systems, = Network and Security Engineer=0A>>> http://www.unix-experience.fr=0A>>> = =0A>>> 5 d=C3=A9cembre 2014 15:14 "Rick Macklem" <rmacklem@uoguelph.ca> a= =0A>>> =C3=A9crit:=0A>>> =0A>>>> Loic Blot wrote:=0A>>>> =0A>>>>> Hi,=0A>= >>>> i'm trying to create a virtualisation environment based on jails.=0A= >>>>> Those jails are stored under a big ZFS pool on a FreeBSD 9.3=0A>>>>= > which=0A>>>>> export a NFSv4 volume. This NFSv4 volume was mounted on a= big=0A>>>>> hypervisor (2 Xeon E5v3 + 128GB memory and 8 ports (but only= 1=0A>>>>> was=0A>>>>> used at this time).=0A>>>>> =0A>>>>> The problem i= s simple, my hypervisors runs 6 jails (used 1% cpu=0A>>>>> and=0A>>>>> 10= GB RAM approximatively and less than 1MB bandwidth) and works=0A>>>>> fin= e at start but the system slows down and after 2-3 days become=0A>>>>> un= usable. When i look at top command i see 80-100% on system and=0A>>>>> co= mmands are very very slow. Many process are tagged with=0A>>>>> nfs_cl*.= =0A>>>> =0A>>>> To be honest, I would expect the slowness to be because o= f slow=0A>>>> response=0A>>>> from the NFSv4 server, but if you do:=0A>>>= > # ps axHl=0A>>>> on a client when it is slow and post that, it would gi= ve us some=0A>>>> more=0A>>>> information on where the client side proces= ses are sitting.=0A>>>> If you also do something like:=0A>>>> # nfsstat -= c -w 1=0A>>>> and let it run for a while, that should show you how many R= PCs are=0A>>>> being done and which ones.=0A>>>> =0A>>>> # nfsstat -m=0A>= >>> will show you what your mount is actually using.=0A>>>> The only moun= t option I can suggest trying is=0A>>>> "rsize=3D32768,wsize=3D32768",=0A= >>>> since some network environments have difficulties with 64K.=0A>>>> = =0A>>>> There are a few things you can try on the NFSv4 server side, if i= t=0A>>>> appears=0A>>>> that the clients are generating a large RPC load.= =0A>>>> - disabling the DRC cache for TCP by setting vfs.nfsd.cachetcp=3D= 0=0A>>>> - If the server is seeing a large write RPC load, then=0A>>>> "s= ync=3Ddisabled"=0A>>>> might help, although it does run a risk of data lo= ss when the=0A>>>> server=0A>>>> crashes.=0A>>>> Then there are a couple = of other ZFS related things (I'm not a ZFS=0A>>>> guy,=0A>>>> but these h= ave shown up on the mailing lists).=0A>>>> - make sure your volumes are 4= K aligned and ashift=3D12 (in case a=0A>>>> drive=0A>>>> that uses 4K sec= tors is pretending to be 512byte sectored)=0A>>>> - never run over 70-80%= full if write performance is an issue=0A>>>> - use a zil on an SSD with = good write performance=0A>>>> =0A>>>> The only NFSv4 thing I can tell you= is that it is known that ZFS's=0A>>>> algorithm for determining sequenti= al vs random I/O fails for NFSv4=0A>>>> during writing and this can be a = performance hit. The only=0A>>>> workaround=0A>>>> is to use NFSv3 mounts= , since file handle affinity apparently=0A>>>> fixes=0A>>>> the problem a= nd this is only done=20for NFSv3.=0A>>>> =0A>>>> rick=0A>>>> =0A>>>>> I s= aw that there are TSO issues with igb then i'm trying to=0A>>>>> disable= =0A>>>>> it with sysctl but the situation wasn't solved.=0A>>>>> =0A>>>>>= Someone has got ideas ? I can give you more informations if you=0A>>>>> = need.=0A>>>>> =0A>>>>> Thanks in advance.=0A>>>>> Regards,=0A>>>>> =0A>>>= >> Lo=C3=AFc Blot,=0A>>>>> UNIX Systems, Network and Security Engineer=0A= >>>>> http://www.unix-experience.fr=0A>>>>> _____________________________= __________________=0A>>>>> freebsd-fs@freebsd.org mailing list=0A>>>>> ht= tp://lists.freebsd.org/mailman/listinfo/freebsd-fs=0A>>>>> To unsubscribe= , send any mail to=0A>>>>> "freebsd-fs-unsubscribe@freebsd.org"=0A>>> =0A= >>> _______________________________________________=0A>>> freebsd-fs@free= bsd.org mailing list=0A>>> http://lists.freebsd.org/mailman/listinfo/free= bsd-fs=0A>>> To unsubscribe, send any mail to=0A>>> "freebsd-fs-unsubscri= be@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1e19554bc0d4eb3e8dab74e2056b5ec4>