Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 15 Dec 2014 09:07:32 +0000
From:      "=?utf-8?B?TG/Dr2MgQmxvdA==?=" <loic.blot@unix-experience.fr>
To:        "Rick Macklem" <rmacklem@uoguelph.ca>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: High Kernel Load with nfsv4
Message-ID:  <2efc29240b59eabfdea79fe29744178d@mail.unix-experience.fr>
In-Reply-To: <fc9e829cf79a03cd72f21226d276eb78@mail.unix-experience.fr>
References:  <fc9e829cf79a03cd72f21226d276eb78@mail.unix-experience.fr> <1280247055.9141285.1418216202088.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Rick,=0Aafter talking with my N+1, NFSv4 is required on our infrastruc=
ture. I tried to upgrade NFSv4+ZFS server from 9.3 to 10.1, i hope this w=
ill resolve some issues...=0A=0ARegards,=0A=0ALo=C3=AFc Blot,=0AUNIX Syst=
ems, Network and Security Engineer=0Ahttp://www.unix-experience.fr=0A=0A1=
0 d=C3=A9cembre 2014 15:36 "Lo=C3=AFc Blot" <loic.blot@unix-experience.fr=
> a =C3=A9crit: =0A> Hi Rick,=0A> thanks for your suggestion.=0A> For my =
locking bug, rpc.lockd is stucked in rpcrecv state on the server. kill -9=
 doesn't affect the=0A> process, it's blocked.... (State: Ds)=0A> =0A> fo=
r the performances=0A> =0A> NFSv3: 60Mbps=0A> NFSv4: 45Mbps=0A> Regards,=
=0A> =0A> Lo=C3=AFc Blot,=0A> UNIX Systems, Network and Security Engineer=
=0A> http://www.unix-experience.fr=0A>; =0A> 10 d=C3=A9cembre 2014 13:56 "=
Rick Macklem" <rmacklem@uoguelph.ca> a =C3=A9crit:=0A> =0A>> Loic Blot wr=
ote:=0A>> =0A>>> Hi Rick,=0A>>> I'm trying NFSv3.=0A>>> Some jails are st=
arting very well but now i have an issue with lockd=0A>>> after some minu=
tes:=0A>>> =0A>>> nfs server 10.10.X.8:/jails: lockd not responding=0A>>>=
 nfs server 10.10.X.8:/jails lockd is alive again=0A>>> =0A>>> I look at =
mbuf, but i seems there is no problem.=0A>> =0A>> Well, if you need locks=
 to be visible across multiple clients, then=0A>> I'm afraid you are stuc=
k with using NFSv4 and the performance you get=0A>> from it. (There is no=
 way to do file handle affinity for NFSv4 because=0A>> the read and write=
 ops are buried in the compound RPC and not easily=0A>> recognized.)=0A>>=
 =0A>> If the locks don't need to be visible across multiple clients, I'd=
=0A>> suggest trying the "nolockd" option with nfsv3.=0A>> =0A>>> Here is=
 my rc.conf on server:=0A>>> =0A>>> nfs_server_enable=3D"YES"=0A>>> nfsv4=
_server_enable=3D"YES"=0A>>> nfsuserd_enable=3D"YES"=0A>>> nfsd_server_fl=
ags=3D"-u -t -n 256"=0A>>> mountd_enable=3D"YES"=0A>>> mountd_flags=3D"-r=
"=0A>>> nfsuserd_flags=3D"-usertimeout 0 -force 20"=0A>>> rpcbind_enable=
=3D"YES"=0A>>> rpc_lockd_enable=3D"YES"=0A>>> rpc_statd_enable=3D"YES"=0A=
>>> =0A>>> Here is the client:=0A>>> =0A>>> nfsuserd_enable=3D"YES"=0A>>>=
 nfsuserd_flags=3D"-usertimeout 0 -force 20"=0A>>> nfscbd_enable=3D"YES"=
=0A>>> rpc_lockd_enable=3D"YES"=0A>>> rpc_statd_enable=3D"YES"=0A>>> =0A>=
>> Have you got an idea ?=0A>>> =0A>>> Regards,=0A>>> =0A>>> Lo=C3=AFc Bl=
ot,=0A>>> UNIX Systems, Network and Security Engineer=0A>>> http://www.un=
ix-experience.fr=0A>>> =0A>>> 9 d=C3=A9cembre 2014 04:31 "Rick Macklem" <=
rmacklem@uoguelph.ca> a =C3=A9crit: =0A>>>> Loic Blot wrote:=0A>>>> =0A>>=
>>> Hi rick,=0A>>>>> =0A>>>>> I waited 3 hours (no lag at jail launch) an=
d now I do: sysrc=0A>>>>> memcached_flags=3D"-v -m 512"=0A>>>>> Command w=
as very very slow...=0A>>>>> =0A>>>>> Here is a dd over NFS:=0A>>>>> =0A>=
>>>> 601062912 bytes transferred in 21.060679 secs (28539579 bytes/sec)=
=0A>>>> =0A>>>> Can you try the same read using an NFSv3 mount?=0A>>>> (I=
f it runs much faster, you have probably been bitten by the ZFS=0A>>>> "s=
equential vs random" read heuristic which I've been told things=0A>>>> NF=
S is doing "random" reads without file handle affinity. File=0A>>>> handl=
e affinity is very hard to do for NFSv4, so it isn't done.)=0A>> =0A>> I =
was actually suggesting that you try the "dd" over nfsv3 to see how=0A>> =
the performance compared with nfsv4. If you do that, please post the=0A>>=
 comparable results.=0A>> =0A>> Someday I would like to try and get ZFS's=
 sequential vs random read=0A>> heuristic modified and any info on what d=
ifference in performance that=0A>> might make for NFS would be useful.=0A=
>> =0A>> rick=0A>> =0A>>>> rick=0A>>>> =0A>>>>> This is quite slow...=0A>=
>>>> =0A>>>>> You can found some nfsstat below (command isn't finished ye=
t)=0A>>>>> =0A>>>>> nfsstat -c -w 1=0A>>>>> =0A>>>>> GtAttr Lookup Rdlink=
 Read Write Rename Access Rddir=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 4 0 0 0 0=
 0 16 0=0A>>>>> 2 0 0 0 0 0 17 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 =
0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 4 0 0 0=
 0 4 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 =
0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 4 0 0 0 0 0 3 0=0A>>>>> 0 0 0 0 0 0=
 3 0=0A>>>>> 37 10 0 8 0 0 14 1=0A>>>>> 18 16 0 4 1 2 4 0=0A>>>>> 78 91 0=
 82 6 12 30 0=0A>>>>> 19 18 0 2 2 4 2 0=0A>>>>> 0 0 0 0 2 0 0 0=0A>>>>> 0=
 0 0 0 0 0 0 0=0A>>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddi=
r=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=
=0A>>>>> 0 1 0 0 0 0 1 0=0A>>>>> 4 6 0 0 6 0 3 0=0A>>>>> 2 0 0 0 0 0 0 0=
=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 1 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 1 0 0 0=
=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=
=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=
=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 6 108 0 0 0 0 0 =
0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> GtAttr Lookup R=
dlink Read Write Rename Access Rddir=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 =
0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0=
 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 98 54 =
0 86 11 0 25 0=0A>>>>> 36 24 0 39 25 0 10 1=0A>>>>> 67 8 0 63 63 0 41 0=
=0A>>>>> 34 0 0 35 34 0 0 0=0A>>>>> 75 0 0 75 77 0 0 0=0A>>>>> 34 0 0 35 =
35 0 0 0=0A>>>>> 75 0 0 74 76 0 0 0=0A>>>>> 33 0 0 34 33 0 0 0=0A>>>>> 0 =
0 0 0 5 0 0 0=0A>>>>> 0 0 0 0 0 0 6 0=0A>>>>> 11 0 0 0 0 0 11 0=0A>>>>> 0=
 0 0 0 0 0 0 0=0A>>>>> 0 17 0 0 0 0 1 0=0A>>>>> GtAttr Lookup Rdlink Read=
 Write Rename Access Rddir=0A>>>>> 4 5 0 0 0 0 12 0=0A>>>>> 2 0 0 0 0 0 2=
6 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0=
 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 4 0 0 0 0 4 =
0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=
=0A>>>>> 4 0 0 0 0 0 2 0=0A>>>>> 2 0 0 0 0 0 24 0=0A>>>>> 0 0 0 0 0 0 0 0=
=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=
=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=
=0A>>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A>>>>> 0 0 =
0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 4 0 0 0 0 0 7 0=0A>>>>> 2 1 0=
 0 0 0 1 0=0A>>>>> 0 0 0 0 2 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 =
0 6 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0=
 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 =
0 0 0 0=0A>>>>> 4 6 0 0 0 0 3 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 2 0 0 0 0=
 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 =
0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> GtAttr Lookup Rdlink Read Write Ren=
ame Access Rddir=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> =
0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 4=
 71 0 0 0 0 0 0=0A>>>>> 0 1 0 0 0 0 0 0=0A>>>>> 2 36 0 0 0 0 1 0=0A>>>>> =
0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0=
 0 0 0 0 0 0 0=0A>>>>> 1 0 0 0 0 0 1 0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 =
0 0 0 0 0 0 0=0A>>>>> 79 6 0 79 79 0 2 0=0A>>>>> 25 0 0 25 26 0 6 0=0A>>>=
>> 43 18 0 39 46 0 23 0=0A>>>>> 36 0 0 36 36 0 31 0=0A>>>>> 68 1 0 66 68 =
0 0 0=0A>>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A>>>>>=
 36 0 0 36 36 0 0 0=0A>>>>> 48 0 0 48 49 0 0 0=0A>>>>> 20 0 0 20 20 0 0 0=
=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 3 14 0 1 0 0 11 0=0A>>>>> 0 0 0 0 0 0 0 =
0=0A>>>>> 0 0 0 0 0 0 0 0=0A>>>>> 0 4 0 0 0 0 4 0=0A>>>>> 0 0 0 0 0 0 0 0=
=0A>>>>> 4 22 0 0 0 0 16 0=0A>>>>> 2 0 0 0 0 0 23 0=0A>>>>> =0A>>>>> Rega=
rds,=0A>>>>> =0A>>>>> Lo=C3=AFc Blot,=0A>>>>> UNIX Systems, Network and S=
ecurity Engineer=0A>>>>> http://www.unix-experience.fr=0A>>>>>; =0A>>>>> 8=
 d=C3=A9cembre 2014 09:36 "Lo=C3=AFc Blot" <loic.blot@unix-experience.fr>=
 a=0A>>>>> =C3=A9crit: =0A>>>>>> Hi Rick,=0A>>>>>> I stopped the jails th=
is week-end and started it this morning,=0A>>>>>> i'll=0A>>>>>> give you =
some stats this week.=0A>>>>>> =0A>>>>>> Here is my nfsstat -m output (wi=
th your rsize/wsize tweaks)=0A>> =0A>> =0A> nfsv4,tcp,resvport,hard,cto,s=
ec=3Dsys,acdirmin=3D3,acdirmax=3D60,acregmin=3D5,acregmax=3D60,nametimeo=
=3D60,negna=0A>> =0A>>>>>> =0A>> =0A>> =0A> etimeo=3D60,rsize=3D32768,wsi=
ze=3D32768,readdirsize=3D32768,readahead=3D1,wcommitsize=3D773136,timeout=
=3D120,retra=0A>> =0A>>>>>> s=3D2147483647=0A>>>>>> =0A>>>>>> On server s=
ide my disks are on a raid controller which show a=0A>>>>>> 512b=0A>>>>>>=
 volume and write performances=0A>>>>>> are very honest (dd if=3D/dev/zer=
o of=3D/jails/test.dd bs=3D4096=0A>>>>>> count=3D100000000 =3D> 450MBps)=
=0A>>>>>> =0A>>>>>> Regards,=0A>>>>>> =0A>>>>>> Lo=C3=AFc Blot,=0A>>>>>> =
UNIX Systems, Network and Security Engineer=0A>>>>>> http://www.unix-expe=
rience.fr=0A>>>>>> =0A>>>>>> 5 d=C3=A9cembre 2014 15:14 "Rick Macklem" <r=
macklem@uoguelph.ca> a=0A>>>>>> =C3=A9crit:=0A>>>>>> =0A>>>>>>> Loic Blot=
 wrote:=0A>>>>>>> =0A>>>>>>>> Hi,=0A>>>>>>>> i'm trying to create a virtu=
alisation environment based on=0A>>>>>>>> jails.=0A>>>>>>>> Those jails a=
re stored under a big ZFS pool on=20a FreeBSD 9.3=0A>>>>>>>> which=0A>>>>=
>>>> export a NFSv4 volume. This NFSv4 volume was mounted on a big=0A>>>>=
>>>> hypervisor (2 Xeon E5v3 + 128GB memory and 8 ports (but only 1=0A>>>=
>>>>> was=0A>>>>>>>> used at this time).=0A>>>>>>>> =0A>>>>>>>> The probl=
em is simple, my hypervisors runs 6 jails (used 1% cpu=0A>>>>>>>> and=0A>=
>>>>>>> 10GB RAM approximatively and less than 1MB bandwidth) and works=
=0A>>>>>>>> fine at start but the system slows down and after 2-3 days=0A=
>>>>>>>> become=0A>>>>>>>> unusable. When i look at top command i see 80-=
100% on system=0A>>>>>>>> and=0A>>>>>>>> commands are very very slow. Man=
y process are tagged with=0A>>>>>>>> nfs_cl*.=0A>>>>>>> =0A>>>>>>> To be =
honest, I would expect the slowness to be because of slow=0A>>>>>>> respo=
nse=0A>>>>>>> from the NFSv4 server, but if you do:=0A>>>>>>> # ps axHl=
=0A>>>>>>> on a client when it is slow and post that, it would give us so=
me=0A>>>>>>> more=0A>>>>>>> information on where the client side processe=
s are sitting.=0A>>>>>>> If you also do something like:=0A>>>>>>> # nfsst=
at -c -w 1=0A>>>>>>> and let it run for a while, that should show you how=
 many RPCs=0A>>>>>>> are=0A>>>>>>> being done and which ones.=0A>>>>>>> =
=0A>>>>>>> # nfsstat -m=0A>>>>>>> will show you what your mount is actual=
ly using.=0A>>>>>>> The only mount option I can suggest trying is=0A>>>>>=
>> "rsize=3D32768,wsize=3D32768",=0A>>>>>>> since some network environmen=
ts have difficulties with 64K.=0A>>>>>>> =0A>>>>>>> There are a few thing=
s you can try on the NFSv4 server side, if=0A>>>>>>> it=0A>>>>>>> appears=
=0A>>>>>>> that the clients are generating a large RPC load.=0A>>>>>>> - =
disabling the DRC cache for TCP by setting vfs.nfsd.cachetcp=3D0=0A>>>>>>=
> - If the server is seeing a large write RPC load, then=0A>>>>>>> "sync=
=3Ddisabled"=0A>>>>>>> might help, although it does run a risk of data lo=
ss when the=0A>>>>>>> server=0A>>>>>>> crashes.=0A>>>>>>> Then there are =
a couple of other ZFS related things (I'm not a=0A>>>>>>> ZFS=0A>>>>>>> g=
uy,=0A>>>>>>> but these have shown up on the mailing lists).=0A>>>>>>> - =
make sure your volumes are 4K aligned and ashift=3D12 (in case a=0A>>>>>>=
> drive=0A>>>>>>> that uses 4K sectors is pretending to be 512byte sector=
ed)=0A>>>>>>> - never run over 70-80% full if write performance is an iss=
ue=0A>>>>>>> - use a zil on an SSD with good write performance=0A>>>>>>> =
=0A>>>>>>> The only NFSv4 thing I can tell you is that it is known that=
=0A>>>>>>> ZFS's=0A>>>>>>> algorithm for determining sequential vs random=
 I/O fails for=0A>>>>>>> NFSv4=0A>>>>>>> during writing and this can be a=
 performance hit. The only=0A>>>>>>> workaround=0A>>>>>>> is to use NFSv3=
 mounts, since file handle affinity apparently=0A>>>>>>> fixes=0A>>>>>>> =
the problem and this is only done for NFSv3.=0A>>>>>>> =0A>>>>>>> rick=0A=
>>>>>>> =0A>>>>>>>> I saw that there are TSO issues with igb then i'm try=
ing to=0A>>>>>>>> disable=0A>>>>>>>> it with sysctl but the situation was=
n't solved.=0A>>>>>>>> =0A>>>>>>>> Someone has got ideas ? I can give you=
 more informations if you=0A>>>>>>>> need.=0A>>>>>>>> =0A>>>>>>>> Thanks =
in advance.=0A>>>>>>>> Regards,=0A>>>>>>>> =0A>>>>>>>> Lo=C3=AFc Blot,=0A=
>>>>>>>> UNIX Systems, Network and Security Engineer=0A>>>>>>>> http://ww=
w.unix-experience.fr=0A>>>>>>>> _________________________________________=
______=0A>>>>>>>> freebsd-fs@freebsd.org mailing list=0A>>>>>>>> http://l=
ists.freebsd.org/mailman/listinfo/freebsd-fs=0A>>>>>>>> To unsubscribe, s=
end any mail to=0A>>>>>>>> "freebsd-fs-unsubscribe@freebsd.org"=0A>>>>>> =
=0A>>>>>> _______________________________________________=0A>>>>>> freebs=
d-fs@freebsd.org mailing list=0A>>>>>> http://lists.freebsd.org/mailman/l=
istinfo/freebsd-fs=0A>>>>>> To unsubscribe, send any mail to=0A>>>>>> "fr=
eebsd-fs-unsubscribe@freebsd.org"=0A> =0A> ______________________________=
_________________=0A> freebsd-fs@freebsd.org mailing list=0A> http://list=
s.freebsd.org/mailman/listinfo/freebsd-fs=0A> To unsubscribe, send any ma=
il to "freebsd-fs-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2efc29240b59eabfdea79fe29744178d>