Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 15 Dec 2014 12:29:27 +0000
From:      "=?utf-8?B?TG/Dr2MgQmxvdA==?=" <loic.blot@unix-experience.fr>
To:        "Rick Macklem" <rmacklem@uoguelph.ca>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: High Kernel Load with nfsv4
Message-ID:  <db7be16e523322eec76d281a9a9c5934@mail.unix-experience.fr>
In-Reply-To: <2efc29240b59eabfdea79fe29744178d@mail.unix-experience.fr>
References:  <2efc29240b59eabfdea79fe29744178d@mail.unix-experience.fr> <fc9e829cf79a03cd72f21226d276eb78@mail.unix-experience.fr>  <1280247055.9141285.1418216202088.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
Hmmm...=0Anow i'm experiencing a deadlock.=0A=0A   0  918  915   0  21  0=
  12352 3372 zfs      D     -     1:48.64 nfsd: server (nfsd)=0A=0Athe on=
ly issue was to reboot the server, but after rebooting deadlock arrives a=
 second time when i start my jails over NFS.=0A=0ARegards,=0A=0ALo=C3=AFc=
 Blot,=0AUNIX Systems, Network and Security Engineer=0Ahttp://www.unix-ex=
perience.fr=0A=0A15 d=C3=A9cembre 2014 10:07 "Lo=C3=AFc Blot" <loic.blot@=
unix-experience.fr> a =C3=A9crit: =0A=0AHi Rick,=0Aafter talking with my =
N+1, NFSv4 is required on our infrastructure. I tried to upgrade NFSv4+ZF=
S server from 9.3 to 10.1, i hope this will resolve some issues...=0A=0AR=
egards,=0A=0ALo=C3=AFc Blot,=0AUNIX Systems, Network and Security Enginee=
r=0Ahttp://www.unix-experience.fr=0A=0A10 d=C3=A9cembre 2014 15:36 "Lo=C3=
=AFc Blot" <loic.blot@unix-experience.fr> a =C3=A9crit:=0A=0A=0AHi Rick,=
=0Athanks for your suggestion.=0AFor my locking bug, rpc.lockd is stucked=
 in rpcrecv state on the server. kill -9 doesn't affect the=0Aprocess, it=
's blocked.... (State: Ds)=0A=0Afor the performances=0A=0ANFSv3: 60Mbps=
=0ANFSv4: 45Mbps=0ARegards,=0A=0ALo=C3=AFc Blot,=0AUNIX Systems, Network =
and Security Engineer=0Ahttp://www.unix-experience.fr=0A=0A10 d=C3=A9cemb=
re 2014 13:56 "Rick Macklem" <rmacklem@uoguelph.ca> a =C3=A9crit:=0A=0A> =
Loic Blot wrote:=0A> =0A>> Hi Rick,=0A>> I'm trying NFSv3.=0A>> Some jail=
s are starting very well but now i have an issue with lockd=0A>> after so=
me minutes:=0A>> =0A>> nfs server 10.10.X.8:/jails: lockd not responding=
=0A>> nfs server 10.10.X.8:/jails lockd is alive again=0A>> =0A>> I look =
at mbuf, but i seems there is no problem.=0A> =0A> Well, if you need lock=
s to be visible across multiple clients, then=0A> I'm afraid you are stuc=
k with using NFSv4 and the performance you get=0A> from it. (There is no =
way to do file handle affinity for NFSv4 because=0A> the read and write o=
ps are buried in the compound RPC and not easily=0A> recognized.)=0A> =0A=
> If the locks don't need to be visible across multiple clients, I'd=0A> =
suggest trying the "nolockd" option with nfsv3.=0A> =0A>> Here is my rc.c=
onf on server:=0A>> =0A>> nfs_server_enable=3D"YES"=0A>> nfsv4_server_ena=
ble=3D"YES"=0A>> nfsuserd_enable=3D"YES"=0A>> nfsd_server_flags=3D"-u -t =
-n 256"=0A>> mountd_enable=3D"YES"=0A>> mountd_flags=3D"-r"=0A>> nfsuserd=
_flags=3D"-usertimeout 0 -force 20"=0A>> rpcbind_enable=3D"YES"=0A>> rpc_=
lockd_enable=3D"YES"=0A>> rpc_statd_enable=3D"YES"=0A>> =0A>> Here is the=
 client:=0A>> =0A>> nfsuserd_enable=3D"YES"=0A>> nfsuserd_flags=3D"-usert=
imeout 0 -force 20"=0A>> nfscbd_enable=3D"YES"=0A>> rpc_lockd_enable=3D"Y=
ES"=0A>> rpc_statd_enable=3D"YES"=0A>> =0A>> Have you got an idea ?=0A>> =
=0A>> Regards,=0A>> =0A>> Lo=C3=AFc Blot,=0A>> UNIX Systems, Network and =
Security Engineer=0A>> http://www.unix-experience.fr=0A>>; =0A>> 9 d=C3=A9=
cembre 2014 04:31 "Rick Macklem" <rmacklem@uoguelph.ca> a =C3=A9crit: =0A=
>>> Loic Blot wrote:=0A>>> =0A>>>> Hi rick,=0A>>>> =0A>>>> I waited 3 hou=
rs (no lag at jail launch) and now I do: sysrc=0A>>>> memcached_flags=3D"=
-v -m 512"=0A>>>> Command was very very slow...=0A>>>> =0A>>>> Here is a =
dd over NFS:=0A>>>> =0A>>>> 601062912 bytes transferred in 21.060679 secs=
 (28539579 bytes/sec)=0A>>> =0A>>> Can you try the same read using an NFS=
v3 mount?=0A>>> (If it runs much faster, you have probably been bitten by=
 the ZFS=0A>>> "sequential vs random" read heuristic which I've been told=
 things=0A>>> NFS is doing "random" reads without file handle affinity. F=
ile=0A>>> handle affinity is very hard to do for NFSv4, so it isn't done.=
)=0A> =0A> I was actually suggesting that you try the "dd" over nfsv3 to =
see how=0A> the performance compared with nfsv4. If you do that, please p=
ost the=0A> comparable results.=0A> =0A> Someday I would like to try and =
get ZFS's sequential vs random read=0A> heuristic modified and any info o=
n what difference in performance that=0A> might make for NFS would be use=
ful.=0A> =0A> rick=0A> =0A>>> rick=0A>>> =0A>>>> This is quite slow...=0A=
>>>> =0A>>>> You can found some nfsstat below (command isn't finished yet=
)=0A>>>> =0A>>>> nfsstat -c -w 1=0A>>>> =0A>>>> GtAttr Lookup Rdlink Read=
 Write Rename Access Rddir=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 0 0 0 0 0 16 0=
=0A>>>> 2 0 0 0 0 0 17 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A=
>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 4 0 0 0 0 4 0=0A>>>>=
 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0=
 0 0 0 0 0 0=0A>>>> 4 0 0 0 0 0 3 0=0A>>>> 0 0 0 0 0 0 3 0=0A>>>> 37 10 0=
 8 0 0 14 1=0A>>>> 18 16 0 4 1 2 4 0=0A>>>> 78 91 0 82 6 12 30 0=0A>>>> 1=
9 18 0 2 2 4 2 0=0A>>>> 0 0 0 0 2 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> GtA=
ttr Lookup Rdlink Read Write Rename Access Rddir=0A>>>> 0 0 0 0 0 0 0 0=
=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 1 0 0 0 0 1 0=0A>=
>>> 4 6 0 0 6 0 3 0=0A>>>> 2 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> =
1 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 1 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 =
0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 =
0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 =
0 0=0A>>>> 6 108 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 =
0=0A>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A>>>> 0 0 0=
 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0=
 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0=
 0=0A>>>> 98 54 0 86 11 0 25 0=0A>>>> 36 24 0 39 25 0 10 1=0A>>>> 67 8 0 =
63 63 0 41 0=0A>>>> 34 0 0 35 34 0 0 0=0A>>>> 75 0 0 75 77 0 0 0=0A>>>> 3=
4 0 0 35 35 0 0 0=0A>>>> 75 0 0 74 76 0 0 0=0A>>>> 33 0 0 34 33 0 0 0=0A>=
>>> 0 0 0 0 5 0 0 0=0A>>>> 0 0 0 0 0 0 6 0=0A>>>> 11 0 0 0 0 0 11 0=0A>>>=
> 0 0 0 0 0 0 0 0=0A>>>> 0 17 0 0 0 0 1 0=0A>>>> GtAttr Lookup Rdlink Rea=
d Write Rename Access Rddir=0A>>>> 4 5 0 0 0 0 12 0=0A>>>> 2 0 0 0 0 0 26=
 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=
=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 4 0 0 0 0 4 0=0A>=
>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> =
4 0 0 0 0 0 2 0=0A>>>> 2 0 0 0 0 0 24 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0=
 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0=
 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> GtAttr Look=
up Rdlink Read Write Rename Access Rddir=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 =
0 0 0 0 0 0 0=0A>>>> 4 0 0 0 0 0 7 0=0A>>>> 2 1 0 0 0 0 1 0=0A>>>> 0 0 0 =
0 2 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 6 0 0 0=0A>>>> 0 0 0 0 0 =
0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 =
0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 6 0 0 0 0 3 0=0A=
>>>> 0 0 0 0 0 0 0 0=0A>>>> 2 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>>=
 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> GtA=
ttr Lookup Rdlink Read Write Rename Access Rddir=0A>>>> 0 0 0 0 0 0 0 0=
=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>=
>>> 0 0 0 0 0 0 0 0=0A>>>> 4 71 0 0 0 0 0 0=0A>>>> 0 1 0 0 0 0 0 0=0A>>>>=
 2 36 0 0 0 0 1 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 =
0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 1 0 0 0 0 0 1 0=0A>>>> 0 0 0 =
0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 79 6 0 79 79 0 2 0=0A>>>> 25 0 0 =
25 26 0 6 0=0A>>>> 43 18 0 39 46 0 23 0=0A>>>> 36 0 0 36 36 0 31 0=0A>>>>=
 68 1 0 66 68 0 0 0=0A>>>> GtAttr Lookup Rdlink Read Write Rename Access =
Rddir=0A>>>> 36 0 0 36 36 0 0 0=0A>>>> 48 0 0 48 49 0 0 0=0A>>>> 20 0 0 2=
0 20 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 3 14 0 1 0 0 11 0=0A>>>> 0 0 0 0=
 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 4 0 0 0 0 4 0=0A>>>> 0 0 0 0 0 0=
 0 0=0A>>>> 4 22 0 0 0 0 16 0=0A>>>> 2 0 0 0 0 0 23 0=0A>>>> =0A>>>> Rega=
rds,=0A>>>> =0A>>>> Lo=C3=AFc Blot,=0A>>>> UNIX Systems, Network and Secu=
rity Engineer=0A>>>> http://www.unix-experience.fr=0A>>>>; =0A>>>> 8 d=C3=
=A9cembre 2014 09:36 "Lo=C3=AFc Blot" <loic.blot@unix-experience.fr> a=0A=
>>>> =C3=A9crit: =0A>>>>> Hi Rick,=0A>>>>> I stopped the jails this week-=
end and started it this morning,=0A>>>>> i'll=0A>>>>> give you some stats=
 this week.=0A>>>>> =0A>>>>> Here is my nfsstat -m output (with your rsiz=
e/wsize tweaks)=0A=0Anfsv4,tcp,resvport,hard,cto,sec=3Dsys,acdirmin=3D3,a=
cdirmax=3D60,acregmin=3D5,acregmax=3D60,nametimeo=3D60,negna =0A>>>>> =0A=
=0Aetimeo=3D60,rsize=3D32768,wsize=3D32768,readdirsize=3D32768,readahead=
=3D1,wcommitsize=3D773136,timeout=3D120,retra =0A=0A=0A=0A=0A=0A=0A=0A=0A=
=0As=3D2147483647=0A=0AOn server side my disks are on a raid controller w=
hich show a=0A512b=0Avolume and write performances=0Aare very honest (dd =
if=3D/dev/zero of=3D/jails/test.dd bs=3D4096=0Acount=3D100000000 =3D> 450=
MBps)=0A=0ARegards,=0A=0ALo=C3=AFc Blot,=0AUNIX Systems, Network and Secu=
rity Engineer=0Ahttp://www.unix-experience.fr=0A=0A5 d=C3=A9cembre 2014 1=
5:14 "Rick Macklem" <rmacklem@uoguelph.ca> a=0A=C3=A9crit:=0A=0A> Loic Bl=
ot wrote:=0A> =0A>> Hi,=0A>> i'm trying to create a virtualisation enviro=
nment based on=0A>> jails.=0A>> Those jails are stored under a big ZFS po=
ol on a FreeBSD 9.3=0A>> which=0A>> export a NFSv4 volume. This NFSv4 vol=
ume was mounted on a big=0A>> hypervisor (2 Xeon E5v3 + 128GB memory and =
8 ports (but only 1=0A>> was=0A>> used at this time).=0A>> =0A>> The prob=
lem is simple, my hypervisors runs 6 jails (used 1% cpu=0A>> and=0A>> 10G=
B RAM approximatively and less than 1MB bandwidth) and works=0A>> fine at=
 start but the system slows down and after 2-3 days=0A>> become=0A>> unus=
able. When i look at top command i see 80-100% on system=0A>> and=0A>> co=
mmands are very very slow. Many process are tagged with=0A>> nfs_cl*.=0A>=
 =0A> To be honest, I would expect the slowness to be because of slow=0A>=
 response=0A> from the NFSv4 server, but if you do:=0A> # ps axHl=0A> on =
a client when it is slow and post that, it would give us some=0A> more=0A=
> information on where the client side processes are sitting.=0A> If you =
also do something like:=0A> # nfsstat -c -w 1=0A> and let it run for a wh=
ile, that should show you how many RPCs=0A> are=0A> being done and which =
ones.=0A> =0A> # nfsstat -m=0A> will show you what your mount is actually=
 using.=0A> The only mount option I can suggest trying is=0A> "rsize=3D32=
768,wsize=3D32768",=0A> since some network environments have difficulties=
 with 64K.=0A> =0A> There are a few things you can try on the NFSv4 serve=
r side, if=0A> it=0A> appears=0A> that the clients are generating a large=
 RPC load.=0A> - disabling the DRC cache for TCP by setting vfs.nfsd.cach=
etcp=3D0=0A> - If the server is seeing a large write RPC load, then=0A> "=
sync=3Ddisabled"=0A> might help, although it does run a risk of data loss=
 when the=0A> server=0A> crashes.=0A> Then there are a couple of other ZF=
S related things (I'm not a=0A> ZFS=0A> guy,=0A> but these have shown up =
on the mailing lists).=0A> - make sure your volumes are 4K aligned and as=
hift=3D12 (in case a=0A> drive=0A> that uses 4K sectors is pretending to =
be 512byte sectored)=0A> - never run over 70-80% full if write performanc=
e is an issue=0A> - use a zil on an SSD with good write performance=0A> =
=0A> The only NFSv4 thing I can tell you is that it is known that=0A> ZFS=
's=0A> algorithm for determining sequential vs random I/O fails for=0A> N=
FSv4=0A> during writing and this can be a performance hit. The only=0A> w=
orkaround=0A> is to use NFSv3 mounts, since file handle affinity apparent=
ly=0A> fixes=0A> the problem and this is only done for NFSv3.=0A> =0A> ri=
ck=0A> =0A>> I saw that there are TSO issues with igb then i'm trying to=
=0A>> disable=0A>> it with sysctl but the situation wasn't solved.=0A>> =
=0A>> Someone has got ideas ? I can give you more informations if you=0A>=
> need.=0A>> =0A>> Thanks in advance.=0A>> Regards,=0A>> =0A>> Lo=C3=AFc =
Blot,=0A>> UNIX Systems, Network and Security Engineer=0A>> http://www.un=
ix-experience.fr=0A>> _______________________________________________=0A>=
> freebsd-fs@freebsd.org mailing list=0A>> http://lists.freebsd.org/mailm=
an/listinfo/freebsd-fs=0A>> To unsubscribe, send any mail to=0A>> "freebs=
d-fs-unsubscribe@freebsd.org"=0A=0A______________________________________=
_________=0Afreebsd-fs@freebsd.org mailing list=0Ahttp://lists.freebsd.or=
g/mailman/listinfo/freebsd-fs=0ATo unsubscribe, send any mail to=0A"freeb=
sd-fs-unsubscribe@freebsd.org"=0A=0A=0A=0A=0A=0A=0A=0A=0A=0A=0A__________=
_____________________________________=0Afreebsd-fs@freebsd.org mailing li=
st=0Ahttp://lists.freebsd.org/mailman/listinfo/freebsd-fs=0ATo unsubscrib=
e, send any mail to "freebsd-fs-unsubscribe@freebsd.org"=0A=0A=0A________=
_______________________________________=0Afreebsd-fs@freebsd.org mailing =
list=0Ahttp://lists.freebsd.org/mailman/listinfo/freebsd-fs=0ATo unsubscr=
ibe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?db7be16e523322eec76d281a9a9c5934>