Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Dec 2014 11:33:14 +0000
From:      "=?utf-8?B?TG/Dr2MgQmxvdA==?=" <loic.blot@unix-experience.fr>
To:        "Rick Macklem" <rmacklem@uoguelph.ca>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: High Kernel Load with nfsv4
Message-ID:  <1e19554bc0d4eb3e8dab74e2056b5ec4@mail.unix-experience.fr>
In-Reply-To: <766911003.8048587.1418095910736.JavaMail.root@uoguelph.ca>
References:  <766911003.8048587.1418095910736.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Rick,=0AI'm trying NFSv3.=0ASome jails are starting very well but now =
i have an issue with lockd after some minutes:=0A=0Anfs server 10.10.X.8:=
/jails: lockd not responding=0Anfs server 10.10.X.8:/jails lockd is alive=
 again=0A=0AI look at mbuf, but i seems there is no problem.=0A=0AHere is=
 my rc.conf on server:=0A=0Anfs_server_enable=3D"YES"=0Anfsv4_server_enab=
le=3D"YES"=0Anfsuserd_enable=3D"YES"=0Anfsd_server_flags=3D"-u -t -n 256"=
=0Amountd_enable=3D"YES"=0Amountd_flags=3D"-r"=0Anfsuserd_flags=3D"-usert=
imeout 0 -force 20"=0Arpcbind_enable=3D"YES"=0Arpc_lockd_enable=3D"YES"=
=0Arpc_statd_enable=3D"YES"=0A=0AHere is the client:=0A=0Anfsuserd_enable=
=3D"YES"=0Anfsuserd_flags=3D"-usertimeout 0 -force 20"=0Anfscbd_enable=3D=
"YES"=0Arpc_lockd_enable=3D"YES"=0Arpc_statd_enable=3D"YES"=0A=0AHave you=
 got an idea ?=0A=0ARegards,=0A=0ALo=C3=AFc Blot,=0AUNIX Systems, Network=
 and Security Engineer=0Ahttp://www.unix-experience.fr=0A=0A9 d=C3=A9cemb=
re 2014 04:31 "Rick Macklem" <rmacklem@uoguelph.ca> a =C3=A9crit: =0A> Lo=
ic Blot wrote:=0A> =0A>> Hi rick,=0A>> =0A>> I waited 3 hours (no lag at =
jail launch) and now I do: sysrc=0A>> memcached_flags=3D"-v -m 512"=0A>> =
Command was very very slow...=0A>> =0A>> Here is a dd over NFS:=0A>> =0A>=
> 601062912 bytes transferred in 21.060679 secs (28539579 bytes/sec)=0A> =
=0A> Can you try the same read using an NFSv3 mount?=0A> (If it runs much=
 faster, you have probably been bitten by the ZFS=0A> "sequential vs rand=
om" read heuristic which I've been told things=0A> NFS is doing "random" =
reads without file handle affinity. File=0A> handle affinity is very hard=
 to do for NFSv4, so it isn't done.)=0A> =0A> rick=0A> =0A>> This is quit=
e slow...=0A>> =0A>> You can found some nfsstat below (command isn't fini=
shed yet)=0A>> =0A>> nfsstat -c -w 1=0A>> =0A>> GtAttr Lookup Rdlink Read=
 Write Rename Access Rddir=0A>> 0 0 0 0 0 0 0 0=0A>> 4 0 0 0 0 0 16 0=0A>=
> 2 0 0 0 0 0 17 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0=
 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 4 0 0 0 0 4 0=0A>> 0 0 0 0 0 0 0 0=
=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 4 0 =
0 0 0 0 3 0=0A>> 0 0 0 0 0 0 3 0=0A>> 37 10 0 8 0 0 14 1=0A>> 18 16 0 4 1=
 2 4 0=0A>> 78 91 0 82 6 12 30 0=0A>> 19 18 0 2 2 4 2 0=0A>> 0 0 0 0 2 0 =
0 0=0A>> 0 0 0 0 0 0 0 0=0A>> GtAttr Lookup Rdlink Read Write Rename Acce=
ss Rddir=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=
=0A>> 0 1 0 0 0 0 1 0=0A>> 4 6 0 0 6 0 3 0=0A>> 2 0 0 0 0 0 0 0=0A>> 0 0 =
0 0 0 0 0 0=0A>> 1 0 0 0 0 0 0 0=0A>> 0 0 0 0 1 0 0 0=0A>> 0 0 0 0 0 0 0 =
0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0=
 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0=
 0=0A>> 6 108 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> =
GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A>> 0 0 0 0 0 0 0 0=
=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 =
0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 98 54 0 86 11 =
0 25 0=0A>> 36 24 0 39 25 0 10 1=0A>> 67 8 0 63 63 0 41 0=0A>> 34 0 0 35 =
34 0 0 0=0A>> 75 0 0 75 77 0 0 0=0A>> 34 0 0 35 35 0 0 0=0A>> 75 0 0 74 7=
6 0 0 0=0A>> 33 0 0 34 33 0 0 0=0A>> 0 0 0 0 5 0 0 0=0A>> 0 0 0 0 0 0 6 0=
=0A>> 11 0 0 0 0 0 11 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 17 0 0 0 0 1 0=0A>> G=
tAttr Lookup Rdlink Read Write Rename Access Rddir=0A>> 4 5 0 0 0 0 12 0=
=0A>> 2 0 0 0 0 0 26 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0=
 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 4 0 0 0 0 4=
 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 4 =
0 0 0 0 0 2 0=0A>> 2 0 0 0 0 0 24 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0=
 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> =
0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> GtAttr Lookup Rdlink Read Write=
 Rename Access Rddir=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 4 0 0=
 0 0 0 7 0=0A>> 2 1 0 0 0 0 1 0=0A>> 0 0 0 0 2 0 0 0=0A>> 0 0 0 0 0 0 0 0=
=0A>> 0 0 0 0 6 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 =
0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 =
0=0A>> 4 6 0 0 0 0 3 0=0A>> 0 0 0 0 0 0 0 0=0A>> 2 0 0 0 0 0 0 0=0A>> 0 0=
 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0=
 0=0A>> GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A>> 0 0 0 0 =
0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A=
>> 0 0 0 0 0 0 0 0=0A>> 4 71 0 0 0 0 0 0=0A>> 0 1 0 0 0 0 0 0=0A>> 2 36 0=
 0 0 0 1 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=
=0A>> 0 0 0 0 0 0 0 0=0A>> 1 0 0 0 0 0 1 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 =
0 0 0 0 0 0=0A>> 79 6 0 79 79 0 2 0=0A>> 25 0 0 25 26 0 6 0=0A>> 43 18 0 =
39 46 0 23 0=0A>> 36 0 0 36 36 0 31 0=0A>> 68 1 0 66 68 0 0 0=0A>> GtAttr=
 Lookup Rdlink Read Write Rename Access Rddir=0A>> 36 0 0 36 36 0 0 0=0A>=
> 48 0 0 48 49 0 0 0=0A>> 20 0 0 20 20 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 3 =
14 0 1 0 0 11 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 4 0 0 0 =
0 4 0=0A>> 0 0 0 0 0 0 0 0=0A>> 4 22 0 0 0 0 16 0=0A>> 2 0 0 0 0 0 23 0=
=0A>> =0A>> Regards,=0A>> =0A>> Lo=C3=AFc Blot,=0A>> UNIX Systems, Networ=
k and Security Engineer=0A>> http://www.unix-experience.fr=0A>>; =0A>> 8 d=
=C3=A9cembre 2014 09:36 "Lo=C3=AFc Blot" <loic.blot@unix-experience.fr> a=
=0A>> =C3=A9crit:=0A>>> Hi Rick,=0A>>> I stopped the jails this week-end =
and started it this morning, i'll=0A>>> give you some stats this week.=0A=
>>> =0A>>> Here is my nfsstat -m output (with your rsize/wsize tweaks)=0A=
>>> =0A>>> =0A>> =0A> nfsv4,tcp,resvport,hard,cto,sec=3Dsys,acdirmin=3D3,=
acdirmax=3D60,acregmin=3D5,acregmax=3D60,nametimeo=3D60,negna=0A>>> =0A>>=
 =0A> etimeo=3D60,rsize=3D32768,wsize=3D32768,readdirsize=3D32768,readahe=
ad=3D1,wcommitsize=3D773136,timeout=3D120,retra=0A>>> s=3D2147483647=0A>>=
> =0A>>> On server side my disks are on a raid controller which show a 51=
2b=0A>>> volume and write performances=0A>>> are very honest (dd if=3D/de=
v/zero of=3D/jails/test.dd bs=3D4096=0A>>> count=3D100000000 =3D> 450MBps=
)=0A>>> =0A>>> Regards,=0A>>> =0A>>> Lo=C3=AFc Blot,=0A>>> UNIX Systems, =
Network and Security Engineer=0A>>> http://www.unix-experience.fr=0A>>>; =
=0A>>> 5 d=C3=A9cembre 2014 15:14 "Rick Macklem" <rmacklem@uoguelph.ca> a=
=0A>>> =C3=A9crit:=0A>>> =0A>>>> Loic Blot wrote:=0A>>>> =0A>>>>> Hi,=0A>=
>>>> i'm trying to create a virtualisation environment based on jails.=0A=
>>>>> Those jails are stored under a big ZFS pool on a FreeBSD 9.3=0A>>>>=
> which=0A>>>>> export a NFSv4 volume. This NFSv4 volume was mounted on a=
 big=0A>>>>> hypervisor (2 Xeon E5v3 + 128GB memory and 8 ports (but only=
 1=0A>>>>> was=0A>>>>> used at this time).=0A>>>>> =0A>>>>> The problem i=
s simple, my hypervisors runs 6 jails (used 1% cpu=0A>>>>> and=0A>>>>> 10=
GB RAM approximatively and less than 1MB bandwidth) and works=0A>>>>> fin=
e at start but the system slows down and after 2-3 days become=0A>>>>> un=
usable. When i look at top command i see 80-100% on system and=0A>>>>> co=
mmands are very very slow. Many process are tagged with=0A>>>>> nfs_cl*.=
=0A>>>> =0A>>>> To be honest, I would expect the slowness to be because o=
f slow=0A>>>> response=0A>>>> from the NFSv4 server, but if you do:=0A>>>=
> # ps axHl=0A>>>> on a client when it is slow and post that, it would gi=
ve us some=0A>>>> more=0A>>>> information on where the client side proces=
ses are sitting.=0A>>>> If you also do something like:=0A>>>> # nfsstat -=
c -w 1=0A>>>> and let it run for a while, that should show you how many R=
PCs are=0A>>>> being done and which ones.=0A>>>> =0A>>>> # nfsstat -m=0A>=
>>> will show you what your mount is actually using.=0A>>>> The only moun=
t option I can suggest trying is=0A>>>> "rsize=3D32768,wsize=3D32768",=0A=
>>>> since some network environments have difficulties with 64K.=0A>>>> =
=0A>>>> There are a few things you can try on the NFSv4 server side, if i=
t=0A>>>> appears=0A>>>> that the clients are generating a large RPC load.=
=0A>>>> - disabling the DRC cache for TCP by setting vfs.nfsd.cachetcp=3D=
0=0A>>>> - If the server is seeing a large write RPC load, then=0A>>>> "s=
ync=3Ddisabled"=0A>>>> might help, although it does run a risk of data lo=
ss when the=0A>>>> server=0A>>>> crashes.=0A>>>> Then there are a couple =
of other ZFS related things (I'm not a ZFS=0A>>>> guy,=0A>>>> but these h=
ave shown up on the mailing lists).=0A>>>> - make sure your volumes are 4=
K aligned and ashift=3D12 (in case a=0A>>>> drive=0A>>>> that uses 4K sec=
tors is pretending to be 512byte sectored)=0A>>>> - never run over 70-80%=
 full if write performance is an issue=0A>>>> - use a zil on an SSD with =
good write performance=0A>>>> =0A>>>> The only NFSv4 thing I can tell you=
 is that it is known that ZFS's=0A>>>> algorithm for determining sequenti=
al vs random I/O fails for NFSv4=0A>>>> during writing and this can be a =
performance hit. The only=0A>>>> workaround=0A>>>> is to use NFSv3 mounts=
, since file handle affinity apparently=0A>>>> fixes=0A>>>> the problem a=
nd this is only done=20for NFSv3.=0A>>>> =0A>>>> rick=0A>>>> =0A>>>>> I s=
aw that there are TSO issues with igb then i'm trying to=0A>>>>> disable=
=0A>>>>> it with sysctl but the situation wasn't solved.=0A>>>>> =0A>>>>>=
 Someone has got ideas ? I can give you more informations if you=0A>>>>> =
need.=0A>>>>> =0A>>>>> Thanks in advance.=0A>>>>> Regards,=0A>>>>> =0A>>>=
>> Lo=C3=AFc Blot,=0A>>>>> UNIX Systems, Network and Security Engineer=0A=
>>>>> http://www.unix-experience.fr=0A>>>>>; _____________________________=
__________________=0A>>>>> freebsd-fs@freebsd.org mailing list=0A>>>>> ht=
tp://lists.freebsd.org/mailman/listinfo/freebsd-fs=0A>>>>> To unsubscribe=
, send any mail to=0A>>>>> "freebsd-fs-unsubscribe@freebsd.org"=0A>>> =0A=
>>> _______________________________________________=0A>>> freebsd-fs@free=
bsd.org mailing list=0A>>> http://lists.freebsd.org/mailman/listinfo/free=
bsd-fs=0A>>> To unsubscribe, send any mail to=0A>>> "freebsd-fs-unsubscri=
be@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1e19554bc0d4eb3e8dab74e2056b5ec4>