Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Dec 2014 14:36:39 +0000
From:      "=?utf-8?B?TG/Dr2MgQmxvdA==?=" <loic.blot@unix-experience.fr>
To:        "Rick Macklem" <rmacklem@uoguelph.ca>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: High Kernel Load with nfsv4
Message-ID:  <fc9e829cf79a03cd72f21226d276eb78@mail.unix-experience.fr>
In-Reply-To: <1280247055.9141285.1418216202088.JavaMail.root@uoguelph.ca>
References:  <1280247055.9141285.1418216202088.JavaMail.root@uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help

Hi Rick,
thanks for your suggestion.
For my locking bug, rpc.lockd is stucked in rpcrecv state on the server. kill -9 doesn't affect the process, it's blocked.... (State: Ds)


for the performances

NFSv3: 60Mbps
NFSv4: 45Mbps
Regards,

Loïc Blot,
UNIX Systems, Network and Security Engineer
http://www.unix-experience.fr

10 décembre 2014 13:56 "Rick Macklem" <rmacklem@uoguelph.ca> a écrit: 
> Loic Blot wrote:
> 
>> Hi Rick,
>> I'm trying NFSv3.
>> Some jails are starting very well but now i have an issue with lockd
>> after some minutes:
>> 
>> nfs server 10.10.X.8:/jails: lockd not responding
>> nfs server 10.10.X.8:/jails lockd is alive again
>> 
>> I look at mbuf, but i seems there is no problem.
> 
> Well, if you need locks to be visible across multiple clients, then
> I'm afraid you are stuck with using NFSv4 and the performance you get
> from it. (There is no way to do file handle affinity for NFSv4 because
> the read and write ops are buried in the compound RPC and not easily
> recognized.)
> 
> If the locks don't need to be visible across multiple clients, I'd
> suggest trying the "nolockd" option with nfsv3.
> 
>> Here is my rc.conf on server:
>> 
>> nfs_server_enable="YES"
>> nfsv4_server_enable="YES"
>> nfsuserd_enable="YES"
>> nfsd_server_flags="-u -t -n 256"
>> mountd_enable="YES"
>> mountd_flags="-r"
>> nfsuserd_flags="-usertimeout 0 -force 20"
>> rpcbind_enable="YES"
>> rpc_lockd_enable="YES"
>> rpc_statd_enable="YES"
>> 
>> Here is the client:
>> 
>> nfsuserd_enable="YES"
>> nfsuserd_flags="-usertimeout 0 -force 20"
>> nfscbd_enable="YES"
>> rpc_lockd_enable="YES"
>> rpc_statd_enable="YES"
>> 
>> Have you got an idea ?
>> 
>> Regards,
>> 
>> Loïc Blot,
>> UNIX Systems, Network and Security Engineer
>> http://www.unix-experience.fr
>> 
>> 9 décembre 2014 04:31 "Rick Macklem" <rmacklem@uoguelph.ca> a écrit:
>>> Loic Blot wrote:
>>> 
>>>> Hi rick,
>>>> 
>>>> I waited 3 hours (no lag at jail launch) and now I do: sysrc
>>>> memcached_flags="-v -m 512"
>>>> Command was very very slow...
>>>> 
>>>> Here is a dd over NFS:
>>>> 
>>>> 601062912 bytes transferred in 21.060679 secs (28539579 bytes/sec)
>>> 
>>> Can you try the same read using an NFSv3 mount?
>>> (If it runs much faster, you have probably been bitten by the ZFS
>>> "sequential vs random" read heuristic which I've been told things
>>> NFS is doing "random" reads without file handle affinity. File
>>> handle affinity is very hard to do for NFSv4, so it isn't done.)
>>> 
> 
> I was actually suggesting that you try the "dd" over nfsv3 to see how
> the performance compared with nfsv4. If you do that, please post the
> comparable results.
> 
> Someday I would like to try and get ZFS's sequential vs random read
> heuristic modified and any info on what difference in performance that
> might make for NFS would be useful.
> 
> rick
> 
>>> rick
>>> 
>>>> This is quite slow...
>>>> 
>>>> You can found some nfsstat below (command isn't finished yet)
>>>> 
>>>> nfsstat -c -w 1
>>>> 
>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir
>>>> 0 0 0 0 0 0 0 0
>>>> 4 0 0 0 0 0 16 0
>>>> 2 0 0 0 0 0 17 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 4 0 0 0 0 4 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 4 0 0 0 0 0 3 0
>>>> 0 0 0 0 0 0 3 0
>>>> 37 10 0 8 0 0 14 1
>>>> 18 16 0 4 1 2 4 0
>>>> 78 91 0 82 6 12 30 0
>>>> 19 18 0 2 2 4 2 0
>>>> 0 0 0 0 2 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 1 0 0 0 0 1 0
>>>> 4 6 0 0 6 0 3 0
>>>> 2 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 1 0 0 0 0 0 0 0
>>>> 0 0 0 0 1 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 6 108 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 98 54 0 86 11 0 25 0
>>>> 36 24 0 39 25 0 10 1
>>>> 67 8 0 63 63 0 41 0
>>>> 34 0 0 35 34 0 0 0
>>>> 75 0 0 75 77 0 0 0
>>>> 34 0 0 35 35 0 0 0
>>>> 75 0 0 74 76 0 0 0
>>>> 33 0 0 34 33 0 0 0
>>>> 0 0 0 0 5 0 0 0
>>>> 0 0 0 0 0 0 6 0
>>>> 11 0 0 0 0 0 11 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 17 0 0 0 0 1 0
>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir
>>>> 4 5 0 0 0 0 12 0
>>>> 2 0 0 0 0 0 26 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 4 0 0 0 0 4 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 4 0 0 0 0 0 2 0
>>>> 2 0 0 0 0 0 24 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 4 0 0 0 0 0 7 0
>>>> 2 1 0 0 0 0 1 0
>>>> 0 0 0 0 2 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 6 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 4 6 0 0 0 0 3 0
>>>> 0 0 0 0 0 0 0 0
>>>> 2 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 4 71 0 0 0 0 0 0
>>>> 0 1 0 0 0 0 0 0
>>>> 2 36 0 0 0 0 1 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 1 0 0 0 0 0 1 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 79 6 0 79 79 0 2 0
>>>> 25 0 0 25 26 0 6 0
>>>> 43 18 0 39 46 0 23 0
>>>> 36 0 0 36 36 0 31 0
>>>> 68 1 0 66 68 0 0 0
>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir
>>>> 36 0 0 36 36 0 0 0
>>>> 48 0 0 48 49 0 0 0
>>>> 20 0 0 20 20 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 3 14 0 1 0 0 11 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 0 0 0 0 0 0 0
>>>> 0 4 0 0 0 0 4 0
>>>> 0 0 0 0 0 0 0 0
>>>> 4 22 0 0 0 0 16 0
>>>> 2 0 0 0 0 0 23 0
>>>> 
>>>> Regards,
>>>> 
>>>> Loïc Blot,
>>>> UNIX Systems, Network and Security Engineer
>>>> http://www.unix-experience.fr
>>>> 
>>>> 8 décembre 2014 09:36 "Loïc Blot" <loic.blot@unix-experience.fr> a
>>>> écrit:
>>>>> Hi Rick,
>>>>> I stopped the jails this week-end and started it this morning,
>>>>> i'll
>>>>> give you some stats this week.
>>>>> 
>>>>> Here is my nfsstat -m output (with your rsize/wsize tweaks)
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> nfsv4,tcp,resvport,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negna
>>>>> 
>>>> 
>>> 
>> 
> etimeo=60,rsize=32768,wsize=32768,readdirsize=32768,readahead=1,wcommitsize=773136,timeout=120,retra
>>>>> s=2147483647
>>>>> 
>>>>> On server side my disks are on a raid controller which show a
>>>>> 512b
>>>>> volume and write performances
>>>>> are very honest (dd if=/dev/zero of=/jails/test.dd bs=4096
>>>>> count=100000000 => 450MBps)
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> Loïc Blot,
>>>>> UNIX Systems, Network and Security Engineer
>>>>> http://www.unix-experience.fr
>>>>> 
>>>>> 5 décembre 2014 15:14 "Rick Macklem" <rmacklem@uoguelph.ca> a
>>>>> écrit:
>>>>> 
>>>>>> Loic Blot wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> i'm trying to create a virtualisation environment based on
>>>>>>> jails.
>>>>>>> Those jails are stored under a big ZFS pool on a FreeBSD 9.3
>>>>>>> which
>>>>>>> export a NFSv4 volume. This NFSv4 volume was mounted on a big
>>>>>>> hypervisor (2 Xeon E5v3 + 128GB memory and 8 ports (but only 1
>>>>>>> was
>>>>>>> used at this time).
>>>>>>> 
>>>>>>> The problem is simple, my hypervisors runs 6 jails (used 1% cpu
>>>>>>> and
>>>>>>> 10GB RAM approximatively and less than 1MB bandwidth) and works
>>>>>>> fine at start but the system slows down and after 2-3 days
>>>>>>> become
>>>>>>> unusable. When i look at top command i see 80-100% on system
>>>>>>> and
>>>>>>> commands are very very slow. Many process are tagged with
>>>>>>> nfs_cl*.
>>>>>> 
>>>>>> To be honest, I would expect the slowness to be because of slow
>>>>>> response
>>>>>> from the NFSv4 server, but if you do:
>>>>>> # ps axHl
>>>>>> on a client when it is slow and post that, it would give us some
>>>>>> more
>>>>>> information on where the client side processes are sitting.
>>>>>> If you also do something like:
>>>>>> # nfsstat -c -w 1
>>>>>> and let it run for a while, that should show you how many RPCs
>>>>>> are
>>>>>> being done and which ones.
>>>>>> 
>>>>>> # nfsstat -m
>>>>>> will show you what your mount is actually using.
>>>>>> The only mount option I can suggest trying is
>>>>>> "rsize=32768,wsize=32768",
>>>>>> since some network environments have difficulties with 64K.
>>>>>> 
>>>>>> There are a few things you can try on the NFSv4 server side, if
>>>>>> it
>>>>>> appears
>>>>>> that the clients are generating a large RPC load.
>>>>>> - disabling the DRC cache for TCP by setting vfs.nfsd.cachetcp=0
>>>>>> - If the server is seeing a large write RPC load, then
>>>>>> "sync=disabled"
>>>>>> might help, although it does run a risk of data loss when the
>>>>>> server
>>>>>> crashes.
>>>>>> Then there are a couple of other ZFS related things (I'm not a
>>>>>> ZFS
>>>>>> guy,
>>>>>> but these have shown up on the mailing lists).
>>>>>> - make sure your volumes are 4K aligned and ashift=12 (in case a
>>>>>> drive
>>>>>> that uses 4K sectors is pretending to be 512byte sectored)
>>>>>> - never run over 70-80% full if write performance is an issue
>>>>>> - use a zil on an SSD with good write performance
>>>>>> 
>>>>>> The only NFSv4 thing I can tell you is that it is known that
>>>>>> ZFS's
>>>>>> algorithm for determining sequential vs random I/O fails for
>>>>>> NFSv4
>>>>>> during writing and this can be a performance hit. The only
>>>>>> workaround
>>>>>> is to use NFSv3 mounts, since file handle affinity apparently
>>>>>> fixes
>>>>>> the problem and this is only done for NFSv3.
>>>>>> 
>>>>>> rick
>>>>>> 
>>>>>>> I saw that there are TSO issues with igb then i'm trying to
>>>>>>> disable
>>>>>>> it with sysctl but the situation wasn't solved.
>>>>>>> 
>>>>>>> Someone has got ideas ? I can give you more informations if you
>>>>>>> need.
>>>>>>> 
>>>>>>> Thanks in advance.
>>>>>>> Regards,
>>>>>>> 
>>>>>>> Loïc Blot,
>>>>>>> UNIX Systems, Network and Security Engineer
>>>>>>> http://www.unix-experience.fr
>>>>>>> _______________________________________________
>>>>>>> freebsd-fs@freebsd.org mailing list
>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>> To unsubscribe, send any mail to
>>>>>>> "freebsd-fs-unsubscribe@freebsd.org"
>>>>> 
>>>>> _______________________________________________
>>>>> freebsd-fs@freebsd.org mailing list
>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>> To unsubscribe, send any mail to
>>>>> "freebsd-fs-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?fc9e829cf79a03cd72f21226d276eb78>