Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 29 Oct 2018 15:25:07 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        "Rodney W. Grimes" <freebsd-rwg@pdx.rh.CN85.dnsmgr.net>, Andrew Vylegzhanin <avv314@gmail.com>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>, "freebsd-infiniband@freebsd.org" <freebsd-infiniband@freebsd.org>
Subject:   Re: NFS + Infiniband problem
Message-ID:  <YTOPR0101MB11622A9797376128D2FA182ADDF30@YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <201810291506.w9TF6YAP057202@pdx.rh.CN85.dnsmgr.net>
References:  <CA%2BBi_YiHoxFc3wsEPnMeBHWgW-nh6sXQCEgBTb=-nD6-XcjZ%2Bg@mail.gmail.com>, <201810291506.w9TF6YAP057202@pdx.rh.CN85.dnsmgr.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Rodney W. Grimes wrote:
Andrew Vylegzhanin wrote:
>> Hello everyone,
>>
>> I have a several FreeBSD machines connected via Infiniband netwok ( FDR
>> switch Mellanox SW3036 + ConnectX-3 VPI cards ).
>> One of them is a NAS-server with multiply ZFS pools.
>>
>> All kernels (11.2-RELEASE on clients and 12.0-BETA1 (11.2 also tried) on
>> server) are with infiniband connected mode (option IPOIB_CM, option SDM)
>> and world with OFED stack support. (WITH_OFED=3D'yes').
>>
>> File transfers via FTP or SSH between server and clients works almost
>> flawless ( ~ 12 Gbit/s ).
>>
>> But when I try to copy in/out some significant data via NFS share mounte=
d
>> on clients, NFS i/o hangs at all or got extremely slow (couple kB/s)
>> transfer speed after uncertain amount of copied data. For example, on th=
e
>> one node I can copy 1GB file, and after NFS hang on file with size 30 kb=
.
>>
>> Some details:
>> [root@node4 ~]# mount_nfs -o wsize=3D30000 -o proto=3Dtcp 10.0.2.1:/zdat=
a2 /mnt
>                               ^^^^^^^^^^^^
>I am not sure what the interaction between page sizes, TSO needs,
>buffer needs and all that are but I always use a power of 2 wsize
>and rsize.
They should always be a power of 2. I think the code clips the value, but i=
t might
only clip to a multiple of 512. If it didn't clip this down to 16384, then =
that
would definitely be a problem. Also, normally the same size for rsize and w=
size
is used. If you don't do that, you end up with weird sided blocks in the bu=
ffer cache.
I think it still works when this is done, but could cause performance hits.
Probably doesn't matter for a simple performance test.
(You can find out what options it is actually using by typing "nfsstat -m" =
after doing  the mount.)

>   You might try that.  And as Rick suggested, turn of
>TSO, if you can.  Is infiniband using RDMA to do this, if so then
>the page size stuff is probably very important, use multiples of
>4096 only.
RDMA is not supported by the FreeBSD NFS client. There is a way to use RDMA
on a separate connection with NFSv4.1 or later, but I've never written code
for that. (Not practical to try to implement without access to hardware tha=
t
does it.)

rick
[performance stuff snipped]



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTOPR0101MB11622A9797376128D2FA182ADDF30>