Date: Sat, 13 Apr 2019 08:52:21 -0500 From: Jason Bacon <bacon4000@gmail.com> To: Hans Petter Selasky <hps@selasky.org>, "freebsd-infiniband@freebsd.org" <freebsd-infiniband@freebsd.org> Subject: Re: Kernel modules Message-ID: <5166ec29-876b-0bd3-8a84-8a222647e87a@gmail.com> In-Reply-To: <2f4d9a14-4ff6-0d34-06f0-bbb4ac76c6bd@gmail.com> References: <0eba9ec9-692f-7677-2b10-4e67a232821c@gmail.com> <f3f94452-155f-79f4-72d8-bf65760ae5b0@selasky.org> <598a58f0-89b8-d00d-5ed7-74dd7005950f@gmail.com> <73ce0738-4d63-2f25-2ff6-00f0092de136@selasky.org> <2090dd24-db43-b689-4289-f50bd70090ea@gmail.com> <6673df26-8bba-ebd3-b2c5-d7e9c97db557@gmail.com> <d82f3a60-6ad4-dba8-a15b-355a536a9a83@gmail.com> <bd42597e-2981-4667-468e-b008b9be290b@selasky.org> <2f4d9a14-4ff6-0d34-06f0-bbb4ac76c6bd@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2019-04-12 08:04, Jason Bacon wrote: > On 2019-04-12 07:57, Hans Petter Selasky wrote: >> On 4/12/19 2:39 PM, Jason Bacon wrote: >>> root@zfs-01:~ # ifconfig ib0 >>> ib0: flags=8002<BROADCAST,MULTICAST> metric 0 mtu 65520 >>> options=80018<VLAN_MTU,VLAN_HWTAGGING,LINKSTATE> >>> lladdr 80.0.2.8.fe.80.0.0.0.0.0.0.f4.52.14.3.0.92.88.d1 >>> nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> >> >> Can you try setting an mtu of 4000 bytes on both sides and re-run the >> test? This large mtu I think is not supported. >> >> --HPS > I assume you saw my followup showing 16 Gb/s... > > I'll try playing with MTU anyway. Maybe that will improve performance > a bit more? > > I'm going to do a bunch of tuning and test NFS. Will report results > back here when I have some substantial info. > > Thanks, > > JB > Some data for comparison. Regarding MTU, the bigger the better, up to a point. At 65520, my server became unresponsive to the point of an ssh session timing out. It recovered after a minute or two and there did not seem to be any permanent harm. Lower MTUs provide more stable performance (monitoring with "iostat 1") and lower throughput. For now I'm using 16380, 1/4 of 65520 which is the default on CentOS 7. I haven't yet seen any stability issues at this level. Explanation of data: data-05 is a CentOS 7 RAID server, XFS filesystem. zfs-01 is a FreeBSD 12 RAID server. Hardware is identical - PowerEdge R720xd, 12 ST2000NM0023 SAS drives, RAID-6, PERC H710 mini (MegaRAID). "-local" means benchmark run on the server, testing the local RAID. "-nfs4" means benchmark run on a compute node, testing NFS over FDR Infiniband. Benchmarked with and without ZFS lz4 compression enabled on the server. All results are the average of 3 trials. Highlights: raid-05-nfs4 vs zfs-01-nfs4: o The FreeBSD server outperformed the CentOS server on random and sequential reads. o The FreeBSD server fell short on fresh write and way short on overwrite. zfs-01-local vs FreeBSD 10 results: o FreeBSD is seeing performance limits on the local array for some reason. o Local disk performance was much better on FreeBSD 10 a couple years ago. ZFS or mrsas regression? As I recall, it was overall about 5% faster than CentOS 6 + XFS at that time. o Would resolving this push FreeBSD's NFS write performance over CentOS? Overall, I'd say we're looking pretty good at this point. Performance is way more than adequate for most HPC jobs. I suspect some tuning and/or minor improvements to the IB code will improve it further. Stability will take a long time to test properly. I'm going to start by rerunning some of our most I/O-intensive jobs on it - jobs that actually broke our CentOS RAID servers until I switched them to NFS over RDMA. ==> bench-raid-05-local <== 93.92 GiB write 4.00 MiB blocks 71378.00 ms 1347.42 MiB/s 1024 seek 4.00 MiB blocks 18.06 ms 227.56 MiB/s 93.92 GiB read 4.00 MiB blocks 67519.00 ms 1424.43 MiB/s 93.92 GiB rewrite 4.00 MiB blocks 73635.00 ms 1306.12 MiB/s ==> bench-raid-05-nfs4 <== 125.03 GiB write 4.00 MiB blocks 145903.00 ms 877.53 MiB/s 1024 seek 4.00 MiB blocks 23.98 ms 170.67 MiB/s 125.03 GiB read 4.00 MiB blocks 236010.00 ms 542.49 MiB/s 125.03 GiB rewrite 4.00 MiB blocks 158151.00 ms 809.57 MiB/s ==> bench-zfs-01-local <== 127.74 GiB write 4.00 MiB blocks 157977.00 ms 828.00 MiB/s 1024 seek 4.00 MiB blocks 18.39 ms 227.56 MiB/s 127.74 GiB read 4.00 MiB blocks 165471.00 ms 790.50 MiB/s 127.74 GiB rewrite 4.00 MiB blocks 116542.00 ms 1122.38 MiB/s ==> bench-zfs-01-lz4-nfs4 <== 125.03 GiB write 4.00 MiB blocks 185550.00 ms 690.03 MiB/s 1024 seek 4.00 MiB blocks 24.32 ms 170.67 MiB/s 125.03 GiB read 4.00 MiB blocks 234103.00 ms 546.91 MiB/s 125.03 GiB rewrite 4.00 MiB blocks 423833.00 ms 302.09 MiB/s ==> bench-zfs-01-nfs4 <== 125.03 GiB write 4.00 MiB blocks 174645.00 ms 733.11 MiB/s 1024 seek 4.00 MiB blocks 14.67 ms 273.07 MiB/s 125.03 GiB read 4.00 MiB blocks 225402.00 ms 568.03 MiB/s 125.03 GiB rewrite 4.00 MiB blocks 413798.00 ms 309.41 MiB/s FreeBSD 10.3 local disk results from a couple years ago on the same machine: 127.76 GiB write 4.00 MiB blocks 101323.00 ms 1291.13 MiB/s 1024 seek 4.00 MiB blocks 18.57 ms 215.58 MiB/s 127.76 GiB read 4.00 MiB blocks 95363.00 ms 1371.83 MiB/s 127.76 GiB rewrite 4.00 MiB blocks 108186.00 ms 1209.23 MiB/s -- Earth is a beta site.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5166ec29-876b-0bd3-8a84-8a222647e87a>