Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 13 Apr 2019 08:52:21 -0500
From:      Jason Bacon <bacon4000@gmail.com>
To:        Hans Petter Selasky <hps@selasky.org>, "freebsd-infiniband@freebsd.org" <freebsd-infiniband@freebsd.org>
Subject:   Re: Kernel modules
Message-ID:  <5166ec29-876b-0bd3-8a84-8a222647e87a@gmail.com>
In-Reply-To: <2f4d9a14-4ff6-0d34-06f0-bbb4ac76c6bd@gmail.com>
References:  <0eba9ec9-692f-7677-2b10-4e67a232821c@gmail.com> <f3f94452-155f-79f4-72d8-bf65760ae5b0@selasky.org> <598a58f0-89b8-d00d-5ed7-74dd7005950f@gmail.com> <73ce0738-4d63-2f25-2ff6-00f0092de136@selasky.org> <2090dd24-db43-b689-4289-f50bd70090ea@gmail.com> <6673df26-8bba-ebd3-b2c5-d7e9c97db557@gmail.com> <d82f3a60-6ad4-dba8-a15b-355a536a9a83@gmail.com> <bd42597e-2981-4667-468e-b008b9be290b@selasky.org> <2f4d9a14-4ff6-0d34-06f0-bbb4ac76c6bd@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2019-04-12 08:04, Jason Bacon wrote:
> On 2019-04-12 07:57, Hans Petter Selasky wrote:
>> On 4/12/19 2:39 PM, Jason Bacon wrote:
>>> root@zfs-01:~ # ifconfig ib0
>>> ib0: flags=8002<BROADCAST,MULTICAST> metric 0 mtu 65520
>>>      options=80018<VLAN_MTU,VLAN_HWTAGGING,LINKSTATE>
>>>      lladdr 80.0.2.8.fe.80.0.0.0.0.0.0.f4.52.14.3.0.92.88.d1
>>>      nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>>
>> Can you try setting an mtu of 4000 bytes on both sides and re-run the 
>> test? This large mtu I think is not supported.
>>
>> --HPS
> I assume you saw my followup showing 16 Gb/s...
>
> I'll try playing with MTU anyway.  Maybe that will improve performance 
> a bit more?
>
> I'm going to do a bunch of tuning and test NFS.  Will report results 
> back here when I have some substantial info.
>
> Thanks,
>
>     JB
>

Some data for comparison.

Regarding MTU, the bigger the better, up to a point.  At 65520, my 
server became unresponsive to the point of an ssh session timing out.  
It recovered after a minute or two and there did not seem to be any 
permanent harm.  Lower MTUs provide more stable performance (monitoring 
with "iostat 1") and lower throughput.

For now I'm using 16380, 1/4 of 65520 which is the default on CentOS 7. 
I haven't yet seen any stability issues at this level.

Explanation of data:

data-05 is a CentOS 7 RAID server, XFS filesystem.
zfs-01 is a FreeBSD 12 RAID server.
Hardware is identical - PowerEdge R720xd, 12 ST2000NM0023 SAS drives, 
RAID-6, PERC H710 mini (MegaRAID).
"-local" means benchmark run on the server, testing the local RAID.
"-nfs4" means benchmark run on a compute node, testing NFS over FDR 
Infiniband.
Benchmarked with and without ZFS lz4 compression enabled on the server.
All results are the average of 3 trials.

Highlights:

raid-05-nfs4 vs zfs-01-nfs4:

o The FreeBSD server outperformed the CentOS server on random and 
sequential reads.
o The FreeBSD server fell short on fresh write and way short on overwrite.

zfs-01-local vs FreeBSD 10 results:

o FreeBSD is seeing performance limits on the local array for some reason.
o Local disk performance was much better on FreeBSD 10 a couple years 
ago.  ZFS or mrsas regression?  As I recall, it was overall about 5% 
faster than CentOS 6 + XFS at that time.
o Would resolving this push FreeBSD's NFS write performance over CentOS?

Overall, I'd say we're looking pretty good at this point. Performance is 
way more than adequate for most HPC jobs.  I suspect some tuning and/or 
minor improvements to the IB code will improve it further.

Stability will take a long time to test properly.  I'm going to start by 
rerunning some of our most I/O-intensive jobs on it - jobs that actually 
broke our CentOS RAID servers until I switched them to NFS over RDMA.

==> bench-raid-05-local <==
    93.92 GiB write       4.00 MiB blocks     71378.00 ms      1347.42 MiB/s
         1024 seek        4.00 MiB blocks        18.06 ms       227.56 MiB/s
    93.92 GiB read        4.00 MiB blocks     67519.00 ms      1424.43 MiB/s
    93.92 GiB rewrite     4.00 MiB blocks     73635.00 ms      1306.12 MiB/s

==> bench-raid-05-nfs4 <==
   125.03 GiB write       4.00 MiB blocks    145903.00 ms       877.53 MiB/s
         1024 seek        4.00 MiB blocks        23.98 ms       170.67 MiB/s
   125.03 GiB read        4.00 MiB blocks    236010.00 ms       542.49 MiB/s
   125.03 GiB rewrite     4.00 MiB blocks    158151.00 ms       809.57 MiB/s

==> bench-zfs-01-local <==
   127.74 GiB write       4.00 MiB blocks    157977.00 ms       828.00 MiB/s
         1024 seek        4.00 MiB blocks        18.39 ms       227.56 MiB/s
   127.74 GiB read        4.00 MiB blocks    165471.00 ms       790.50 MiB/s
   127.74 GiB rewrite     4.00 MiB blocks    116542.00 ms      1122.38 MiB/s

==> bench-zfs-01-lz4-nfs4 <==
   125.03 GiB write       4.00 MiB blocks    185550.00 ms       690.03 MiB/s
         1024 seek        4.00 MiB blocks        24.32 ms       170.67 MiB/s
   125.03 GiB read        4.00 MiB blocks    234103.00 ms       546.91 MiB/s
   125.03 GiB rewrite     4.00 MiB blocks    423833.00 ms       302.09 MiB/s

==> bench-zfs-01-nfs4 <==
   125.03 GiB write       4.00 MiB blocks    174645.00 ms       733.11 MiB/s
         1024 seek        4.00 MiB blocks        14.67 ms       273.07 MiB/s
   125.03 GiB read        4.00 MiB blocks    225402.00 ms       568.03 MiB/s
   125.03 GiB rewrite     4.00 MiB blocks    413798.00 ms       309.41 MiB/s

FreeBSD 10.3 local disk results from a couple years ago on the same machine:

   127.76 GiB write       4.00 MiB blocks    101323.00 ms 1291.13 MiB/s
         1024 seek        4.00 MiB blocks        18.57 ms 215.58 MiB/s
   127.76 GiB read        4.00 MiB blocks     95363.00 ms 1371.83 MiB/s
   127.76 GiB rewrite     4.00 MiB blocks    108186.00 ms 1209.23 MiB/s

-- 
Earth is a beta site.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5166ec29-876b-0bd3-8a84-8a222647e87a>