FreeBSD Mail Archives

Date:      Thu, 27 Dec 2018 11:54:16 +0100
From:      Willem Jan Withagen <wjw@digiware.nl>
To:        Sami Halabi <sodynet1@gmail.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Suggestion for hardware for ZFS fileserver
Message-ID:  <d423b8c3-5aba-907c-c80f-b4974571adba@digiware.nl>
In-Reply-To: <CAEW%2BogaKTLsmXaUGk7rZWb7u2Xqja%2BpPBK5rduX0zXCjk=2zew@mail.gmail.com>
References:  <CAEW%2BogZnWC07OCSuzO7E4TeYGr1E9BARKSKEh9ELCL9Zc4YY3w@mail.gmail.com> <C839431D-628C-4C73-8285-2360FE6FFE88@gmail.com> <CAEW%2BogYWKPL5jLW2H_UWEsCOiz=8fzFcSJ9S5k8k7FXMQjywsw@mail.gmail.com> <4f816be7-79e0-cacb-9502-5fbbe343cfc9@denninger.net> <3160F105-85C1-4CB4-AAD5-D16CF5D6143D@ifm.liu.se> <YQBPR01MB038805DBCCE94383219306E1DDB80@YQBPR01MB0388.CANPRD01.PROD.OUTLOOK.COM> <D0E7579B-2768-46DB-94CF-DBD23259E74B@ifm.liu.se> <CAEW%2BogaKTLsmXaUGk7rZWb7u2Xqja%2BpPBK5rduX0zXCjk=2zew@mail.gmail.com>

On 22/12/2018 15:49, Sami Halabi wrote:
> Hi,
> 
> What sas hba card do you recommend for 16/24 internal ports and 2 external
> that are recognized and work well with freebsd ZFS.

There is no real advise here, but what I saw is that it is relatively 
easy to overload a lot of the busses involved int his.

I got this when building Ceph clusters on FreeBSD, where each disk has 
its own daemon to hammer away on the platters.

The first bottleneck is the disk "backplane". It you do not need to wire 
every disk with a dedicated HBA-disk cable, then you are sharing the 
bandwidth on the backplane between all the disks. and dependant on the 
architecture on the backplane serveral disk share one expander. And the 
feed into that will be share by the disks attached to that expander.
Some expanders will have multiple inputs from the HBA, but I seen cases 
where 4 sas lanes go in and only 2 get used.

The second bottleneck is that once you have all these nice disks 
connected to your HBA, but that is only on a PCIe 4x slot.....
You will need PCIe x8 or x16 for that, and PCIe 3.0 stuff.
Total Bandwidth: (x16 link): PCIe 3.0 = 32GB/s, PCIe 2.0 = 16GB/s, PCIe 
1.1 = 8GB/s.

So lets say that your 24 port HBA has 24 disks connected, each doing 
100Mbytes/sec = 19,2 Gbit/s Which will very likely saturate that PCI 
bus. Note that I'm 0nly talking 100Mbyte/sec. Since that is what I seed 
spinning rust do under Ceph. I'm not even talking about the SSDs used 
for journals and cache.

For ZFS the bus challenge is a bit more of a problem, because you cannot 
scale out.

But I've seen designs where an extra disk cabinet with 96 disks is 
attached over something like 4*12Gbit/s into a controller in a PCIe 16x 
slot, wondering why it doesn't do what they thought it was going to do.

For Ceph there is a "nice" way out, because it is able to scale out by 
more smaller servers with less disks per chassis. So we tend to use 16 
drive chassis with 2 8 ports HBAs that have dedicated connections per 
disk. It is a bit more expensive but it seems to work much better.
Note that you then will then run into network problems, which are more 
of the same. Only just a bit further up the scale.

With Ceph that only plays a role during recovery of lost nodes, which 
hopefully is not too often. But a died/replaced disk will be able to 
restore at the max speed a disk can take. A lost/replaced node will 
recover at the speed limited by the disk infrastructure of the 
recovering node, since the data will come from a lot of other disks on 
other servers. The local busses will saturate when the HW design was poor.

--WjW

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d423b8c3-5aba-907c-c80f-b4974571adba>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation