Date: Thu, 27 Dec 2018 11:54:16 +0100 From: Willem Jan Withagen <wjw@digiware.nl> To: Sami Halabi <sodynet1@gmail.com> Cc: freebsd-fs@freebsd.org Subject: Re: Suggestion for hardware for ZFS fileserver Message-ID: <d423b8c3-5aba-907c-c80f-b4974571adba@digiware.nl> In-Reply-To: <CAEW%2BogaKTLsmXaUGk7rZWb7u2Xqja%2BpPBK5rduX0zXCjk=2zew@mail.gmail.com> References: <CAEW%2BogZnWC07OCSuzO7E4TeYGr1E9BARKSKEh9ELCL9Zc4YY3w@mail.gmail.com> <C839431D-628C-4C73-8285-2360FE6FFE88@gmail.com> <CAEW%2BogYWKPL5jLW2H_UWEsCOiz=8fzFcSJ9S5k8k7FXMQjywsw@mail.gmail.com> <4f816be7-79e0-cacb-9502-5fbbe343cfc9@denninger.net> <3160F105-85C1-4CB4-AAD5-D16CF5D6143D@ifm.liu.se> <YQBPR01MB038805DBCCE94383219306E1DDB80@YQBPR01MB0388.CANPRD01.PROD.OUTLOOK.COM> <D0E7579B-2768-46DB-94CF-DBD23259E74B@ifm.liu.se> <CAEW%2BogaKTLsmXaUGk7rZWb7u2Xqja%2BpPBK5rduX0zXCjk=2zew@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 22/12/2018 15:49, Sami Halabi wrote: > Hi, > > What sas hba card do you recommend for 16/24 internal ports and 2 external > that are recognized and work well with freebsd ZFS. There is no real advise here, but what I saw is that it is relatively easy to overload a lot of the busses involved int his. I got this when building Ceph clusters on FreeBSD, where each disk has its own daemon to hammer away on the platters. The first bottleneck is the disk "backplane". It you do not need to wire every disk with a dedicated HBA-disk cable, then you are sharing the bandwidth on the backplane between all the disks. and dependant on the architecture on the backplane serveral disk share one expander. And the feed into that will be share by the disks attached to that expander. Some expanders will have multiple inputs from the HBA, but I seen cases where 4 sas lanes go in and only 2 get used. The second bottleneck is that once you have all these nice disks connected to your HBA, but that is only on a PCIe 4x slot..... You will need PCIe x8 or x16 for that, and PCIe 3.0 stuff. Total Bandwidth: (x16 link): PCIe 3.0 = 32GB/s, PCIe 2.0 = 16GB/s, PCIe 1.1 = 8GB/s. So lets say that your 24 port HBA has 24 disks connected, each doing 100Mbytes/sec = 19,2 Gbit/s Which will very likely saturate that PCI bus. Note that I'm 0nly talking 100Mbyte/sec. Since that is what I seed spinning rust do under Ceph. I'm not even talking about the SSDs used for journals and cache. For ZFS the bus challenge is a bit more of a problem, because you cannot scale out. But I've seen designs where an extra disk cabinet with 96 disks is attached over something like 4*12Gbit/s into a controller in a PCIe 16x slot, wondering why it doesn't do what they thought it was going to do. For Ceph there is a "nice" way out, because it is able to scale out by more smaller servers with less disks per chassis. So we tend to use 16 drive chassis with 2 8 ports HBAs that have dedicated connections per disk. It is a bit more expensive but it seems to work much better. Note that you then will then run into network problems, which are more of the same. Only just a bit further up the scale. With Ceph that only plays a role during recovery of lost nodes, which hopefully is not too often. But a died/replaced disk will be able to restore at the max speed a disk can take. A lost/replaced node will recover at the speed limited by the disk infrastructure of the recovering node, since the data will come from a lot of other disks on other servers. The local busses will saturate when the HW design was poor. --WjW
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d423b8c3-5aba-907c-c80f-b4974571adba>