Date: Wed, 2 Oct 2019 19:49:15 -0500 From: Jason Bacon <bacon4000@gmail.com> To: "Mikhail T." <mi+thun@aldan.algebra.com>, freebsd-infiniband@freebsd.org Subject: Re: Questions about Infiniband on FreeBSD Message-ID: <062c0a38-a51d-58b0-2a6e-594102debad2@gmail.com> In-Reply-To: <5570c97f-3903-d499-2420-8351f7beed37@aldan.algebra.com>
index | next in thread | previous in thread | raw e-mail
On 2019-10-02 18:58, Mikhail T. wrote: > Hello! After some wrangling, I got the direct (no switch) Infiniband > connection working reliably between my two servers (a dual port mlx4 > card in each). I have the following questions: > > 1. Why is running opensm mandatory even in a "point-to-point" setup > like mine? I would've thought, whatever the two ends need to tell > each other could be told /once/, after which the connection will > continue to work even if the opensm-process goes away. > Unfortunately, shutting down opensm freezes the connection... Is > that a hardware/firmware requirement, or can this be improved? A subnet manager is required for IPOIB. It's often run on the switch, but since you don't have one... > 2. Although pings were working and NFS would mount, data-transfers > weren't reliable until I /manually/ lowered the MTU -- on both ends > -- to 2044 (from the 65520 used by the ib-interfaces by default). > And it only occurred to me to do that, when I saw a kernel's message > on one of the two consoles complaining about a packet length of 16k > being greater than 2044... If that's a known limit, why is not the > MTU set to it by default? I saw frequent hangs (self-resolving) with an MTU of 65520. Cutting it in half improved reliability by orders of magnitude, but still occasional issues. Halving it again to 16380 seemed to be the sweet spot. > 3. Currently, I have only one cable connecting the ib1 on one machine > to ib1 of another. Would I get double the throughput if I connect > the two other ports together as well and bundle the connections? If > yes, should I bundle them as network-interfaces -- using lagg(4) -- > or is there something Infiniband-specific? Good question. With Mellanox 6036 switches, nothing needs to be configured to benefit from multiple links. We ran 6 from each of two top-level switches to each of 6 leaf switches. The switches recognize the fabric topology automatically. I don't know if the same is true with the HCAs. You could try just adding a cable and compare results from iperf, etc. > 4. Mellanox recommends keeping the cards' firmware up-to-date. Does > FreeBSD have a tool to do that? I'd also like to know. Regards, JB -- Earth is a beta site.help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?062c0a38-a51d-58b0-2a6e-594102debad2>
