Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 2 Oct 2019 19:49:15 -0500
From:      Jason Bacon <bacon4000@gmail.com>
To:        "Mikhail T." <mi+thun@aldan.algebra.com>, freebsd-infiniband@freebsd.org
Subject:   Re: Questions about Infiniband on FreeBSD
Message-ID:  <062c0a38-a51d-58b0-2a6e-594102debad2@gmail.com>
In-Reply-To: <5570c97f-3903-d499-2420-8351f7beed37@aldan.algebra.com>

index | next in thread | previous in thread | raw e-mail

On 2019-10-02 18:58, Mikhail T. wrote:
> Hello! After some wrangling, I got the direct (no switch) Infiniband 
> connection working reliably between my two servers (a dual port mlx4 
> card in each). I have the following questions:
>
> 1. Why is running opensm mandatory even in a "point-to-point" setup
>    like mine? I would've thought, whatever the two ends need to tell
>    each other could be told /once/, after which the connection will
>    continue to work even if the opensm-process goes away.
>    Unfortunately, shutting down opensm freezes the connection... Is
>    that a hardware/firmware requirement, or can this be improved?
A subnet manager is required for IPOIB.  It's often run on the switch, 
but since you don't have one...
> 2. Although pings were working and NFS would mount, data-transfers
>    weren't reliable until I /manually/ lowered the MTU -- on both ends
>    -- to 2044 (from the 65520 used by the ib-interfaces by default).
>    And it only occurred to me to do that, when I saw a kernel's message
>    on one of the two consoles complaining about a packet length of 16k
>    being greater than 2044... If that's a known limit, why is not the
>    MTU set to it by default?
I saw frequent hangs (self-resolving) with an MTU of 65520.  Cutting it 
in half improved reliability by orders of magnitude, but still 
occasional issues.  Halving it again to 16380 seemed to be the sweet spot.

> 3. Currently, I have only one cable connecting the ib1 on one machine
>    to ib1 of another. Would I get double the throughput if I connect
>    the two other ports together as well and bundle the connections? If
>    yes, should I bundle them as network-interfaces -- using lagg(4) --
>    or is there something Infiniband-specific?
Good question.  With Mellanox 6036 switches, nothing needs to be 
configured to benefit from multiple links.  We ran 6 from each of two 
top-level switches to each of 6 leaf switches.  The switches recognize 
the fabric topology automatically.  I don't know if the same is true 
with the HCAs.  You could try just adding a cable and compare results 
from iperf, etc.
> 4. Mellanox recommends keeping the cards' firmware up-to-date. Does
>    FreeBSD have a tool to do that?
I'd also like to know.

Regards,

     JB

-- 
Earth is a beta site.




help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?062c0a38-a51d-58b0-2a6e-594102debad2>