Date: Fri, 13 Sep 2019 13:04:27 -0400 From: John Fleming <john@spikefishsolutions.com> To: Jason Bacon <bacon4000@gmail.com>, freebsd-stable@freebsd.org Subject: Re: Just joined the infiniband club Message-ID: <CABy3cGzfc-UjPOxMFDYtL%2BOUPw8MYH7WS3picXjGmC=a=Q1xQQ@mail.gmail.com> In-Reply-To: <00acac6f-3f13-a343-36c5-00fe45620eb0@gmail.com> References: <CABy3cGxXa8J1j%2BodmfdQ6b534BiPwOMUAMOYqXKMD6zGOeBE3w@mail.gmail.com> <00acac6f-3f13-a343-36c5-00fe45620eb0@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Sep 7, 2019 at 9:26 PM Jason Bacon <bacon4000@gmail.com> wrote: > > On 2019-09-07 19:00, John Fleming wrote: > > Hi all, i've recently joined the club. I have two Dell R720s connected > > directly to each other. The card is a connectx-4. I was having a lot > > of problem with network drops. Where i'm at now is i'm running > > FreeBSD12-Stable as of a week ago and cards have been cross flashed > > with OEM firmware (these are lenovo i think) and i'm no longer getting > > network drops. This box is basically my storage server. Its exporting > > a raid 10 ZFS volume to a linux (compute 19.04 5.0.0-27-generic) box > > which is running GNS3 for a lab. > > > > So many questions.. sorry if this is a bit rambly! > > > > From what I understand this card is really 4 x 25 gig lanes. If i > > understand that correctly then 1 data transfer should be able to do at > > max 25 gig (best case) correct? > > > > I'm not getting what the difference between connected mode and > > datagram mode is. Does this have anything to do with the card > > operating in infiniband mode vs ethernet mode? FreeBSD is using the > > modules compiled in connected mode with shell script (which is really > > a bash script not a sh script) from freebsd-infiniband page. > > Nothing to do with Ethernet... > > Google turned up a brief explanation here: > > https://wiki.archlinux.org/index.php/InfiniBand > I still don't get why I would want to use one of the the other or why the option is there but it doesn't matter. After firmware upgrade and moving to FreeBSD stable (unsure which is triggering this) i can no longer set connected mode on linux. There are a lot of posts that say you have to diabled enhanced iboip mode via a modules.conf setting but the driver doesn't have any idea what that is. echoing connnected to mode file throws a write error. I poked around in linux source but like i'm not even level 1 fighter on C. i'm like generic NPC that says hi at the gates. > Those are my module building scripts on the wiki. What bash extensions > did you see? Isn't this a bash..ism? When i run it inside sh it throws a fit. No worries, i just edited loaded.conf auto-append-line > > > > Linux box complains if mtu is over 2044 with expect mulitcast drops or > > something like that so mtu on both boxes is set to 2044. > > > > Everything i'm reading makes it sound like there is no RDMA support in > > FreeBSD or maybe that was no NFS RDMA support. Is that correct? > RDMA is inherent in Infiniband AFAIK. Last I checked, there was no > support in FreeBSD for NFS over RDMA, but news travels slowly in this > group so a little digging might prove otherwise. > > > > So far it seems like these cards struggle to full 10 gig pipe. Using > > iperf (2) the best i'm getting is around 6gb(bit) sec. Interfaces > > aren't showing drops on either end. Doesn't seem to matter if i do 1, > > 2 or 4 threads on iperf. > You'll need both ends in connected mode with a fairly large MTU to get > good throughput. CentOS defaults to 64k, but FreeBSD is unstable at > that size last I checked. I got good results with 16k. > > My FreeBSD ZFS NFS server performed comparably to the CentOS servers, > with some buffer space errors causing the interface to shut down (under > the same loads that caused CentOS servers to lock up completely). > Someone mentioned that this buffer space bug has been fixed, but I no > longer have a way to test it. > > Best, > > Jason > > -- > Earth is a beta site. So .. i ended up switch to linux mode via mlxconfig -d PCID set LINK_TYPE_P1=2 LINK_TYPE_P2=2 Oh i also set MTU to 9000. After that.. the flood gates opened massively. root@R720-Storage:~ # iperf -c 10.255.255.55 -P4 ------------------------------------------------------------ Client connecting to 10.255.255.55, TCP port 5001 TCP window size: 1.01 MByte (default) ------------------------------------------------------------ [ 6] local 10.255.255.22 port 62256 connected with 10.255.255.55 port 5001 [ 3] local 10.255.255.22 port 51842 connected with 10.255.255.55 port 5001 [ 4] local 10.255.255.22 port 53680 connected with 10.255.255.55 port 5001 [ 5] local 10.255.255.22 port 33455 connected with 10.255.255.55 port 5001 [ ID] Interval Transfer Bandwidth [ 6] 0.0-10.0 sec 24.6 GBytes 21.1 Gbits/sec [ 3] 0.0-10.0 sec 23.8 GBytes 20.5 Gbits/sec [ 4] 0.0-10.0 sec 33.4 GBytes 28.7 Gbits/sec [ 5] 0.0-10.0 sec 32.9 GBytes 28.3 Gbits/sec [SUM] 0.0-10.0 sec 115 GBytes 98.5 Gbits/sec root@R720-Storage:~ # 11:56 AM root@compute720:~# iperf -c 10.255.255.22 -P4 ------------------------------------------------------------ Client connecting to 10.255.255.22, TCP port 5001 TCP window size: 325 KByte (default) ------------------------------------------------------------ [ 5] local 10.255.255.55 port 50022 connected with 10.255.255.22 port 5001 [ 3] local 10.255.255.55 port 50026 connected with 10.255.255.22 port 5001 [ 6] local 10.255.255.55 port 50024 connected with 10.255.255.22 port 5001 [ 4] local 10.255.255.55 port 50020 connected with 10.255.255.22 port 5001 [ ID] Interval Transfer Bandwidth [ 5] 0.0-10.0 sec 27.4 GBytes 23.5 Gbits/sec [ 3] 0.0-10.0 sec 26.2 GBytes 22.5 Gbits/sec [ 6] 0.0-10.0 sec 26.8 GBytes 23.1 Gbits/sec [ 4] 0.0-10.0 sec 26.0 GBytes 22.3 Gbits/sec [SUM] 0.0-10.0 sec 106 GBytes 91.4 Gbits/sec root@compute720:~# I should point out before doing this while running in IB mode with datagram mode i disabled SMT and set the power profile to performance on box boxes. This moved me up to 10-12 gig/sec, nothing like the change to ethernet which i can now fill the pipe from the looks of it. Also note a single connection doesn't do more then 25ishgig/sec. Back to SATA being the bottle neck but at least if its coming out of the cache there should be more then enough network IO. Oh one last thing, i thought i read somewhere that you needed to have a switch to do ethernet mode. This doesn't seem to be the case. I haven't shutdown opensm yet but i'll try that later as i'm assuming i no longer need that. w00t!
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABy3cGzfc-UjPOxMFDYtL%2BOUPw8MYH7WS3picXjGmC=a=Q1xQQ>