From owner-freebsd-stable@freebsd.org Fri Sep 13 17:04:42 2019 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 95C31F6268 for ; Fri, 13 Sep 2019 17:04:42 +0000 (UTC) (envelope-from john@spikefishsolutions.com) Received: from mail-lj1-x22b.google.com (mail-lj1-x22b.google.com [IPv6:2a00:1450:4864:20::22b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46VMSS70nKz3PYt for ; Fri, 13 Sep 2019 17:04:40 +0000 (UTC) (envelope-from john@spikefishsolutions.com) Received: by mail-lj1-x22b.google.com with SMTP id q64so17284540ljb.12 for ; Fri, 13 Sep 2019 10:04:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=spikefishsolutions.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=Sqxgh51JYipiywhNSvv5XZf5Y4r3X3E5k0kXtEGANxI=; b=N/PosvF5w0AYwtEuhYr1HwIxIpEHApsQyHFxgbNr15pze7MsT/3VKm+vQKakRlCoBC Chn4jXXjtn01Jjtl4NtaWJ50mYSGGkt5MJSPIb+rjcNOeDZF3LdtALG5pIvLXtYb2vlz PTS4zmpGJkEkC9wJrSWn3rYbtZjGXNg3/6KK4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=Sqxgh51JYipiywhNSvv5XZf5Y4r3X3E5k0kXtEGANxI=; b=FZpO/onfias0mFVqK6YT0uv7BV6+LJ684yiV25aWgu1cGjJshOcj1N3Z0R+3Qxpb+E xnu9uqnBSwTnGN+UA2tcqVXk3m3kdLwoDPYnhveTCyd4Qmk/YSgKZXftkpWwTOwVFD5R 2xHiC7IzCiL5jO8bl3bG8IZMWCGYN3AbMF6oWpdVRyLA51+PIMsrJbmXz+v2tZHoCaAh V7Ufqhz8CjyTIUqZD/Bq+cSP3FN5KqZxcx10D9Af6mEkdCkyETqj4vWlLoWWbUewTwju TWJTmUx0a2/fPs8uA+BvU7VwOjqz01TaLaKFeT73EXwcWIujCxk20cM4d/Dc+9O9xYlq L+nQ== X-Gm-Message-State: APjAAAVg/Sk7LR5Zea2LVdFINjSMJRqLMUySSR+4S2Qp5aQnRIYCSvhd /cRp4mYEpbw9M5LCYCixbA/sr0NobVZfpSK4WPrI4Q== X-Google-Smtp-Source: APXvYqyDCnv26gXND5ScAQpQhE5pf62JxCJvG6xtBcLejgZ6YDhLRWqipg/WTn+4RhJ6qKfwmOeggehv+LU6xK4gKpw= X-Received: by 2002:a2e:91c6:: with SMTP id u6mr13255559ljg.112.1568394278163; Fri, 13 Sep 2019 10:04:38 -0700 (PDT) MIME-Version: 1.0 References: <00acac6f-3f13-a343-36c5-00fe45620eb0@gmail.com> In-Reply-To: <00acac6f-3f13-a343-36c5-00fe45620eb0@gmail.com> From: John Fleming Date: Fri, 13 Sep 2019 13:04:27 -0400 Message-ID: Subject: Re: Just joined the infiniband club To: Jason Bacon , freebsd-stable@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 46VMSS70nKz3PYt X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=spikefishsolutions.com header.s=google header.b=N/PosvF5; dmarc=none; spf=pass (mx1.freebsd.org: domain of john@spikefishsolutions.com designates 2a00:1450:4864:20::22b as permitted sender) smtp.mailfrom=john@spikefishsolutions.com X-Spamd-Result: default: False [-5.42 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[spikefishsolutions.com:s=google]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org]; DMARC_NA(0.00)[spikefishsolutions.com]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[spikefishsolutions.com:+]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[b.2.2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.5.4.1.0.0.a.2.list.dnswl.org : 127.0.5.0]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; IP_SCORE(-2.92)[ip: (-9.34), ipnet: 2a00:1450::/32(-2.96), asn: 15169(-2.24), country: US(-0.05)]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Sep 2019 17:04:42 -0000 On Sat, Sep 7, 2019 at 9:26 PM Jason Bacon wrote: > > On 2019-09-07 19:00, John Fleming wrote: > > Hi all, i've recently joined the club. I have two Dell R720s connected > > directly to each other. The card is a connectx-4. I was having a lot > > of problem with network drops. Where i'm at now is i'm running > > FreeBSD12-Stable as of a week ago and cards have been cross flashed > > with OEM firmware (these are lenovo i think) and i'm no longer getting > > network drops. This box is basically my storage server. Its exporting > > a raid 10 ZFS volume to a linux (compute 19.04 5.0.0-27-generic) box > > which is running GNS3 for a lab. > > > > So many questions.. sorry if this is a bit rambly! > > > > From what I understand this card is really 4 x 25 gig lanes. If i > > understand that correctly then 1 data transfer should be able to do at > > max 25 gig (best case) correct? > > > > I'm not getting what the difference between connected mode and > > datagram mode is. Does this have anything to do with the card > > operating in infiniband mode vs ethernet mode? FreeBSD is using the > > modules compiled in connected mode with shell script (which is really > > a bash script not a sh script) from freebsd-infiniband page. > > Nothing to do with Ethernet... > > Google turned up a brief explanation here: > > https://wiki.archlinux.org/index.php/InfiniBand > I still don't get why I would want to use one of the the other or why the option is there but it doesn't matter. After firmware upgrade and moving to FreeBSD stable (unsure which is triggering this) i can no longer set connected mode on linux. There are a lot of posts that say you have to diabled enhanced iboip mode via a modules.conf setting but the driver doesn't have any idea what that is. echoing connnected to mode file throws a write error. I poked around in linux source but like i'm not even level 1 fighter on C. i'm like generic NPC that says hi at the gates. > Those are my module building scripts on the wiki. What bash extensions > did you see? Isn't this a bash..ism? When i run it inside sh it throws a fit. No worries, i just edited loaded.conf auto-append-line > > > > Linux box complains if mtu is over 2044 with expect mulitcast drops or > > something like that so mtu on both boxes is set to 2044. > > > > Everything i'm reading makes it sound like there is no RDMA support in > > FreeBSD or maybe that was no NFS RDMA support. Is that correct? > RDMA is inherent in Infiniband AFAIK. Last I checked, there was no > support in FreeBSD for NFS over RDMA, but news travels slowly in this > group so a little digging might prove otherwise. > > > > So far it seems like these cards struggle to full 10 gig pipe. Using > > iperf (2) the best i'm getting is around 6gb(bit) sec. Interfaces > > aren't showing drops on either end. Doesn't seem to matter if i do 1, > > 2 or 4 threads on iperf. > You'll need both ends in connected mode with a fairly large MTU to get > good throughput. CentOS defaults to 64k, but FreeBSD is unstable at > that size last I checked. I got good results with 16k. > > My FreeBSD ZFS NFS server performed comparably to the CentOS servers, > with some buffer space errors causing the interface to shut down (under > the same loads that caused CentOS servers to lock up completely). > Someone mentioned that this buffer space bug has been fixed, but I no > longer have a way to test it. > > Best, > > Jason > > -- > Earth is a beta site. So .. i ended up switch to linux mode via mlxconfig -d PCID set LINK_TYPE_P1=2 LINK_TYPE_P2=2 Oh i also set MTU to 9000. After that.. the flood gates opened massively. root@R720-Storage:~ # iperf -c 10.255.255.55 -P4 ------------------------------------------------------------ Client connecting to 10.255.255.55, TCP port 5001 TCP window size: 1.01 MByte (default) ------------------------------------------------------------ [ 6] local 10.255.255.22 port 62256 connected with 10.255.255.55 port 5001 [ 3] local 10.255.255.22 port 51842 connected with 10.255.255.55 port 5001 [ 4] local 10.255.255.22 port 53680 connected with 10.255.255.55 port 5001 [ 5] local 10.255.255.22 port 33455 connected with 10.255.255.55 port 5001 [ ID] Interval Transfer Bandwidth [ 6] 0.0-10.0 sec 24.6 GBytes 21.1 Gbits/sec [ 3] 0.0-10.0 sec 23.8 GBytes 20.5 Gbits/sec [ 4] 0.0-10.0 sec 33.4 GBytes 28.7 Gbits/sec [ 5] 0.0-10.0 sec 32.9 GBytes 28.3 Gbits/sec [SUM] 0.0-10.0 sec 115 GBytes 98.5 Gbits/sec root@R720-Storage:~ # 11:56 AM root@compute720:~# iperf -c 10.255.255.22 -P4 ------------------------------------------------------------ Client connecting to 10.255.255.22, TCP port 5001 TCP window size: 325 KByte (default) ------------------------------------------------------------ [ 5] local 10.255.255.55 port 50022 connected with 10.255.255.22 port 5001 [ 3] local 10.255.255.55 port 50026 connected with 10.255.255.22 port 5001 [ 6] local 10.255.255.55 port 50024 connected with 10.255.255.22 port 5001 [ 4] local 10.255.255.55 port 50020 connected with 10.255.255.22 port 5001 [ ID] Interval Transfer Bandwidth [ 5] 0.0-10.0 sec 27.4 GBytes 23.5 Gbits/sec [ 3] 0.0-10.0 sec 26.2 GBytes 22.5 Gbits/sec [ 6] 0.0-10.0 sec 26.8 GBytes 23.1 Gbits/sec [ 4] 0.0-10.0 sec 26.0 GBytes 22.3 Gbits/sec [SUM] 0.0-10.0 sec 106 GBytes 91.4 Gbits/sec root@compute720:~# I should point out before doing this while running in IB mode with datagram mode i disabled SMT and set the power profile to performance on box boxes. This moved me up to 10-12 gig/sec, nothing like the change to ethernet which i can now fill the pipe from the looks of it. Also note a single connection doesn't do more then 25ishgig/sec. Back to SATA being the bottle neck but at least if its coming out of the cache there should be more then enough network IO. Oh one last thing, i thought i read somewhere that you needed to have a switch to do ethernet mode. This doesn't seem to be the case. I haven't shutdown opensm yet but i'll try that later as i'm assuming i no longer need that. w00t!