From owner-freebsd-infiniband@freebsd.org Fri Sep 13 19:13:39 2019 Return-Path: Delivered-To: freebsd-infiniband@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 595E0D09C2 for ; Fri, 13 Sep 2019 19:13:39 +0000 (UTC) (envelope-from john@spikefishsolutions.com) Received: from mail-lf1-x130.google.com (mail-lf1-x130.google.com [IPv6:2a00:1450:4864:20::130]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46VQKG3xzQz41pd for ; Fri, 13 Sep 2019 19:13:38 +0000 (UTC) (envelope-from john@spikefishsolutions.com) Received: by mail-lf1-x130.google.com with SMTP id c195so8057861lfg.9 for ; Fri, 13 Sep 2019 12:13:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=spikefishsolutions.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=Ac/+IksczV1G3hoZx+ci7QErrTKvg6/JEREBh+PWyBU=; b=OLwdaioawKQZFAZjCCq8PkvIgRMohyDc6ZcMNOIImaRh1L94yxkkZ8T8pYdISdX4sA Pkwhwx1IxHttvM49sMn0Vy2Swuupkdo7qalshTgU6ExUwqzYNC1vn5oeNXfRCIKI5UjA fihnZSS9/P8OmeVsbvdABXRTiJMKQt74xL4cY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=Ac/+IksczV1G3hoZx+ci7QErrTKvg6/JEREBh+PWyBU=; b=Kw+7lTjGoHKhdlqYL/jlV9AQem/aihDqRUU+txiBJ8uFfeGin9FVFwcqR/53YY3oMq 0x2aPTQ6xyLwrhXV7KG2s7Wtvx8hEIKKudKcObNWE+dKp1CNtDGceN+Rl4oPX64qtvYp 8V4P+183tuD26wqZdZSPMMdr6uzHWKDavTgG920h8z16FRK4rEH9cuAcrU3H4DITPdtm y4M14X9kcEJ9nnZwxCvkezNo8Cft7/EQZl774Eawepv7oU+HMnXLpirt0eCJ2fgtkRTz sudsKrSSjtMdd+Qm7e9++Ru1yXx6d+ggLdPNXFInn7rB3Sa2184n2nYlxhPlE7GzDGgt 9kQA== X-Gm-Message-State: APjAAAVN7NstiSpxOpJ2NC5c2D6mZTIL4ghtRu//kcXriqjclAa1eCvv mrdYFMIlRGFui/erVe9QMZ88CMbREiK3iOY/YvMoX7uT X-Google-Smtp-Source: APXvYqyaj7L7ULniUp+gShoCwmSsoxRQwAIjKDTkTDiPvBTMrWzNQqX+I5t+UzZX/59eckMyHAiLaLZotPy4ZXFher0= X-Received: by 2002:a19:f711:: with SMTP id z17mr12989834lfe.58.1568402016201; Fri, 13 Sep 2019 12:13:36 -0700 (PDT) MIME-Version: 1.0 References: <00acac6f-3f13-a343-36c5-00fe45620eb0@gmail.com> In-Reply-To: From: John Fleming Date: Fri, 13 Sep 2019 15:13:25 -0400 Message-ID: Subject: Re: Just joined the infiniband club To: freebsd-infiniband@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 46VQKG3xzQz41pd X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=spikefishsolutions.com header.s=google header.b=OLwdaioa; dmarc=none; spf=pass (mx1.freebsd.org: domain of john@spikefishsolutions.com designates 2a00:1450:4864:20::130 as permitted sender) smtp.mailfrom=john@spikefishsolutions.com X-Spamd-Result: default: False [-5.36 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[spikefishsolutions.com:s=google]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-infiniband@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; DMARC_NA(0.00)[spikefishsolutions.com]; DKIM_TRACE(0.00)[spikefishsolutions.com:+]; RCVD_IN_DNSWL_NONE(0.00)[0.3.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.5.4.1.0.0.a.2.list.dnswl.org : 127.0.5.0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; IP_SCORE(-2.86)[ip: (-9.04), ipnet: 2a00:1450::/32(-2.96), asn: 15169(-2.24), country: US(-0.05)]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Sep 2019 19:13:39 -0000 And of course I meant ethernet mode not linux mode. On Fri, Sep 13, 2019 at 2:36 PM John Fleming wrote: > > Top post I know, but i meant to send this to freebsd-infiniband not stable > > > > > On 2019-09-07 19:00, John Fleming wrote: > > > Hi all, i've recently joined the club. I have two Dell R720s connected > > > directly to each other. The card is a connectx-4. I was having a lot > > > of problem with network drops. Where i'm at now is i'm running > > > FreeBSD12-Stable as of a week ago and cards have been cross flashed > > > with OEM firmware (these are lenovo i think) and i'm no longer getting > > > network drops. This box is basically my storage server. Its exporting > > > a raid 10 ZFS volume to a linux (compute 19.04 5.0.0-27-generic) box > > > which is running GNS3 for a lab. > > > > > > So many questions.. sorry if this is a bit rambly! > > > > > > From what I understand this card is really 4 x 25 gig lanes. If i > > > understand that correctly then 1 data transfer should be able to do at > > > max 25 gig (best case) correct? > > > > > > I'm not getting what the difference between connected mode and > > > datagram mode is. Does this have anything to do with the card > > > operating in infiniband mode vs ethernet mode? FreeBSD is using the > > > modules compiled in connected mode with shell script (which is really > > > a bash script not a sh script) from freebsd-infiniband page. > > > > Nothing to do with Ethernet... > > > > Google turned up a brief explanation here: > > > > https://wiki.archlinux.org/index.php/InfiniBand > > > I still don't get why I would want to use one of the the other or why > the option is there but it doesn't matter. > After firmware upgrade and moving to FreeBSD stable (unsure which is > triggering this) i can no longer > set connected mode on linux. There are a lot of posts that say you > have to diabled enhanced iboip mode > via a modules.conf setting but the driver doesn't have any idea what > that is. echoing connnected to mode file > throws a write error. I poked around in linux source but like i'm not > even level 1 fighter on C. i'm like generic NPC > that says hi at the gates. > > > Those are my module building scripts on the wiki. What bash extensions > > did you see? > > Isn't this a bash..ism? When i run it inside sh it throws a fit. No > worries, i just edited loaded.conf > > auto-append-line > > > > > > > Linux box complains if mtu is over 2044 with expect mulitcast drops or > > > something like that so mtu on both boxes is set to 2044. > > > > > > Everything i'm reading makes it sound like there is no RDMA support in > > > FreeBSD or maybe that was no NFS RDMA support. Is that correct? > > RDMA is inherent in Infiniband AFAIK. Last I checked, there was no > > support in FreeBSD for NFS over RDMA, but news travels slowly in this > > group so a little digging might prove otherwise. > > > > > > So far it seems like these cards struggle to full 10 gig pipe. Using > > > iperf (2) the best i'm getting is around 6gb(bit) sec. Interfaces > > > aren't showing drops on either end. Doesn't seem to matter if i do 1, > > > 2 or 4 threads on iperf. > > You'll need both ends in connected mode with a fairly large MTU to get > > good throughput. CentOS defaults to 64k, but FreeBSD is unstable at > > that size last I checked. I got good results with 16k. > > > > My FreeBSD ZFS NFS server performed comparably to the CentOS servers, > > with some buffer space errors causing the interface to shut down (under > > the same loads that caused CentOS servers to lock up completely). > > Someone mentioned that this buffer space bug has been fixed, but I no > > longer have a way to test it. > > > > Best, > > > > Jason > > > > -- > > Earth is a beta site. > > So .. i ended up switch to ETHERNET mode via mlxconfig -d PCID set > LINK_TYPE_P1=2 LINK_TYPE_P2=2 > Oh i also set MTU to 9000. > > After that.. the flood gates opened massively. > > root@R720-Storage:~ # iperf -c 10.255.255.55 -P4 > ------------------------------------------------------------ > Client connecting to 10.255.255.55, TCP port 5001 > TCP window size: 1.01 MByte (default) > ------------------------------------------------------------ > [ 6] local 10.255.255.22 port 62256 connected with 10.255.255.55 port 5001 > [ 3] local 10.255.255.22 port 51842 connected with 10.255.255.55 port 5001 > [ 4] local 10.255.255.22 port 53680 connected with 10.255.255.55 port 5001 > [ 5] local 10.255.255.22 port 33455 connected with 10.255.255.55 port 5001 > [ ID] Interval Transfer Bandwidth > [ 6] 0.0-10.0 sec 24.6 GBytes 21.1 Gbits/sec > [ 3] 0.0-10.0 sec 23.8 GBytes 20.5 Gbits/sec > [ 4] 0.0-10.0 sec 33.4 GBytes 28.7 Gbits/sec > [ 5] 0.0-10.0 sec 32.9 GBytes 28.3 Gbits/sec > [SUM] 0.0-10.0 sec 115 GBytes 98.5 Gbits/sec > root@R720-Storage:~ # > 11:56 AM > root@compute720:~# iperf -c 10.255.255.22 -P4 > ------------------------------------------------------------ > Client connecting to 10.255.255.22, TCP port 5001 > TCP window size: 325 KByte (default) > ------------------------------------------------------------ > [ 5] local 10.255.255.55 port 50022 connected with 10.255.255.22 port 5001 > [ 3] local 10.255.255.55 port 50026 connected with 10.255.255.22 port 5001 > [ 6] local 10.255.255.55 port 50024 connected with 10.255.255.22 port 5001 > [ 4] local 10.255.255.55 port 50020 connected with 10.255.255.22 port 5001 > [ ID] Interval Transfer Bandwidth > [ 5] 0.0-10.0 sec 27.4 GBytes 23.5 Gbits/sec > [ 3] 0.0-10.0 sec 26.2 GBytes 22.5 Gbits/sec > [ 6] 0.0-10.0 sec 26.8 GBytes 23.1 Gbits/sec > [ 4] 0.0-10.0 sec 26.0 GBytes 22.3 Gbits/sec > [SUM] 0.0-10.0 sec 106 GBytes 91.4 Gbits/sec > root@compute720:~# > > I should point out before doing this while running in IB mode with > datagram mode i disabled SMT and set the power profile to performance > on box boxes. This moved me up to 10-12 gig/sec, nothing like the > change to ethernet which i can now fill the pipe from the looks of it. > > Also note a single connection doesn't do more then 25ishgig/sec. > > Back to SATA being the bottle neck but at least if its coming out of > the cache there should be more then enough network IO. > > Oh one last thing, i thought i read somewhere that you needed to have > a switch to do ethernet mode. This doesn't seem to be the case. I > haven't shutdown opensm yet but i'll try that later as i'm assuming i > no longer need that. > > w00t!