From owner-freebsd-infiniband@freebsd.org Sun Sep 8 00:00:55 2019 Return-Path: Delivered-To: freebsd-infiniband@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 65832E1EEF for ; Sun, 8 Sep 2019 00:00:55 +0000 (UTC) (envelope-from john@spikefishsolutions.com) Received: from mail-lf1-x134.google.com (mail-lf1-x134.google.com [IPv6:2a00:1450:4864:20::134]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46QrzV1Zjtz4Gh3 for ; Sun, 8 Sep 2019 00:00:53 +0000 (UTC) (envelope-from john@spikefishsolutions.com) Received: by mail-lf1-x134.google.com with SMTP id j4so7785234lfh.8 for ; Sat, 07 Sep 2019 17:00:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=spikefishsolutions.com; s=google; h=mime-version:from:date:message-id:subject:to; bh=oCNyuDMHXeinBbP3I71lb4qmejj9yvOwxjBlDJEiFQk=; b=l0pb2eqqkZZsiaZyNekDoX8Znzd5wI3UxZN7gABo4NGUA7mQ3q5xAp5zOfwH3rAeHV jc8eLSa37sJRwUINYPNJjDhZPPJgS4iezhQnevTPL9Gumjk5qtZFu43J9IYac1SUaU5R sY8VguWhyJqZS3olQlRqoDZ6Wn3riCq7ZA9ys= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=oCNyuDMHXeinBbP3I71lb4qmejj9yvOwxjBlDJEiFQk=; b=FM1+e6J+ES8X67e6r1xvT1+/Aeu4LSUOkdG31QwWUOEjjlC6+2W9oHw/OtMAaqRerS HZdF/V0AASzjUwsQPaK3bkM2I4rO4sTJkM0N5r83Wvq7c8+O0q/10I1E7+4/jnko38zo mQ183K20+Ns4B2/dBWuV/6YlOVci9tBjmQ5BInFm+WBgJfQSrvIOaApi9RVqdmD740vl 63pKpKNJoa7ovnBb2aiQTK3MWbRB/AEj9wOuObBMkiAiYfzo+CBVmZtsZ2Pd/PmlSRlx CuoH2ZdSOMqywXaU3bRfnpiL14GaAKp0lNKvFn5AaEdJlOq3xAnwnRF183GMJPpape7K oKGQ== X-Gm-Message-State: APjAAAW6hEgwME1tYKiV8xRTjkWzf6QPzJ+009rx8DTrcHsO92Bhh7DE 4Kvr8g7I7x2YQWz78K+EmmJUh93B1MrjnY+4uzbOUX6FpnY= X-Google-Smtp-Source: APXvYqx+n+vLukJFDq2Zi+zKcRzFBQ6TWAxfU3DmAA8VTDsLd/uvuHduekN9BsibbZpxi0UtOuEfTwJgel7oWQgbgmw= X-Received: by 2002:a19:7715:: with SMTP id s21mr10874654lfc.98.1567900850333; Sat, 07 Sep 2019 17:00:50 -0700 (PDT) MIME-Version: 1.0 From: John Fleming Date: Sat, 7 Sep 2019 20:00:38 -0400 Message-ID: Subject: Just joined the infiniband club To: freebsd-infiniband@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 46QrzV1Zjtz4Gh3 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; dkim=pass header.d=spikefishsolutions.com header.s=google header.b=l0pb2eqq; dmarc=none; spf=pass (mx1.freebsd.org: domain of john@spikefishsolutions.com designates 2a00:1450:4864:20::134 as permitted sender) smtp.mailfrom=john@spikefishsolutions.com X-Spamd-Result: default: False [-6.40 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[spikefishsolutions.com:s=google]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-infiniband@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; DMARC_NA(0.00)[spikefishsolutions.com]; DKIM_TRACE(0.00)[spikefishsolutions.com:+]; NEURAL_HAM_SHORT(-1.00)[-0.998,0]; RCVD_IN_DNSWL_NONE(0.00)[4.3.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.5.4.1.0.0.a.2.list.dnswl.org : 127.0.5.0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; IP_SCORE(-2.91)[ip: (-9.24), ipnet: 2a00:1450::/32(-2.97), asn: 15169(-2.27), country: US(-0.05)]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Sep 2019 00:00:55 -0000 Hi all, i've recently joined the club. I have two Dell R720s connected directly to each other. The card is a connectx-4. I was having a lot of problem with network drops. Where i'm at now is i'm running FreeBSD12-Stable as of a week ago and cards have been cross flashed with OEM firmware (these are lenovo i think) and i'm no longer getting network drops. This box is basically my storage server. Its exporting a raid 10 ZFS volume to a linux (compute 19.04 5.0.0-27-generic) box which is running GNS3 for a lab. So many questions.. sorry if this is a bit rambly! >From what I understand this card is really 4 x 25 gig lanes. If i understand that correctly then 1 data transfer should be able to do at max 25 gig (best case) correct? I'm not getting what the difference between connected mode and datagram mode is. Does this have anything to do with the card operating in infiniband mode vs ethernet mode? FreeBSD is using the modules compiled in connected mode with shell script (which is really a bash script not a sh script) from freebsd-infiniband page. Linux box complains if mtu is over 2044 with expect mulitcast drops or something like that so mtu on both boxes is set to 2044. Everything i'm reading makes it sound like there is no RDMA support in FreeBSD or maybe that was no NFS RDMA support. Is that correct? So far it seems like these cards struggle to full 10 gig pipe. Using iperf (2) the best i'm getting is around 6gb(bit) sec. Interfaces aren't showing drops on either end. Doesn't seem to matter if i do 1, 2 or 4 threads on iperf. Here is the card mlx5_core0@pci0:66:0:0: class=0x020700 card=0x001415b3 chip=0x101315b3 rev=0x00 hdr=0x00 vendor = 'Mellanox Technologies' device = 'MT27700 Family [ConnectX-4]' class = network This is a MCA456A (dual port connectX-4 infiniband/ethernet). Should be in a 16x slot.. but .. hmm is it? Looking at pciconf i can't tell. Dell R720 - CPU E5-2670 ECC DDR-1600 128GB (16GB sticks in white slots) Compute is - for sure is in pcie 16x slot here. Dell R720 CPU E5-2697 ECC DDR-1600 128GB (16GB sticks in white slots) root@R720-Storage:/var/log # ibstat CA 'mlx5_0' CA type: MT4115 Number of ports: 1 Firmware version: 12.25.1020 Hardware version: 0 Node GUID: 0x248a07030049f308 System image GUID: 0x248a07030049f308 Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 1 LMC: 0 SM lid: 1 Capability mask: 0x2651e84a Port GUID: 0x248a07030049f308 Link layer: InfiniBand root@R720-Storage:/var/log # netstat -inb | egrep 'ib0|Name' Name Mtu Network Address Ipkts Ierrs Idrop Ibytes Opkts Oerrs Obytes Coll ib0 2044 00:00:00:85:fe:80 287483828 0 0 531774120120 330632289 1 401889930592 0 ib0 - 10.255.255.0/ 10.255.255.22 287483710 - - 519124822036 330632186 - 393954749268 - root@R720-Storage:/var/log # This is with nothing going on right now. root@R720-Storage:/var/log # iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 64.0 KByte (default) ------------------------------------------------------------ [ 4] local 10.255.255.22 port 5001 connected with 10.255.255.55 port 56238 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 6.21 GBytes 5.33 Gbits/sec root@compute720:~# iperf -c 10.255.255.22 ------------------------------------------------------------ Client connecting to 10.255.255.22, TCP port 5001 TCP window size: 85.0 KByte (default) ------------------------------------------------------------ [ 3] local 10.255.255.55 port 56238 connected with 10.255.255.22 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 6.21 GBytes 5.33 Gbits/sec root@compute720:~# Swapped root@R720-Storage:/var/log # iperf -c 10.255.255.55 ------------------------------------------------------------ Client connecting to 10.255.255.55, TCP port 5001 TCP window size: 209 KByte (default) ------------------------------------------------------------ [ 3] local 10.255.255.22 port 46814 connected with 10.255.255.55 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.1 sec 3.77 GBytes 3.22 Gbits/sec root@R720-Storage:/var/log # root@compute720:~# iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 4] local 10.255.255.55 port 5001 connected with 10.255.255.22 port 46814 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.1 sec 3.77 GBytes 3.22 Gbits/sec From owner-freebsd-infiniband@freebsd.org Sun Sep 8 01:26:51 2019 Return-Path: Delivered-To: freebsd-infiniband@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 780E1E3D8D for ; Sun, 8 Sep 2019 01:26:51 +0000 (UTC) (envelope-from bacon4000@gmail.com) Received: from mail-io1-xd2c.google.com (mail-io1-xd2c.google.com [IPv6:2607:f8b0:4864:20::d2c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46Qttf4XKmz4Kr3 for ; Sun, 8 Sep 2019 01:26:50 +0000 (UTC) (envelope-from bacon4000@gmail.com) Received: by mail-io1-xd2c.google.com with SMTP id f12so21102267iog.12 for ; Sat, 07 Sep 2019 18:26:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding:content-language; bh=ZyXZAe8883PdkmKiBmv8cwMNbQKufPVH7MiVaZRNZ80=; b=Kv5XuHF7C0go33Inp+eZyBZYPVH56FJaEuFg733C1UDOB326VXuof0+aT9oZnhjAkM VqiKe/aiPk9Q0SaT8PO0+VGm6l8rL8ILZRBNkPRn18yR3kYrEOI5YBYiuNIhsmHMFne5 rbGb7S8HIXLw9WP0vMXU2qkr2mcRXMVXVjjOzoB3wDWnO0Gw3PqOPiID/Eto1Nd68EhW P4mqq2u0EV9PZsWIGw3DTOpYJn6/hTsbje4peLO4ADVN/sF0j0bkIIMzD4e1FUjdn+Oq QaQqbwvadBjPjpKHu/aggGIA1K4O4y8O/6JZHB/6m/W07uFVD4sscc+hr8/iLqGERcR9 +uGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=ZyXZAe8883PdkmKiBmv8cwMNbQKufPVH7MiVaZRNZ80=; b=W+dk8JrM9ifIc/YRHSKM3zB8RntjcMLWvcWlSM2OuAXvXlIfKqxwj9G7UbnxoGj+LN WoCNLNXvVgrPLLnE3YaB2bYwx/W4vFFWmfGD1gR8Dkn4HT8rszUw6Quyk5clH4QuFFfC txxsBeLEZrf1k6Ja8cARk3aDuKK8fEKfL+scgv5SfiTFAdyLJs0EOl6zpbtyHPrXi7HW rcayHD7sbFGRpzfZhNczJt/Q+qnIXdOPdkApLQbJmLFAoRd91SjoyoYNNSDngLl3lSnX ldKFFQc+r3uCpZ9nXIS3IxmeYTfEoutAsvEOwCU2NU1fVlHzj3TJ3q1IvFSeMzpGI7cz Pr1Q== X-Gm-Message-State: APjAAAWccgvGCkCnNXNugJ3YidsTw4oQODSbq2BSS7Ro90FIHRTgSj9E t4FU++EqK4frtzp5Sz4XLmPgcHgR X-Google-Smtp-Source: APXvYqzD9ITTmaBfLHkofW1dJrBzDdUHUl1c97W/XJ8kSb5jVyEyxtvpKD1+auTHQWMkzlhIjwCAaQ== X-Received: by 2002:a02:b882:: with SMTP id p2mr18347200jam.16.1567906008564; Sat, 07 Sep 2019 18:26:48 -0700 (PDT) Received: from mako.acadix.biz ([2605:a000:bc45:5f00:f2de:f1ff:fe17:e25f]) by smtp.gmail.com with ESMTPSA id x9sm6841986iol.23.2019.09.07.18.26.47 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 07 Sep 2019 18:26:47 -0700 (PDT) Subject: Re: Just joined the infiniband club To: John Fleming , freebsd-infiniband@freebsd.org References: From: Jason Bacon Message-ID: <00acac6f-3f13-a343-36c5-00fe45620eb0@gmail.com> Date: Sat, 7 Sep 2019 20:26:46 -0500 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:68.0) Gecko/20100101 Thunderbird/68.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Content-Language: en-US X-Rspamd-Queue-Id: 46Qttf4XKmz4Kr3 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=Kv5XuHF7; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of bacon4000@gmail.com designates 2607:f8b0:4864:20::d2c as permitted sender) smtp.mailfrom=bacon4000@gmail.com X-Spamd-Result: default: False [-4.00 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; FREEMAIL_FROM(0.00)[gmail.com]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; NEURAL_HAM_SHORT(-1.00)[-0.996,0]; FROM_EQ_ENVFROM(0.00)[]; IP_SCORE(0.00)[ip: (-5.30), ipnet: 2607:f8b0::/32(-2.75), asn: 15169(-2.27), country: US(-0.05)]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-infiniband@freebsd.org]; IP_SCORE_FREEMAIL(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[c.2.d.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org : 127.0.5.0]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Sep 2019 01:26:51 -0000 On 2019-09-07 19:00, John Fleming wrote: > Hi all, i've recently joined the club. I have two Dell R720s connected > directly to each other. The card is a connectx-4. I was having a lot > of problem with network drops. Where i'm at now is i'm running > FreeBSD12-Stable as of a week ago and cards have been cross flashed > with OEM firmware (these are lenovo i think) and i'm no longer getting > network drops. This box is basically my storage server. Its exporting > a raid 10 ZFS volume to a linux (compute 19.04 5.0.0-27-generic) box > which is running GNS3 for a lab. > > So many questions.. sorry if this is a bit rambly! > > From what I understand this card is really 4 x 25 gig lanes. If i > understand that correctly then 1 data transfer should be able to do at > max 25 gig (best case) correct? > > I'm not getting what the difference between connected mode and > datagram mode is. Does this have anything to do with the card > operating in infiniband mode vs ethernet mode? FreeBSD is using the > modules compiled in connected mode with shell script (which is really > a bash script not a sh script) from freebsd-infiniband page. Nothing to do with Ethernet... Google turned up a brief explanation here: https://wiki.archlinux.org/index.php/InfiniBand Those are my module building scripts on the wiki.=C2=A0 What bash extensi= ons=20 did you see? > > Linux box complains if mtu is over 2044 with expect mulitcast drops or > something like that so mtu on both boxes is set to 2044. > > Everything i'm reading makes it sound like there is no RDMA support in > FreeBSD or maybe that was no NFS RDMA support. Is that correct? RDMA is inherent in Infiniband AFAIK.=C2=A0 Last I checked, there was no = support in FreeBSD for NFS over RDMA, but news travels slowly in this=20 group so a little digging might prove otherwise. > > So far it seems like these cards struggle to full 10 gig pipe. Using > iperf (2) the best i'm getting is around 6gb(bit) sec. Interfaces > aren't showing drops on either end. Doesn't seem to matter if i do 1, > 2 or 4 threads on iperf. You'll need both ends in connected mode with a fairly large MTU to get=20 good throughput.=C2=A0 CentOS defaults to 64k, but FreeBSD is unstable at= =20 that size last I checked.=C2=A0 I got good results with 16k. My FreeBSD ZFS NFS server performed comparably to the CentOS servers,=20 with some buffer space errors causing the interface to shut down (under=20 the same loads that caused CentOS servers to lock up completely).=C2=A0=20 Someone mentioned that this buffer space bug has been fixed, but I no=20 longer have a way to test it. Best, =C2=A0=C2=A0=C2=A0 Jason --=20 Earth is a beta site. From owner-freebsd-infiniband@freebsd.org Fri Sep 13 18:37:05 2019 Return-Path: Delivered-To: freebsd-infiniband@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id AAF78F7A73 for ; Fri, 13 Sep 2019 18:37:05 +0000 (UTC) (envelope-from john@spikefishsolutions.com) Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46VPW46FXtz3ykh for ; Fri, 13 Sep 2019 18:37:04 +0000 (UTC) (envelope-from john@spikefishsolutions.com) Received: by mail-lf1-x12b.google.com with SMTP id w67so22856650lff.4 for ; Fri, 13 Sep 2019 11:37:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=spikefishsolutions.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=8+zvtT38RIRtPguQqfeBv/rVd6rpSXrObTAKU8GdD2M=; b=faGTG1bwcwyVn0FQzDK7GO+YoiKK6wR9vkILtxZOM8r4q5xyx89YFR1HT7G/RqSzOT anZopdLQoonUj6a9y2BoEYgk+UtW2zpRGS7GEfswUnCojQbwBjWwMIQ9mZb9/ExY/AB3 v428h2BWjs8NlrU4w06RhrmtEfXuooqRFk14Y= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=8+zvtT38RIRtPguQqfeBv/rVd6rpSXrObTAKU8GdD2M=; b=iPS3zekzhMsEUIXBfBfMMzvmCEj/edCzCEFHStiDtRTAdALl9S1cHvpEYxLStBIINp M4WYSEAOBeX2eAhFVgIEpGADpI/wIfo2lxytnqPHq50nevZurkiNc4jyII/Cb+Ajdvk7 kLQI640ngQb6Jm75MAQmVC5O7lSYQxvZpIceLj2GPtkNhuvL097fqUzOLjMfSVP5n9hz 92o4uugOqCVegoSJXY2OHF8niENVj/UcoXr2N89Wq2FuOHXYsliWPqaw4q5p3z7zrvhM uxwahcDc5qjcHXbfqrwbi6jO/+6ek+dtT+WUBCW/DLhHdC10eYfJ7pMpsLRwaoyxeIEV K6kg== X-Gm-Message-State: APjAAAWt780ZY+JVllI8xGlPr9uI/2miH+0OfKlsB5fWqyev9fo5S7PL U9Ek6/+vkpW2z9LdZHMqdVF4k95y6+6GcTgLKAOFl3wH X-Google-Smtp-Source: APXvYqzIJCnZJxP3bsFIv7eX5IK8en/ULTX3wXlemuNJL3uagnegETGlQ3zlpK0ni3XDWYY7Cq7hMt9+WEHhSc3TfME= X-Received: by 2002:ac2:510c:: with SMTP id q12mr31227824lfb.163.1568399822564; Fri, 13 Sep 2019 11:37:02 -0700 (PDT) MIME-Version: 1.0 References: <00acac6f-3f13-a343-36c5-00fe45620eb0@gmail.com> In-Reply-To: From: John Fleming Date: Fri, 13 Sep 2019 14:36:52 -0400 Message-ID: Subject: Fwd: Just joined the infiniband club To: freebsd-infiniband@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 46VPW46FXtz3ykh X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=spikefishsolutions.com header.s=google header.b=faGTG1bw; dmarc=none; spf=pass (mx1.freebsd.org: domain of john@spikefishsolutions.com designates 2a00:1450:4864:20::12b as permitted sender) smtp.mailfrom=john@spikefishsolutions.com X-Spamd-Result: default: False [-5.42 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[spikefishsolutions.com:s=google]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-infiniband@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; DMARC_NA(0.00)[spikefishsolutions.com]; DKIM_TRACE(0.00)[spikefishsolutions.com:+]; RCVD_IN_DNSWL_NONE(0.00)[b.2.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.5.4.1.0.0.a.2.list.dnswl.org : 127.0.5.0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; IP_SCORE(-2.92)[ip: (-9.35), ipnet: 2a00:1450::/32(-2.96), asn: 15169(-2.24), country: US(-0.05)]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Sep 2019 18:37:05 -0000 Top post I know, but i meant to send this to freebsd-infiniband not stable > > On 2019-09-07 19:00, John Fleming wrote: > > Hi all, i've recently joined the club. I have two Dell R720s connected > > directly to each other. The card is a connectx-4. I was having a lot > > of problem with network drops. Where i'm at now is i'm running > > FreeBSD12-Stable as of a week ago and cards have been cross flashed > > with OEM firmware (these are lenovo i think) and i'm no longer getting > > network drops. This box is basically my storage server. Its exporting > > a raid 10 ZFS volume to a linux (compute 19.04 5.0.0-27-generic) box > > which is running GNS3 for a lab. > > > > So many questions.. sorry if this is a bit rambly! > > > > From what I understand this card is really 4 x 25 gig lanes. If i > > understand that correctly then 1 data transfer should be able to do at > > max 25 gig (best case) correct? > > > > I'm not getting what the difference between connected mode and > > datagram mode is. Does this have anything to do with the card > > operating in infiniband mode vs ethernet mode? FreeBSD is using the > > modules compiled in connected mode with shell script (which is really > > a bash script not a sh script) from freebsd-infiniband page. > > Nothing to do with Ethernet... > > Google turned up a brief explanation here: > > https://wiki.archlinux.org/index.php/InfiniBand > I still don't get why I would want to use one of the the other or why the option is there but it doesn't matter. After firmware upgrade and moving to FreeBSD stable (unsure which is triggering this) i can no longer set connected mode on linux. There are a lot of posts that say you have to diabled enhanced iboip mode via a modules.conf setting but the driver doesn't have any idea what that is. echoing connnected to mode file throws a write error. I poked around in linux source but like i'm not even level 1 fighter on C. i'm like generic NPC that says hi at the gates. > Those are my module building scripts on the wiki. What bash extensions > did you see? Isn't this a bash..ism? When i run it inside sh it throws a fit. No worries, i just edited loaded.conf auto-append-line > > > > Linux box complains if mtu is over 2044 with expect mulitcast drops or > > something like that so mtu on both boxes is set to 2044. > > > > Everything i'm reading makes it sound like there is no RDMA support in > > FreeBSD or maybe that was no NFS RDMA support. Is that correct? > RDMA is inherent in Infiniband AFAIK. Last I checked, there was no > support in FreeBSD for NFS over RDMA, but news travels slowly in this > group so a little digging might prove otherwise. > > > > So far it seems like these cards struggle to full 10 gig pipe. Using > > iperf (2) the best i'm getting is around 6gb(bit) sec. Interfaces > > aren't showing drops on either end. Doesn't seem to matter if i do 1, > > 2 or 4 threads on iperf. > You'll need both ends in connected mode with a fairly large MTU to get > good throughput. CentOS defaults to 64k, but FreeBSD is unstable at > that size last I checked. I got good results with 16k. > > My FreeBSD ZFS NFS server performed comparably to the CentOS servers, > with some buffer space errors causing the interface to shut down (under > the same loads that caused CentOS servers to lock up completely). > Someone mentioned that this buffer space bug has been fixed, but I no > longer have a way to test it. > > Best, > > Jason > > -- > Earth is a beta site. So .. i ended up switch to linux mode via mlxconfig -d PCID set LINK_TYPE_P1=2 LINK_TYPE_P2=2 Oh i also set MTU to 9000. After that.. the flood gates opened massively. root@R720-Storage:~ # iperf -c 10.255.255.55 -P4 ------------------------------------------------------------ Client connecting to 10.255.255.55, TCP port 5001 TCP window size: 1.01 MByte (default) ------------------------------------------------------------ [ 6] local 10.255.255.22 port 62256 connected with 10.255.255.55 port 5001 [ 3] local 10.255.255.22 port 51842 connected with 10.255.255.55 port 5001 [ 4] local 10.255.255.22 port 53680 connected with 10.255.255.55 port 5001 [ 5] local 10.255.255.22 port 33455 connected with 10.255.255.55 port 5001 [ ID] Interval Transfer Bandwidth [ 6] 0.0-10.0 sec 24.6 GBytes 21.1 Gbits/sec [ 3] 0.0-10.0 sec 23.8 GBytes 20.5 Gbits/sec [ 4] 0.0-10.0 sec 33.4 GBytes 28.7 Gbits/sec [ 5] 0.0-10.0 sec 32.9 GBytes 28.3 Gbits/sec [SUM] 0.0-10.0 sec 115 GBytes 98.5 Gbits/sec root@R720-Storage:~ # 11:56 AM root@compute720:~# iperf -c 10.255.255.22 -P4 ------------------------------------------------------------ Client connecting to 10.255.255.22, TCP port 5001 TCP window size: 325 KByte (default) ------------------------------------------------------------ [ 5] local 10.255.255.55 port 50022 connected with 10.255.255.22 port 5001 [ 3] local 10.255.255.55 port 50026 connected with 10.255.255.22 port 5001 [ 6] local 10.255.255.55 port 50024 connected with 10.255.255.22 port 5001 [ 4] local 10.255.255.55 port 50020 connected with 10.255.255.22 port 5001 [ ID] Interval Transfer Bandwidth [ 5] 0.0-10.0 sec 27.4 GBytes 23.5 Gbits/sec [ 3] 0.0-10.0 sec 26.2 GBytes 22.5 Gbits/sec [ 6] 0.0-10.0 sec 26.8 GBytes 23.1 Gbits/sec [ 4] 0.0-10.0 sec 26.0 GBytes 22.3 Gbits/sec [SUM] 0.0-10.0 sec 106 GBytes 91.4 Gbits/sec root@compute720:~# I should point out before doing this while running in IB mode with datagram mode i disabled SMT and set the power profile to performance on box boxes. This moved me up to 10-12 gig/sec, nothing like the change to ethernet which i can now fill the pipe from the looks of it. Also note a single connection doesn't do more then 25ishgig/sec. Back to SATA being the bottle neck but at least if its coming out of the cache there should be more then enough network IO. Oh one last thing, i thought i read somewhere that you needed to have a switch to do ethernet mode. This doesn't seem to be the case. I haven't shutdown opensm yet but i'll try that later as i'm assuming i no longer need that. w00t! From owner-freebsd-infiniband@freebsd.org Fri Sep 13 19:13:39 2019 Return-Path: Delivered-To: freebsd-infiniband@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 595E0D09C2 for ; Fri, 13 Sep 2019 19:13:39 +0000 (UTC) (envelope-from john@spikefishsolutions.com) Received: from mail-lf1-x130.google.com (mail-lf1-x130.google.com [IPv6:2a00:1450:4864:20::130]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46VQKG3xzQz41pd for ; Fri, 13 Sep 2019 19:13:38 +0000 (UTC) (envelope-from john@spikefishsolutions.com) Received: by mail-lf1-x130.google.com with SMTP id c195so8057861lfg.9 for ; Fri, 13 Sep 2019 12:13:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=spikefishsolutions.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=Ac/+IksczV1G3hoZx+ci7QErrTKvg6/JEREBh+PWyBU=; b=OLwdaioawKQZFAZjCCq8PkvIgRMohyDc6ZcMNOIImaRh1L94yxkkZ8T8pYdISdX4sA Pkwhwx1IxHttvM49sMn0Vy2Swuupkdo7qalshTgU6ExUwqzYNC1vn5oeNXfRCIKI5UjA fihnZSS9/P8OmeVsbvdABXRTiJMKQt74xL4cY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=Ac/+IksczV1G3hoZx+ci7QErrTKvg6/JEREBh+PWyBU=; b=Kw+7lTjGoHKhdlqYL/jlV9AQem/aihDqRUU+txiBJ8uFfeGin9FVFwcqR/53YY3oMq 0x2aPTQ6xyLwrhXV7KG2s7Wtvx8hEIKKudKcObNWE+dKp1CNtDGceN+Rl4oPX64qtvYp 8V4P+183tuD26wqZdZSPMMdr6uzHWKDavTgG920h8z16FRK4rEH9cuAcrU3H4DITPdtm y4M14X9kcEJ9nnZwxCvkezNo8Cft7/EQZl774Eawepv7oU+HMnXLpirt0eCJ2fgtkRTz sudsKrSSjtMdd+Qm7e9++Ru1yXx6d+ggLdPNXFInn7rB3Sa2184n2nYlxhPlE7GzDGgt 9kQA== X-Gm-Message-State: APjAAAVN7NstiSpxOpJ2NC5c2D6mZTIL4ghtRu//kcXriqjclAa1eCvv mrdYFMIlRGFui/erVe9QMZ88CMbREiK3iOY/YvMoX7uT X-Google-Smtp-Source: APXvYqyaj7L7ULniUp+gShoCwmSsoxRQwAIjKDTkTDiPvBTMrWzNQqX+I5t+UzZX/59eckMyHAiLaLZotPy4ZXFher0= X-Received: by 2002:a19:f711:: with SMTP id z17mr12989834lfe.58.1568402016201; Fri, 13 Sep 2019 12:13:36 -0700 (PDT) MIME-Version: 1.0 References: <00acac6f-3f13-a343-36c5-00fe45620eb0@gmail.com> In-Reply-To: From: John Fleming Date: Fri, 13 Sep 2019 15:13:25 -0400 Message-ID: Subject: Re: Just joined the infiniband club To: freebsd-infiniband@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 46VQKG3xzQz41pd X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=spikefishsolutions.com header.s=google header.b=OLwdaioa; dmarc=none; spf=pass (mx1.freebsd.org: domain of john@spikefishsolutions.com designates 2a00:1450:4864:20::130 as permitted sender) smtp.mailfrom=john@spikefishsolutions.com X-Spamd-Result: default: False [-5.36 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[spikefishsolutions.com:s=google]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-infiniband@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; DMARC_NA(0.00)[spikefishsolutions.com]; DKIM_TRACE(0.00)[spikefishsolutions.com:+]; RCVD_IN_DNSWL_NONE(0.00)[0.3.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.5.4.1.0.0.a.2.list.dnswl.org : 127.0.5.0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; IP_SCORE(-2.86)[ip: (-9.04), ipnet: 2a00:1450::/32(-2.96), asn: 15169(-2.24), country: US(-0.05)]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Sep 2019 19:13:39 -0000 And of course I meant ethernet mode not linux mode. On Fri, Sep 13, 2019 at 2:36 PM John Fleming wrote: > > Top post I know, but i meant to send this to freebsd-infiniband not stable > > > > > On 2019-09-07 19:00, John Fleming wrote: > > > Hi all, i've recently joined the club. I have two Dell R720s connected > > > directly to each other. The card is a connectx-4. I was having a lot > > > of problem with network drops. Where i'm at now is i'm running > > > FreeBSD12-Stable as of a week ago and cards have been cross flashed > > > with OEM firmware (these are lenovo i think) and i'm no longer getting > > > network drops. This box is basically my storage server. Its exporting > > > a raid 10 ZFS volume to a linux (compute 19.04 5.0.0-27-generic) box > > > which is running GNS3 for a lab. > > > > > > So many questions.. sorry if this is a bit rambly! > > > > > > From what I understand this card is really 4 x 25 gig lanes. If i > > > understand that correctly then 1 data transfer should be able to do at > > > max 25 gig (best case) correct? > > > > > > I'm not getting what the difference between connected mode and > > > datagram mode is. Does this have anything to do with the card > > > operating in infiniband mode vs ethernet mode? FreeBSD is using the > > > modules compiled in connected mode with shell script (which is really > > > a bash script not a sh script) from freebsd-infiniband page. > > > > Nothing to do with Ethernet... > > > > Google turned up a brief explanation here: > > > > https://wiki.archlinux.org/index.php/InfiniBand > > > I still don't get why I would want to use one of the the other or why > the option is there but it doesn't matter. > After firmware upgrade and moving to FreeBSD stable (unsure which is > triggering this) i can no longer > set connected mode on linux. There are a lot of posts that say you > have to diabled enhanced iboip mode > via a modules.conf setting but the driver doesn't have any idea what > that is. echoing connnected to mode file > throws a write error. I poked around in linux source but like i'm not > even level 1 fighter on C. i'm like generic NPC > that says hi at the gates. > > > Those are my module building scripts on the wiki. What bash extensions > > did you see? > > Isn't this a bash..ism? When i run it inside sh it throws a fit. No > worries, i just edited loaded.conf > > auto-append-line > > > > > > > Linux box complains if mtu is over 2044 with expect mulitcast drops or > > > something like that so mtu on both boxes is set to 2044. > > > > > > Everything i'm reading makes it sound like there is no RDMA support in > > > FreeBSD or maybe that was no NFS RDMA support. Is that correct? > > RDMA is inherent in Infiniband AFAIK. Last I checked, there was no > > support in FreeBSD for NFS over RDMA, but news travels slowly in this > > group so a little digging might prove otherwise. > > > > > > So far it seems like these cards struggle to full 10 gig pipe. Using > > > iperf (2) the best i'm getting is around 6gb(bit) sec. Interfaces > > > aren't showing drops on either end. Doesn't seem to matter if i do 1, > > > 2 or 4 threads on iperf. > > You'll need both ends in connected mode with a fairly large MTU to get > > good throughput. CentOS defaults to 64k, but FreeBSD is unstable at > > that size last I checked. I got good results with 16k. > > > > My FreeBSD ZFS NFS server performed comparably to the CentOS servers, > > with some buffer space errors causing the interface to shut down (under > > the same loads that caused CentOS servers to lock up completely). > > Someone mentioned that this buffer space bug has been fixed, but I no > > longer have a way to test it. > > > > Best, > > > > Jason > > > > -- > > Earth is a beta site. > > So .. i ended up switch to ETHERNET mode via mlxconfig -d PCID set > LINK_TYPE_P1=2 LINK_TYPE_P2=2 > Oh i also set MTU to 9000. > > After that.. the flood gates opened massively. > > root@R720-Storage:~ # iperf -c 10.255.255.55 -P4 > ------------------------------------------------------------ > Client connecting to 10.255.255.55, TCP port 5001 > TCP window size: 1.01 MByte (default) > ------------------------------------------------------------ > [ 6] local 10.255.255.22 port 62256 connected with 10.255.255.55 port 5001 > [ 3] local 10.255.255.22 port 51842 connected with 10.255.255.55 port 5001 > [ 4] local 10.255.255.22 port 53680 connected with 10.255.255.55 port 5001 > [ 5] local 10.255.255.22 port 33455 connected with 10.255.255.55 port 5001 > [ ID] Interval Transfer Bandwidth > [ 6] 0.0-10.0 sec 24.6 GBytes 21.1 Gbits/sec > [ 3] 0.0-10.0 sec 23.8 GBytes 20.5 Gbits/sec > [ 4] 0.0-10.0 sec 33.4 GBytes 28.7 Gbits/sec > [ 5] 0.0-10.0 sec 32.9 GBytes 28.3 Gbits/sec > [SUM] 0.0-10.0 sec 115 GBytes 98.5 Gbits/sec > root@R720-Storage:~ # > 11:56 AM > root@compute720:~# iperf -c 10.255.255.22 -P4 > ------------------------------------------------------------ > Client connecting to 10.255.255.22, TCP port 5001 > TCP window size: 325 KByte (default) > ------------------------------------------------------------ > [ 5] local 10.255.255.55 port 50022 connected with 10.255.255.22 port 5001 > [ 3] local 10.255.255.55 port 50026 connected with 10.255.255.22 port 5001 > [ 6] local 10.255.255.55 port 50024 connected with 10.255.255.22 port 5001 > [ 4] local 10.255.255.55 port 50020 connected with 10.255.255.22 port 5001 > [ ID] Interval Transfer Bandwidth > [ 5] 0.0-10.0 sec 27.4 GBytes 23.5 Gbits/sec > [ 3] 0.0-10.0 sec 26.2 GBytes 22.5 Gbits/sec > [ 6] 0.0-10.0 sec 26.8 GBytes 23.1 Gbits/sec > [ 4] 0.0-10.0 sec 26.0 GBytes 22.3 Gbits/sec > [SUM] 0.0-10.0 sec 106 GBytes 91.4 Gbits/sec > root@compute720:~# > > I should point out before doing this while running in IB mode with > datagram mode i disabled SMT and set the power profile to performance > on box boxes. This moved me up to 10-12 gig/sec, nothing like the > change to ethernet which i can now fill the pipe from the looks of it. > > Also note a single connection doesn't do more then 25ishgig/sec. > > Back to SATA being the bottle neck but at least if its coming out of > the cache there should be more then enough network IO. > > Oh one last thing, i thought i read somewhere that you needed to have > a switch to do ethernet mode. This doesn't seem to be the case. I > haven't shutdown opensm yet but i'll try that later as i'm assuming i > no longer need that. > > w00t!