From owner-freebsd-infiniband@freebsd.org Sun Sep 8 00:00:55 2019 Return-Path: Delivered-To: freebsd-infiniband@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 65832E1EEF for ; Sun, 8 Sep 2019 00:00:55 +0000 (UTC) (envelope-from john@spikefishsolutions.com) Received: from mail-lf1-x134.google.com (mail-lf1-x134.google.com [IPv6:2a00:1450:4864:20::134]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46QrzV1Zjtz4Gh3 for ; Sun, 8 Sep 2019 00:00:53 +0000 (UTC) (envelope-from john@spikefishsolutions.com) Received: by mail-lf1-x134.google.com with SMTP id j4so7785234lfh.8 for ; Sat, 07 Sep 2019 17:00:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=spikefishsolutions.com; s=google; h=mime-version:from:date:message-id:subject:to; bh=oCNyuDMHXeinBbP3I71lb4qmejj9yvOwxjBlDJEiFQk=; b=l0pb2eqqkZZsiaZyNekDoX8Znzd5wI3UxZN7gABo4NGUA7mQ3q5xAp5zOfwH3rAeHV jc8eLSa37sJRwUINYPNJjDhZPPJgS4iezhQnevTPL9Gumjk5qtZFu43J9IYac1SUaU5R sY8VguWhyJqZS3olQlRqoDZ6Wn3riCq7ZA9ys= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=oCNyuDMHXeinBbP3I71lb4qmejj9yvOwxjBlDJEiFQk=; b=FM1+e6J+ES8X67e6r1xvT1+/Aeu4LSUOkdG31QwWUOEjjlC6+2W9oHw/OtMAaqRerS HZdF/V0AASzjUwsQPaK3bkM2I4rO4sTJkM0N5r83Wvq7c8+O0q/10I1E7+4/jnko38zo mQ183K20+Ns4B2/dBWuV/6YlOVci9tBjmQ5BInFm+WBgJfQSrvIOaApi9RVqdmD740vl 63pKpKNJoa7ovnBb2aiQTK3MWbRB/AEj9wOuObBMkiAiYfzo+CBVmZtsZ2Pd/PmlSRlx CuoH2ZdSOMqywXaU3bRfnpiL14GaAKp0lNKvFn5AaEdJlOq3xAnwnRF183GMJPpape7K oKGQ== X-Gm-Message-State: APjAAAW6hEgwME1tYKiV8xRTjkWzf6QPzJ+009rx8DTrcHsO92Bhh7DE 4Kvr8g7I7x2YQWz78K+EmmJUh93B1MrjnY+4uzbOUX6FpnY= X-Google-Smtp-Source: APXvYqx+n+vLukJFDq2Zi+zKcRzFBQ6TWAxfU3DmAA8VTDsLd/uvuHduekN9BsibbZpxi0UtOuEfTwJgel7oWQgbgmw= X-Received: by 2002:a19:7715:: with SMTP id s21mr10874654lfc.98.1567900850333; Sat, 07 Sep 2019 17:00:50 -0700 (PDT) MIME-Version: 1.0 From: John Fleming Date: Sat, 7 Sep 2019 20:00:38 -0400 Message-ID: Subject: Just joined the infiniband club To: freebsd-infiniband@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 46QrzV1Zjtz4Gh3 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; dkim=pass header.d=spikefishsolutions.com header.s=google header.b=l0pb2eqq; dmarc=none; spf=pass (mx1.freebsd.org: domain of john@spikefishsolutions.com designates 2a00:1450:4864:20::134 as permitted sender) smtp.mailfrom=john@spikefishsolutions.com X-Spamd-Result: default: False [-6.40 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[spikefishsolutions.com:s=google]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-infiniband@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; DMARC_NA(0.00)[spikefishsolutions.com]; DKIM_TRACE(0.00)[spikefishsolutions.com:+]; NEURAL_HAM_SHORT(-1.00)[-0.998,0]; RCVD_IN_DNSWL_NONE(0.00)[4.3.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.5.4.1.0.0.a.2.list.dnswl.org : 127.0.5.0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; IP_SCORE(-2.91)[ip: (-9.24), ipnet: 2a00:1450::/32(-2.97), asn: 15169(-2.27), country: US(-0.05)]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Sep 2019 00:00:55 -0000 Hi all, i've recently joined the club. I have two Dell R720s connected directly to each other. The card is a connectx-4. I was having a lot of problem with network drops. Where i'm at now is i'm running FreeBSD12-Stable as of a week ago and cards have been cross flashed with OEM firmware (these are lenovo i think) and i'm no longer getting network drops. This box is basically my storage server. Its exporting a raid 10 ZFS volume to a linux (compute 19.04 5.0.0-27-generic) box which is running GNS3 for a lab. So many questions.. sorry if this is a bit rambly! >From what I understand this card is really 4 x 25 gig lanes. If i understand that correctly then 1 data transfer should be able to do at max 25 gig (best case) correct? I'm not getting what the difference between connected mode and datagram mode is. Does this have anything to do with the card operating in infiniband mode vs ethernet mode? FreeBSD is using the modules compiled in connected mode with shell script (which is really a bash script not a sh script) from freebsd-infiniband page. Linux box complains if mtu is over 2044 with expect mulitcast drops or something like that so mtu on both boxes is set to 2044. Everything i'm reading makes it sound like there is no RDMA support in FreeBSD or maybe that was no NFS RDMA support. Is that correct? So far it seems like these cards struggle to full 10 gig pipe. Using iperf (2) the best i'm getting is around 6gb(bit) sec. Interfaces aren't showing drops on either end. Doesn't seem to matter if i do 1, 2 or 4 threads on iperf. Here is the card mlx5_core0@pci0:66:0:0: class=0x020700 card=0x001415b3 chip=0x101315b3 rev=0x00 hdr=0x00 vendor = 'Mellanox Technologies' device = 'MT27700 Family [ConnectX-4]' class = network This is a MCA456A (dual port connectX-4 infiniband/ethernet). Should be in a 16x slot.. but .. hmm is it? Looking at pciconf i can't tell. Dell R720 - CPU E5-2670 ECC DDR-1600 128GB (16GB sticks in white slots) Compute is - for sure is in pcie 16x slot here. Dell R720 CPU E5-2697 ECC DDR-1600 128GB (16GB sticks in white slots) root@R720-Storage:/var/log # ibstat CA 'mlx5_0' CA type: MT4115 Number of ports: 1 Firmware version: 12.25.1020 Hardware version: 0 Node GUID: 0x248a07030049f308 System image GUID: 0x248a07030049f308 Port 1: State: Active Physical state: LinkUp Rate: 100 Base lid: 1 LMC: 0 SM lid: 1 Capability mask: 0x2651e84a Port GUID: 0x248a07030049f308 Link layer: InfiniBand root@R720-Storage:/var/log # netstat -inb | egrep 'ib0|Name' Name Mtu Network Address Ipkts Ierrs Idrop Ibytes Opkts Oerrs Obytes Coll ib0 2044 00:00:00:85:fe:80 287483828 0 0 531774120120 330632289 1 401889930592 0 ib0 - 10.255.255.0/ 10.255.255.22 287483710 - - 519124822036 330632186 - 393954749268 - root@R720-Storage:/var/log # This is with nothing going on right now. root@R720-Storage:/var/log # iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 64.0 KByte (default) ------------------------------------------------------------ [ 4] local 10.255.255.22 port 5001 connected with 10.255.255.55 port 56238 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 6.21 GBytes 5.33 Gbits/sec root@compute720:~# iperf -c 10.255.255.22 ------------------------------------------------------------ Client connecting to 10.255.255.22, TCP port 5001 TCP window size: 85.0 KByte (default) ------------------------------------------------------------ [ 3] local 10.255.255.55 port 56238 connected with 10.255.255.22 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 6.21 GBytes 5.33 Gbits/sec root@compute720:~# Swapped root@R720-Storage:/var/log # iperf -c 10.255.255.55 ------------------------------------------------------------ Client connecting to 10.255.255.55, TCP port 5001 TCP window size: 209 KByte (default) ------------------------------------------------------------ [ 3] local 10.255.255.22 port 46814 connected with 10.255.255.55 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.1 sec 3.77 GBytes 3.22 Gbits/sec root@R720-Storage:/var/log # root@compute720:~# iperf -s ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 4] local 10.255.255.55 port 5001 connected with 10.255.255.22 port 46814 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.1 sec 3.77 GBytes 3.22 Gbits/sec