From owner-freebsd-infiniband@freebsd.org Sat Feb 22 00:48:50 2020 Return-Path: Delivered-To: freebsd-infiniband@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id A2FF5248A76 for ; Sat, 22 Feb 2020 00:48:50 +0000 (UTC) (envelope-from SRS0+0QzX=4K=moira.hest-guild.se=andkem@lysator.liu.se) Received: from mail.lysator.liu.se (mail.lysator.liu.se [IPv6:2001:6b0:17:f0a0::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 48PV7h4rJwz4LtQ for ; Sat, 22 Feb 2020 00:48:48 +0000 (UTC) (envelope-from SRS0+0QzX=4K=moira.hest-guild.se=andkem@lysator.liu.se) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 5F09C40007 for ; Sat, 22 Feb 2020 01:48:40 +0100 (CET) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id 37ED540009; Sat, 22 Feb 2020 01:48:40 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on bernadotte.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL,UNPARSEABLE_RELAY autolearn=disabled version=3.4.2 X-Spam-Score: 0.0 Received: from moira.hest-guild.se (moira.hest-guild.se [IPv6:2001:470:de3f:5ec::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id A2C1B40003 for ; Sat, 22 Feb 2020 01:48:39 +0100 (CET) Received: from andkem (uid 1000) (envelope-from andkem@moira.hest-guild.se) id 1878f0a0 by moira.hest-guild.se (DragonFly Mail Agent v0.12); Sat, 22 Feb 2020 01:48:38 +0100 Date: Sat, 22 Feb 2020 01:48:38 +0100 From: Andreas Kempe To: freebsd-net@freebsd.org, freebsd-infiniband@freebsd.org Subject: [PATCH]: ipoib with mlx4 initialisation ordering Message-ID: <20200222004838.GA22659@moira.hest-guild.se> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="zx4FCpZtqtKETZ7O" Content-Disposition: inline X-Virus-Scanned: ClamAV using ClamSMTP X-Rspamd-Queue-Id: 48PV7h4rJwz4LtQ X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=pass (policy=none) header.from=liu.se; spf=pass (mx1.freebsd.org: domain of SRS0@lysator.liu.se designates 2001:6b0:17:f0a0::3 as permitted sender) smtp.mailfrom=SRS0@lysator.liu.se X-Spamd-Result: default: False [-6.31 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_COUNT_FIVE(0.00)[5]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+a:mail.lysator.liu.se]; RCVD_TLS_LAST(0.00)[]; HAS_ATTACHMENT(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-infiniband@freebsd.org]; TO_DN_NONE(0.00)[]; MIME_GOOD(-0.20)[multipart/signed,multipart/mixed,text/plain]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; IP_SCORE(-1.71)[ip: (-6.67), ipnet: 2001:6b0::/32(-1.04), asn: 1653(-0.83), country: EU(-0.01)]; RCPT_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_NONE(0.00)[3.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.a.0.f.7.1.0.0.0.b.6.0.1.0.0.2.list.dnswl.org : 127.0.11.0]; DMARC_POLICY_ALLOW(-0.50)[liu.se,none]; SIGNED_PGP(-2.00)[]; FORGED_SENDER(0.30)[kempe@lysator.liu.se,SRS0@lysator.liu.se]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:+,3:~,4:~]; ASN(0.00)[asn:1653, ipnet:2001:6b0::/32, country:EU]; TAGGED_FROM(0.00)[0QzX=4K=moira.hest-guild.se=andkem]; FROM_NEQ_ENVFROM(0.00)[kempe@lysator.liu.se,SRS0@lysator.liu.se] X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Feb 2020 00:48:50 -0000 --zx4FCpZtqtKETZ7O Content-Type: multipart/mixed; boundary="ew6BAiZeqk4r7MaW" Content-Disposition: inline --ew6BAiZeqk4r7MaW Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hello everyone, We have had issues with our machine using IPoIB on FreeBSD with the mlx4 driver. The machine would hang on shutdown. We traced the issue to IPoIB registering multicast groups that increase the reference count of the port in the ib_multicast client. When shutting down the machine, the kernel tore down the ib_multicast before it tore down IPoIB, causing it to wait forever for the references to disappear before it deleted the multicast client. This issue can be remedied by changing the initialisation of the IPoIB module to happen after the mlx4 driver is initialised. By doing this, all multicast groups will be cleaned up before the ib_multicast client is destroyed. See patch attached. Sponsored by: Lysator ACS Cordially, Andreas Kempe --ew6BAiZeqk4r7MaW Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="ipoib_ordering.patch" Content-Transfer-Encoding: quoted-printable --- sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c 2020-02-21 20:52:35.= 311328000 +0100 +++ sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c 2020-02-22 01:06:20.= 720997000 +0100 @@ -1754,7 +1754,7 @@ } } =20 -module_init(ipoib_init_module); +module_init_order(ipoib_init_module, SI_ORDER_FOURTH); module_exit(ipoib_cleanup_module); =20 static int --ew6BAiZeqk4r7MaW-- --zx4FCpZtqtKETZ7O Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEETci4cPcl+ZcyiACiCkqKrhcKSD0FAl5Qel8ACgkQCkqKrhcK SD3gFxAAkev95i1THhddprkY5iW4wb3vwVYcvo7t0cjD32V+ZYJnm4WS7PEKte0y 0nLLqN6ZEi8tXOxs5+Ky/A/v+P1abjsINwbMrcY6s1lp8GoA4AeYsdSexi/5ji0F q5Wtx4CJNJusl264kdrqde44ZCd8yFvaLtcmqRJbIizsY+0YXRIVKfrBlW+gTkjA HdFn3EbZcNVxLVzoCbs7wgYSJhHSFi0ZTsl6MduSocyGw5qQwHn+oo6L2ZIF1ASc 1810rPWDJpR7O1c6Wk+Ilc76yPh0lrB4mekMlb1IhjzkY3Di8Jrj9ha3XzrnulvA rzfAeigGG2BP0HDgJCNZ6ngDsZlb48MgYK9zKp7INwPZp1KGa4GPAUcu8J5b/B2I eVh4f2m4gZZoArd4xUsQ6m7RnnPhVgCevb8vEzjSLPF0pXreXVRY2h5KbV6t/Zks hj55W76PcLzB7EwJMFMx0rX3h5xWAHEP+GuakqByjHrKLivMvijlAXdQDd+wLMs9 Ng6u6o1APDXTXM7mLYdAfVSrypmR7Q9Pzx64WCKVA1fAZj1z7LKlj8Spy4Neq53O xeTo8SR0xKDPp1NrYPcivH8cvhqEIN+FW/cJKP3wAIoBPeEjZS4ef+fvjKl7mmKM BO4Y1UsVtKJIYFoQDr+DQNNs2f5ORI+SIMF7vsqnJj7PPtX6Nok= =n3bI -----END PGP SIGNATURE----- --zx4FCpZtqtKETZ7O-- From owner-freebsd-infiniband@freebsd.org Sat Feb 22 08:59:34 2020 Return-Path: Delivered-To: freebsd-infiniband@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 4A073255122; Sat, 22 Feb 2020 08:59:34 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (turbocat.net [IPv6:2a01:4f8:c17:6c4b::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 48Pj1w5gFSz40xM; Sat, 22 Feb 2020 08:59:32 +0000 (UTC) (envelope-from hps@selasky.org) Received: from hps2020.home.selasky.org (unknown [62.141.129.235]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 7E66B2602DF; Sat, 22 Feb 2020 09:59:29 +0100 (CET) Subject: Re: [PATCH]: ipoib with mlx4 initialisation ordering To: Andreas Kempe , freebsd-net@freebsd.org, freebsd-infiniband@freebsd.org References: <20200222004838.GA22659@moira.hest-guild.se> From: Hans Petter Selasky Message-ID: Date: Sat, 22 Feb 2020 09:59:23 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:68.0) Gecko/20100101 Thunderbird/68.4.2 MIME-Version: 1.0 In-Reply-To: <20200222004838.GA22659@moira.hest-guild.se> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 48Pj1w5gFSz40xM X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of hps@selasky.org designates 2a01:4f8:c17:6c4b::2 as permitted sender) smtp.mailfrom=hps@selasky.org X-Spamd-Result: default: False [-4.96 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+a:mail.turbocat.net]; FROM_HAS_DN(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[selasky.org]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; IP_SCORE(-2.66)[ip: (-9.21), ipnet: 2a01:4f8::/29(-2.54), asn: 24940(-1.55), country: DE(-0.02)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:24940, ipnet:2a01:4f8::/29, country:DE]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Feb 2020 08:59:34 -0000 On 2020-02-22 01:48, Andreas Kempe wrote: > Hello everyone, > > We have had issues with our machine using IPoIB on FreeBSD with the > mlx4 driver. The machine would hang on shutdown. > > We traced the issue to IPoIB registering multicast groups that > increase the reference count of the port in the ib_multicast client. > When shutting down the machine, the kernel tore down the ib_multicast > before it tore down IPoIB, causing it to wait forever for the > references to disappear before it deleted the multicast client. > > This issue can be remedied by changing the initialisation of the IPoIB > module to happen after the mlx4 driver is initialised. By doing this, > all multicast groups will be cleaned up before the ib_multicast client > is destroyed. > > See patch attached. Sponsored by: Lysator ACS > > Cordially, > Andreas Kempe I'll have a closer look on Monday. --HPS