From owner-freebsd-infiniband@freebsd.org Wed Feb 26 23:00:26 2020 Return-Path: Delivered-To: freebsd-infiniband@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id A3566249A35 for ; Wed, 26 Feb 2020 23:00:26 +0000 (UTC) (envelope-from SRS0+dd89=4O=moira.hest-guild.se=andkem@lysator.liu.se) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 48SWVJ3WDHz3y30 for ; Wed, 26 Feb 2020 23:00:24 +0000 (UTC) (envelope-from SRS0+dd89=4O=moira.hest-guild.se=andkem@lysator.liu.se) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id CB74840010 for ; Thu, 27 Feb 2020 00:00:16 +0100 (CET) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id B12B040013; Thu, 27 Feb 2020 00:00:16 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on bernadotte.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL,UNPARSEABLE_RELAY autolearn=disabled version=3.4.2 X-Spam-Score: 0.0 Received: from moira.hest-guild.se (moira.hest-guild.se [IPv6:2001:470:de3f:5ec::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 4FD2A40010 for ; Thu, 27 Feb 2020 00:00:15 +0100 (CET) Received: from andkem (uid 1000) (envelope-from andkem@moira.hest-guild.se) id 1878f0a0 by moira.hest-guild.se (DragonFly Mail Agent v0.12); Thu, 27 Feb 2020 00:00:12 +0100 Date: Thu, 27 Feb 2020 00:00:12 +0100 From: Andreas Kempe To: Hans Petter Selasky Cc: Konstantin Belousov , Meny Yossefi , freebsd-infiniband@freebsd.org Subject: Re: [PATCH]: ipoib with mlx4 initialisation ordering Message-ID: <20200226230012.GA6559@moira.hest-guild.se> References: <20200222004838.GA22659@moira.hest-guild.se> <9d76992b-6ba4-2419-61ff-5035aa45e597@selasky.org> <20200224194608.GC22659@moira.hest-guild.se> <16883d49-3cc0-d9cc-0877-46f811eeb8f1@selasky.org> <20200226210554.GE22659@moira.hest-guild.se> <20200226213022.GG22659@moira.hest-guild.se> <2226834e-4184-a581-87bb-3b8ce6c184da@selasky.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="uAKRQypu60I7Lcqm" Content-Disposition: inline In-Reply-To: <2226834e-4184-a581-87bb-3b8ce6c184da@selasky.org> X-Virus-Scanned: ClamAV using ClamSMTP X-Rspamd-Queue-Id: 48SWVJ3WDHz3y30 X-Spamd-Bar: ------- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=pass (policy=none) header.from=liu.se; spf=pass (mx1.freebsd.org: domain of SRS0@lysator.liu.se designates 130.236.254.3 as permitted sender) smtp.mailfrom=SRS0@lysator.liu.se X-Spamd-Result: default: False [-7.93 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_COUNT_FIVE(0.00)[5]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; R_SPF_ALLOW(-0.20)[+a:mail.lysator.liu.se]; RCVD_TLS_LAST(0.00)[]; MIME_GOOD(-0.20)[multipart/signed,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-infiniband@freebsd.org]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[3.254.236.130.list.dnswl.org : 127.0.11.2]; DMARC_POLICY_ALLOW(-0.50)[liu.se,none]; IP_SCORE(-3.13)[ip: (-8.09), ipnet: 130.236.0.0/16(-4.18), asn: 2843(-3.34), country: SE(-0.03)]; SIGNED_PGP(-2.00)[]; FORGED_SENDER(0.30)[kempe@lysator.liu.se,SRS0@lysator.liu.se]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:2843, ipnet:130.236.0.0/16, country:SE]; TAGGED_FROM(0.00)[dd89=4O=moira.hest-guild.se=andkem]; FROM_NEQ_ENVFROM(0.00)[kempe@lysator.liu.se,SRS0@lysator.liu.se] X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Feb 2020 23:00:26 -0000 --uAKRQypu60I7Lcqm Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Feb 26, 2020 at 10:52:56PM +0100, Hans Petter Selasky wrote: > On 2020-02-26 22:30, Andreas Kempe wrote: > > Index: sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > --- sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c (revision 356611) > > +++ sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c (working copy) > > @@ -1739,7 +1739,7 @@ > > } > > module_init(ipoib_init_module); > > -module_exit(ipoib_cleanup_module); > > +module_exit_order(ipoib_cleanup_module, SI_ORDER_FOURTH); > > static int > > ipoib_evhand(module_t mod, int event, void *arg) > >=20 >=20 > I haven't yet found time to reproduce this issue. >=20 No worries, there is absolutely no rush from my side. We can patch our machines ourselves with the initial patch until some sort of solution gets adopted upstream. > Possibly you're right that the list order matters. >=20 I would also have guessed that the patch above would have solved the issue. When the ipoib module is torn down, it should, as far as I can tell from only reading the code, remove all the multicast groups. Without hooking up the kernel debugger again, I can't say for sure why it would still hang. I'm providing the wall of text below in hopes it can help you or anyone that wishes to debug this issue further. The only reason I really said that the list ordering matters is that mlx4_ib_remove calls ib_unregister_device which in turn walks the client list in the reverse order. Printing each list element as the list is iterated during shutdown yields the following client order (the first client to be removed at the top of the list): ib_unregister_device: ib_client->name =3D uverbs =20 ib_unregister_device: ib_client->name =3D ucm ib_unregister_device: ib_client->name =3D umad ib_unregister_device: ib_client->name =3D cm ib_unregister_device: ib_client->name =3D ib_multicast = =20 ib_unregister_device: ib_client->name =3D sa = =20 ib_unregister_device: ib_client->name =3D mad ib_unregister_device: ib_client->name =3D cma ib_unregister_device: ib_client->name =3D ipoib = =20 ib_unregister_device: ib_client->name =3D sdp If the interface is up and running and has sent data when the machine is shut down, it hangs on list index 4, i.e. ib_unregister_device: ib_client->name =3D ib_multicast. The reason it hangs is the wait in mcast_remove_one, see below: sys/ofed/drivers/infiniband/core/ib_multicast.c: > static void mcast_remove_one(struct ib_device *device, void *client_data) > { > struct mcast_device *dev =3D client_data; > struct mcast_port *port; > int i; >=20 > if (!dev) > return; >=20 > ib_unregister_event_handler(&dev->event_handler); > flush_workqueue(mcast_wq); >=20 > for (i =3D 0; i <=3D dev->end_port - dev->start_port; i++) { > if (rdma_cap_ib_mcast(device, dev->start_port + i)) { > port =3D &dev->port[i]; > deref_port(port); > wait_for_completion(&port->comp); > } > } >=20 > kfree(dev); > } > > [...] > > tatic void deref_port(struct mcast_port *port) > { > if (atomic_dec_and_test(&port->refcount)) > complete(&port->comp); > } The crucial logic is in the deref_port(port) function call. It does a check whether the reference counter for port is zero after decrementing the count. If it is zero, complete is called on port->comp. In the case where the reference count is larger than zero after the decrement, complete is never called and we hang forever on wait_for_completion in mcast_remove_one. By moving the initialisation of ipoib, we got it to be removed before the ib_multicast client, causing the reference count to be exactly 1 going into mcast_remove_one, preventing the hang. Cordially, Andreas Kempe --uAKRQypu60I7Lcqm Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEETci4cPcl+ZcyiACiCkqKrhcKSD0FAl5W+HQACgkQCkqKrhcK SD0www//ZmYxrBl7U63tl9e83DpZqNSs0JP8TRazR/Te0WJhXMlnFcK+LlM48CdM HFEtKChhQpOR+nzMwy1+Ozolu3Imx8es8C16Mfh9WvxF4XYGzWPG/Rmntw9zATMF krZl3gugOrVKnHQHUKG3fSBZX7j1PMeO1Bo5eHbSJ6AYu/KyeKBD8O6RDX62jnN6 5FzNqLwovYlsoUMX8xBr0nSMVhPZIbzgUAw5krBzs+uNx4VrG16WGt/wHqYvTPtn TJbV3Y0DUXy5P/TEQPUrofSXhbPUWowZ4qqsx0QaJArQt1nSUMEKFmkqiP6TZPfo oMlouHoSPb9JBcg/YmG0WBowsHPCIxw7/wJmHBpxRHlw2Yjyz6tVcbvvoLYgFs40 no2pOeaWcTTKmcgG/Rhk4nN542GzAABWYrZvRNp7oj2FRKzfbBVnlI0k3ZUTYAOj U/6Sc4msv4UQKKRjn4f5/iPSx98Nfr3TZmtWzN7I+Xa2F8JqzKBsWz/pzG5NxfZH Qu4kQugzRaRgyEG3rwx75OCIRsHNbLytjbSxj2lXxR/Du5JcENIp90b4ACQ5kCiU PGNQgjldFeYv70AFl4Nf3Ckgzui8SmuCBP8vSLdAiF9c+wMJ0nABF4BKBqJbITRe 6woHS5+hGodMt7jKlEN9+2tONsrBcr/tdYEd3nGHCbmucsYTsqg= =/9nz -----END PGP SIGNATURE----- --uAKRQypu60I7Lcqm--