From owner-freebsd-infiniband@freebsd.org Mon Feb 24 09:05:35 2020 Return-Path: Delivered-To: freebsd-infiniband@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 2F7B32567E7 for ; Mon, 24 Feb 2020 09:05:35 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (turbocat.net [IPv6:2a01:4f8:c17:6c4b::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 48Qx3y2n7Tz3D6M for ; Mon, 24 Feb 2020 09:05:33 +0000 (UTC) (envelope-from hps@selasky.org) Received: from hps2020.home.selasky.org (unknown [62.141.129.235]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 851232604CB; Mon, 24 Feb 2020 10:05:26 +0100 (CET) Subject: Re: [PATCH]: ipoib with mlx4 initialisation ordering To: Andreas Kempe , freebsd-infiniband@freebsd.org References: <20200222004838.GA22659@moira.hest-guild.se> From: Hans Petter Selasky Message-ID: <9d76992b-6ba4-2419-61ff-5035aa45e597@selasky.org> Date: Mon, 24 Feb 2020 10:05:09 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:68.0) Gecko/20100101 Thunderbird/68.4.2 MIME-Version: 1.0 In-Reply-To: <20200222004838.GA22659@moira.hest-guild.se> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 48Qx3y2n7Tz3D6M X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of hps@selasky.org designates 2a01:4f8:c17:6c4b::2 as permitted sender) smtp.mailfrom=hps@selasky.org X-Spamd-Result: default: False [-4.96 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+a:mail.turbocat.net]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[selasky.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; IP_SCORE(-2.66)[ip: (-9.21), ipnet: 2a01:4f8::/29(-2.54), asn: 24940(-1.56), country: DE(-0.02)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:24940, ipnet:2a01:4f8::/29, country:DE]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Feb 2020 09:05:35 -0000 On 2020-02-22 01:48, Andreas Kempe wrote: > This issue can be remedied by changing the initialisation of the IPoIB > module to happen after the mlx4 driver is initialised. By doing this, > all multicast groups will be cleaned up before the ib_multicast client > is destroyed. Hi Andreas, Are you sure it is not the module exit that should be ordered instead? module_exit(ipoib_cleanup_module); Because from the description this issue happen on shutdown and not load. I'm currently trying to reproduce the issue. Dropping freebsd-net @ --HPS From owner-freebsd-infiniband@freebsd.org Mon Feb 24 14:16:43 2020 Return-Path: Delivered-To: freebsd-infiniband@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 731B025DC1D for ; Mon, 24 Feb 2020 14:16:43 +0000 (UTC) (envelope-from bacon4000@gmail.com) Received: from mail-yw1-xc33.google.com (mail-yw1-xc33.google.com [IPv6:2607:f8b0:4864:20::c33]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 48R3yy1WQrz4Xv8 for ; Mon, 24 Feb 2020 14:16:41 +0000 (UTC) (envelope-from bacon4000@gmail.com) Received: by mail-yw1-xc33.google.com with SMTP id n184so5242607ywc.3 for ; Mon, 24 Feb 2020 06:16:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding:content-language; bh=ynasmRbc22pLR0dcl2rSdCO8eQcgCeKRl2K7Jsk7Hnw=; b=u+ioYT4H7hTjddO+kGFa4mMdab/9+k4YW15FUXssMgSWe3U33Q/x9+URyo2aWduTcY caIlzpf7CJUhWFQAugPtLSxqHfLJNeyGe+xU18zXmRiOiBhfc3xeuqUVKifTkZrqkHu6 rRA0XwWtkFlGtWS5AbzYEvYP/IOj2b1qovVTedOX6Mvo78+VIOLLxs0a3Dl91TPi2l9Z H5QaWU5nRiLN+8gK89to0+HLdtsTbrf89HE6+J3qw4imbCSrPmhItGsG56bOvPx8Bk0K lLVQw5Iv1FvhRifZMR39NcEieU3v4GwHzt9c+H4/VDet8ruety0ds+XcENo+ONN2uwO1 +8Cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=ynasmRbc22pLR0dcl2rSdCO8eQcgCeKRl2K7Jsk7Hnw=; b=oEO0Ze49f3GREidwtZdpOuO3uuKusdHmWupJJctdb2R96+ifsvY/nMI/P/T00ClPAf qbueUasUf+ZVIZru5f1XntG1bh802wQwsHtb1I/SdVm7zVMDiwh5vwC+SnyLHzU3VxPN PLaxGMY53NIULUGjVxm/M09ercmpC4eG4qqtXwX3EvtBdCzfITBr82ZV/y9tK+2H0cON OesrxWODrdObZYN6ShtKLtuSfHbaI1IZ4nHuMfWdRRVXK8iqKsxKrgPV0gL587530xkq rH0jag6rfJQwbAgjozVy7ud4p5n+5Pfk4rL4lqjb7kcn9tsZSynW8EHE60WNBsaYh/WV sa1Q== X-Gm-Message-State: APjAAAUjuv4ezaGQV/HPg+4mLZu6L5if2P2aWbqOqdmhahIC5Qowj/W1 NKoh/7ym10hvHhS4hegTQXb+NvMg X-Google-Smtp-Source: APXvYqzU754UIhonkM4qTiI2CjO6KHRlW3Xm6yDQIisCmfyVJc6RF8ZxUeaZGJqWDgHCyUmnc7j2Zw== X-Received: by 2002:a81:24c2:: with SMTP id k185mr40788123ywk.490.1582553800729; Mon, 24 Feb 2020 06:16:40 -0800 (PST) Received: from coral.acadix.biz (2603-6000-a446-9100-0223-24ff-fe37-c4d7.res6.spectrum.com. [2603:6000:a446:9100:223:24ff:fe37:c4d7]) by smtp.gmail.com with ESMTPSA id s130sm5138880ywg.11.2020.02.24.06.16.39 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 24 Feb 2020 06:16:39 -0800 (PST) Subject: Re: [PATCH]: ipoib with mlx4 initialisation ordering To: freebsd-infiniband@freebsd.org References: <20200222004838.GA22659@moira.hest-guild.se> <9d76992b-6ba4-2419-61ff-5035aa45e597@selasky.org> From: Jason Bacon Message-ID: <3684a21c-e26c-abcf-b443-aa0ddccc1338@gmail.com> Date: Mon, 24 Feb 2020 08:16:38 -0600 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 In-Reply-To: <9d76992b-6ba4-2419-61ff-5035aa45e597@selasky.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Content-Language: en-US X-Rspamd-Queue-Id: 48R3yy1WQrz4Xv8 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=u+ioYT4H; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of bacon4000@gmail.com designates 2607:f8b0:4864:20::c33 as permitted sender) smtp.mailfrom=bacon4000@gmail.com X-Spamd-Result: default: False [-3.00 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; FREEMAIL_FROM(0.00)[gmail.com]; TO_DN_NONE(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[gmail.com:+]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; FROM_EQ_ENVFROM(0.00)[]; IP_SCORE(0.00)[ip: (-9.18), ipnet: 2607:f8b0::/32(-1.88), asn: 15169(-1.67), country: US(-0.05)]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-infiniband@freebsd.org]; IP_SCORE_FREEMAIL(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; RCVD_IN_DNSWL_NONE(0.00)[3.3.c.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org : 127.0.5.0]; RCVD_TLS_ALL(0.00)[] X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Feb 2020 14:16:43 -0000 On 2020-02-24 03:05, Hans Petter Selasky wrote: > On 2020-02-22 01:48, Andreas Kempe wrote: >> This issue can be remedied by changing the initialisation of the IPoIB= >> module to happen after the mlx4 driver is initialised. By doing this, >> all multicast groups will be cleaned up before the ib_multicast client= >> is destroyed. > > Hi Andreas, > > Are you sure it is not the module exit that should be ordered instead? > > =C2=A0module_exit(ipoib_cleanup_module); > > Because from the description this issue happen on shutdown and not load= =2E > > I'm currently trying to reproduce the issue. > > Dropping freebsd-net @ > > --HPS > _______________________________________________ > freebsd-infiniband@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-infiniband > To unsubscribe, send any mail to=20 > "freebsd-infiniband-unsubscribe@freebsd.org" Side-note: I was seeing the same symptom occasionally on our CentOS=20 clusters for years.=C2=A0 I frequently had to go to the data center and c= heck=20 on hung systems after routine updates, which would often be stuck at=20 "Unloading IB modules".=C2=A0 I wonder if it was a similar issue. --=20 Earth is a beta site. From owner-freebsd-infiniband@freebsd.org Mon Feb 24 16:56:23 2020 Return-Path: Delivered-To: freebsd-infiniband@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 9890523AE01; Mon, 24 Feb 2020 16:56:23 +0000 (UTC) (envelope-from justin@postgresql.org) Received: from meldrar.postgresql.org (meldrar.postgresql.org [IPv6:2a02:c0:301:0:ffff::31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "meldrar.postgresql.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 48R7W95r7bz4bTw; Mon, 24 Feb 2020 16:56:21 +0000 (UTC) (envelope-from justin@postgresql.org) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=postgresql.org; s=20171124; h=Message-ID:References:In-Reply-To:Subject:Cc: To:From:Date:Content-Transfer-Encoding:Content-Type:MIME-Version:Sender: Reply-To:Content-ID:Content-Description; bh=FnkciiVwAkKVg0iDArzbKqkRQGlY4fjOoR4UQRcqhos=; b=a7g2jZJbtPxoDCEcyrLEuzGdZr t5OpqXkroxtWsv8ZkoB5dVK2nMR5RMGjv/ec8kbj7LT1zqj0Nh8SQeHOy6GiVl9K52fC5bYA1AF0D 0lhYb3vecvEDxdIgBti5X4qo4g7C/IWFZTWXEX+5e3ObQtyjShlYTO9Rom5cUhwpM6lqJ6BzPcpNH 7iLdditXIW/GmtpEXC6wQMj2NJq2Wp4qnJN3cJ26MM0vRkGzNrnqfXF5q4sZvBAKEAIsk56nk9JR8 qXE5N4ECNu+WmNlC75T7k0ZEdNTHTp3i2ymDuuSPrEybMSPoGxY2WaGvlQjVXN4Yvjh/sSSIdYkb7 Ubs6Fitw==; Received: from meldrar.postgresql.org ([87.238.57.231] helo=webmail.postgresql.org) by meldrar.postgresql.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1j6H1j-0000pU-IR; Mon, 24 Feb 2020 16:56:17 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Tue, 25 Feb 2020 03:56:14 +1100 From: Justin Clift To: Jason Bacon Cc: freebsd-infiniband@freebsd.org, owner-freebsd-infiniband@freebsd.org Subject: Re: [PATCH]: ipoib with mlx4 initialisation ordering In-Reply-To: <3684a21c-e26c-abcf-b443-aa0ddccc1338@gmail.com> References: <20200222004838.GA22659@moira.hest-guild.se> <9d76992b-6ba4-2419-61ff-5035aa45e597@selasky.org> <3684a21c-e26c-abcf-b443-aa0ddccc1338@gmail.com> Message-ID: X-Sender: justin@postgresql.org User-Agent: Roundcube Webmail/1.2.3 X-Pg-Spam-Score: -2.9 (--) X-Rspamd-Queue-Id: 48R7W95r7bz4bTw X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=postgresql.org header.s=20171124 header.b=a7g2jZJb; dmarc=pass (policy=none) header.from=postgresql.org; spf=pass (mx1.freebsd.org: domain of justin@postgresql.org designates 2a02:c0:301:0:ffff::31 as permitted sender) smtp.mailfrom=justin@postgresql.org X-Spamd-Result: default: False [-4.00 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_DKIM_ALLOW(-0.20)[postgresql.org:s=20171124]; NEURAL_HAM_MEDIUM(-1.00)[-0.997,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; R_SPF_ALLOW(-0.20)[+ip6:2a02:c0:301:0:ffff::31]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; TO_DN_SOME(0.00)[]; DWL_DNSWL_LOW(-1.00)[postgresql.org.dwl.dnswl.org : 127.0.9.1]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[postgresql.org:+]; DMARC_POLICY_ALLOW(-0.50)[postgresql.org,none]; RCVD_IN_DNSWL_NONE(0.00)[1.3.0.0.0.0.0.0.0.0.0.0.f.f.f.f.0.0.0.0.1.0.3.0.0.c.0.0.2.0.a.2.list.dnswl.org : 127.0.9.0]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; IP_SCORE(-0.00)[country: NO(-0.01)]; ASN(0.00)[asn:39029, ipnet:2a02:c0::/32, country:NO]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 24 Feb 2020 16:56:23 -0000 On 2020-02-25 01:16, Jason Bacon wrote: > Side-note: I was seeing the same symptom occasionally on our CentOS > clusters for years.  I frequently had to go to the data center and > check on hung systems after routine updates, which would often be > stuck at "Unloading IB modules".  I wonder if it was a similar issue. That sounds familiar to me too, also on CentOS (6 & 7 maybe?) at the time. + Justin From owner-freebsd-infiniband@freebsd.org Wed Feb 26 21:12:10 2020 Return-Path: Delivered-To: freebsd-infiniband@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 0DCB124584C for ; Wed, 26 Feb 2020 21:12:10 +0000 (UTC) (envelope-from SRS0+dd89=4O=moira.hest-guild.se=andkem@lysator.liu.se) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 48ST5M1jR9z4NR4 for ; Wed, 26 Feb 2020 21:12:06 +0000 (UTC) (envelope-from SRS0+dd89=4O=moira.hest-guild.se=andkem@lysator.liu.se) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 15B8E4000F for ; Wed, 26 Feb 2020 22:12:04 +0100 (CET) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id F19FC40010; Wed, 26 Feb 2020 22:12:03 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on bernadotte.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=AWL,MISSING_HEADERS, UNPARSEABLE_RELAY autolearn=disabled version=3.4.2 X-Spam-Score: 0.6 Received: from moira.hest-guild.se (moira.hest-guild.se [IPv6:2001:470:de3f:5ec::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id B25574000F for ; Wed, 26 Feb 2020 22:12:02 +0100 (CET) Received: from andkem (uid 1000) (envelope-from andkem@moira.hest-guild.se) id 1878f0a0 by moira.hest-guild.se (DragonFly Mail Agent v0.12); Wed, 26 Feb 2020 22:11:57 +0100 Date: Wed, 26 Feb 2020 22:11:57 +0100 From: Andreas Kempe Cc: freebsd-infiniband@freebsd.org Subject: Re: [PATCH]: ipoib with mlx4 initialisation ordering Message-ID: <20200226211157.GF22659@moira.hest-guild.se> References: <20200222004838.GA22659@moira.hest-guild.se> <9d76992b-6ba4-2419-61ff-5035aa45e597@selasky.org> <20200224194608.GC22659@moira.hest-guild.se> <16883d49-3cc0-d9cc-0877-46f811eeb8f1@selasky.org> <20200226210554.GE22659@moira.hest-guild.se> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="9/eUdp+dLtKXvemk" Content-Disposition: inline In-Reply-To: <20200226210554.GE22659@moira.hest-guild.se> X-Virus-Scanned: ClamAV using ClamSMTP X-Rspamd-Queue-Id: 48ST5M1jR9z4NR4 X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=pass (policy=none) header.from=liu.se; spf=pass (mx1.freebsd.org: domain of SRS0@lysator.liu.se designates 130.236.254.3 as permitted sender) smtp.mailfrom=SRS0@lysator.liu.se X-Spamd-Result: default: False [-5.83 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_COUNT_FIVE(0.00)[5]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+a:mail.lysator.liu.se]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.20)[multipart/signed,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-infiniband@freebsd.org]; TO_DN_NONE(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; RCVD_TLS_LAST(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[3.254.236.130.list.dnswl.org : 127.0.11.2]; DMARC_POLICY_ALLOW(-0.50)[liu.se,none]; MISSING_TO(2.00)[]; IP_SCORE(-3.03)[ip: (-7.80), ipnet: 130.236.0.0/16(-4.07), asn: 2843(-3.26), country: SE(-0.03)]; SIGNED_PGP(-2.00)[]; FORGED_SENDER(0.30)[kempe@lysator.liu.se,SRS0@lysator.liu.se]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:2843, ipnet:130.236.0.0/16, country:SE]; TAGGED_FROM(0.00)[dd89=4O=moira.hest-guild.se=andkem]; FROM_NEQ_ENVFROM(0.00)[kempe@lysator.liu.se,SRS0@lysator.liu.se] X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Feb 2020 21:12:10 -0000 --9/eUdp+dLtKXvemk Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Feb 26, 2020 at 10:05:55PM +0100, Andreas Kempe wrote: > On Mon, Feb 24, 2020 at 11:50:10PM +0100, Hans Petter Selasky wrote: > > Hi, > >=20 > > On 2020-02-24 20:46, Andreas Kempe wrote: > > > If you want me to try reordering the deinitialisation, I should be > > > able to do that this coming wednesday. > > >=20 > >=20 > > Yes, please. > >=20 > > Depending on how lists work, might not be so good in the long run. > >=20 >=20 > First I tried the following change and the machine still hung. >=20 > > --- sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c (revision 356611) > > +++ sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c (working copy) > > @@ -1739,7 +1739,7 @@ > > } > > =20 > > module_init(ipoib_init_module); > > -module_exit(ipoib_cleanup_module); > > +module_exit_order(ipoib_cleanup_module, SI_ORDER_FIRST); > > =20 > > static int > > ipoib_evhand(module_t mod, int event, void *arg) >=20 > Then I tried moving the mlx4 driver unloading using the following > change and the machine still hung. >=20 > > --- sys/dev/mlx4/mlx4_ib/mlx4_ib_main.c (revision 356611) > > +++ sys/dev/mlx4/mlx4_ib/mlx4_ib_main.c (working copy) > > @@ -3320,7 +3320,7 @@ > > } > > =20 > > module_init_order(mlx4_ib_init, SI_ORDER_THIRD); > > -module_exit(mlx4_ib_cleanup); > > +module_exit_order(mlx4_ib_cleanup, SI_ORDER_THIRD); > > =20 > > static int > > mlx4ib_evhand(module_t mod, int event, void *arg) >=20 > I don't really feel like analysing why it still hangs with the above > changes at the moment since we got something that works for us. If you > have any suggestions you want me to try, I could still do that. >=20 > > > > I'm currently trying to reproduce the issue. > >=20 > > > >=20 > > > We're seeing the issue every time when running the machine in a > > > network with a Linux machine. We simply need to send a bit of data on > > > the link and then trigger a shutdown. > >=20 > > I see. > >=20 >=20 > I can add that we compiled the modules into the kernel by adding the > following to the GENERIC kernel of 12.1-STABLE: >=20 > > # INFINIBAND > > options COMPAT_LINUXKPI > > options OFED > > options SDP > > options IPOIB_CM > >=20 > > device ipoib > > device mlx4 > > device mlx4ib >=20 > Adding freebsd-inifinband again. >=20 > Cordially, > Andreas Kempe --9/eUdp+dLtKXvemk Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEETci4cPcl+ZcyiACiCkqKrhcKSD0FAl5W3xcACgkQCkqKrhcK SD2X7g/+IK8KPUN/P/BmvJjr7qQXDEq5JkvwtVguXQuWEWAzbh6i3FSFXhwpZkJJ 3U4yoz5t9se7Le+GJ0n0nDGBWjFZOr8oBrDmzXarHck52+ClfTeOaucGdx5c78bG E+dMfw0wHaJX/FIfv2A6u1HecYICcIZCv1KHjsFIkqERcl9sHNZDisNZq6HoLUzA An1AqcYj9eiEg8cmx/d8gdA6UOHlW4B/yps/Nvk0Mh0RPSsUoz3DMnhZSw0rdHEZ qY3VgzRElGsWZ8NBQpHjTDm6+DQJNE1ZTMPyuOZpTbEr7JwVUcHPW91f17TczKmC 5a6rK/aGUxLUJlgNG+IJfSZf5dI1oTcDQk6xtztEntFj1Y++pTyJx0ou11gxTEX0 sdzWthDDQh8ccS8/SZBcAYryXoPhJoUg1kzmYuTxrHmRu0+GWhmsiIsOjI4/g58K xC9rbwH4bSUc/dzvnw5pXKoFr3F/BAfHlkJdcZsC6UM00OOy5V6La+zC7dxb0jBW IQQ5pD3g/dQQZKIzQyWRt2kbQTUwCLpIWQcLK3pH2TDJ8fY1oWgX7GBdkoCW7MHm 4/s+eISZcU13InvPj7hdSvNVVg3ceSLp4wAVCPW0hQoHrd1bsXwUDf6p0gBjcE22 iOwAa98tVeFw0vAHtde9xHBuu9puVst2G4qfOicIDqOnVv+7Hxg= =ip4Y -----END PGP SIGNATURE----- --9/eUdp+dLtKXvemk-- From owner-freebsd-infiniband@freebsd.org Wed Feb 26 21:30:36 2020 Return-Path: Delivered-To: freebsd-infiniband@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 10FE6246805 for ; Wed, 26 Feb 2020 21:30:36 +0000 (UTC) (envelope-from SRS0+dd89=4O=moira.hest-guild.se=andkem@lysator.liu.se) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 48STVc5Y7Sz3M3M for ; Wed, 26 Feb 2020 21:30:32 +0000 (UTC) (envelope-from SRS0+dd89=4O=moira.hest-guild.se=andkem@lysator.liu.se) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 9F88340011 for ; Wed, 26 Feb 2020 22:30:27 +0100 (CET) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id 8AF7540013; Wed, 26 Feb 2020 22:30:27 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on bernadotte.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL,UNPARSEABLE_RELAY autolearn=disabled version=3.4.2 X-Spam-Score: 0.0 Received: from moira.hest-guild.se (moira.hest-guild.se [IPv6:2001:470:de3f:5ec::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 20AD740011 for ; Wed, 26 Feb 2020 22:30:25 +0100 (CET) Received: from andkem (uid 1000) (envelope-from andkem@moira.hest-guild.se) id 1878f0a0 by moira.hest-guild.se (DragonFly Mail Agent v0.12); Wed, 26 Feb 2020 22:30:22 +0100 Date: Wed, 26 Feb 2020 22:30:22 +0100 From: Andreas Kempe To: Hans Petter Selasky Cc: Konstantin Belousov , Meny Yossefi , freebsd-infiniband@freebsd.org Subject: Re: [PATCH]: ipoib with mlx4 initialisation ordering Message-ID: <20200226213022.GG22659@moira.hest-guild.se> References: <20200222004838.GA22659@moira.hest-guild.se> <9d76992b-6ba4-2419-61ff-5035aa45e597@selasky.org> <20200224194608.GC22659@moira.hest-guild.se> <16883d49-3cc0-d9cc-0877-46f811eeb8f1@selasky.org> <20200226210554.GE22659@moira.hest-guild.se> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="/9ZOS6odDaRI+0hI" Content-Disposition: inline In-Reply-To: X-Virus-Scanned: ClamAV using ClamSMTP X-Rspamd-Queue-Id: 48STVc5Y7Sz3M3M X-Spamd-Bar: ------- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=pass (policy=none) header.from=liu.se; spf=pass (mx1.freebsd.org: domain of SRS0@lysator.liu.se designates 130.236.254.3 as permitted sender) smtp.mailfrom=SRS0@lysator.liu.se X-Spamd-Result: default: False [-7.84 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_COUNT_FIVE(0.00)[5]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; R_SPF_ALLOW(-0.20)[+a:mail.lysator.liu.se]; RCVD_TLS_LAST(0.00)[]; MIME_GOOD(-0.20)[multipart/signed,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-infiniband@freebsd.org]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[3.254.236.130.list.dnswl.org : 127.0.11.2]; DMARC_POLICY_ALLOW(-0.50)[liu.se,none]; IP_SCORE(-3.04)[ip: (-7.81), ipnet: 130.236.0.0/16(-4.08), asn: 2843(-3.26), country: SE(-0.03)]; SIGNED_PGP(-2.00)[]; FORGED_SENDER(0.30)[kempe@lysator.liu.se,SRS0@lysator.liu.se]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:2843, ipnet:130.236.0.0/16, country:SE]; TAGGED_FROM(0.00)[dd89=4O=moira.hest-guild.se=andkem]; FROM_NEQ_ENVFROM(0.00)[kempe@lysator.liu.se,SRS0@lysator.liu.se] X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Feb 2020 21:30:36 -0000 --/9ZOS6odDaRI+0hI Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Feb 26, 2020 at 10:08:09PM +0100, Hans Petter Selasky wrote: > On 2020-02-26 22:05, Andreas Kempe wrote: > > +module_exit_order(ipoib_cleanup_module, SI_ORDER_FIRST); >=20 > Try: >=20 > module_exit_order(ipoib_cleanup_module, SI_ORDER_FOURTH); >=20 Tried it, but the machine still hung. See patch: Index: sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c (revision 356611) +++ sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c (working copy) @@ -1739,7 +1739,7 @@ } =20 module_init(ipoib_init_module); -module_exit(ipoib_cleanup_module); +module_exit_order(ipoib_cleanup_module, SI_ORDER_FOURTH); =20 static int ipoib_evhand(module_t mod, int event, void *arg) >=20 > Fourth is first, >=20 > first is last. >=20 > for exit. >=20 I see I should read some more man pages. :P Cordially, Andreas Kempe --/9ZOS6odDaRI+0hI Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEETci4cPcl+ZcyiACiCkqKrhcKSD0FAl5W42cACgkQCkqKrhcK SD2tUBAAqOkCSS4UKxsvPsHL6KgFfCp1cZZNdoGwzBJ09/M2qzwtUjAG7o5b40G5 UB1dUKzCp4Xj2luJE4kM2g2uOSlbU+YzYbgv8eZ+2Pr+XpyLKBBUUhiPLAhIcCmD 1e4RiYm1bGS/BLZXGCxSSphbKN141eryBrgF/Dbm5B0KbSWU+bC4m0U4xn99KVyc vT+8dl3zknv/nc5LRA2EziPa4uCzZNmW7ym6eMNjvOMhxIzFO+yf9KoXWCOjWO9i U4CMVh/Ft59MCXmToyauvCMuAAOpO4Keacz3gzEwIML8VtiLYr2BLJZWBTGILlh7 jhWSxYKGuQ0Yjui1ZCXliHal7ws91nfISXNlji4nFs4SbZdFYh1SzMma/jlbKeOq 9QaV75UJF6095P29tNhWUMonPYsN4x3tqPIP0cA9/991yrtfT+LQfJt5tZvWMOPH xVZgSo66eQ/c8gqufiu494C5+4+gbtMpGKQCkvZA2mLm3D0KtZJ0nA0rbxbpX2Dz H831RoO8LS22PP8oUa04rbU5ooJ9GimBXf8UrZymNUmW7KYJZEF3VMgSnAs9GFSu g1oJ2MY0BOH3NLi+y+0nJP89NpUqvLgmaJOPKD3QOfycNmCULY5t8t7H6kAfeDzF QiWkk60jRLThz/yLvHcfsRe89Xnc+monGmt6px8CnKR/Pa8vjD0= =CyZF -----END PGP SIGNATURE----- --/9ZOS6odDaRI+0hI-- From owner-freebsd-infiniband@freebsd.org Wed Feb 26 21:55:34 2020 Return-Path: Delivered-To: freebsd-infiniband@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id BCDD7247E5F for ; Wed, 26 Feb 2020 21:55:34 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (turbocat.net [88.99.82.50]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 48SV3T3PqVz3R6M; Wed, 26 Feb 2020 21:55:33 +0000 (UTC) (envelope-from hps@selasky.org) Received: from hps2020.home.selasky.org (unknown [62.141.129.235]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 469482601D2; Wed, 26 Feb 2020 22:55:30 +0100 (CET) Subject: Re: [PATCH]: ipoib with mlx4 initialisation ordering To: Andreas Kempe Cc: Konstantin Belousov , Meny Yossefi , freebsd-infiniband@freebsd.org References: <20200222004838.GA22659@moira.hest-guild.se> <9d76992b-6ba4-2419-61ff-5035aa45e597@selasky.org> <20200224194608.GC22659@moira.hest-guild.se> <16883d49-3cc0-d9cc-0877-46f811eeb8f1@selasky.org> <20200226210554.GE22659@moira.hest-guild.se> <20200226213022.GG22659@moira.hest-guild.se> From: Hans Petter Selasky Message-ID: <2226834e-4184-a581-87bb-3b8ce6c184da@selasky.org> Date: Wed, 26 Feb 2020 22:52:56 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:68.0) Gecko/20100101 Thunderbird/68.4.2 MIME-Version: 1.0 In-Reply-To: <20200226213022.GG22659@moira.hest-guild.se> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 48SV3T3PqVz3R6M X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of hps@selasky.org designates 88.99.82.50 as permitted sender) smtp.mailfrom=hps@selasky.org X-Spamd-Result: default: False [-5.42 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCPT_COUNT_THREE(0.00)[4]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+a:mail.turbocat.net:c]; FROM_HAS_DN(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[selasky.org]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; IP_SCORE(-3.12)[ip: (-9.32), ipnet: 88.99.0.0/16(-4.71), asn: 24940(-1.55), country: DE(-0.02)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:24940, ipnet:88.99.0.0/16, country:DE]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Feb 2020 21:55:34 -0000 On 2020-02-26 22:30, Andreas Kempe wrote: > On Wed, Feb 26, 2020 at 10:08:09PM +0100, Hans Petter Selasky wrote: >> On 2020-02-26 22:05, Andreas Kempe wrote: >>> +module_exit_order(ipoib_cleanup_module, SI_ORDER_FIRST); >> >> Try: >> >> module_exit_order(ipoib_cleanup_module, SI_ORDER_FOURTH); >> > > Tried it, but the machine still hung. See patch: > > Index: sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c > =================================================================== > --- sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c (revision 356611) > +++ sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c (working copy) > @@ -1739,7 +1739,7 @@ > } > > module_init(ipoib_init_module); > -module_exit(ipoib_cleanup_module); > +module_exit_order(ipoib_cleanup_module, SI_ORDER_FOURTH); > > static int > ipoib_evhand(module_t mod, int event, void *arg) > >> >> Fourth is first, >> >> first is last. >> >> for exit. >> > I haven't yet found time to reproduce this issue. Possibly you're right that the list order matters. --HPS From owner-freebsd-infiniband@freebsd.org Wed Feb 26 23:00:26 2020 Return-Path: Delivered-To: freebsd-infiniband@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id A3566249A35 for ; Wed, 26 Feb 2020 23:00:26 +0000 (UTC) (envelope-from SRS0+dd89=4O=moira.hest-guild.se=andkem@lysator.liu.se) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 48SWVJ3WDHz3y30 for ; Wed, 26 Feb 2020 23:00:24 +0000 (UTC) (envelope-from SRS0+dd89=4O=moira.hest-guild.se=andkem@lysator.liu.se) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id CB74840010 for ; Thu, 27 Feb 2020 00:00:16 +0100 (CET) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id B12B040013; Thu, 27 Feb 2020 00:00:16 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on bernadotte.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL,UNPARSEABLE_RELAY autolearn=disabled version=3.4.2 X-Spam-Score: 0.0 Received: from moira.hest-guild.se (moira.hest-guild.se [IPv6:2001:470:de3f:5ec::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 4FD2A40010 for ; Thu, 27 Feb 2020 00:00:15 +0100 (CET) Received: from andkem (uid 1000) (envelope-from andkem@moira.hest-guild.se) id 1878f0a0 by moira.hest-guild.se (DragonFly Mail Agent v0.12); Thu, 27 Feb 2020 00:00:12 +0100 Date: Thu, 27 Feb 2020 00:00:12 +0100 From: Andreas Kempe To: Hans Petter Selasky Cc: Konstantin Belousov , Meny Yossefi , freebsd-infiniband@freebsd.org Subject: Re: [PATCH]: ipoib with mlx4 initialisation ordering Message-ID: <20200226230012.GA6559@moira.hest-guild.se> References: <20200222004838.GA22659@moira.hest-guild.se> <9d76992b-6ba4-2419-61ff-5035aa45e597@selasky.org> <20200224194608.GC22659@moira.hest-guild.se> <16883d49-3cc0-d9cc-0877-46f811eeb8f1@selasky.org> <20200226210554.GE22659@moira.hest-guild.se> <20200226213022.GG22659@moira.hest-guild.se> <2226834e-4184-a581-87bb-3b8ce6c184da@selasky.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="uAKRQypu60I7Lcqm" Content-Disposition: inline In-Reply-To: <2226834e-4184-a581-87bb-3b8ce6c184da@selasky.org> X-Virus-Scanned: ClamAV using ClamSMTP X-Rspamd-Queue-Id: 48SWVJ3WDHz3y30 X-Spamd-Bar: ------- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=pass (policy=none) header.from=liu.se; spf=pass (mx1.freebsd.org: domain of SRS0@lysator.liu.se designates 130.236.254.3 as permitted sender) smtp.mailfrom=SRS0@lysator.liu.se X-Spamd-Result: default: False [-7.93 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_COUNT_FIVE(0.00)[5]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; R_SPF_ALLOW(-0.20)[+a:mail.lysator.liu.se]; RCVD_TLS_LAST(0.00)[]; MIME_GOOD(-0.20)[multipart/signed,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-infiniband@freebsd.org]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_MED(-0.20)[3.254.236.130.list.dnswl.org : 127.0.11.2]; DMARC_POLICY_ALLOW(-0.50)[liu.se,none]; IP_SCORE(-3.13)[ip: (-8.09), ipnet: 130.236.0.0/16(-4.18), asn: 2843(-3.34), country: SE(-0.03)]; SIGNED_PGP(-2.00)[]; FORGED_SENDER(0.30)[kempe@lysator.liu.se,SRS0@lysator.liu.se]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:2843, ipnet:130.236.0.0/16, country:SE]; TAGGED_FROM(0.00)[dd89=4O=moira.hest-guild.se=andkem]; FROM_NEQ_ENVFROM(0.00)[kempe@lysator.liu.se,SRS0@lysator.liu.se] X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Feb 2020 23:00:26 -0000 --uAKRQypu60I7Lcqm Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Feb 26, 2020 at 10:52:56PM +0100, Hans Petter Selasky wrote: > On 2020-02-26 22:30, Andreas Kempe wrote: > > Index: sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > --- sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c (revision 356611) > > +++ sys/ofed/drivers/infiniband/ulp/ipoib/ipoib_main.c (working copy) > > @@ -1739,7 +1739,7 @@ > > } > > module_init(ipoib_init_module); > > -module_exit(ipoib_cleanup_module); > > +module_exit_order(ipoib_cleanup_module, SI_ORDER_FOURTH); > > static int > > ipoib_evhand(module_t mod, int event, void *arg) > >=20 >=20 > I haven't yet found time to reproduce this issue. >=20 No worries, there is absolutely no rush from my side. We can patch our machines ourselves with the initial patch until some sort of solution gets adopted upstream. > Possibly you're right that the list order matters. >=20 I would also have guessed that the patch above would have solved the issue. When the ipoib module is torn down, it should, as far as I can tell from only reading the code, remove all the multicast groups. Without hooking up the kernel debugger again, I can't say for sure why it would still hang. I'm providing the wall of text below in hopes it can help you or anyone that wishes to debug this issue further. The only reason I really said that the list ordering matters is that mlx4_ib_remove calls ib_unregister_device which in turn walks the client list in the reverse order. Printing each list element as the list is iterated during shutdown yields the following client order (the first client to be removed at the top of the list): ib_unregister_device: ib_client->name =3D uverbs =20 ib_unregister_device: ib_client->name =3D ucm ib_unregister_device: ib_client->name =3D umad ib_unregister_device: ib_client->name =3D cm ib_unregister_device: ib_client->name =3D ib_multicast = =20 ib_unregister_device: ib_client->name =3D sa = =20 ib_unregister_device: ib_client->name =3D mad ib_unregister_device: ib_client->name =3D cma ib_unregister_device: ib_client->name =3D ipoib = =20 ib_unregister_device: ib_client->name =3D sdp If the interface is up and running and has sent data when the machine is shut down, it hangs on list index 4, i.e. ib_unregister_device: ib_client->name =3D ib_multicast. The reason it hangs is the wait in mcast_remove_one, see below: sys/ofed/drivers/infiniband/core/ib_multicast.c: > static void mcast_remove_one(struct ib_device *device, void *client_data) > { > struct mcast_device *dev =3D client_data; > struct mcast_port *port; > int i; >=20 > if (!dev) > return; >=20 > ib_unregister_event_handler(&dev->event_handler); > flush_workqueue(mcast_wq); >=20 > for (i =3D 0; i <=3D dev->end_port - dev->start_port; i++) { > if (rdma_cap_ib_mcast(device, dev->start_port + i)) { > port =3D &dev->port[i]; > deref_port(port); > wait_for_completion(&port->comp); > } > } >=20 > kfree(dev); > } > > [...] > > tatic void deref_port(struct mcast_port *port) > { > if (atomic_dec_and_test(&port->refcount)) > complete(&port->comp); > } The crucial logic is in the deref_port(port) function call. It does a check whether the reference counter for port is zero after decrementing the count. If it is zero, complete is called on port->comp. In the case where the reference count is larger than zero after the decrement, complete is never called and we hang forever on wait_for_completion in mcast_remove_one. By moving the initialisation of ipoib, we got it to be removed before the ib_multicast client, causing the reference count to be exactly 1 going into mcast_remove_one, preventing the hang. Cordially, Andreas Kempe --uAKRQypu60I7Lcqm Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEETci4cPcl+ZcyiACiCkqKrhcKSD0FAl5W+HQACgkQCkqKrhcK SD0www//ZmYxrBl7U63tl9e83DpZqNSs0JP8TRazR/Te0WJhXMlnFcK+LlM48CdM HFEtKChhQpOR+nzMwy1+Ozolu3Imx8es8C16Mfh9WvxF4XYGzWPG/Rmntw9zATMF krZl3gugOrVKnHQHUKG3fSBZX7j1PMeO1Bo5eHbSJ6AYu/KyeKBD8O6RDX62jnN6 5FzNqLwovYlsoUMX8xBr0nSMVhPZIbzgUAw5krBzs+uNx4VrG16WGt/wHqYvTPtn TJbV3Y0DUXy5P/TEQPUrofSXhbPUWowZ4qqsx0QaJArQt1nSUMEKFmkqiP6TZPfo oMlouHoSPb9JBcg/YmG0WBowsHPCIxw7/wJmHBpxRHlw2Yjyz6tVcbvvoLYgFs40 no2pOeaWcTTKmcgG/Rhk4nN542GzAABWYrZvRNp7oj2FRKzfbBVnlI0k3ZUTYAOj U/6Sc4msv4UQKKRjn4f5/iPSx98Nfr3TZmtWzN7I+Xa2F8JqzKBsWz/pzG5NxfZH Qu4kQugzRaRgyEG3rwx75OCIRsHNbLytjbSxj2lXxR/Du5JcENIp90b4ACQ5kCiU PGNQgjldFeYv70AFl4Nf3Ckgzui8SmuCBP8vSLdAiF9c+wMJ0nABF4BKBqJbITRe 6woHS5+hGodMt7jKlEN9+2tONsrBcr/tdYEd3nGHCbmucsYTsqg= =/9nz -----END PGP SIGNATURE----- --uAKRQypu60I7Lcqm--