From owner-svn-src-all@freebsd.org Wed Dec 12 12:22:30 2018 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F132A130B65B; Wed, 12 Dec 2018 12:22:29 +0000 (UTC) (envelope-from hselasky@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 991248EDC3; Wed, 12 Dec 2018 12:22:29 +0000 (UTC) (envelope-from hselasky@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 8FF377E84; Wed, 12 Dec 2018 12:22:29 +0000 (UTC) (envelope-from hselasky@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id wBCCMTkb080000; Wed, 12 Dec 2018 12:22:29 GMT (envelope-from hselasky@FreeBSD.org) Received: (from hselasky@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id wBCCMTuH079997; Wed, 12 Dec 2018 12:22:29 GMT (envelope-from hselasky@FreeBSD.org) Message-Id: <201812121222.wBCCMTuH079997@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: hselasky set sender to hselasky@FreeBSD.org using -f From: Hans Petter Selasky Date: Wed, 12 Dec 2018 12:22:29 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-12@freebsd.org Subject: svn commit: r341933 - in stable/12/sys/dev/mlx5: . mlx5_core X-SVN-Group: stable-12 X-SVN-Commit-Author: hselasky X-SVN-Commit-Paths: in stable/12/sys/dev/mlx5: . mlx5_core X-SVN-Commit-Revision: 341933 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 991248EDC3 X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-0.60 / 15.00]; ASN(0.00)[asn:11403, ipnet:2610:1c1:1::/48, country:US]; local_wl_from(0.00)[FreeBSD.org]; NEURAL_HAM_SHORT(-0.60)[-0.601,0] X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Dec 2018 12:22:30 -0000 Author: hselasky Date: Wed Dec 12 12:22:28 2018 New Revision: 341933 URL: https://svnweb.freebsd.org/changeset/base/341933 Log: MFC r341560: mlx5: Fix use-after-free in self-healing flow When the mlx5 health mechanism detects a problem while the driver is in the middle of init_one or remove_one, the driver needs to prevent the health mechanism from scheduling future work; if future work is scheduled, there is a problem with use-after-free: the system WQ tries to run the work item (which has been freed) at the scheduled future time. Prevent this by disabling work item scheduling in the health mechanism when the driver is in the middle of init_one() or remove_one(). Sponsored by: Mellanox Technologies Modified: stable/12/sys/dev/mlx5/driver.h stable/12/sys/dev/mlx5/mlx5_core/mlx5_health.c stable/12/sys/dev/mlx5/mlx5_core/mlx5_main.c Directory Properties: stable/12/ (props changed) Modified: stable/12/sys/dev/mlx5/driver.h ============================================================================== --- stable/12/sys/dev/mlx5/driver.h Wed Dec 12 12:19:49 2018 (r341932) +++ stable/12/sys/dev/mlx5/driver.h Wed Dec 12 12:22:28 2018 (r341933) @@ -923,7 +923,7 @@ void mlx5_unmap_free_uar(struct mlx5_core_dev *mdev, s void mlx5_health_cleanup(struct mlx5_core_dev *dev); int mlx5_health_init(struct mlx5_core_dev *dev); void mlx5_start_health_poll(struct mlx5_core_dev *dev); -void mlx5_stop_health_poll(struct mlx5_core_dev *dev); +void mlx5_stop_health_poll(struct mlx5_core_dev *dev, bool disable_health); void mlx5_drain_health_wq(struct mlx5_core_dev *dev); void mlx5_drain_health_recovery(struct mlx5_core_dev *dev); void mlx5_trigger_health_work(struct mlx5_core_dev *dev); Modified: stable/12/sys/dev/mlx5/mlx5_core/mlx5_health.c ============================================================================== --- stable/12/sys/dev/mlx5/mlx5_core/mlx5_health.c Wed Dec 12 12:19:49 2018 (r341932) +++ stable/12/sys/dev/mlx5/mlx5_core/mlx5_health.c Wed Dec 12 12:22:28 2018 (r341933) @@ -516,9 +516,17 @@ void mlx5_start_health_poll(struct mlx5_core_dev *dev) round_jiffies(jiffies + MLX5_HEALTH_POLL_INTERVAL)); } -void mlx5_stop_health_poll(struct mlx5_core_dev *dev) +void mlx5_stop_health_poll(struct mlx5_core_dev *dev, bool disable_health) { struct mlx5_core_health *health = &dev->priv.health; + unsigned long flags; + + if (disable_health) { + spin_lock_irqsave(&health->wq_lock, flags); + set_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags); + set_bit(MLX5_DROP_NEW_RECOVERY_WORK, &health->flags); + spin_unlock_irqrestore(&health->wq_lock, flags); + } del_timer_sync(&health->timer); } Modified: stable/12/sys/dev/mlx5/mlx5_core/mlx5_main.c ============================================================================== --- stable/12/sys/dev/mlx5/mlx5_core/mlx5_main.c Wed Dec 12 12:19:49 2018 (r341932) +++ stable/12/sys/dev/mlx5/mlx5_core/mlx5_main.c Wed Dec 12 12:22:28 2018 (r341933) @@ -1107,7 +1107,7 @@ err_cleanup_once: mlx5_cleanup_once(dev); err_stop_poll: - mlx5_stop_health_poll(dev); + mlx5_stop_health_poll(dev, boot); if (mlx5_cmd_teardown_hca(dev)) { device_printf((&dev->pdev->dev)->bsddev, "ERR: ""tear_down_hca failed, skip cleanup\n"); goto out_err; @@ -1159,7 +1159,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, mlx5_disable_msix(dev); if (cleanup) mlx5_cleanup_once(dev); - mlx5_stop_health_poll(dev); + mlx5_stop_health_poll(dev, cleanup); err = mlx5_cmd_teardown_hca(dev); if (err) { device_printf((&dev->pdev->dev)->bsddev, "ERR: ""tear_down_hca failed, skip cleanup\n"); @@ -1405,6 +1405,12 @@ static int mlx5_try_fast_unload(struct mlx5_core_dev * mlx5_core_dbg(dev, "Device in internal error state, giving up\n"); return -EAGAIN; } + + /* Panic tear down fw command will stop the PCI bus communication + * with the HCA, so the health polll is no longer needed. + */ + mlx5_drain_health_wq(dev); + mlx5_stop_health_poll(dev, false); err = mlx5_cmd_force_teardown_hca(dev); if (err) {