From owner-svn-src-head@freebsd.org Thu Mar 8 09:58:42 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 47564F47761; Thu, 8 Mar 2018 09:58:42 +0000 (UTC) (envelope-from hselasky@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E7B0371E1C; Thu, 8 Mar 2018 09:58:41 +0000 (UTC) (envelope-from hselasky@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id DE3AC73E0; Thu, 8 Mar 2018 09:58:41 +0000 (UTC) (envelope-from hselasky@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w289wf5k036190; Thu, 8 Mar 2018 09:58:41 GMT (envelope-from hselasky@FreeBSD.org) Received: (from hselasky@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w289wfeR036187; Thu, 8 Mar 2018 09:58:41 GMT (envelope-from hselasky@FreeBSD.org) Message-Id: <201803080958.w289wfeR036187@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: hselasky set sender to hselasky@FreeBSD.org using -f From: Hans Petter Selasky Date: Thu, 8 Mar 2018 09:58:41 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r330646 - in head/sys/dev/mlx5: . mlx5_core X-SVN-Group: head X-SVN-Commit-Author: hselasky X-SVN-Commit-Paths: in head/sys/dev/mlx5: . mlx5_core X-SVN-Commit-Revision: 330646 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2018 09:58:42 -0000 Author: hselasky Date: Thu Mar 8 09:58:41 2018 New Revision: 330646 URL: https://svnweb.freebsd.org/changeset/base/330646 Log: Fix race between PCI error handlers and health work in mlx5core. linux commit 05ac2c0b7438ea08c5d54b48797acf9b22cb2f6f Submitted by: Matthew Finlay MFC after: 1 week Sponsored by: Mellanox Technologies Modified: head/sys/dev/mlx5/driver.h head/sys/dev/mlx5/mlx5_core/mlx5_health.c head/sys/dev/mlx5/mlx5_core/mlx5_main.c Modified: head/sys/dev/mlx5/driver.h ============================================================================== --- head/sys/dev/mlx5/driver.h Thu Mar 8 09:51:33 2018 (r330645) +++ head/sys/dev/mlx5/driver.h Thu Mar 8 09:58:41 2018 (r330646) @@ -482,7 +482,10 @@ struct mlx5_core_health { u32 prev; int miss_counter; bool sick; + /* wq spinlock to synchronize draining */ + spinlock_t wq_lock; struct workqueue_struct *wq; + unsigned long flags; struct work_struct work; }; @@ -872,6 +875,7 @@ void mlx5_health_cleanup(struct mlx5_core_dev *dev); int mlx5_health_init(struct mlx5_core_dev *dev); void mlx5_start_health_poll(struct mlx5_core_dev *dev); void mlx5_stop_health_poll(struct mlx5_core_dev *dev); +void mlx5_drain_health_wq(struct mlx5_core_dev *dev); #define mlx5_buf_alloc_node(dev, size, direct, buf, node) \ mlx5_buf_alloc(dev, size, direct, buf) Modified: head/sys/dev/mlx5/mlx5_core/mlx5_health.c ============================================================================== --- head/sys/dev/mlx5/mlx5_core/mlx5_health.c Thu Mar 8 09:51:33 2018 (r330645) +++ head/sys/dev/mlx5/mlx5_core/mlx5_health.c Thu Mar 8 09:58:41 2018 (r330646) @@ -43,6 +43,10 @@ enum { MLX5_NIC_IFC_NO_DRAM_NIC = 2 }; +enum { + MLX5_DROP_NEW_HEALTH_WORK, +}; + static u8 get_nic_interface(struct mlx5_core_dev *dev) { return (ioread32be(&dev->iseg->cmdq_addr_l_sz) >> 8) & 3; @@ -242,7 +246,13 @@ static void poll_health(unsigned long data) if (in_fatal(dev) && !health->sick) { health->sick = true; print_health_info(dev); - queue_work(health->wq, &health->work); + spin_lock(&health->wq_lock); + if (!test_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags)) + queue_work(health->wq, &health->work); + else + dev_err(&dev->pdev->dev, + "new health works are not permitted at this stage\n"); + spin_unlock(&health->wq_lock); } } @@ -252,6 +262,7 @@ void mlx5_start_health_poll(struct mlx5_core_dev *dev) init_timer(&health->timer); health->sick = 0; + clear_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags); health->health = &dev->iseg->health; health->health_counter = &dev->iseg->health_counter; @@ -267,6 +278,16 @@ void mlx5_stop_health_poll(struct mlx5_core_dev *dev) del_timer_sync(&health->timer); } +void mlx5_drain_health_wq(struct mlx5_core_dev *dev) +{ + struct mlx5_core_health *health = &dev->priv.health; + + spin_lock(&health->wq_lock); + set_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags); + spin_unlock(&health->wq_lock); + cancel_work_sync(&health->work); +} + void mlx5_health_cleanup(struct mlx5_core_dev *dev) { struct mlx5_core_health *health = &dev->priv.health; @@ -293,6 +314,7 @@ int mlx5_health_init(struct mlx5_core_dev *dev) if (!health->wq) return -ENOMEM; + spin_lock_init(&health->wq_lock); INIT_WORK(&health->work, health_care); return 0; Modified: head/sys/dev/mlx5/mlx5_core/mlx5_main.c ============================================================================== --- head/sys/dev/mlx5/mlx5_core/mlx5_main.c Thu Mar 8 09:51:33 2018 (r330645) +++ head/sys/dev/mlx5/mlx5_core/mlx5_main.c Thu Mar 8 09:58:41 2018 (r330646) @@ -1243,7 +1243,12 @@ static pci_ers_result_t mlx5_pci_err_detected(struct p dev_info(&pdev->dev, "%s was called\n", __func__); mlx5_enter_error_state(dev); mlx5_unload_one(dev, priv); - mlx5_pci_disable_device(dev); + if (state) { + pci_save_state(pdev->dev.bsddev); + mlx5_drain_health_wq(dev); + mlx5_pci_disable_device(dev); + } + return state == pci_channel_io_perm_failure ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_NEED_RESET; }