From owner-svn-src-all@freebsd.org Thu May 16 17:15:42 2019 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 24EA8159FD5B; Thu, 16 May 2019 17:15:42 +0000 (UTC) (envelope-from hselasky@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BE9C38DD1E; Thu, 16 May 2019 17:15:41 +0000 (UTC) (envelope-from hselasky@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id B0FBD23DE4; Thu, 16 May 2019 17:15:41 +0000 (UTC) (envelope-from hselasky@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id x4GHFfDi098830; Thu, 16 May 2019 17:15:41 GMT (envelope-from hselasky@FreeBSD.org) Received: (from hselasky@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id x4GHFfmn098828; Thu, 16 May 2019 17:15:41 GMT (envelope-from hselasky@FreeBSD.org) Message-Id: <201905161715.x4GHFfmn098828@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: hselasky set sender to hselasky@FreeBSD.org using -f From: Hans Petter Selasky Date: Thu, 16 May 2019 17:15:41 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-11@freebsd.org Subject: svn commit: r347803 - in stable/11/sys/dev/mlx5: . mlx5_core X-SVN-Group: stable-11 X-SVN-Commit-Author: hselasky X-SVN-Commit-Paths: in stable/11/sys/dev/mlx5: . mlx5_core X-SVN-Commit-Revision: 347803 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: BE9C38DD1E X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-2.98 / 15.00]; local_wl_from(0.00)[FreeBSD.org]; NEURAL_HAM_MEDIUM(-1.00)[-0.998,0]; NEURAL_HAM_SHORT(-0.99)[-0.986,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; ASN(0.00)[asn:11403, ipnet:2610:1c1:1::/48, country:US] X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 May 2019 17:15:42 -0000 Author: hselasky Date: Thu May 16 17:15:41 2019 New Revision: 347803 URL: https://svnweb.freebsd.org/changeset/base/347803 Log: MFC r347253: Protect from infinite sw-reset loop in mlx5core. Avoid an infinite software firmware reset loop that may be caused by a hardware bug by limiting the maximum number of resets. The counter between resets is reset by request for reset, and not by a successful reset. The interval between two resets can be configured via sysctl: hw.mlx5.sw_reset_timeout which is global to all mlx5 devices in the system. Submitted by: slavash@ Sponsored by: Mellanox Technologies Modified: stable/11/sys/dev/mlx5/driver.h stable/11/sys/dev/mlx5/mlx5_core/mlx5_health.c Directory Properties: stable/11/ (props changed) Modified: stable/11/sys/dev/mlx5/driver.h ============================================================================== --- stable/11/sys/dev/mlx5/driver.h Thu May 16 17:15:00 2019 (r347802) +++ stable/11/sys/dev/mlx5/driver.h Thu May 16 17:15:41 2019 (r347803) @@ -534,6 +534,7 @@ struct mlx5_core_health { unsigned long flags; struct work_struct work; struct delayed_work recover_work; + unsigned int last_reset_req; }; #define MLX5_CQ_LINEAR_ARRAY_SIZE 1024 Modified: stable/11/sys/dev/mlx5/mlx5_core/mlx5_health.c ============================================================================== --- stable/11/sys/dev/mlx5/mlx5_core/mlx5_health.c Thu May 16 17:15:00 2019 (r347802) +++ stable/11/sys/dev/mlx5/mlx5_core/mlx5_health.c Thu May 16 17:15:41 2019 (r347803) @@ -64,6 +64,12 @@ SYSCTL_INT(_hw_mlx5, OID_AUTO, fw_reset_enable, CTLFLA &mlx5_fw_reset_enable, 0, "Enable firmware reset"); +static unsigned int sw_reset_to = 1200; +SYSCTL_UINT(_hw_mlx5, OID_AUTO, sw_reset_timeout, CTLFLAG_RWTUN, + &sw_reset_to, 0, + "Minimum timeout in seconds between two firmware resets"); + + static int lock_sem_sw_reset(struct mlx5_core_dev *dev) { int ret; @@ -218,6 +224,32 @@ static void reset_fw_if_needed(struct mlx5_core_dev *d &dev->iseg->cmdq_addr_l_sz); } +static bool +mlx5_health_allow_reset(struct mlx5_core_dev *dev) +{ + struct mlx5_core_health *health = &dev->priv.health; + unsigned int delta; + bool ret; + + if (health->last_reset_req != 0) { + delta = ticks - health->last_reset_req; + delta /= hz; + ret = delta >= sw_reset_to; + } else { + ret = true; + } + + /* + * In principle, ticks may be 0. Setting it to off by one (-1) + * to prevent certain reset in next request. + */ + health->last_reset_req = ticks ? : -1; + if (!ret) + mlx5_core_warn(dev, "Firmware reset elided due to " + "auto-reset frequency threshold.\n"); + return (ret); +} + #define MLX5_CRDUMP_WAIT_MS 60000 #define MLX5_FW_RESET_WAIT_MS 1000 #define MLX5_NIC_STATE_POLL_MS 5 @@ -243,7 +275,8 @@ void mlx5_enter_error_state(struct mlx5_core_dev *dev, if (force) goto err_state_done; - if (fatal_error == MLX5_SENSOR_FW_SYND_RFR) { + if (fatal_error == MLX5_SENSOR_FW_SYND_RFR && + mlx5_health_allow_reset(dev)) { /* Get cr-dump and reset FW semaphore */ if (mlx5_core_is_pf(dev)) lock = lock_sem_sw_reset(dev);