From owner-svn-src-head@freebsd.org Thu May 7 11:28:41 2020 Return-Path: Delivered-To: svn-src-head@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 138282C46D2; Thu, 7 May 2020 11:28:41 +0000 (UTC) (envelope-from mw@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49HrnN6qmBz425C; Thu, 7 May 2020 11:28:40 +0000 (UTC) (envelope-from mw@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id E3115183BD; Thu, 7 May 2020 11:28:40 +0000 (UTC) (envelope-from mw@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id 047BSeFQ028257; Thu, 7 May 2020 11:28:40 GMT (envelope-from mw@FreeBSD.org) Received: (from mw@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id 047BSepF028253; Thu, 7 May 2020 11:28:40 GMT (envelope-from mw@FreeBSD.org) Message-Id: <202005071128.047BSepF028253@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: mw set sender to mw@FreeBSD.org using -f From: Marcin Wojtas Date: Thu, 7 May 2020 11:28:40 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r360777 - head/sys/dev/ena X-SVN-Group: head X-SVN-Commit-Author: mw X-SVN-Commit-Paths: head/sys/dev/ena X-SVN-Commit-Revision: 360777 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 May 2020 11:28:41 -0000 Author: mw Date: Thu May 7 11:28:39 2020 New Revision: 360777 URL: https://svnweb.freebsd.org/changeset/base/360777 Log: Optimize ENA Rx refill for low memory conditions Sometimes, especially when there is not much memory in the system left, allocating mbuf jumbo clusters (like 9KB or 16KB) can take a lot of time and it is not guaranteed that it'll succeed. In that situation, the fallback will work, but if the refill needs to take a place for a lot of descriptors at once, the time spent in m_getjcl looking for memory can cause system unresponsiveness due to high priority of the Rx task. This can also lead to driver reset, because Tx cleanup routine is being blocked and timer service could detect that Tx packets aren't cleaned up. The reset routine can further create another unresponsiveness - Rx rings are being refilled there, so m_getjcl will again burn the CPU. This was causing NVMe driver timeouts and resets, because network driver is having higher priority. Instead of 16KB jumbo clusters for the Rx buffers, 9KB clusters are enough - ENA MTU is being set to 9K anyway, so it's very unlikely that more space than 9KB will be needed. However, 9KB jumbo clusters can still cause issues, so by default the page size mbuf cluster will be used for the Rx descriptors. This can have a small (~2%) impact on the throughput of the device, so to restore original behavior, one must change sysctl "hw.ena.enable_9k_mbufs" to "1" in "/boot/loader.conf" file. As a part of this patch (important fix), the version of the driver was updated to v2.1.2. Submitted by: cperciva Reviewed by: Michal Krawczyk Reviewed by: Ido Segev Reviewed by: Guy Tzalik MFC after: 3 days PR: 225791, 234838, 235856, 236989, 243531 Differential Revision: https://reviews.freebsd.org/D24546 Modified: head/sys/dev/ena/ena.c head/sys/dev/ena/ena.h head/sys/dev/ena/ena_sysctl.c head/sys/dev/ena/ena_sysctl.h Modified: head/sys/dev/ena/ena.c ============================================================================== --- head/sys/dev/ena/ena.c Thu May 7 10:46:02 2020 (r360776) +++ head/sys/dev/ena/ena.c Thu May 7 11:28:39 2020 (r360777) @@ -368,6 +368,7 @@ ena_init_io_rings_common(struct ena_adapter *adapter, ring->ena_dev = adapter->ena_dev; ring->first_interrupt = false; ring->no_interrupt_event_cnt = 0; + ring->rx_mbuf_sz = ena_mbuf_sz; } static void @@ -508,9 +509,9 @@ ena_setup_rx_dma_tag(struct ena_adapter *adapter) ENA_DMA_BIT_MASK(adapter->dma_width), /* lowaddr of excl window */ BUS_SPACE_MAXADDR, /* highaddr of excl window */ NULL, NULL, /* filter, filterarg */ - MJUM16BYTES, /* maxsize */ + ena_mbuf_sz, /* maxsize */ adapter->max_rx_sgl_size, /* nsegments */ - MJUM16BYTES, /* maxsegsize */ + ena_mbuf_sz, /* maxsegsize */ 0, /* flags */ NULL, /* lockfunc */ NULL, /* lockarg */ @@ -963,7 +964,8 @@ ena_alloc_rx_mbuf(struct ena_adapter *adapter, return (0); /* Get mbuf using UMA allocator */ - rx_info->mbuf = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR, MJUM16BYTES); + rx_info->mbuf = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR, + rx_ring->rx_mbuf_sz); if (unlikely(rx_info->mbuf == NULL)) { counter_u64_add(rx_ring->rx_stats.mjum_alloc_fail, 1); @@ -974,7 +976,7 @@ ena_alloc_rx_mbuf(struct ena_adapter *adapter, } mlen = MCLBYTES; } else { - mlen = MJUM16BYTES; + mlen = rx_ring->rx_mbuf_sz; } /* Set mbuf length*/ rx_info->mbuf->m_pkthdr.len = rx_info->mbuf->m_len = mlen; Modified: head/sys/dev/ena/ena.h ============================================================================== --- head/sys/dev/ena/ena.h Thu May 7 10:46:02 2020 (r360776) +++ head/sys/dev/ena/ena.h Thu May 7 11:28:39 2020 (r360777) @@ -41,7 +41,7 @@ #define DRV_MODULE_VER_MAJOR 2 #define DRV_MODULE_VER_MINOR 1 -#define DRV_MODULE_VER_SUBMINOR 1 +#define DRV_MODULE_VER_SUBMINOR 2 #define DRV_MODULE_NAME "ena" @@ -307,8 +307,13 @@ struct ena_ring { /* Determines if device will use LLQ or normal mode for TX */ enum ena_admin_placement_policy_type tx_mem_queue_type; - /* The maximum length the driver can push to the device (For LLQ) */ - uint8_t tx_max_header_size; + union { + /* The maximum length the driver can push to the device (For LLQ) */ + uint8_t tx_max_header_size; + /* The maximum (and default) mbuf size for the Rx descriptor. */ + uint16_t rx_mbuf_sz; + + }; bool first_interrupt; uint16_t no_interrupt_event_cnt; Modified: head/sys/dev/ena/ena_sysctl.c ============================================================================== --- head/sys/dev/ena/ena_sysctl.c Thu May 7 10:46:02 2020 (r360776) +++ head/sys/dev/ena/ena_sysctl.c Thu May 7 11:28:39 2020 (r360777) @@ -48,6 +48,17 @@ int ena_log_level = ENA_ALERT | ENA_WARNING; SYSCTL_INT(_hw_ena, OID_AUTO, log_level, CTLFLAG_RWTUN, &ena_log_level, 0, "Logging level indicating verbosity of the logs"); +/* + * Use 9k mbufs for the Rx buffers. Default to 0 (use page size mbufs instead). + * Using 9k mbufs in low memory conditions might cause allocation to take a lot + * of time and lead to the OS instability as it needs to look for the contiguous + * pages. + * However, page size mbufs has a bit smaller throughput than 9k mbufs, so if + * the network performance is the priority, the 9k mbufs can be used. + */ +int ena_enable_9k_mbufs = 0; +SYSCTL_INT(_hw_ena, OID_AUTO, enable_9k_mbufs, CTLFLAG_RDTUN, + &ena_enable_9k_mbufs, 0, "Use 9 kB mbufs for Rx descriptors"); void ena_sysctl_add_nodes(struct ena_adapter *adapter) Modified: head/sys/dev/ena/ena_sysctl.h ============================================================================== --- head/sys/dev/ena/ena_sysctl.h Thu May 7 10:46:02 2020 (r360776) +++ head/sys/dev/ena/ena_sysctl.h Thu May 7 11:28:39 2020 (r360777) @@ -41,4 +41,7 @@ void ena_sysctl_add_nodes(struct ena_adapter *); +extern int ena_enable_9k_mbufs; +#define ena_mbuf_sz (ena_enable_9k_mbufs ? MJUM9BYTES : MJUMPAGESIZE) + #endif /* !(ENA_SYSCTL_H) */