From owner-freebsd-stable@freebsd.org Sat Feb 6 12:19:10 2021 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 65DB5547662 for ; Sat, 6 Feb 2021 12:19:10 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4DXrtj2ZyDz4XSb; Sat, 6 Feb 2021 12:19:08 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD)) (envelope-from ) id 1l8MYF-000A0q-VV; Sat, 06 Feb 2021 15:18:59 +0300 Date: Sat, 6 Feb 2021 15:18:59 +0300 From: Slawa Olhovchenkov To: GomoR Cc: John Baldwin , freebsd-stable@freebsd.org Subject: Re: Suspected mbuf leak with Nginx + sendfile + TLS in 12.2-STABLE Message-ID: <20210206121859.GE75195@zxy.spb.ru> References: <9c56bfda-725c-9c2a-9db3-4599abcfeaa0@FreeBSD.org> <069535216479ce00859e4bcbf499f8a1@gomor.org> <8f02057bee5e8196644e85bbe7f8b31e@gomor.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8f02057bee5e8196644e85bbe7f8b31e@gomor.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false X-Rspamd-Queue-Id: 4DXrtj2ZyDz4XSb X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of slw@zxy.spb.ru has no SPF policy when checking 195.70.199.98) smtp.mailfrom=slw@zxy.spb.ru X-Spamd-Result: default: False [-0.73 / 15.00]; RCVD_TLS_LAST(0.00)[]; ARC_NA(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[zxy.spb.ru]; RBL_DBL_DONT_QUERY_IPS(0.00)[195.70.199.98:from]; AUTH_NA(1.00)[]; NEURAL_SPAM_SHORT(0.37)[0.373]; SPAMHAUS_ZRD(0.00)[195.70.199.98:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_SPF_NA(0.00)[no SPF record]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:5495, ipnet:195.70.192.0/19, country:RU]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-stable] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Feb 2021 12:19:10 -0000 On Fri, Feb 05, 2021 at 11:54:07AM +0100, GomoR wrote: > On 2021-02-05 09:11, GomoR wrote: > >> The first step I would do if possible would be to bisect between the > >> last > >> known working version and the version that is known to be broken to > >> determine which commit introduced the problem. One thing that could > >> help > >> here is to see if you can reproduce the problem using a 12.2 kernel on > >> a > >> 12.1 world + ports. If you can, then you can limit your bisecting to > >> just > >> building new kernels which will make that process quicker. > > We have reinstalled from scratch our system with FreeBSD 12.1-RELEASE. > We then > have installed just enough of our software stack to reproduce the issue. > > No problem with a stock 12.1-RELEASE kernel, but problem arise after > installkernel > with the latest 12.2-STABLE. We then turned off all our customizations, > including > some specific sysctl.conf values. The bug didn't triggered. > > After dissecting our sysctl values, the faulty one has been identified: > > kern.ipc.maxsockbuf=157286400 > > This value is 75 times the default value (2097152). Restoring the > default value > fixes the issue. After some tests, the bug is triggered starting > somewhere to > 64 times the default value (134217728). > > There was no issue with this setting in 12.1-RELEASE, but there is in > 12.2-RELEASE. > > Do you have some insights onto why it causes that mbuf problems? In the > meantime, > we have our solution, but we are willing to help identify if that's a > kernel bug > or just a real bad idea to set maxsockbuf to such a high value. === > Each time a user downloads a file, mbuf & mbuf_clusters are raising to > reach the maximum limit in a matter of seconds. Those values are > asserted by 'netstat -m' as follows: > > Normal situation: > > mbuf: 256, 26031105, 16767, 5974,428087938, 0, 0 > mbuf_cluster: 2048, 8135232, 18408, 2704,101644203, 0, 0 > > Warning situtation: > > mbuf: 256, 26031105, 2981516, 151205,1109483561, 0, 0 > mbuf_cluster: 2048, 8135232, 2983155, 4201,319714617, 0, 0 === Can you clarified what is problem? I.e. under load system used more resources and this is not bug. Do you see more resources usage compared to load? Or resources don't freed after drop load?