From owner-freebsd-hackers@FreeBSD.ORG Sat Nov 10 13:20:25 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 125D144C for ; Sat, 10 Nov 2012 13:20:25 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id A32FC8FC15 for ; Sat, 10 Nov 2012 13:20:24 +0000 (UTC) Received: from tom.home (localhost [127.0.0.1]) by kib.kiev.ua (8.14.5/8.14.5) with ESMTP id qAADKKP4073948; Sat, 10 Nov 2012 15:20:20 +0200 (EET) (envelope-from kostikbel@gmail.com) X-DKIM: OpenDKIM Filter v2.5.2 kib.kiev.ua qAADKKP4073948 Received: (from kostik@localhost) by tom.home (8.14.5/8.14.5/Submit) id qAADKJBS073940; Sat, 10 Nov 2012 15:20:19 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 10 Nov 2012 15:20:19 +0200 From: Konstantin Belousov To: "Sears, Steven" Subject: Re: Memory reserves or lack thereof Message-ID: <20121110132019.GP73505@kib.kiev.ua> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="pc5/sMjAdU99/gPV" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=0.2 required=5.0 tests=ALL_TRUSTED, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: "freebsd-hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 Nov 2012 13:20:25 -0000 --pc5/sMjAdU99/gPV Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Nov 09, 2012 at 07:10:04PM +0000, Sears, Steven wrote: > I have a memory subsystem design question that I'm hoping someone can ans= wer. >=20 > I've been looking at a machine that is completely out of memory, as in >=20 > v_free_count =3D 0,=20 > v_cache_count =3D 0,=20 >=20 > I wondered how a machine could completely run out of memory like this, es= pecially after finding a lack of interrupt storms or other pathologies that= would tend to overcommit memory. So I started investigating. >=20 > Most allocators come down to vm_page_alloc(), which has this guard: >=20 > if ((curproc =3D=3D pageproc) && (page_req !=3D VM_ALLOC_INTERRUPT)) { > page_req =3D VM_ALLOC_SYSTEM; > }; >=20 > if (cnt.v_free_count + cnt.v_cache_count > cnt.v_free_reserved || > (page_req =3D=3D VM_ALLOC_SYSTEM &&=20 > cnt.v_free_count + cnt.v_cache_count > cnt.v_interrupt_free_min) || > (page_req =3D=3D VM_ALLOC_INTERRUPT && > cnt.v_free_count + cnt.v_cache_count > 0)) { >=20 > The key observation is if VM_ALLOC_INTERRUPT is set, it will allocate eve= ry last page. >=20 > >From the name one might expect VM_ALLOC_INTERRUPT to be somewhat rare, p= erhaps only used from interrupt threads. Not so, see kmem_malloc() or uma_s= mall_alloc() which both contain this mapping: >=20 > if ((flags & (M_NOWAIT|M_USE_RESERVE)) =3D=3D M_NOWAIT) > pflags =3D VM_ALLOC_INTERRUPT | VM_ALLOC_WIRED; > else > pflags =3D VM_ALLOC_SYSTEM | VM_ALLOC_WIRED; >=20 > Note that M_USE_RESERVE has been deprecated and is used in just a handful= of places. Also note that lots of code paths come through these routines. >=20 > What this means is essentially _any_ allocation using M_NOWAIT will bypas= s whatever reserves have been held back and will take every last page avail= able. >=20 > There is no documentation stating M_NOWAIT has this side effect of essent= ially being privileged, so any innocuous piece of code that can't block wil= l use it. And of course M_NOWAIT is literally used all over. >=20 > It looks to me like the design goal of the BSD allocators is on recovery;= it will give all pages away knowing it can recover. >=20 > Am I missing anything? I would have expected some small number of pages t= o be held in reserve just in case. And I didn't expect M_NOWAIT to be a sor= t of back door for grabbing memory. >=20 Your analysis is right, there is nothing to add or correct. This is the reason to strongly prefer M_WAITOK. --pc5/sMjAdU99/gPV Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlCeVJMACgkQC3+MBN1Mb4hR/gCbB/O8BhKBT5X1R0N4qgE2j3rN psMAn2+n5ZpjGJpiPsf/zPXLnr3B4QuO =6RHi -----END PGP SIGNATURE----- --pc5/sMjAdU99/gPV--