Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 10 Nov 2012 15:20:19 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        "Sears, Steven" <Steven.Sears@netapp.com>
Cc:        "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   Re: Memory reserves or lack thereof
Message-ID:  <20121110132019.GP73505@kib.kiev.ua>
In-Reply-To: <A6DE036C6A90C949A25CE89E844237FB2086970A@SACEXCMBX01-PRD.hq.netapp.com>
References:  <A6DE036C6A90C949A25CE89E844237FB2086970A@SACEXCMBX01-PRD.hq.netapp.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--pc5/sMjAdU99/gPV
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Nov 09, 2012 at 07:10:04PM +0000, Sears, Steven wrote:
> I have a memory subsystem design question that I'm hoping someone can ans=
wer.
>=20
> I've been looking at a machine that is completely out of memory, as in
>=20
>  v_free_count =3D 0,=20
>  v_cache_count =3D 0,=20
>=20
> I wondered how a machine could completely run out of memory like this, es=
pecially after finding a lack of interrupt storms or other pathologies that=
 would tend to overcommit memory. So I started investigating.
>=20
> Most allocators come down to vm_page_alloc(), which has this guard:
>=20
> 	if ((curproc =3D=3D pageproc) && (page_req !=3D VM_ALLOC_INTERRUPT)) {
> 		page_req =3D VM_ALLOC_SYSTEM;
> 	};
>=20
> 	if (cnt.v_free_count + cnt.v_cache_count > cnt.v_free_reserved ||
> 	    (page_req =3D=3D VM_ALLOC_SYSTEM &&=20
> 	    cnt.v_free_count + cnt.v_cache_count > cnt.v_interrupt_free_min) ||
> 	    (page_req =3D=3D VM_ALLOC_INTERRUPT &&
> 	    cnt.v_free_count + cnt.v_cache_count > 0)) {
>=20
> The key observation is if VM_ALLOC_INTERRUPT is set, it will allocate eve=
ry last page.
>=20
> >From the name one might expect VM_ALLOC_INTERRUPT to be somewhat rare, p=
erhaps only used from interrupt threads. Not so, see kmem_malloc() or uma_s=
mall_alloc() which both contain this mapping:
>=20
> 	if ((flags & (M_NOWAIT|M_USE_RESERVE)) =3D=3D M_NOWAIT)
> 		pflags =3D VM_ALLOC_INTERRUPT | VM_ALLOC_WIRED;
> 	else
> 		pflags =3D VM_ALLOC_SYSTEM | VM_ALLOC_WIRED;
>=20
> Note that M_USE_RESERVE has been deprecated and is used in just a handful=
 of places. Also note that lots of code paths come through these routines.
>=20
> What this means is essentially _any_ allocation using M_NOWAIT will bypas=
s whatever reserves have been held back and will take every last page avail=
able.
>=20
> There is no documentation stating M_NOWAIT has this side effect of essent=
ially being privileged, so any innocuous piece of code that can't block wil=
l use it. And of course M_NOWAIT is literally used all over.
>=20
> It looks to me like the design goal of the BSD allocators is on recovery;=
 it will give all pages away knowing it can recover.
>=20
> Am I missing anything? I would have expected some small number of pages t=
o be held in reserve just in case. And I didn't expect M_NOWAIT to be a sor=
t of back door for grabbing memory.
>=20

Your analysis is right, there is nothing to add or correct.
This is the reason to strongly prefer M_WAITOK.

--pc5/sMjAdU99/gPV
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAlCeVJMACgkQC3+MBN1Mb4hR/gCbB/O8BhKBT5X1R0N4qgE2j3rN
psMAn2+n5ZpjGJpiPsf/zPXLnr3B4QuO
=6RHi
-----END PGP SIGNATURE-----

--pc5/sMjAdU99/gPV--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121110132019.GP73505>