From owner-freebsd-hackers@FreeBSD.ORG Fri Nov 9 19:11:13 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4B07FCA3 for ; Fri, 9 Nov 2012 19:11:13 +0000 (UTC) (envelope-from Steven.Sears@netapp.com) Received: from mx2.netapp.com (mx2.netapp.com [216.240.18.37]) by mx1.freebsd.org (Postfix) with ESMTP id 244DB8FC12 for ; Fri, 9 Nov 2012 19:11:12 +0000 (UTC) X-IronPort-AV: E=Sophos;i="4.80,747,1344236400"; d="scan'208";a="708712352" Received: from smtp1.corp.netapp.com ([10.57.156.124]) by mx2-out.netapp.com with ESMTP; 09 Nov 2012 11:11:12 -0800 Received: from vmwexceht02-prd.hq.netapp.com (vmwexceht02-prd.hq.netapp.com [10.106.76.240]) by smtp1.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id qA9JBCbn021597 for ; Fri, 9 Nov 2012 11:11:12 -0800 (PST) Received: from VMWEXCEHT06-PRD.hq.netapp.com (10.106.77.104) by vmwexceht02-prd.hq.netapp.com (10.106.76.240) with Microsoft SMTP Server (TLS) id 14.2.318.1; Fri, 9 Nov 2012 11:11:11 -0800 Received: from SACEXCMBX01-PRD.hq.netapp.com ([169.254.2.216]) by vmwexceht06-prd.hq.netapp.com ([10.106.77.104]) with mapi id 14.02.0318.001; Fri, 9 Nov 2012 11:11:11 -0800 From: "Sears, Steven" To: "freebsd-hackers@freebsd.org" Subject: Memory reserves or lack thereof Thread-Topic: Memory reserves or lack thereof Thread-Index: AQHNvq36DpjcaGVTo0u2vjVo8DKzOA== Date: Fri, 9 Nov 2012 19:10:04 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.106.53.51] Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailman-Approved-At: Fri, 09 Nov 2012 19:22:01 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Nov 2012 19:11:13 -0000 I have a memory subsystem design question that I'm hoping someone can answe= r. I've been looking at a machine that is completely out of memory, as in v_free_count =3D 0,=20 v_cache_count =3D 0,=20 I wondered how a machine could completely run out of memory like this, espe= cially after finding a lack of interrupt storms or other pathologies that w= ould tend to overcommit memory. So I started investigating. Most allocators come down to vm_page_alloc(), which has this guard: if ((curproc =3D=3D pageproc) && (page_req !=3D VM_ALLOC_INTERRUPT)) { page_req =3D VM_ALLOC_SYSTEM; }; if (cnt.v_free_count + cnt.v_cache_count > cnt.v_free_reserved || (page_req =3D=3D VM_ALLOC_SYSTEM &&=20 cnt.v_free_count + cnt.v_cache_count > cnt.v_interrupt_free_min) || (page_req =3D=3D VM_ALLOC_INTERRUPT && cnt.v_free_count + cnt.v_cache_count > 0)) { The key observation is if VM_ALLOC_INTERRUPT is set, it will allocate every= last page. >From the name one might expect VM_ALLOC_INTERRUPT to be somewhat rare, perh= aps only used from interrupt threads. Not so, see kmem_malloc() or uma_smal= l_alloc() which both contain this mapping: if ((flags & (M_NOWAIT|M_USE_RESERVE)) =3D=3D M_NOWAIT) pflags =3D VM_ALLOC_INTERRUPT | VM_ALLOC_WIRED; else pflags =3D VM_ALLOC_SYSTEM | VM_ALLOC_WIRED; Note that M_USE_RESERVE has been deprecated and is used in just a handful o= f places. Also note that lots of code paths come through these routines. What this means is essentially _any_ allocation using M_NOWAIT will bypass = whatever reserves have been held back and will take every last page availab= le. There is no documentation stating M_NOWAIT has this side effect of essentia= lly being privileged, so any innocuous piece of code that can't block will = use it. And of course M_NOWAIT is literally used all over. It looks to me like the design goal of the BSD allocators is on recovery; i= t will give all pages away knowing it can recover. Am I missing anything? I would have expected some small number of pages to = be held in reserve just in case. And I didn't expect M_NOWAIT to be a sort = of back door for grabbing memory. Thanks, -Steve