From owner-freebsd-current@FreeBSD.ORG Sun Aug 26 16:52:13 2012 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A7534106564A; Sun, 26 Aug 2012 16:52:13 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id 492BC8FC16; Sun, 26 Aug 2012 16:52:12 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 11E257300B; Sun, 26 Aug 2012 19:11:26 +0200 (CEST) Date: Sun, 26 Aug 2012 19:11:26 +0200 From: Luigi Rizzo To: Alan Cox Message-ID: <20120826171126.GA40672@onelab2.iet.unipi.it> References: <20120822120105.GA63763@onelab2.iet.unipi.it> <20120823163145.GA3999@onelab2.iet.unipi.it> <50366398.2070700@rice.edu> <20120823174504.GB4820@onelab2.iet.unipi.it> <50371485.1020409@rice.edu> <20120824145708.GA16557@onelab2.iet.unipi.it> <5037A803.6030100@rice.edu> <20120824165428.GA17495@onelab2.iet.unipi.it> <5037B226.3000103@rice.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5037B226.3000103@rice.edu> User-Agent: Mutt/1.4.2.3i Cc: alc@freebsd.org, current@freebsd.org Subject: Re: less aggressive contigmalloc ? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 26 Aug 2012 16:52:13 -0000 On Fri, Aug 24, 2012 at 11:56:06AM -0500, Alan Cox wrote: > On 08/24/2012 11:54, Luigi Rizzo wrote: > >On Fri, Aug 24, 2012 at 11:12:51AM -0500, Alan Cox wrote: > >>On 08/24/2012 09:57, Luigi Rizzo wrote: > >>>On Fri, Aug 24, 2012 at 12:43:33AM -0500, Alan Cox wrote: > >>>>On 08/23/2012 12:45, Luigi Rizzo wrote: > >>>>>On Thu, Aug 23, 2012 at 12:08:40PM -0500, Alan Cox wrote: > >>>>>... > >>>>>>>yes i do see that. > >>>>>>> > >>>>>>>Maybe less aggressive with M_NOWAIT but still kills processes. > >>>>>>Are you compiling world with MALLOC_PRODUCTION? The latest version of > >>>>>whatever the default is. But: > >>>>> > >>>>>>jemalloc uses significantly more memory when debugging options are > >>>>>>enabled. This first came up in a thread titled "10-CURRENT and swap > >>>>>>usage" back in June. > >>>>>> > >>>>>>Even at its most aggressive, M_WAITOK, contigmalloc() does not > >>>>>>directly > >>>>>>kill processes. If process death coincides with the use of > >>>>>>contigmalloc(), then it is simply the result of earlier, successful > >>>>>>contigmalloc() calls, or for that matter any other physical memory > >>>>>>allocation calls, having depleted the pool of free pages to the point > >>>>>>that the page daemon runs and invokes vm_pageout_oom(). > >>>>>does it mean that those previous allocations relied on memory > >>>>>overbooking ? > >>>>Yes. > >>>> > >>>>>Is there a way to avoid that, then ? > >>>>I believe that malloc()'s default minimum allocation size is 4MB. You > >>>>could reduce that. > >>>> > >>>>Alternatively, you can enable MALLOC_PRODUCTION. > >>>i tried this, and as others mentioned it makes life > >>>better and reduces the problem but contigmalloc still triggers > >>>random process kills. > >>I would be curious to see a stack backtrace when vm_pageout_oom() is > >>called. > >you mean a backtrace of the process(es) that get killed ? > > No, a backtrace showing who called vm_pageout_oom(). Simply add a > kdb_backtrace() call at the start of vm_pageout_oom(). There are two > possibilities. I want to know which it is. this is dmesg when I add kdb_backtrace() at the start of vm_pageout_oom() The '... netmap_finalize_obj_allocator... are from my calls to contigmalloc, each one doing one-page allocations. I get 7-8 'KDB: stack backtrace' blocks, then allocations restart successfully, then more failures... The reference to fork_exit() does not seem right, because i am in a block where i call contigmalloc, so the caller of vm_pageout_grow_cache() should be kmem_alloc_contig(). 630.004926 netmap_finalize_obj_allocator [593] cluster at 8910 ok 630.005563 netmap_finalize_obj_allocator [593] cluster at 8912 ok 630.006077 netmap_finalize_obj_allocator [593] cluster at 8914 ok KDB: stack backtrace: X_db_sym_numargs() at X_db_sym_numargs+0x1aa vm_pageout_oom() at vm_pageout_oom+0x19 vm_pageout_grow_cache() at vm_pageout_grow_cache+0xd01 fork_exit() at fork_exit+0x11c fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff8005f12cb0, rbp = 0 --- KDB: stack backtrace: X_db_sym_numargs() at X_db_sym_numargs+0x1aa vm_pageout_oom() at vm_pageout_oom+0x19 vm_pageout_grow_cache() at vm_pageout_grow_cache+0xd01 fork_exit() at fork_exit+0x11c fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff8005f12cb0, rbp = 0 --- ... Some of the processes must be 'getty' because i also find this line in dmesg: <118>Aug 26 16:47:11 init: getty repeating too quickly on port /dev/ttyv7, sleep ing 30 secs cheers luigi