From owner-freebsd-current@FreeBSD.ORG Mon Oct 28 22:24:54 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 88753BD7; Mon, 28 Oct 2013 22:24:54 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) by mx1.freebsd.org (Postfix) with ESMTP id 43E152428; Mon, 28 Oct 2013 22:24:53 +0000 (UTC) Received: from slw by zxy.spb.ru with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1VavGt-0000xC-2x; Tue, 29 Oct 2013 02:26:51 +0400 Date: Tue, 29 Oct 2013 02:26:51 +0400 From: Slawa Olhovchenkov To: d@delphij.net Subject: Re: ZFS txg implementation flaw Message-ID: <20131028222651.GZ63359@zxy.spb.ru> References: <20131028092844.GA24997@zxy.spb.ru> <9A00B135-7D28-47EB-ADB3-E87C38BAC6B6@ixsystems.com> <20131028213204.GX63359@zxy.spb.ru> <526ED956.10202@delphij.net> <20131028214552.GY63359@zxy.spb.ru> <526EDD81.8000109@delphij.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <526EDD81.8000109@delphij.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org, Jordan Hubbard X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Oct 2013 22:24:54 -0000 On Mon, Oct 28, 2013 at 02:56:17PM -0700, Xin Li wrote: > >>> Semi-indirect. dtrace -n 'fbt:kernel:vm_object_terminate:entry > >>> { @traces[stack()] = count(); }' > >>> > >>> After some (2-3) seconds > >>> > >>> kernel`vnode_destroy_vobject+0xb9 > >>> zfs.ko`zfs_freebsd_reclaim+0x2e kernel`VOP_RECLAIM_APV+0x78 > >>> kernel`vgonel+0x134 kernel`vnlru_free+0x362 > >>> kernel`vnlru_proc+0x61e kernel`fork_exit+0x11f > >>> kernel`0xffffffff80cdbfde 2490 > > > > 0xffffffff80cdbfd0 : mov %r12,%rdi > > 0xffffffff80cdbfd3 : mov %rbx,%rsi > > 0xffffffff80cdbfd6 : mov %rsp,%rdx > > 0xffffffff80cdbfd9 : callq 0xffffffff808db560 > > 0xffffffff80cdbfde : jmpq > > 0xffffffff80cdca80 0xffffffff80cdbfe3 > > : nopw 0x0(%rax,%rax,1) > > 0xffffffff80cdbfe9 : nopl 0x0(%rax) > > > > > >>> I don't have user process created threads nor do fork/exit. > >> > >> This has nothing to do with fork/exit but does suggest that you > >> are running of vnodes. What does sysctl -a | grep vnode say? > > > > kern.maxvnodes: 1095872 kern.minvnodes: 273968 > > vm.stats.vm.v_vnodepgsout: 0 vm.stats.vm.v_vnodepgsin: 62399 > > vm.stats.vm.v_vnodeout: 0 vm.stats.vm.v_vnodein: 10680 > > vfs.freevnodes: 275107 vfs.wantfreevnodes: 273968 vfs.numvnodes: > > 316321 debug.sizeof.vnode: 504 > > Try setting vfs.wantfreevnodes to 547936 (double it). Now fork_trampoline was gone, but I still see prcfr (and zfod/totfr too). Currently half of peeak traffic and I can't check impact to IRQ handling. kern.maxvnodes: 1095872 kern.minvnodes: 547936 vm.stats.vm.v_vnodepgsout: 0 vm.stats.vm.v_vnodepgsin: 63134 vm.stats.vm.v_vnodeout: 0 vm.stats.vm.v_vnodein: 10836 vfs.freevnodes: 481873 vfs.wantfreevnodes: 547936 vfs.numvnodes: 517331 debug.sizeof.vnode: 504 Now dtrace -n 'fbt:kernel:vm_object_terminate:entry { @traces[stack()] = count(); }' kernel`vm_object_deallocate+0x520 kernel`vm_map_entry_deallocate+0x4c kernel`vm_map_process_deferred+0x3d kernel`sys_munmap+0x16c kernel`amd64_syscall+0x5ea kernel`0xffffffff80cdbd97 56 I think this is nginx memory management (allocation|dealocation). Can I tune malloc to disable free pages?