From owner-freebsd-current@FreeBSD.ORG Wed Oct 23 14:30:53 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 350DBF32 for ; Wed, 23 Oct 2013 14:30:53 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 09B5C2399 for ; Wed, 23 Oct 2013 14:30:53 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 134FAB981; Wed, 23 Oct 2013 10:30:52 -0400 (EDT) From: John Baldwin To: freebsd-current@freebsd.org Subject: Re: How to debug whats cause to much __mtx_lock_sleep in system Date: Wed, 23 Oct 2013 10:25:01 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20130906; KDE/4.5.5; amd64; ; ) References: <20131021125949.GB13109@hell.ukr.net> In-Reply-To: <20131021125949.GB13109@hell.ukr.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201310231025.01899.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 23 Oct 2013 10:30:52 -0400 (EDT) Cc: Vitalij Satanivskij X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Oct 2013 14:30:53 -0000 On Monday, October 21, 2013 8:59:49 am Vitalij Satanivskij wrote: > Hello. > > Have 10.0-BETA1 #7 r256765 whith terible load's "load averages: 23.31, 30.53, 31" > > wich degraded more and more with time. > > Kernel compilied with dtrace support and using script called hotkernel from DTraceToolkit-0.99 found some stange statistics > > zfs.ko`lz4_compress 5045 0.2% > kernel`0xffffffff80 5185 0.2% > kernel`uma_zalloc_arg 5302 0.2% > kernel`bcopy 5322 0.2% > kernel`_sx_xlock 7310 0.3% > kernel`_sx_xunlock 7434 0.3% > zfs.ko`l2arc_feed_thread 9797 0.4% > zfs.ko`lzjb_compress 9912 0.4% > zfs.ko`list_prev 17894 0.7% > kernel`__rw_wlock_hard 30522 1.2% > kernel`spinlock_exit 31310 1.3% > kernel`acpi_cpu_c1 103495 4.1% > kernel`_sx_xlock_hard 138743 5.5% > kernel`vmem_xalloc 175869 7.0% > kernel`cpu_idle 371159 14.8% > kernel`__mtx_lock_sleep 1345815 53.8% > > > > Theris another same machine with simple data and usage but with old curent r245701 > > Which have none problem's with load > > zfs.ko`fletcher_4_native 2366 0.1% > kernel`uma_zfree_arg 2387 0.1% > zfs.ko`lzjb_decompress 2392 0.1% > kernel`__rw_rlock 2477 0.1% > zfs.ko`dmu_zfetch 2553 0.1% > kernel`bcopy 3035 0.1% > kernel`vm_page_splay 3089 0.1% > kernel`_mtx_trylock_flags_ 3346 0.2% > kernel`bzero 3411 0.2% > kernel`0xffffffff80 3665 0.2% > kernel`_sx_xunlock 3818 0.2% > kernel`uma_zalloc_arg 4216 0.2% > kernel`vmtotal 4702 0.2% > kernel`_sx_xlock 5117 0.2% > kernel`free 5476 0.2% > zfs.ko`lzjb_compress 6674 0.3% > kernel`spinlock_exit 21590 1.0% > kernel`__mtx_lock_sleep 40819 1.9% > kernel`acpi_cpu_c1 311077 14.1% > kernel`cpu_idle 1639418 74.6% > > > > Both servers have same hardware, same software of cause not system version. > > So which way is the right to investigate problem and find resolution? You need to determine which mutex(es) are being contested. There is a LOCK_PROFILING kernel option you can use to investigate this further. -- John Baldwin