From owner-freebsd-current@FreeBSD.ORG  Wed Oct 23 14:30:53 2013
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 350DBF32
 for <freebsd-current@freebsd.org>; Wed, 23 Oct 2013 14:30:53 +0000 (UTC)
 (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 09B5C2399
 for <freebsd-current@freebsd.org>; Wed, 23 Oct 2013 14:30:53 +0000 (UTC)
Received: from jhbbsd.localnet (unknown [209.249.190.124])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 134FAB981;
 Wed, 23 Oct 2013 10:30:52 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-current@freebsd.org
Subject: Re: How to debug whats cause to much __mtx_lock_sleep in system
Date: Wed, 23 Oct 2013 10:25:01 -0400
User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20130906; KDE/4.5.5; amd64; ; )
References: <20131021125949.GB13109@hell.ukr.net>
In-Reply-To: <20131021125949.GB13109@hell.ukr.net>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201310231025.01899.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Wed, 23 Oct 2013 10:30:52 -0400 (EDT)
Cc: Vitalij Satanivskij <satan@ukr.net>
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Oct 2013 14:30:53 -0000

On Monday, October 21, 2013 8:59:49 am Vitalij Satanivskij wrote:
> Hello.
> 
> Have 10.0-BETA1 #7 r256765  whith terible load's "load averages: 23.31, 30.53, 31"
> 
> wich degraded more and more with time. 
> 
> Kernel compilied with dtrace support and using script called  hotkernel from DTraceToolkit-0.99 found some stange statistics
> 
> zfs.ko`lz4_compress                                      5045   0.2%
> kernel`0xffffffff80                                      5185   0.2%
> kernel`uma_zalloc_arg                                    5302   0.2%
> kernel`bcopy                                             5322   0.2%
> kernel`_sx_xlock                                         7310   0.3%
> kernel`_sx_xunlock                                       7434   0.3%
> zfs.ko`l2arc_feed_thread                                 9797   0.4%
> zfs.ko`lzjb_compress                                     9912   0.4%
> zfs.ko`list_prev                                        17894   0.7%
> kernel`__rw_wlock_hard                                  30522   1.2%
> kernel`spinlock_exit                                    31310   1.3%
> kernel`acpi_cpu_c1                                     103495   4.1%
> kernel`_sx_xlock_hard                                  138743   5.5%
> kernel`vmem_xalloc                                     175869   7.0%
> kernel`cpu_idle                                        371159  14.8%
> kernel`__mtx_lock_sleep                               1345815  53.8%
> 
> 
> 
> Theris another same machine with simple data and usage but with old curent r245701 
> 
> Which have none problem's with load 
> 
> zfs.ko`fletcher_4_native                                 2366   0.1%
> kernel`uma_zfree_arg                                     2387   0.1%
> zfs.ko`lzjb_decompress                                   2392   0.1%
> kernel`__rw_rlock                                        2477   0.1%
> zfs.ko`dmu_zfetch                                        2553   0.1%
> kernel`bcopy                                             3035   0.1%
> kernel`vm_page_splay                                     3089   0.1%
> kernel`_mtx_trylock_flags_                               3346   0.2%
> kernel`bzero                                             3411   0.2%
> kernel`0xffffffff80                                      3665   0.2%
> kernel`_sx_xunlock                                       3818   0.2%
> kernel`uma_zalloc_arg                                    4216   0.2%
> kernel`vmtotal                                           4702   0.2%
> kernel`_sx_xlock                                         5117   0.2%
> kernel`free                                              5476   0.2%
> zfs.ko`lzjb_compress                                     6674   0.3%
> kernel`spinlock_exit                                    21590   1.0%
> kernel`__mtx_lock_sleep                                 40819   1.9%
> kernel`acpi_cpu_c1                                     311077  14.1%
> kernel`cpu_idle                                       1639418  74.6%
> 
> 
> 
> Both servers have same hardware, same software of cause not system version.
> 
> So which way is the right to investigate problem and find resolution?

You need to determine which mutex(es) are being contested.  There is a
LOCK_PROFILING kernel option you can use to investigate this further.

-- 
John Baldwin