From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 24 07:50:25 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id E6D55309 for ; Thu, 24 Oct 2013 07:50:25 +0000 (UTC) (envelope-from satan@ukr.net) Received: from hell.ukr.net (hell.ukr.net [212.42.67.68]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A1D2921FA for ; Thu, 24 Oct 2013 07:50:25 +0000 (UTC) Received: from satan by hell.ukr.net with local ID 1VZFgV-000Df7-Sg ; Thu, 24 Oct 2013 10:50:23 +0300 Date: Thu, 24 Oct 2013 10:50:23 +0300 From: Vitalij Satanivskij To: Vitalij Satanivskij Subject: Re: FreeBSD 10.0-BETA1 #8 r256765M spend too much time in locks Message-ID: <20131024075023.GA52443@hell.ukr.net> References: <20131024074826.GA50853@hell.ukr.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131024074826.GA50853@hell.ukr.net> User-Agent: Mutt/1.5.22 (2013-10-16) Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Oct 2013 07:50:26 -0000 EEE forget to notice - old system have none compression on main zfs Vitalij Satanivskij wrote: VS> Hello. VS> VS> After upgrading system from old current (r245701) to fresh current r255173 (than switch to stable/10 r256765M) VS> found some strange system behavior: VS> VS> Diff from r256765M anf r256765 is VS> Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c VS> =================================================================== VS> --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c (revision 256765) VS> +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c (working copy) VS> @@ -5147,7 +5147,7 @@ VS> len = l2hdr->b_asize; VS> cdata = zio_data_buf_alloc(len); VS> csize = zio_compress_data(ZIO_COMPRESS_LZ4, l2hdr->b_tmp_cdata, VS> - cdata, l2hdr->b_asize, (size_t)SPA_MINBLOCKSIZE); VS> + cdata, l2hdr->b_asize, (size_t)(1ULL << l2hdr->b_dev->l2ad_vdev->vdev_ashift)); VS> VS> if (csize == 0) { VS> /* zero block, indicate that there's nothing to write */ VS> VS> VS> But same situation was befor patch. VS> VS> VS> System load to high VS> CPU: 6.8% user, 0.0% nice, 57.3% system, 0.8% interrupt, 35.1% idle VS> VS> hotkernel (dtrace script) says VS> VS> kernel`__mtx_unlock_flags 292 0.3% VS> kernel`__mtx_lock_flags 315 0.3% VS> zfs.ko`lzjb_compress 349 0.3% VS> kernel`__rw_wlock_hard 686 0.7% VS> kernel`spinlock_exit 1050 1.0% VS> kernel`vmem_xalloc 7516 7.3% VS> kernel`_sx_xlock_hard 8664 8.5% VS> kernel`acpi_cpu_c1 9737 9.5% VS> kernel`cpu_idle 20189 19.7% VS> kernel`__mtx_lock_sleep 45952 44.9% VS> VS> VS> VS> Trying to understand where is a problem I'm build kernel with lock profiling. VS> VS> and get some data (for one minute ) VS> VS> (file attached) VS> VS> using agregation find most lock's is in VS> VS> 14,159818 /usr/src/sys/kern/subr_vmem.c:1128(sleep mutex:kmem arena) VS> 9,553523 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1597(sx:buf_hash_table.ht_locks[i].ht_lock) VS> 8,386943 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:3541(sx:l2arc_buflist_mtx) VS> 8,110206 /usr/src/sys/kern/subr_vmem.c:1230(sleep mutex:kmem arena) VS> 5,909429 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1969(sx:arc_mru->arcs_locks[i].arcs_lock) VS> 5,452206 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1969(sx:arc_mfu->arcs_locks[i].arcs_lock) VS> 5,050224 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:303(sx:tx->tx_cpu[c].tc_open_lock) VS> 4,232819 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:891(sx:buf_hash_table.ht_locks[i].ht_lock) VS> 4,211348 /usr/src/sys/kern/vfs_subr.c:2101(lockmgr:zfs) VS> 4,011656 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:862(sx:buf_hash_table.ht_locks[i].ht_lock) VS> 3,823698 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:2009(sx:arc_eviction_mtx) VS> 2,697344 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:126(sx:h->hash_mutexes[i]) VS> 2,343242 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1256(sx:arc_mfu->arcs_locks[i].arcs_lock) VS> 1,752713 /usr/src/sys/kern/vfs_lookup.c:707(lockmgr:zfs) VS> 1,580856 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1136(sx:zfsvfs->z_hold_mtx[i]) VS> 1,534242 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1291(sx:arc_mfu->arcs_locks[i].arcs_lock) VS> 1,331583 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:129(sx:db->db_mtx) VS> 1,105058 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:427(sx:vq->vq_lock) VS> 1,080855 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:396(sx:vq->vq_lock) VS> 0,858568 /usr/src/sys/kern/vfs_cache.c:488(rw:Name Cache) VS> 0,831652 /usr/src/sys/vm/vm_kern.c:344(rw:kmem vm object) VS> 0,815439 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1845(sx:buf_hash_table.ht_locks[i].ht_lock) VS> 0,812613 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1256(sx:arc_mru->arcs_locks[i].arcs_lock) VS> 0,794274 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1529(lockmgr:zfs) VS> 0,669845 /usr/src/sys/vm/uma_core.c:2097(sleep mutex:zio_cache) VS> VS> VS> VS> VS> Short system description VS> CPU E5-1650 VS> MEM 128Gb ddr3-1600 VS> VS> Storage subsystem VS> VS> 36x1Tb WD RE4 drives on LSI SAS2308 Controler VS> 3x180Gb Intel ssd 530 series as l2 cache VS> VS> VS> POOL is 18 mirrors, two drives in ich mirror and 3 cache devices VS> VS> eg. VS> .... VS> mirror-14 ONLINE 0 0 0 VS> gpt/disk28 ONLINE 0 0 0 VS> gpt/disk29 ONLINE 0 0 0 VS> mirror-15 ONLINE 0 0 0 VS> gpt/disk30 ONLINE 0 0 0 VS> gpt/disk31 ONLINE 0 0 0 VS> mirror-16 ONLINE 0 0 0 VS> gpt/disk32 ONLINE 0 0 0 VS> gpt/disk33 ONLINE 0 0 0 VS> mirror-17 ONLINE 0 0 0 VS> gpt/disk34 ONLINE 0 0 0 VS> gpt/disk35 ONLINE 0 0 0 VS> cache VS> ada1 ONLINE 0 0 0 VS> ada2 ONLINE 0 0 0 VS> ada3 ONLINE 0 0 0 VS> VS> VS> POOL have two ZFS VS> VS> main with options (diffs from default) - VS> compression lz4 VS> secondarycache all VS> sync disabled VS> VS> Data size for it around 6Tb (compresed) eg VS> disk1 refcompressratio 1.56x - VS> disk1 written 5,99T - VS> disk1 logicalused 10,8T - VS> disk1 logicalreferenced 9,32T - VS> VS> VS> and another with options VS> recordsize 4K, before it was 32k, but internal software use data blocks mostly 4k so we try to change (without real data realocate ) VS> compresion -s off VS> sync disabled VS> secondarycache all VS> VS> DATA size on it around 1.4Tb VS> VS> ARC configured to use max 80Gb VS> VS> top most time looks like this VS> VS> Mem: 14G Active, 13G Inact, 94G Wired, 497M Cache, 3297M Buf, 2214M Free VS> ARC: 80G Total, 2010M MFU, 70G MRU, 49M Anon, 3822M Header, 4288M Other VS> VS> VS> LA's - around 10-20 depend on day time. VS> VS> VS> zpool iostat disk1 1 VS> capacity operations bandwidth VS> pool alloc free read write read write VS> ---------- ----- ----- ----- ----- ----- ----- VS> disk1 7,45T 8,86T 546 1,49K 16,4M 13,4M VS> disk1 7,45T 8,86T 272 3,91K 11,7M 27,4M VS> disk1 7,45T 8,86T 344 2,94K 7,26M 25,2M VS> disk1 7,45T 8,86T 224 2,02K 9,91M 21,8M VS> disk1 7,45T 8,86T 244 2,35K 8,23M 18,4M VS> disk1 7,45T 8,86T 245 2,54K 6,44M 23,4M VS> disk1 7,45T 8,86T 114 2,94K 3,28M 13,3M VS> disk1 7,45T 8,86T 288 4,43K 6,09M 33,5M VS> disk1 7,45T 8,86T 157 1,26K 2,98M 15,7M VS> disk1 7,45T 8,86T 94 842 3,07M 13,6M VS> disk1 7,45T 8,86T 327 1,71K 9,63M 8,21M VS> disk1 7,45T 8,86T 237 1,81K 5,73M 29,3M VS> disk1 7,45T 8,86T 247 3,47K 5,17M 29,6M VS> disk1 7,45T 8,86T 165 2,37K 3,22M 16,7M VS> disk1 7,45T 8,86T 155 3,23K 3,27M 23,9M VS> VS> Strange as timeout seted up to 10sec's. VS> VS> What intresting - after reboot system work fine for some time, at last while ARC not become 80G size. VS> Low limit for arc is 40gb, strange but old system can take memory from arc eg like this VS> VS> VS> Mem: 32G Active, 8797M Inact, 78G Wired, 2370M Cache, 209M Buf, 3977M Free VS> ARC: 43G Total, 2204M MFU, 28G MRU, 135M Anon, 7898M Header, 5301M Other VS> VS> On new ARC getting to it max allowed size. VS> VS> So for now question is, what it can be, what we can try to improve system perfomance and so on? VS> VS> VS> VS> _______________________________________________ VS> freebsd-hackers@freebsd.org mailing list VS> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers VS> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"