Date: Thu, 24 Oct 2013 18:14:55 +0100 From: "Steven Hartland" <killing@multiplay.co.uk> To: "Vitalij Satanivskij" <satan@ukr.net> Cc: freebsd-hackers@freebsd.org Subject: Re: FreeBSD 10.0-BETA1 #8 r256765M spend too much time in locks Message-ID: <6B49A70ECCAB453F8A4EF540EFF1A3A0@multiplay.co.uk> References: <20131024074826.GA50853@hell.ukr.net> <20131024075023.GA52443@hell.ukr.net> <20131024115519.GA72359@hell.ukr.net> <20131024165218.GA82686@hell.ukr.net>
next in thread | previous in thread | raw e-mail | index | archive | help
The following proposed changes to illumos may well be of interest:- Webrev: http://ma.nexenta.com/borisp/arc_locks/ Webrev: http://cr.illumos.org/~webrev/skiselkov/new_buf_hash/ Issue: https://www.illumos.org/issues/4218 ----- Original Message ----- From: "Vitalij Satanivskij" <satan@ukr.net> To: "Vitalij Satanivskij" <satan@ukr.net> Cc: <freebsd-hackers@freebsd.org> Sent: Thursday, October 24, 2013 5:52 PM Subject: Re: FreeBSD 10.0-BETA1 #8 r256765M spend too much time in locks > > It's me again :) > > Some little more information > > Using simple script like - > #!/bin/sh > > time=`date "+%Y-%m-%d_%H:%M"` > sysctl debug.lock.prof.reset=1 > sysctl debug.lock.prof.enable=1 > sleep 1 > sysctl debug.lock.prof.enable=0 > sysctl debug.lock.prof.stats > locks_stats_$time.log > > and run it exactly when zfs flush data to pool > > eg. > > disk1 7,53T 8,78T 114 24 2,38M 1,36M > disk1 7,53T 8,78T 16 4,18K 307K 60,9M > disk1 7,53T 8,78T 146 7,18K 2,73M 48,2M > disk1 7,53T 8,78T 372 60 7,71M 2,99M > disk1 7,53T 8,78T 119 18 4,36M 940K > disk1 7,53T 8,78T 115 33 2,56M 1,53M > disk1 7,53T 8,78T 35 942 929K 11,1M > disk1 7,53T 8,78T 58 5,64K 971K 89,4M - here > disk1 7,53T 8,78T 271 2,63K 8,10M 10,9M > disk1 7,53T 8,78T 218 21 11,7M 1,16M > disk1 7,53T 8,78T 86 15 2,66M 786K > disk1 7,53T 8,78T 92 15 2,29M 841K > disk1 7,53T 8,78T 68 27 1,80M 5,98M > > > get more information > > first system part of load grows up to 60-70% even when overal load not big > > second using script > > #!/bin/sh > > if [ -z "$1" ] > then > echo "Usage: lock.sh file.txt" > exit 1 > fi > > count=`awk 'BEGIN {NR>2}; {sum += $4}; END{print sum}' $1` > awk -v col=$count 'BEGIN {NR>2}; {foo = ($4/col)* 100; if (foo >= 0.01) { printf "%f\t %s", foo, $10; for(i = 11;i <= NF; ++i) > printf "%s ", $i; printf "\n" }}' $1 | sort -nr > res_$1.txt > > found next top's > > 26,812861 /usr/src/sys/kern/subr_vmem.c:1128(sleep mutex:kmem arena) > 21,292701 /usr/src/sys/kern/subr_vmem.c:1230(sleep mutex:kmem arena) > 13,661495 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:3541(sx:l2arc_buflist_mtx) > 13,099382 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1597(sx:buf_hash_table.ht_locks[i].ht_lock) > 4,028993 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:2009(sx:arc_eviction_mtx) > 2,972297 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1969(sx:arc_mru->arcs_locks[i].arcs_lock) > 2,558227 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:862(sx:buf_hash_table.ht_locks[i].ht_lock) > 2,328091 /usr/src/sys/kern/vfs_subr.c:2101(lockmgr:zfs) > 2,294326 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:396(sx:vq->vq_lock) > 1,418128 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1256(sx:arc_mfu->arcs_locks[i].arcs_lock) > 1,407655 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1256(sx:arc_mru->arcs_locks[i].arcs_lock) > 1,176428 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_object.c:41(sx:os->os_obj_lock) > 0,936418 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1969(sx:arc_mfu->arcs_locks[i].arcs_lock) > 0,848378 /usr/src/sys/vm/vm_kern.c:400(rw:kmem vm object) > 0,845051 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1381(sx:zfsvfs->z_hold_mtx[i]) > 0,808122 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:129(sx:db->db_mtx) > 0,680536 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1136(sx:zfsvfs->z_hold_mtx[i]) > 0,616060 /usr/src/sys/vm/vm_kern.c:344(rw:kmem vm object) > 0,609296 /usr/src/sys/kern/vfs_lookup.c:707(lockmgr:zfs) > > > Another run > > 36,917638 /usr/src/sys/kern/subr_vmem.c:1230(sleep mutex:kmem arena) > 32,872199 /usr/src/sys/kern/subr_vmem.c:1128(sleep mutex:kmem arena) > 5,242466 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:3541(sx:l2arc_buflist_mtx) > 4,885420 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:891(sx:buf_hash_table.ht_locks[i].ht_lock) > 4,626170 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1597(sx:buf_hash_table.ht_locks[i].ht_lock) > 2,125377 /usr/src/sys/kern/vfs_lookup.c:707(lockmgr:zfs) > 1,831535 /usr/src/sys/vm/vm_kern.c:400(rw:kmem vm object) > 1,202869 /usr/src/sys/vm/vm_kern.c:344(rw:kmem vm object) > 1,187653 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:2009(sx:arc_eviction_mtx) > 1,183960 /usr/src/sys/vm/uma_core.c:2097(sleep mutex:zio_cache) > 1,015616 /usr/src/sys/vm/uma_core.c:2605(sleep mutex:zio_cache) > 0,679882 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1136(sx:zfsvfs->z_hold_mtx[i]) > 0,577229 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:129(sx:db->db_mtx) > 0,555283 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1256(sx:arc_mfu->arcs_locks[i].arcs_lock) > 0,545886 /usr/src/sys/vm/uma_core.c:2259(sleep mutex:zio_cache) > 0,510988 /usr/src/sys/vm/uma_core.c:2605(sleep mutex:zio_link_cache) > 0,503242 /usr/src/sys/vm/uma_core.c:2710(sleep mutex:zio_cache) > > > Buzy on drives not grows over 20% > > Maybe somebody have any sugestion about what can be going on or at last what to check/tune ? > > > Vitalij Satanivskij wrote: > VS> Forget about profiling log > VS> > VS> > VS> Vitalij Satanivskij wrote: > VS> VS> > VS> VS> > VS> VS> EEE forget to notice - old system have none compression on main zfs > VS> VS> > VS> VS> > VS> VS> > VS> VS> Vitalij Satanivskij wrote: > VS> VS> VS> Hello. > VS> VS> VS> > VS> VS> VS> After upgrading system from old current (r245701) to fresh current r255173 (than switch to stable/10 r256765M) > VS> VS> VS> found some strange system behavior: > VS> VS> VS> > VS> VS> VS> Diff from r256765M anf r256765 is > VS> VS> VS> Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c > VS> VS> VS> =================================================================== > VS> VS> VS> --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c (revision 256765) > VS> VS> VS> +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c (working copy) > VS> VS> VS> @@ -5147,7 +5147,7 @@ > VS> VS> VS> len = l2hdr->b_asize; > VS> VS> VS> cdata = zio_data_buf_alloc(len); > VS> VS> VS> csize = zio_compress_data(ZIO_COMPRESS_LZ4, l2hdr->b_tmp_cdata, > VS> VS> VS> - cdata, l2hdr->b_asize, (size_t)SPA_MINBLOCKSIZE); > VS> VS> VS> + cdata, l2hdr->b_asize, (size_t)(1ULL << l2hdr->b_dev->l2ad_vdev->vdev_ashift)); > VS> VS> VS> > VS> VS> VS> if (csize == 0) { > VS> VS> VS> /* zero block, indicate that there's nothing to write */ > VS> VS> VS> > VS> VS> VS> > VS> VS> VS> But same situation was befor patch. > VS> VS> VS> > VS> VS> VS> > VS> VS> VS> System load to high > VS> VS> VS> CPU: 6.8% user, 0.0% nice, 57.3% system, 0.8% interrupt, 35.1% idle > VS> VS> VS> > VS> VS> VS> hotkernel (dtrace script) says > VS> VS> VS> > VS> VS> VS> kernel`__mtx_unlock_flags 292 0.3% > VS> VS> VS> kernel`__mtx_lock_flags 315 0.3% > VS> VS> VS> zfs.ko`lzjb_compress 349 0.3% > VS> VS> VS> kernel`__rw_wlock_hard 686 0.7% > VS> VS> VS> kernel`spinlock_exit 1050 1.0% > VS> VS> VS> kernel`vmem_xalloc 7516 7.3% > VS> VS> VS> kernel`_sx_xlock_hard 8664 8.5% > VS> VS> VS> kernel`acpi_cpu_c1 9737 9.5% > VS> VS> VS> kernel`cpu_idle 20189 19.7% > VS> VS> VS> kernel`__mtx_lock_sleep 45952 44.9% > VS> VS> VS> > VS> VS> VS> > VS> VS> VS> > VS> VS> VS> Trying to understand where is a problem I'm build kernel with lock profiling. > VS> VS> VS> > VS> VS> VS> and get some data (for one minute ) > VS> VS> VS> > VS> VS> VS> (file attached) > VS> VS> VS> > VS> VS> VS> using agregation find most lock's is in > VS> VS> VS> > VS> VS> VS> 14,159818 /usr/src/sys/kern/subr_vmem.c:1128(sleep mutex:kmem arena) > VS> VS> VS> 9,553523 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1597(sx:buf_hash_table.ht_locks[i].ht_lock) > VS> VS> VS> 8,386943 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:3541(sx:l2arc_buflist_mtx) > VS> VS> VS> 8,110206 /usr/src/sys/kern/subr_vmem.c:1230(sleep mutex:kmem arena) > VS> VS> VS> 5,909429 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1969(sx:arc_mru->arcs_locks[i].arcs_lock) > VS> VS> VS> 5,452206 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1969(sx:arc_mfu->arcs_locks[i].arcs_lock) > VS> VS> VS> 5,050224 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:303(sx:tx->tx_cpu[c].tc_open_lock) > VS> VS> VS> 4,232819 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:891(sx:buf_hash_table.ht_locks[i].ht_lock) > VS> VS> VS> 4,211348 /usr/src/sys/kern/vfs_subr.c:2101(lockmgr:zfs) > VS> VS> VS> 4,011656 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:862(sx:buf_hash_table.ht_locks[i].ht_lock) > VS> VS> VS> 3,823698 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:2009(sx:arc_eviction_mtx) > VS> VS> VS> 2,697344 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:126(sx:h->hash_mutexes[i]) > VS> VS> VS> 2,343242 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1256(sx:arc_mfu->arcs_locks[i].arcs_lock) > VS> VS> VS> 1,752713 /usr/src/sys/kern/vfs_lookup.c:707(lockmgr:zfs) > VS> VS> VS> 1,580856 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1136(sx:zfsvfs->z_hold_mtx[i]) > VS> VS> VS> 1,534242 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1291(sx:arc_mfu->arcs_locks[i].arcs_lock) > VS> VS> VS> 1,331583 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c:129(sx:db->db_mtx) > VS> VS> VS> 1,105058 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:427(sx:vq->vq_lock) > VS> VS> VS> 1,080855 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c:396(sx:vq->vq_lock) > VS> VS> VS> 0,858568 /usr/src/sys/kern/vfs_cache.c:488(rw:Name Cache) > VS> VS> VS> 0,831652 /usr/src/sys/vm/vm_kern.c:344(rw:kmem vm object) > VS> VS> VS> 0,815439 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1845(sx:buf_hash_table.ht_locks[i].ht_lock) > VS> VS> VS> 0,812613 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1256(sx:arc_mru->arcs_locks[i].arcs_lock) > VS> VS> VS> 0,794274 > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1529(lockmgr:zfs) > VS> VS> VS> 0,669845 /usr/src/sys/vm/uma_core.c:2097(sleep mutex:zio_cache) > VS> VS> VS> > VS> VS> VS> > VS> VS> VS> > VS> VS> VS> > VS> VS> VS> Short system description > VS> VS> VS> CPU E5-1650 > VS> VS> VS> MEM 128Gb ddr3-1600 > VS> VS> VS> > VS> VS> VS> Storage subsystem > VS> VS> VS> > VS> VS> VS> 36x1Tb WD RE4 drives on LSI SAS2308 Controler > VS> VS> VS> 3x180Gb Intel ssd 530 series as l2 cache > VS> VS> VS> > VS> VS> VS> > VS> VS> VS> POOL is 18 mirrors, two drives in ich mirror and 3 cache devices > VS> VS> VS> > VS> VS> VS> eg. > VS> VS> VS> .... > VS> VS> VS> mirror-14 ONLINE 0 0 0 > VS> VS> VS> gpt/disk28 ONLINE 0 0 0 > VS> VS> VS> gpt/disk29 ONLINE 0 0 0 > VS> VS> VS> mirror-15 ONLINE 0 0 0 > VS> VS> VS> gpt/disk30 ONLINE 0 0 0 > VS> VS> VS> gpt/disk31 ONLINE 0 0 0 > VS> VS> VS> mirror-16 ONLINE 0 0 0 > VS> VS> VS> gpt/disk32 ONLINE 0 0 0 > VS> VS> VS> gpt/disk33 ONLINE 0 0 0 > VS> VS> VS> mirror-17 ONLINE 0 0 0 > VS> VS> VS> gpt/disk34 ONLINE 0 0 0 > VS> VS> VS> gpt/disk35 ONLINE 0 0 0 > VS> VS> VS> cache > VS> VS> VS> ada1 ONLINE 0 0 0 > VS> VS> VS> ada2 ONLINE 0 0 0 > VS> VS> VS> ada3 ONLINE 0 0 0 > VS> VS> VS> > VS> VS> VS> > VS> VS> VS> POOL have two ZFS > VS> VS> VS> > VS> VS> VS> main with options (diffs from default) - > VS> VS> VS> compression lz4 > VS> VS> VS> secondarycache all > VS> VS> VS> sync disabled > VS> VS> VS> > VS> VS> VS> Data size for it around 6Tb (compresed) eg > VS> VS> VS> disk1 refcompressratio 1.56x - > VS> VS> VS> disk1 written 5,99T - > VS> VS> VS> disk1 logicalused 10,8T - > VS> VS> VS> disk1 logicalreferenced 9,32T - > VS> VS> VS> > VS> VS> VS> > VS> VS> VS> and another with options > VS> VS> VS> recordsize 4K, before it was 32k, but internal software use data blocks mostly 4k so we try to change > (without real data realocate ) > VS> VS> VS> compresion -s off > VS> VS> VS> sync disabled > VS> VS> VS> secondarycache all > VS> VS> VS> > VS> VS> VS> DATA size on it around 1.4Tb > VS> VS> VS> > VS> VS> VS> ARC configured to use max 80Gb > VS> VS> VS> > VS> VS> VS> top most time looks like this > VS> VS> VS> > VS> VS> VS> Mem: 14G Active, 13G Inact, 94G Wired, 497M Cache, 3297M Buf, 2214M Free > VS> VS> VS> ARC: 80G Total, 2010M MFU, 70G MRU, 49M Anon, 3822M Header, 4288M Other > VS> VS> VS> > VS> VS> VS> > VS> VS> VS> LA's - around 10-20 depend on day time. > VS> VS> VS> > VS> VS> VS> > VS> VS> VS> zpool iostat disk1 1 > VS> VS> VS> capacity operations bandwidth > VS> VS> VS> pool alloc free read write read write > VS> VS> VS> ---------- ----- ----- ----- ----- ----- ----- > VS> VS> VS> disk1 7,45T 8,86T 546 1,49K 16,4M 13,4M > VS> VS> VS> disk1 7,45T 8,86T 272 3,91K 11,7M 27,4M > VS> VS> VS> disk1 7,45T 8,86T 344 2,94K 7,26M 25,2M > VS> VS> VS> disk1 7,45T 8,86T 224 2,02K 9,91M 21,8M > VS> VS> VS> disk1 7,45T 8,86T 244 2,35K 8,23M 18,4M > VS> VS> VS> disk1 7,45T 8,86T 245 2,54K 6,44M 23,4M > VS> VS> VS> disk1 7,45T 8,86T 114 2,94K 3,28M 13,3M > VS> VS> VS> disk1 7,45T 8,86T 288 4,43K 6,09M 33,5M > VS> VS> VS> disk1 7,45T 8,86T 157 1,26K 2,98M 15,7M > VS> VS> VS> disk1 7,45T 8,86T 94 842 3,07M 13,6M > VS> VS> VS> disk1 7,45T 8,86T 327 1,71K 9,63M 8,21M > VS> VS> VS> disk1 7,45T 8,86T 237 1,81K 5,73M 29,3M > VS> VS> VS> disk1 7,45T 8,86T 247 3,47K 5,17M 29,6M > VS> VS> VS> disk1 7,45T 8,86T 165 2,37K 3,22M 16,7M > VS> VS> VS> disk1 7,45T 8,86T 155 3,23K 3,27M 23,9M > VS> VS> VS> > VS> VS> VS> Strange as timeout seted up to 10sec's. > VS> VS> VS> > VS> VS> VS> What intresting - after reboot system work fine for some time, at last while ARC not become 80G size. > VS> VS> VS> Low limit for arc is 40gb, strange but old system can take memory from arc eg like this > VS> VS> VS> > VS> VS> VS> > VS> VS> VS> Mem: 32G Active, 8797M Inact, 78G Wired, 2370M Cache, 209M Buf, 3977M Free > VS> VS> VS> ARC: 43G Total, 2204M MFU, 28G MRU, 135M Anon, 7898M Header, 5301M Other > VS> VS> VS> > VS> VS> VS> On new ARC getting to it max allowed size. > VS> VS> VS> > VS> VS> VS> So for now question is, what it can be, what we can try to improve system perfomance and so on? > VS> VS> VS> > VS> VS> VS> > VS> VS> VS> > VS> VS> VS> _______________________________________________ > VS> VS> VS> freebsd-hackers@freebsd.org mailing list > VS> VS> VS> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > VS> VS> VS> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > VS> VS> _______________________________________________ > VS> VS> freebsd-hackers@freebsd.org mailing list > VS> VS> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > VS> VS> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > > > VS> _______________________________________________ > VS> freebsd-hackers@freebsd.org mailing list > VS> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > VS> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6B49A70ECCAB453F8A4EF540EFF1A3A0>