Date: Tue, 4 Feb 2014 12:08:23 +0200 From: Vitalij Satanivskij <satan@ukr.net> To: Vitalij Satanivskij <satan@ukr.net> Cc: Vladimir Sharun <atz@ukr.net>, Current FreeBSD <freebsd-current@freebsd.org> Subject: Re: ARC "pressured out", how to control/stabilize ? (reformatted to text/plain) Message-ID: <20140204100823.GA95709@hell.ukr.net> In-Reply-To: <20140131182637.GA82526@hell.ukr.net> References: <1389005433.815055146.2dcjke36@frv45.ukr.net> <52CA9963.1050507@FreeBSD.org> <1389676958.516993176.oq4lbgg7@frv45.ukr.net> <52D59E36.9040405@FreeBSD.org> <20140115102837.GA98983@hell.ukr.net> <52D66DB6.7030807@FreeBSD.org> <1390900795.258244476.v35k1338@frv45.ukr.net> <52EA3459.3070300@FreeBSD.org> <1391083826.948700370.cmzf8475@frv45.ukr.net> <20140131182637.GA82526@hell.ukr.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Dear Andriy and FreeBSD community, With patch system panic on boot. After remove cache device from pool system boot without problem. After this cache added again and sone kernel panic happened Screen shot of panic here http://i61.tinypic.com/30sbx2g.jpg Vitalij Satanivskij wrote: VS> Dear Andriy and FreeBSD community, VS> VS> Build world with path failed with error VS> VS> /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4642:13: error: use of VS> undeclared identifier 'l2hdr' VS> ASSERT3P(l2hdr->b_tmp_cdata, ==, NULL); VS> ^ VS> /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/sys/debug.h:125:40: note: expanded from VS> macro 'ASSERT3P' VS> #define ASSERT3P(x, y, z) VERIFY3_IMPL(x, y, z, uintptr_t) VS> ^ VS> /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/sys/debug.h:109:29: note: expanded from VS> macro 'VERIFY3_IMPL' VS> const TYPE __left = (TYPE)(LEFT); \ VS> ^ VS> 1 error generated. VS> *** Error code 1 VS> VS> VS> VS> Vladimir Sharun wrote: VS> VS> Dear Andriy and FreeBSD community, VS> VS> VS> VS> L2ARC temporarily turned off by setting secondarycache=none everywhere it was enabled, VS> VS> so no more leak for one particular day. VS> VS> VS> VS> Here's the top header: VS> VS> last pid: 89916; load averages: 2.49, 2.91, 2.89 up 5+19:21:42 14:09:12 VS> VS> 561 processes: 2 running, 559 sleeping VS> VS> CPU: 5.7% user, 0.0% nice, 14.0% system, 1.0% interrupt, 79.3% idle VS> VS> Mem: 23G Active, 1017M Inact, 98G Wired, 1294M Cache, 3285M Buf, 1997M Free VS> VS> ARC: 69G Total, 3498M MFU, 59G MRU, 53M Anon, 1651M Header, 4696M Other VS> VS> Swap: VS> VS> VS> VS> Here's the calculated vmstat -z (mean all of the allocations, which exceeds 100*1024^2 printed): VS> VS> UMA Slabs: 199,915M VS> VS> VM OBJECT: 207,354M VS> VS> 32: 205,558M VS> VS> 64: 901,122M VS> VS> 128: 215,211M VS> VS> 256: 242,262M VS> VS> 4096: 2316,01M VS> VS> range_seg_cache: 205,396M VS> VS> zio_buf_512: 1103,31M VS> VS> zio_buf_16384: 15697,9M VS> VS> zio_data_buf_16384: 348,297M VS> VS> zio_data_buf_24576: 129,352M VS> VS> zio_data_buf_32768: 104,375M VS> VS> zio_data_buf_36864: 163,371M VS> VS> zio_data_buf_53248: 100,496M VS> VS> zio_data_buf_57344: 105,93M VS> VS> zio_data_buf_65536: 101,75M VS> VS> zio_data_buf_73728: 111,938M VS> VS> zio_data_buf_90112: 104,414M VS> VS> zio_data_buf_106496: 100,242M VS> VS> zio_data_buf_131072: 61652,5M VS> VS> dnode_t: 3203,98M VS> VS> dmu_buf_impl_t: 797,695M VS> VS> arc_buf_hdr_t: 1498,76M VS> VS> arc_buf_t: 105,802M VS> VS> zfs_znode_cache: 352,61M VS> VS> VS> VS> zio_data_buf_131072 (61652M) + zio_buf_16384 (15698M) = 77350M VS> VS> easily exceeds ARC total (70G) VS> VS> VS> VS> VS> VS> Here's the same calculations from exact the same system where L2 was disabled before reboot: VS> VS> last pid: 63407; load averages: 2.35, 2.71, 2.73 up 8+19:42:54 14:17:33 VS> VS> 527 processes: 1 running, 526 sleeping VS> VS> CPU: 4.8% user, 0.0% nice, 6.6% system, 1.1% interrupt, 87.4% idle VS> VS> Mem: 21G Active, 1460M Inact, 99G Wired, 1748M Cache, 3308M Buf, 952M Free VS> VS> ARC: 87G Total, 4046M MFU, 76G MRU, 37M Anon, 2026M Header, 4991M Other VS> VS> Swap: VS> VS> VS> VS> and the vmstat -z filtered: VS> VS> UMA Slabs: 208,004M VS> VS> VM OBJECT: 207,392M VS> VS> 32: 172,831M VS> VS> 64: 752,226M VS> VS> 128: 210,024M VS> VS> 256: 244,204M VS> VS> 4096: 2249,02M VS> VS> range_seg_cache: 245,711M VS> VS> zio_buf_512: 1145,25M VS> VS> zio_buf_16384: 15170,1M VS> VS> zio_data_buf_16384: 422,766M VS> VS> zio_data_buf_20480: 120,742M VS> VS> zio_data_buf_24576: 148,641M VS> VS> zio_data_buf_28672: 112,848M VS> VS> zio_data_buf_32768: 117,375M VS> VS> zio_data_buf_36864: 185,379M VS> VS> zio_data_buf_45056: 103,168M VS> VS> zio_data_buf_53248: 105,32M VS> VS> zio_data_buf_57344: 122,828M VS> VS> zio_data_buf_65536: 109,25M VS> VS> zio_data_buf_69632: 100,406M VS> VS> zio_data_buf_73728: 126,844M VS> VS> zio_data_buf_77824: 101,086M VS> VS> zio_data_buf_81920: 100,391M VS> VS> zio_data_buf_86016: 101,391M VS> VS> zio_data_buf_90112: 112,836M VS> VS> zio_data_buf_98304: 100,688M VS> VS> zio_data_buf_102400: 106,543M VS> VS> zio_data_buf_106496: 108,875M VS> VS> zio_data_buf_131072: 63190,5M VS> VS> dnode_t: 3437,36M VS> VS> dmu_buf_impl_t: 840,62M VS> VS> arc_buf_hdr_t: 1870,88M VS> VS> arc_buf_t: 114,942M VS> VS> zfs_znode_cache: 353,055M VS> VS> VS> VS> Everything seems within ARC total range. VS> VS> VS> VS> We will try patch attached within few days and will come back with the result. VS> VS> VS> VS> Thank you for your help. VS> VS> VS> VS> > on 28/01/2014 11:28 Vladimir Sharun said the following: VS> VS> > > Dear Andriy and FreeBSD community, VS> VS> > > VS> VS> > > After applying this path one of the systems runs fine (disk subsystem load low to moderate VS> VS> > > - 10-20% busy sustained), VS> VS> > > VS> VS> > > Then I saw this patch was merged to the HEAD and we apply it to the one of the systems VS> VS> > > with moderate to high disk load: 30-60% busy (11.0-CURRENT #7 r261118: Fri Jan 24 17:25:08 EET 2014) VS> VS> > > VS> VS> > > Within 4 days we experiencing the same leak(?) as without patch: VS> VS> > > VS> VS> > > last pid: 53841; load averages: 4.47, 4.18, 3.78 up 3+16:37:09 11:24:39 VS> VS> > > 543 processes: 6 running, 537 sleeping VS> VS> > > CPU: 8.7% user, 0.0% nice, 14.6% system, 1.4% interrupt, 75.3% idle VS> VS> > > Mem: 22G Active, 1045M Inact, 98G Wired, 1288M Cache, 3284M Buf, 2246M Free VS> VS> > > ARC: 73G Total, 3763M MFU, 62G MRU, 56M Anon, 1887M Header, 4969M Other VS> VS> > > Swap: VS> VS> > > VS> VS> > > The ARC is populated within 30mins under load to the max (90Gb) then start decreasing. VS> VS> > > VS> VS> > > The delta between Wiread and ARC total start growing from typical 10-12Gb without L2 enabled VS> VS> > > to the 25Gb with L2 enabled and counting (4 hours ago was 22Gb delta). VS> VS> > VS> VS> > First, have you checked that vmstat -z output contains the same anomaly as for VS> VS> > in your original report? VS> VS> > VS> VS> > If yes, the please try to reproduce the problem with the following debugging patch: VS> VS> > http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.patch VS> VS> > Please make sure to compile your kernel (and modules) with INVARIANTS. VS> VS> > VS> VS> > -- VS> VS> > Andriy Gapon VS> VS> > _______________________________________________ VS> VS> > freebsd-current@freebsd.org mailing list VS> VS> > http://lists.freebsd.org/mailman/listinfo/freebsd-current VS> VS> > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" VS> VS> _______________________________________________ VS> VS> freebsd-current@freebsd.org mailing list VS> VS> http://lists.freebsd.org/mailman/listinfo/freebsd-current VS> VS> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" VS> _______________________________________________ VS> freebsd-current@freebsd.org mailing list VS> http://lists.freebsd.org/mailman/listinfo/freebsd-current VS> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140204100823.GA95709>