Date: Tue, 2 Aug 2011 11:01:22 -0700 From: David P Discher <dpd@bitgravity.com> To: Martin Matuska <mm@FreeBSD.org> Cc: freebsd-fs@FreeBSD.org, Andriy Gapon <avg@freebsd.org> Subject: Re: zfs process hang on pool access Message-ID: <DC3F7EEF-0FD1-4110-85A7-0D24E4F78091@bitgravity.com> In-Reply-To: <4E36A25F.7000000@FreeBSD.org> References: <A14F1C768A41483C876AD77502A864D6@multiplay.co.uk> <0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk> <4E3013DF.10803@FreeBSD.org> <3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk> <4E301C55.7090105@FreeBSD.org> <5C84E7C8452E489C8CA738294F5EBB78@multiplay.co.uk> <4E301F10.6060708@FreeBSD.org> <63705B5AEEAD4BB88ADB9EF770AB6C76@multiplay.co.uk> <4E302204.2030009@FreeBSD.org> <6703F0BB-D4FC-4417-B519-CAFC62E5BC39@bitgravity.com> <04C305AE5F184C6AAC2A67CE23184013@multiplay.co.uk> <3D893A9B-2CD9-40EB-B4A2-5DBCBB72C62E@bitgravity.com> <4E35D2E5.4020108@FreeBSD.org> <0E851F439C384186A1D44A347C19A7D7@multiplay.co.uk> <4E36A25F.7000000@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Aug 1, 2011, at 5:55 AM, Martin Matuska wrote: >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>> --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c (revision >>> 224527) >>> +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c (working = copy) >>> @@ -488,7 +488,7 @@ >>> txg_delay(dsl_pool_t *dp, uint64_t txg, int ticks) >>> { >>> tx_state_t *tx =3D &dp->dp_tx; >>> - int timeout =3D ddi_get_lbolt() + ticks; >>> + clock_t timeout =3D ddi_get_lbolt() + ticks; >>=20 >> So you recon that one line will fix the 100+ days overflow David's >> talking about? > No, this bug causes that after 28,5 days of uptime this function won't > delay txg sync threads anymore (ddi_get_lbolt() will be always larger > than timeout).This may lead to a slowdown of writes. Thanks Martin. I see you committed to -head as well. However, a bit confused with my issue still. I've been able to = recreating it and I can't explain why LBOLT is still coming back = negative. I'm in 8.1-RELEASE in this case, with the patches that I = posted before for LBOLT and typedef of clock_t.=20 I added some debugging and am seeing LBOLT still return a negative = number, which then (timeout-LBOLT) because a large number of ticks to = wait. I put some debugging in txg_delay, without fixing the timeout's type: Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c = (revision 3343) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c = (working copy) @@ -427,6 +427,7 @@ { tx_state_t *tx =3D &dp->dp_tx; int timeout =3D LBOLT + ticks; + /* if ( timeout < 0 ) { printf ("txg_delay: timeout = negative timeout=3D%d, ticks=3D%d\n", timeout, ticks); } */ =20 /* don't delay if this txg could transition to = quiesing immediately */ if (tx->tx_open_txg > txg || @@ -439,10 +440,10 @@ return; } =20 - while (LBOLT < timeout && - tx->tx_syncing_txg < txg-1 && !txg_stalled(dp)) - (void) cv_timedwait(&tx->tx_quiesce_more_cv, = &tx->tx_sync_lock, - timeout - LBOLT); + while (LBOLT < timeout && tx->tx_syncing_txg < txg-1 && = !txg_stalled(dp)) { + printf ("txg_delay::cv_timeoutwait: = timeout negative timeout=3D%d, ticks=3D%d %lld \n", timeout, ticks, = LBOLT ); + (void) = cv_timedwait(&tx->tx_quiesce_more_cv, &tx->tx_sync_lock, timeout - = LBOLT); + } =20 mutex_exit(&tx->tx_sync_lock); } After two hours of doing 9 parallel file creates ... I got it deadlocked = ... well, probably just a long wait ... txg_delay::cv_timeoutwait: timeout negative timeout=3D-518109823, = ticks=3D1 -9182683359751551616 txg_delay::cv_timeoutwait: timeout negative timeout=3D-355729919, = ticks=3D1 -9182562881461551616 txg_delay::cv_timeoutwait: timeout negative timeout=3D1272384705, = ticks=3D1 -9182430354322551616 txg_delay::cv_timeoutwait: timeout negative timeout=3D1272384705, = ticks=3D1 -9182308872293551616 txg_delay::cv_timeoutwait: timeout negative timeout=3D1272384705, = ticks=3D1 -9182180363183551616 txg_delay::cv_timeoutwait: timeout negative timeout=3D1272384705, = ticks=3D1 -9182051853654551616 =09 Grabbed a core, and did some more poking : (kgdb) up #3 0xffffffff804d222b in _cv_timedwait (cvp=3D0xffffff0029524260,= =20 lock=3D0xffffff00295241d0, timo=3D1914894593) at /usr/src/sys/kern/kern_condvar.c:323 323 rval =3D sleepq_timedwait(cvp, 0); (kgdb) up #4 0xffffffff8106ef7e in txg_delay (dp=3D0xffffff0029524000, = txg=3D9264, ticks=3D1) at = /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/= txg.c:445 445 (void) = cv_timedwait(&tx->tx_quiesce_more_cv, &tx->tx_sync_lock, timeout - = LBOLT); (kgdb) info locals timeout =3D Variable "timeout" is not available. (kgdb)=20 As you can see, LBOLT is apparently negative - if I'm getting my printf = formatting correct - but regardless the timo value to cv_timedwait() is = timo=3D1914894593 ... which is trying to wait for : 1914894593 / 1000hz / 60s / 60m /24h =3D 22.16 hours Regardless of setting timeout's type correctly ... why is LBOLT still = negative ? It's not clear to me that fixing timeout's type will fix = this issue. However, I will do so, and re-run this same test.=20 My patches earlier show my changes to = sys/cddl/compat/opensolaris/sys/time.h but in summary : 37 #define NANOSEC 1000000000 39 typedef longlong_t hrtime_t; 41 #define LBOLT (gethrtime() * (NANOSEC/hz)) And gerhrtime() still remains unchanged from 8.1: 51 #ifdef _KERNEL 52 #define lbolt64 (int64_t)(LBOLT) 53=20 54 static __inline hrtime_t 55 gethrtime(void) { 56=20 57 struct timespec ts; 58 hrtime_t nsec; 59=20 60 #if 1 61 getnanouptime(&ts); 62 #else 63 nanouptime(&ts); 64 #endif 65 nsec =3D (hrtime_t)ts.tv_sec * NANOSEC + ts.tv_nsec; 66 return (nsec); 67 } 68=20 69 #define gethrestime_sec() (time_second) 70 #define gethrestime(ts) getnanotime(ts) --- David P. Discher dpd@bitgravity.com * AIM: bgDavidDPD BITGRAVITY * http://www.bitgravity.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?DC3F7EEF-0FD1-4110-85A7-0D24E4F78091>