Date: Sat, 22 Mar 2014 18:56:18 -0500 From: rmtodd@servalan.servalan.com (Richard Todd) To: freebsd-fs@freebsd.org Subject: ZFS panic: solaris assert: wakeup >= now, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c, line: 1066 Message-ID: <E1WRVrl-00010b-MZ@ln.servalan.com>
next in thread | raw e-mail | index | archive | help
I just recently updated my main machine to a recent (Wed. morning) 9-STABLE version. While doing backups to an external ZFS-formatted USB drive I got the following panic: panic: solaris assert: wakeup >= now, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c, line: 1066 The culprit seems to be the write-throttling code in dmu_tx.c (I've noted thea actual line of the panic with an arrow): #ifdef _KERNEL #ifdef illumos mutex_enter(&curthread->t_delay_lock); while (cv_timedwait_hires(&curthread->t_delay_cv, &curthread->t_delay_lock, wakeup, zfs_delay_resolution_ns, CALLOUT_FLAG_ABSOLUTE | CALLOUT_FLAG_ROUNDUP) > 0) continue; mutex_exit(&curthread->t_delay_lock); #else /* XXX High resolution callouts are not available */ ASSERT(wakeup >= now); <------ panic pause("dmu_tx_delay", NSEC_TO_TICK(wakeup - now)); #endif #else hrtime_t delta = wakeup - gethrtime(); struct timespec ts; ts.tv_sec = delta / NANOSEC; ts.tv_nsec = delta % NANOSEC; (void) nanosleep(&ts, NULL); #endif The code is supposed to put the thread to sleep until such time as the time "wakeup" arrives, but apparently under some circumstances when the system is under heavy enough load things get delayed enough so that by the time the code gets to line 1066 above, the actual time is past the "wakeup" time, making the assert fail. (The system was quite busy when this happened, with not only the backups running but a couple of the zfs pools were doing the /etc/periodic-initiated regular scrub at the time.) I've gone ahead and hacked my copy to skip the pause if the wakeup time has already passed instead of doing a panic, logging an informative printf when this happens: diff -r e817c2457f83 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c Wed Mar 19 21:08:39 2014 -0500 +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c Sat Mar 22 16:28:06 2014 -0500 @@ -1063,8 +1063,11 @@ mutex_exit(&curthread->t_delay_lock); #else /* XXX High resolution callouts are not available */ - ASSERT(wakeup >= now); - pause("dmu_tx_delay", NSEC_TO_TICK(wakeup - now)); + if (wakeup < now) { + printf("Warning: dmu_tx_delay: wakeup %lu < now %lu\n", (unsigned long)wakeup, (unsigned long)now); + } else { + pause("dmu_tx_delay", NSEC_TO_TICK(wakeup - now)); + } #endif #else hrtime_t delta = wakeup - gethrtime(); Things seem to be running okay with this patch, with an occasional console message like Mar 22 17:43:09 ichotolot kernel: Warning: dmu_tx_delay: wakeup 3965794197537 < now 3975734195460
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E1WRVrl-00010b-MZ>