Date: Sat, 22 Mar 2014 18:56:18 -0500 From: rmtodd@servalan.servalan.com (Richard Todd) To: freebsd-fs@freebsd.org Subject: ZFS panic: solaris assert: wakeup >= now, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c, line: 1066 Message-ID: <E1WRVrl-00010b-MZ@ln.servalan.com>
next in thread | raw e-mail | index | archive | help
I just recently updated my main machine to a recent (Wed. morning)
9-STABLE version. While doing backups to an external ZFS-formatted
USB drive I got the following panic:
panic: solaris assert: wakeup >= now, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c, line: 1066
The culprit seems to be the write-throttling code in dmu_tx.c (I've noted thea
actual line of the panic with an arrow):
#ifdef _KERNEL
#ifdef illumos
mutex_enter(&curthread->t_delay_lock);
while (cv_timedwait_hires(&curthread->t_delay_cv,
&curthread->t_delay_lock, wakeup, zfs_delay_resolution_ns,
CALLOUT_FLAG_ABSOLUTE | CALLOUT_FLAG_ROUNDUP) > 0)
continue;
mutex_exit(&curthread->t_delay_lock);
#else
/* XXX High resolution callouts are not available */
ASSERT(wakeup >= now); <------ panic
pause("dmu_tx_delay", NSEC_TO_TICK(wakeup - now));
#endif
#else
hrtime_t delta = wakeup - gethrtime();
struct timespec ts;
ts.tv_sec = delta / NANOSEC;
ts.tv_nsec = delta % NANOSEC;
(void) nanosleep(&ts, NULL);
#endif
The code is supposed to put the thread to sleep until such time as the time
"wakeup" arrives, but apparently under some circumstances when the system is
under heavy enough load things get delayed enough so that by the time the
code gets to line 1066 above, the actual time is past the "wakeup" time,
making the assert fail. (The system was quite busy when this happened, with
not only the backups running but a couple of the zfs pools were doing the
/etc/periodic-initiated regular scrub at the time.)
I've gone ahead and hacked my copy to skip the pause
if the wakeup time has already passed instead of doing a panic, logging an
informative printf when this happens:
diff -r e817c2457f83 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c
--- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c Wed Mar 19 21:08:39 2014 -0500
+++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c Sat Mar 22 16:28:06 2014 -0500
@@ -1063,8 +1063,11 @@
mutex_exit(&curthread->t_delay_lock);
#else
/* XXX High resolution callouts are not available */
- ASSERT(wakeup >= now);
- pause("dmu_tx_delay", NSEC_TO_TICK(wakeup - now));
+ if (wakeup < now) {
+ printf("Warning: dmu_tx_delay: wakeup %lu < now %lu\n", (unsigned long)wakeup, (unsigned long)now);
+ } else {
+ pause("dmu_tx_delay", NSEC_TO_TICK(wakeup - now));
+ }
#endif
#else
hrtime_t delta = wakeup - gethrtime();
Things seem to be running okay with this patch, with an occasional console
message like
Mar 22 17:43:09 ichotolot kernel: Warning: dmu_tx_delay: wakeup 3965794197537 < now 3975734195460
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E1WRVrl-00010b-MZ>
