From owner-freebsd-fs@FreeBSD.ORG Sun Mar 23 00:02:18 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D83F7E13 for ; Sun, 23 Mar 2014 00:02:18 +0000 (UTC) Received: from ln.servalan.com (ln.servalan.com [IPv6:2600:3c00::f03c:91ff:fe96:62f5]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id B2DAA7D2 for ; Sun, 23 Mar 2014 00:02:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=servalan.com; s=rsadkim; h=Message-Id:Date:Subject:From:To; bh=rKjjnuvBo2rGG4IC8OgWzVp/83/0jlYPvA/ncfG3GhA=; b=so7QBMwSMoUYkUG/UIJWvowofrVlFtx2m0D1lXzD3kjcF3GjxlGtyxhmK53+CU1rlgzOOl+MqJQvxFHqf/7iA1/Igrjud679echrgEPSq8xRzs8b7PZWmBlPnp73muOJGjf0xNW90iEV3KBGNlziPLtbdQ4/QhuaEsmvIeNxKgQ=; Received: from uucp by ln.servalan.com with local-rmail (Exim 4.76) (envelope-from ) id 1WRVrl-00010b-MZ for freebsd-fs@freebsd.org; Sat, 22 Mar 2014 19:02:17 -0500 Received: from localhost ([127.0.0.1]:39758 helo=ichotolot.servalan.com) by servalan.servalan.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from ) id 1WRVly-00080s-ER for freebsd-fs@freebsd.org; Sat, 22 Mar 2014 18:56:18 -0500 To: freebsd-fs@freebsd.org From: rmtodd@servalan.servalan.com (Richard Todd) Subject: ZFS panic: solaris assert: wakeup >= now, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c, line: 1066 Date: Sat, 22 Mar 2014 18:56:18 -0500 Message-Id: X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Mar 2014 00:02:19 -0000 I just recently updated my main machine to a recent (Wed. morning) 9-STABLE version. While doing backups to an external ZFS-formatted USB drive I got the following panic: panic: solaris assert: wakeup >= now, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c, line: 1066 The culprit seems to be the write-throttling code in dmu_tx.c (I've noted thea actual line of the panic with an arrow): #ifdef _KERNEL #ifdef illumos mutex_enter(&curthread->t_delay_lock); while (cv_timedwait_hires(&curthread->t_delay_cv, &curthread->t_delay_lock, wakeup, zfs_delay_resolution_ns, CALLOUT_FLAG_ABSOLUTE | CALLOUT_FLAG_ROUNDUP) > 0) continue; mutex_exit(&curthread->t_delay_lock); #else /* XXX High resolution callouts are not available */ ASSERT(wakeup >= now); <------ panic pause("dmu_tx_delay", NSEC_TO_TICK(wakeup - now)); #endif #else hrtime_t delta = wakeup - gethrtime(); struct timespec ts; ts.tv_sec = delta / NANOSEC; ts.tv_nsec = delta % NANOSEC; (void) nanosleep(&ts, NULL); #endif The code is supposed to put the thread to sleep until such time as the time "wakeup" arrives, but apparently under some circumstances when the system is under heavy enough load things get delayed enough so that by the time the code gets to line 1066 above, the actual time is past the "wakeup" time, making the assert fail. (The system was quite busy when this happened, with not only the backups running but a couple of the zfs pools were doing the /etc/periodic-initiated regular scrub at the time.) I've gone ahead and hacked my copy to skip the pause if the wakeup time has already passed instead of doing a panic, logging an informative printf when this happens: diff -r e817c2457f83 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c Wed Mar 19 21:08:39 2014 -0500 +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c Sat Mar 22 16:28:06 2014 -0500 @@ -1063,8 +1063,11 @@ mutex_exit(&curthread->t_delay_lock); #else /* XXX High resolution callouts are not available */ - ASSERT(wakeup >= now); - pause("dmu_tx_delay", NSEC_TO_TICK(wakeup - now)); + if (wakeup < now) { + printf("Warning: dmu_tx_delay: wakeup %lu < now %lu\n", (unsigned long)wakeup, (unsigned long)now); + } else { + pause("dmu_tx_delay", NSEC_TO_TICK(wakeup - now)); + } #endif #else hrtime_t delta = wakeup - gethrtime(); Things seem to be running okay with this patch, with an occasional console message like Mar 22 17:43:09 ichotolot kernel: Warning: dmu_tx_delay: wakeup 3965794197537 < now 3975734195460