From owner-freebsd-current@FreeBSD.ORG Thu Oct 3 10:54:28 2013 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 3D327EC8; Thu, 3 Oct 2013 10:54:28 +0000 (UTC) (envelope-from kwhite@site.uottawa.ca) Received: from courriel.site.uottawa.ca (eecsmail.engineering.uottawa.ca [137.122.24.224]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id D08DE286B; Thu, 3 Oct 2013 10:54:27 +0000 (UTC) Received: from [10.0.2.15] (dsl-74-51-61-7.vianet.ca [74.51.61.7]) (authenticated bits=0) by courriel.site.uottawa.ca (8.14.5/8.14.4) with ESMTP id r93AsOSb077758 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 3 Oct 2013 06:54:25 -0400 (EDT) (envelope-from kwhite@site.uottawa.ca) Date: Thu, 3 Oct 2013 06:54:31 -0400 (EDT) From: Keith White X-X-Sender: kwhite@localhost.my.domain To: Andriy Gapon Subject: Re: ZFS panic with r255937 In-Reply-To: <524D13AB.2020800@FreeBSD.org> Message-ID: References: <60850.74.51.61.7.1380496284.squirrel@courriel.site.uottawa.ca> <524C0DB2.5040202@FreeBSD.org> <524D13AB.2020800@FreeBSD.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-current@FreeBSD.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Oct 2013 10:54:28 -0000 On Thu, 3 Oct 2013, Andriy Gapon wrote: > on 02/10/2013 20:59 Keith White said the following: >> On Wed, 2 Oct 2013, Andriy Gapon wrote: >> >>> on 30/09/2013 02:11 kwhite@site.uottawa.ca said the following: >>>> Sorry, debugging this is *way* beyond me. Any hints, patches to try? >>> >>> Please share the stack trace. >>> >>> -- >>> Andriy Gapon >> >> There's now a pr for this panic: kern/182570 >> >> Here's the stack trace: >> >> root@freebsd10:/usr/src # kgdb /boot/kernel/kernel /var/crash/vmcore.last >> GNU gdb 6.1.1 [FreeBSD] >> Copyright 2004 Free Software Foundation, Inc. >> GDB is free software, covered by the GNU General Public License, and you are >> welcome to change it and/or distribute copies of it under certain conditions. >> Type "show copying" to see the conditions. >> There is absolutely no warranty for GDB. Type "show warranty" for details. >> This GDB was configured as "amd64-marcel-freebsd"... >> >> Unread portion of the kernel message buffer: >> panic: solaris assert: dn->dn_maxblkid == 0 && >> (BP_IS_HOLE(&dn->dn_phys->dn_blkptr[0]) || dnode_block_freed(dn, 0)), file: >> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c, >> line: 598 >> cpuid = 1 >> KDB: stack backtrace: >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00992b3280 >> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe00992b3330 >> vpanic() at vpanic+0x126/frame 0xfffffe00992b3370 >> panic() at panic+0x43/frame 0xfffffe00992b33d0 >> assfail() at assfail+0x22/frame 0xfffffe00992b33e0 >> dnode_reallocate() at dnode_reallocate+0x225/frame 0xfffffe00992b3430 >> dmu_object_reclaim() at dmu_object_reclaim+0x123/frame 0xfffffe00992b3480 >> dmu_recv_stream() at dmu_recv_stream+0xd79/frame 0xfffffe00992b36b0 >> zfs_ioc_recv() at zfs_ioc_recv+0x96c/frame 0xfffffe00992b3920 >> zfsdev_ioctl() at zfsdev_ioctl+0x54a/frame 0xfffffe00992b39c0 >> devfs_ioctl_f() at devfs_ioctl_f+0xf0/frame 0xfffffe00992b3a20 >> kern_ioctl() at kern_ioctl+0x2ca/frame 0xfffffe00992b3a90 >> sys_ioctl() at sys_ioctl+0x11f/frame 0xfffffe00992b3ae0 >> amd64_syscall() at amd64_syscall+0x265/frame 0xfffffe00992b3bf0 >> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe00992b3bf0 > > > Thank you very much. > To me this looks very similar to a problem discovered and fixed in illumos some > time ago. Please check if the following change improves the situation for you. > > https://github.com/avg-I/freebsd/commit/a7e7dece215bc5d00077e9c7f4db34d9e5c30659 > > Raw: > https://github.com/avg-I/freebsd/commit/a7e7dece215bc5d00077e9c7f4db34d9e5c30659.patch > ... Yes, it does. send/recv completes with no panic. That patch fixes kern/182570 for me. Thanks! ...keith Once the patch is applied "svn diff" gives me: Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c =================================================================== --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c (revision 255986) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c (working copy) @@ -677,6 +677,16 @@ if (err != 0) return (err); err = dmu_free_long_range_impl(os, dn, offset, length); + + /* + * It is important to zero out the maxblkid when freeing the entire + * file, so that (a) subsequent calls to dmu_free_long_range_impl() + * will take the fast path, and (b) dnode_reallocate() can verify + * that the entire file has been freed. + */ + if (offset == 0 && length == DMU_OBJECT_END) + dn->dn_maxblkid = 0; + dnode_rele(dn, FTAG); return (err); } Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c =================================================================== --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c (revision 255986) +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c (working copy) @@ -616,7 +616,7 @@ */ if (dn->dn_datablkshift == 0) { if (off != 0 || len < dn->dn_datablksz) - dmu_tx_count_write(txh, off, len); + dmu_tx_count_write(txh, 0, dn->dn_datablksz); } else { /* first block will be modified if it is not aligned */ if (!IS_P2ALIGNED(off, 1 << dn->dn_datablkshift))