From owner-freebsd-sparc64@FreeBSD.ORG Wed Jun 8 21:14:31 2011 Return-Path: Delivered-To: freebsd-sparc64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7704A1065673; Wed, 8 Jun 2011 21:14:31 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214]) by mx1.freebsd.org (Postfix) with ESMTP id EA81A8FC1C; Wed, 8 Jun 2011 21:14:30 +0000 (UTC) Received: from alchemy.franken.de (localhost [127.0.0.1]) by alchemy.franken.de (8.14.4/8.14.4/ALCHEMY.FRANKEN.DE) with ESMTP id p58LEREM035514; Wed, 8 Jun 2011 23:14:27 +0200 (CEST) (envelope-from marius@alchemy.franken.de) Received: (from marius@localhost) by alchemy.franken.de (8.14.4/8.14.4/Submit) id p58LER2l035513; Wed, 8 Jun 2011 23:14:27 +0200 (CEST) (envelope-from marius) Date: Wed, 8 Jun 2011 23:14:27 +0200 From: Marius Strobl To: Nathaniel W Filardo Message-ID: <20110608211427.GA35494@alchemy.franken.de> References: <20110406080043.GQ609@gradx.cs.jhu.edu> <20110603070356.GJ7129@gradx.cs.jhu.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110603070356.GJ7129@gradx.cs.jhu.edu> User-Agent: Mutt/1.4.2.3i Cc: freebsd-current@freebsd.org, pjd@freebsd.org, freebsd-sparc64@freebsd.org, mm@freebsd.org Subject: Re: ZFS panic with concurrent recv and read-heavy workload X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jun 2011 21:14:31 -0000 On Fri, Jun 03, 2011 at 03:03:56AM -0400, Nathaniel W Filardo wrote: > I just got this on another machine, no heavy workload needed, just booting > and starting some jails. Of interest, perhaps, both this and the machine > triggering the below panic are SMP V240s with 1.5GHz CPUs (though I will > confess that the machine in the original report may have had bad RAM). I > have run a UP 1.2GHz V240 for months and never seen this panic. > > This time the kernel is > > FreeBSD 9.0-CURRENT #9: Fri Jun 3 02:32:13 EDT 2011 > csup'd immediately before building. The full panic this time is > > panic: Lock buf_hash_table.ht_locks[i].ht_lock not exclusively locked @ > > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4659 > > > > cpuid = 1 > > KDB: stack backtrace: > > panic() at panic+0x1c8 > > _sx_assert() at _sx_assert+0xc4 > > _sx_xunlock() at _sx_xunlock+0x98 > > l2arc_feed_thread() at l2arc_feed_thread+0xeac > > fork_exit() at fork_exit+0x9c > > fork_trampoline() at fork_trampoline+0x8 > > > > SC Alert: SC Request to send Break to host. > > KDB: enter: Line break on console > > [ thread pid 27 tid 100121 ] > > Stopped at kdb_enter+0x80: ta %xcc, 1 > > db> reset > > ttiimmeeoouutt sshhuuttttiinngg ddoowwnn CCPPUUss.. > > Half of the memory in this machine is new (well, came with the machine) and > half is from the aforementioned UP V240 which seemed to work fine (I was > attempting an upgrade when this happened); none of it (or indeed any of the > hardware save the disk controller and disks) are common between this and the > machine reporting below. > > Thoughts? Any help would be greatly appreciated. > Thanks. > --nwf; > > On Wed, Apr 06, 2011 at 04:00:43AM -0400, Nathaniel W Filardo wrote: > >[...] > > panic: Lock buf_hash_table.ht_locks[i].ht_lock not exclusively locked @ /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1869 > > > > cpuid = 1 > > KDB: stack backtrace: > > panic() at panic+0x1c8 > > _sx_assert() at _sx_assert+0xc4 > > _sx_xunlock() at _sx_xunlock+0x98 > > arc_evict() at arc_evict+0x614 > > arc_get_data_buf() at arc_get_data_buf+0x360 > > arc_buf_alloc() at arc_buf_alloc+0x94 > > dmu_buf_will_fill() at dmu_buf_will_fill+0xfc > > dmu_write() at dmu_write+0xec > > dmu_recv_stream() at dmu_recv_stream+0x8a8 > > zfs_ioc_recv() at zfs_ioc_recv+0x354 > > zfsdev_ioctl() at zfsdev_ioctl+0xe0 > > devfs_ioctl_f() at devfs_ioctl_f+0xe8 > > kern_ioctl() at kern_ioctl+0x294 > > ioctl() at ioctl+0x198 > > syscallenter() at syscallenter+0x270 > > syscall() at syscall+0x74 > > -- syscall (54, FreeBSD ELF64, ioctl) %o7=0x40c13e24 -- > > userland() at 0x40e72cc8 > > user trace: trap %o7=0x40c13e24 > > pc 0x40e72cc8, sp 0x7fdffff4641 > > pc 0x40c158f4, sp 0x7fdffff4721 > > pc 0x40c1e878, sp 0x7fdffff47f1 > > pc 0x40c1ce54, sp 0x7fdffff8b01 > > pc 0x40c1dbe0, sp 0x7fdffff9431 > > pc 0x40c1f718, sp 0x7fdffffd741 > > pc 0x10731c, sp 0x7fdffffd831 > > pc 0x10c90c, sp 0x7fdffffd8f1 > > pc 0x103ef0, sp 0x7fdffffe1d1 > > pc 0x4021aff4, sp 0x7fdffffe291 > > done > >[...] Apparently this is a locking issue in the ARC code, the ZFS people should be able to help you. Marius