From owner-freebsd-sparc64@FreeBSD.ORG  Wed Jun  8 21:14:31 2011
Return-Path: <owner-freebsd-sparc64@FreeBSD.ORG>
Delivered-To: freebsd-sparc64@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7704A1065673;
	Wed,  8 Jun 2011 21:14:31 +0000 (UTC)
	(envelope-from marius@alchemy.franken.de)
Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214])
	by mx1.freebsd.org (Postfix) with ESMTP id EA81A8FC1C;
	Wed,  8 Jun 2011 21:14:30 +0000 (UTC)
Received: from alchemy.franken.de (localhost [127.0.0.1])
	by alchemy.franken.de (8.14.4/8.14.4/ALCHEMY.FRANKEN.DE) with ESMTP id
	p58LEREM035514; Wed, 8 Jun 2011 23:14:27 +0200 (CEST)
	(envelope-from marius@alchemy.franken.de)
Received: (from marius@localhost)
	by alchemy.franken.de (8.14.4/8.14.4/Submit) id p58LER2l035513;
	Wed, 8 Jun 2011 23:14:27 +0200 (CEST) (envelope-from marius)
Date: Wed, 8 Jun 2011 23:14:27 +0200
From: Marius Strobl <marius@alchemy.franken.de>
To: Nathaniel W Filardo <nwf@cs.jhu.edu>
Message-ID: <20110608211427.GA35494@alchemy.franken.de>
References: <20110406080043.GQ609@gradx.cs.jhu.edu>
	<20110603070356.GJ7129@gradx.cs.jhu.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110603070356.GJ7129@gradx.cs.jhu.edu>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-current@freebsd.org, pjd@freebsd.org, freebsd-sparc64@freebsd.org,
	mm@freebsd.org
Subject: Re: ZFS panic with concurrent recv and read-heavy workload
X-BeenThere: freebsd-sparc64@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Porting FreeBSD to the Sparc <freebsd-sparc64.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64>, 
	<mailto:freebsd-sparc64-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-sparc64>
List-Post: <mailto:freebsd-sparc64@freebsd.org>
List-Help: <mailto:freebsd-sparc64-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64>,
	<mailto:freebsd-sparc64-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Jun 2011 21:14:31 -0000

On Fri, Jun 03, 2011 at 03:03:56AM -0400, Nathaniel W Filardo wrote:
> I just got this on another machine, no heavy workload needed, just booting
> and starting some jails.  Of interest, perhaps, both this and the machine
> triggering the below panic are SMP V240s with 1.5GHz CPUs (though I will
> confess that the machine in the original report may have had bad RAM).  I
> have run a UP 1.2GHz V240 for months and never seen this panic.
> 
> This time the kernel is
> > FreeBSD 9.0-CURRENT #9: Fri Jun  3 02:32:13 EDT 2011
> csup'd immediately before building.  The full panic this time is
> > panic: Lock buf_hash_table.ht_locks[i].ht_lock not exclusively locked @
> > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4659
> >
> > cpuid = 1
> > KDB: stack backtrace:
> > panic() at panic+0x1c8
> > _sx_assert() at _sx_assert+0xc4
> > _sx_xunlock() at _sx_xunlock+0x98
> > l2arc_feed_thread() at l2arc_feed_thread+0xeac
> > fork_exit() at fork_exit+0x9c
> > fork_trampoline() at fork_trampoline+0x8
> >
> > SC Alert: SC Request to send Break to host.
> > KDB: enter: Line break on console
> > [ thread pid 27 tid 100121 ]
> > Stopped at      kdb_enter+0x80: ta              %xcc, 1
> > db> reset
> > ttiimmeeoouutt  sshhuuttttiinngg  ddoowwnn  CCPPUUss..
> 
> Half of the memory in this machine is new (well, came with the machine) and
> half is from the aforementioned UP V240 which seemed to work fine (I was
> attempting an upgrade when this happened); none of it (or indeed any of the
> hardware save the disk controller and disks) are common between this and the
> machine reporting below.
> 
> Thoughts?  Any help would be greatly appreciated.
> Thanks.
> --nwf;
> 
> On Wed, Apr 06, 2011 at 04:00:43AM -0400, Nathaniel W Filardo wrote:
> >[...]
> > panic: Lock buf_hash_table.ht_locks[i].ht_lock not exclusively locked @ /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1869
> >
> > cpuid = 1
> > KDB: stack backtrace:
> > panic() at panic+0x1c8
> > _sx_assert() at _sx_assert+0xc4
> > _sx_xunlock() at _sx_xunlock+0x98
> > arc_evict() at arc_evict+0x614
> > arc_get_data_buf() at arc_get_data_buf+0x360
> > arc_buf_alloc() at arc_buf_alloc+0x94
> > dmu_buf_will_fill() at dmu_buf_will_fill+0xfc
> > dmu_write() at dmu_write+0xec
> > dmu_recv_stream() at dmu_recv_stream+0x8a8
> > zfs_ioc_recv() at zfs_ioc_recv+0x354
> > zfsdev_ioctl() at zfsdev_ioctl+0xe0
> > devfs_ioctl_f() at devfs_ioctl_f+0xe8
> > kern_ioctl() at kern_ioctl+0x294
> > ioctl() at ioctl+0x198
> > syscallenter() at syscallenter+0x270
> > syscall() at syscall+0x74
> > -- syscall (54, FreeBSD ELF64, ioctl) %o7=0x40c13e24 --
> > userland() at 0x40e72cc8
> > user trace: trap %o7=0x40c13e24
> > pc 0x40e72cc8, sp 0x7fdffff4641
> > pc 0x40c158f4, sp 0x7fdffff4721
> > pc 0x40c1e878, sp 0x7fdffff47f1
> > pc 0x40c1ce54, sp 0x7fdffff8b01
> > pc 0x40c1dbe0, sp 0x7fdffff9431
> > pc 0x40c1f718, sp 0x7fdffffd741
> > pc 0x10731c, sp 0x7fdffffd831
> > pc 0x10c90c, sp 0x7fdffffd8f1
> > pc 0x103ef0, sp 0x7fdffffe1d1
> > pc 0x4021aff4, sp 0x7fdffffe291
> > done
> >[...]

Apparently this is a locking issue in the ARC code, the ZFS people should
be able to help you.

Marius