From owner-freebsd-stable@FreeBSD.ORG  Fri Sep  9 20:10:46 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DE5351065670;
	Fri,  9 Sep 2011 20:10:45 +0000 (UTC) (envelope-from hrs@FreeBSD.org)
Received: from mail.allbsd.org (gatekeeper-int.allbsd.org
	[IPv6:2001:2f0:104:e002::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 2EEB68FC13;
	Fri,  9 Sep 2011 20:10:44 +0000 (UTC)
Received: from alph.allbsd.org ([IPv6:2001:2f0:104:e010:862b:2bff:febc:8956])
	(authenticated bits=128)
	by mail.allbsd.org (8.14.4/8.14.4) with ESMTP id p89KASb9005483;
	Sat, 10 Sep 2011 05:10:38 +0900 (JST) (envelope-from hrs@FreeBSD.org)
Received: from localhost (localhost [IPv6:::1]) (authenticated bits=0)
	by alph.allbsd.org (8.14.4/8.14.4) with ESMTP id p89KARbm026576;
	Sat, 10 Sep 2011 05:10:28 +0900 (JST) (envelope-from hrs@FreeBSD.org)
Date: Sat, 10 Sep 2011 04:48:41 +0900 (JST)
Message-Id: <20110910.044841.232160047547388224.hrs@allbsd.org>
To: pjd@FreeBSD.org, mm@FreeBSD.org, freebsd-stable@FreeBSD.org
From: Hiroki Sato <hrs@FreeBSD.org>
In-Reply-To: <20110907.094717.2272609566853905102.hrs@allbsd.org>
References: <20110903.071908.971549835606878048.hrs@allbsd.org>
	<CAJ-FndAChGndC=LkZNi7i6mOt+Spw3-OftO9rH0+5WNnVWzuBw@mail.gmail.com>
	<20110907.094717.2272609566853905102.hrs@allbsd.org>
X-PGPkey-fingerprint: BDB3 443F A5DD B3D0 A530  FFD7 4F2C D3D8 2793 CF2D
X-Mailer: Mew version 6.3.51 on Emacs 23.3 / Mule 6.0 (HANACHIRUSATO)
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: clamav-milter 0.97 at gatekeeper.allbsd.org
X-Virus-Status: Clean
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.3
	(mail.allbsd.org [IPv6:2001:2f0:104:e001::32]);
	Sat, 10 Sep 2011 05:10:42 +0900 (JST)
X-Spam-Status: No, score=-104.6 required=13.0 tests=BAYES_00,
	CONTENT_TYPE_PRESENT, RDNS_NONE, SPF_SOFTFAIL,
	USER_IN_WHITELIST autolearn=no version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	gatekeeper.allbsd.org
Cc: attilio@FreeBSD.org, kib@FreeBSD.org
Subject: ZFS panic on a RELENG_8 NFS server (Was: panic: spin lock held too
 long (RELENG_8 from today))
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 09 Sep 2011 20:10:46 -0000

Hiroki Sato <hrs@freebsd.org> wrote
  in <20110907.094717.2272609566853905102.hrs@allbsd.org>:

hr>  During this investigation an disk has to be replaced and resilvering
hr>  it is now in progress.  A deadlock and a forced reboot after that
hr>  make recovering of the zfs datasets take a long time (for committing
hr>  logs, I think), so I will try to reproduce the deadlock and get a
hr>  core dump after it finished.

 I think I could reproduce the symptoms.  I have no idea about if
 these are exactly the same as occurred on my box before because the
 kernel was replaced with one with some debugging options, but these
 are reproducible at least.

 There are two symptoms.  One is a panic.  A DDB output when the panic
 occurred is the following:

----
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address	= 0x100000040
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff8065b926
stack pointer	        = 0x28:0xffffff8257b94d70
frame pointer	        = 0x28:0xffffff8257b94e10
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 992 (nfsd: service)
[thread pid 992 tid 100586 ]
Stopped at      witness_checkorder+0x246:       movl    0x40(%r13),%ebx

db> bt
Tracing pid 992 tid 100586 td 0xffffff00595d9000
witness_checkorder() at witness_checkorder+0x246
_sx_slock() at _sx_slock+0x35
dmu_bonus_hold() at dmu_bonus_hold+0x57
zfs_zget() at zfs_zget+0x237
zfs_dirent_lock() at zfs_dirent_lock+0x488
zfs_dirlook() at zfs_dirlook+0x69
zfs_lookup() at zfs_lookup+0x26b
zfs_freebsd_lookup() at zfs_freebsd_lookup+0x81
vfs_cache_lookup() at vfs_cache_lookup+0xf0
VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x40
lookup() at lookup+0x384
nfsvno_namei() at nfsvno_namei+0x268
nfsrvd_lookup() at nfsrvd_lookup+0xd6
nfsrvd_dorpc() at nfsrvd_dorpc+0x745
nfssvc_program() at nfssvc_program+0x447
svc_run_internal() at svc_run_internal+0x51b
svc_thread_start() at svc_thread_start+0xb
fork_exit() at fork_exit+0x11d
fork_trampoline() at fork_trampoline+0xe
--- trap 0xc, rip = 0x8006a031c, rsp = 0x7fffffffe6c8, rbp = 0x6 ---
----

 The complete output can be found at:

  http://people.allbsd.org/~hrs/zfs_panic_20110909_1/pool-zfs-20110909-1.txt

 Another is getting stuck at ZFS access.  The kernel is running with
 no panic but any access to ZFS datasets causes a program
 non-responsive.  The DDB output can be found at:

  http://people.allbsd.org/~hrs/zfs_panic_20110909_2/pool-zfs-20110909-2.txt

 The trigger for the both was some access to a ZFS dataset from the
 NFS clients.  Because the access pattern was complex I could not
 narrow down what was the culprit, but it seems timing-dependent and
 simply doing "rm -rf" locally on the server can sometimes trigger
 them.

 The crash dump and the kernel can be found at the following URLs:

  panic:
    http://people.allbsd.org/~hrs/zfs_panic_20110909_1/

  no panic but unresponsive:
    http://people.allbsd.org/~hrs/zfs_panic_20110909_2/

  kernel:
    http://people.allbsd.org/~hrs/zfs_panic_20110909_kernel/

-- Hiroki