From owner-freebsd-current@FreeBSD.ORG Wed May 27 18:22:33 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 177621065700 for ; Wed, 27 May 2009 18:22:33 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id DF33E8FC1E for ; Wed, 27 May 2009 18:22:32 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 9276046B06; Wed, 27 May 2009 14:22:32 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 82E788A029; Wed, 27 May 2009 14:22:31 -0400 (EDT) From: John Baldwin To: freebsd-current@freebsd.org Date: Wed, 27 May 2009 14:22:29 -0400 User-Agent: KMail/1.9.7 References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200905271422.30177.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 27 May 2009 14:22:31 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Artem Belevich Subject: Re: ZFS : panic("sleeping thread") X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 May 2009 18:22:33 -0000 On Wednesday 27 May 2009 1:58:49 pm Artem Belevich wrote: > Hi, > > While recent ZFS improvements got rid of random hangs I used to see, > there's still one problem that I keep running into -- panic in ZFS > under heavy load. I can reproduce it by doing a build with -j16 in a > jail running i386 binaries on -CURRENT/amd64 running on a box with > quad-core CPU. It takes a while to reproduce, but it usually shows up > within couple of hours. > > Sleeping thread (tid 100606, pid 32147) owns a non-sleepable lock > sched_switch() at sched_switch+0xed > mi_switch() at mi_switch+0x16f > sleepq_wait() at sleepq_wait+0x42 > _sx_xlock_hard() at _sx_xlock_hard+0x1f0 > _sx_xlock() at _sx_xlock+0x4e > rrw_exit() at rrw_exit+0x1d > zfs_freebsd_getattr() at zfs_freebsd_getattr+0x2be > VOP_GETATTR_APV() at VOP_GETATTR_APV+0x44 > filt_vfsread() at filt_vfsread+0x51 > knote() at knote+0xc2 > VOP_WRITE_APV() at VOP_WRITE_APV+0x11f > vn_write() at vn_write+0x279 > dofilewrite() at dofilewrite+0x85 > kern_writev() at kern_writev+0x60 > write() at write+0x54 > ia32_syscall() at ia32_syscall+0x236 > Xint0x80_syscall() at Xint0x80_syscall+0x85 > --- syscall (4, FreeBSD ELF32, write), rip = 0x78162153, rsp = > 0xffff945c, rbp = 0xffff9478 --- > > It appears that locking within ZFS conflicts with vnode locking. The > back-trace is always the same. I think it is the knote locking that is actually the problem. knote() is acquiring the KQ_LOCK and I think this lock needs to be dropped while invoking VOP_GETATTR(). -- John Baldwin