From owner-freebsd-stable@FreeBSD.ORG Thu Nov 13 00:41:11 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 592351065670 for ; Thu, 13 Nov 2008 00:41:11 +0000 (UTC) (envelope-from tdb@carrick.bishnet.net) Received: from carrick.bishnet.net (carrick.bishnet.net [IPv6:2a01:348:132::1]) by mx1.freebsd.org (Postfix) with ESMTP id 156908FC0C for ; Thu, 13 Nov 2008 00:41:10 +0000 (UTC) (envelope-from tdb@carrick.bishnet.net) Received: from tdb by carrick.bishnet.net with local (Exim 4.66 (FreeBSD)) (envelope-from ) id 1L0QGc-000AB1-J1; Thu, 13 Nov 2008 00:41:02 +0000 Date: Thu, 13 Nov 2008 00:41:02 +0000 From: Tim Bishop To: Kostik Belousov Message-ID: <20081113004102.GD24360@carrick.bishnet.net> References: <20081112175826.GD26195@carrick.bishnet.net> <20081112194735.GK47073@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20081112194735.GK47073@deviant.kiev.zoral.com.ua> X-PGP-Key: 0x5AE7D984, http://www.bishnet.net/tim/tim-bishnet-net.asc X-PGP-Fingerprint: 1453 086E 9376 1A50 ECF6 AE05 7DCE D659 5AE7 D984 User-Agent: Mutt/1.5.13 (2006-08-11) X-Bishnet-MailScanner-Information: Contact postmaster@bishnet.net X-Bishnet-MailScanner-VirusCheck: Found to be clean X-Bishnet-MailScanner-SpamCheck: not spam, SpamAssassin (not cached, score=-1.01, required 5, BAYES_00 -2.60, DNS_FROM_SECURITYSAGE 1.51, NO_RELAYS -0.00, TW_DV 0.08) X-Bishnet-MailScanner-From: tdb@carrick.bishnet.net Cc: Tim Bishop , freebsd-stable@freebsd.org Subject: Re: System deadlock when using mksnap_ffs X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Nov 2008 00:41:11 -0000 On Wed, Nov 12, 2008 at 09:47:35PM +0200, Kostik Belousov wrote: > On Wed, Nov 12, 2008 at 05:58:26PM +0000, Tim Bishop wrote: > > I've been playing around with snapshots lately but I've got a problem on > > one of my servers running 7-STABLE amd64: > > > > FreeBSD paladin 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #8: Mon Nov 10 20:49:51 GMT 2008 tdb@paladin:/usr/obj/usr/src/sys/PALADIN amd64 > > > > I run the mksnap_ffs command to take the snapshot and some time later > > the system completely freezes up: > > > > paladin# cd /u2/.snap/ > > paladin# mksnap_ffs /u2 test.1 > > > > It only happens on this one filesystem, though, which might be to do > > with its size. It's not over the 2TB marker, but it's pretty close. It's > > also backed by a hardware RAID system, although a smaller filesystem on > > the same RAID has no issues. > > > > Filesystem 1K-blocks Used Avail Capacity Mounted on > > /dev/da0s1a 2078881084 921821396 990749202 48% /u2 > > > > To clarify "completely freezes up": unresponsive to all services over > > the network, except ping. On the console I can switch between the ttys, > > but none of them respond. The only way out is to hit the reset button. > > You need to provide information described in the > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html > and especially > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html Ok, I've done that, and removed the patch that seemed to fix things. The first thing I notice after doing this on the console is that I can still ctrl+t the process: load: 0.14 cmd: mksnap_ffs 2603 [newbuf] 0.00u 10.75s 0% 1160k But the top and ps I left running on other ttys have all stopped responding. Also the following kernel message came out: Expensive timeout(9) function: 0xffffffff802ce380(0xffffff000677ca50) 0.006121001 s There is also still some disk I/O. Dropping to ddb worked, but I don't have a serial console so I can't paste the output. ps shows mksnap_ffs in newbuf, as we already saw. A trace of mksnap_ffs looks like this: Tracing pid 2603 tid 100214 td 0xffffff0006a0e370 sched_switch() at sched_switch+0x2a1 mi_switch() at mi_switch+0x233 sleepq_switch() at sleepq_switch+0xe9 sleepq_wait() at sleepq_wait+0x44 _sleep() at _sleep+0x351 getnewbuf() at getnewbuf+0x2e1 getblk() at getblk+0x30d setup_allocindir_phase2() at setup_allocindir_phase2+0x338 softdep_setup_allocindir_page() at softdep_setup_allocindir_page+0xa7 ffs_balloc_ufs2() at ffs_balloc_ufs2+0x121e ffs_snapshot() at ffs_snapshot+0xc52 ffs_mount() at ffs_mount+0x735 vfs_donmount() at vfs_donmount+0xeb5 kernel_mount() at kernel_mount+0xa1 ffs_cmount() at ffs_cmount+0x92 mount() at mount+0x1cc syscall() at syscall+0x1f6 Xfast_syscall() at Xfast_syscall+0xab --- syscall (21, FreeBSD ELF64, mount), rip = 0x80068636c, rsp = 0x7fffffffe518, rbp = 0x8008447a0 --- show pcpu shows cpuid 3 (quad core machine) in thread "swi6: Giant taskq". All the other cpus are idle. show locks shows: exclusive sleep mutex Giant r = 0 (0xffffffff806ae040) locked @ /usr/src/sys/kern/kern_intr.c:1087 There are two other locks shown by show all locks, one for sshd and one for mysqld, both in kern/uipc_sockbuf.c. show lockedvnods shows mksnap_ffs has a lock on da0s1a with ffs_vget at the top of the stack. Sorry for any typos. I'll sort out a serial cable if more is needed :-) Tim. -- Tim Bishop http://www.bishnet.net/tim/ PGP Key: 0x5AE7D984