Date: Thu, 13 Nov 2008 02:45:14 -0800 From: Jeremy Chadwick <koitsu@FreeBSD.org> To: Kostik Belousov <kostikbel@gmail.com> Cc: Tim Bishop <tim@bishnet.net>, freebsd-stable@freebsd.org Subject: Re: System deadlock when using mksnap_ffs Message-ID: <20081113104514.GA17589@icarus.home.lan> In-Reply-To: <20081113102642.GQ47073@deviant.kiev.zoral.com.ua> References: <20081112175826.GD26195@carrick.bishnet.net> <20081112194735.GK47073@deviant.kiev.zoral.com.ua> <20081113004102.GD24360@carrick.bishnet.net> <20081113044200.GA10419@icarus.home.lan> <20081113102642.GQ47073@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Nov 13, 2008 at 12:26:42PM +0200, Kostik Belousov wrote: > On Wed, Nov 12, 2008 at 08:42:00PM -0800, Jeremy Chadwick wrote: > > On Thu, Nov 13, 2008 at 12:41:02AM +0000, Tim Bishop wrote: > > > On Wed, Nov 12, 2008 at 09:47:35PM +0200, Kostik Belousov wrote: > > > > On Wed, Nov 12, 2008 at 05:58:26PM +0000, Tim Bishop wrote: > > > > > I've been playing around with snapshots lately but I've got a problem on > > > > > one of my servers running 7-STABLE amd64: > > > > > > > > > > FreeBSD paladin 7.1-PRERELEASE FreeBSD 7.1-PRERELEASE #8: Mon Nov 10 20:49:51 GMT 2008 tdb@paladin:/usr/obj/usr/src/sys/PALADIN amd64 > > > > > > > > > > I run the mksnap_ffs command to take the snapshot and some time later > > > > > the system completely freezes up: > > > > > > > > > > paladin# cd /u2/.snap/ > > > > > paladin# mksnap_ffs /u2 test.1 > > > > > > > > > > It only happens on this one filesystem, though, which might be to do > > > > > with its size. It's not over the 2TB marker, but it's pretty close. It's > > > > > also backed by a hardware RAID system, although a smaller filesystem on > > > > > the same RAID has no issues. > > > > > > > > > > Filesystem 1K-blocks Used Avail Capacity Mounted on > > > > > /dev/da0s1a 2078881084 921821396 990749202 48% /u2 > > > > > > > > > > To clarify "completely freezes up": unresponsive to all services over > > > > > the network, except ping. On the console I can switch between the ttys, > > > > > but none of them respond. The only way out is to hit the reset button. > > > > > > > > You need to provide information described in the > > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html > > > > and especially > > > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html > > > > > > Ok, I've done that, and removed the patch that seemed to fix things. > > > > > > The first thing I notice after doing this on the console is that I can > > > still ctrl+t the process: > > > > > > load: 0.14 cmd: mksnap_ffs 2603 [newbuf] 0.00u 10.75s 0% 1160k > > > > > > But the top and ps I left running on other ttys have all stopped > > > responding. > > > > Then in my book, the patch didn't fix anything. :-) The system is > > still "deadlocking"; snapshot generation **should not** wedge the system > > hard like this. > You systematically mix two completely different issues: > - first one is the _deadlock_ experienced by Tim; Re-read what he wrote. Quote: "Ok, I've done that, and removed the patch that seemed to fix things. The first thing I notice after doing this on the console is that I can still ctrl+t the process: load: 0.14 cmd: mksnap_ffs 2603 [newbuf] 0.00u 10.75s 0% 1160k But the top and ps I left running on other ttys have all stopped responding." If he can press Control-T, it means SIGINFO can be sent to the mksnap_ffs process, and the process responds with that information. So, the system is not deadlocked -- meaning, I believe what he experiences is what others experience (the system becomes completely unusable during mksnap_ffs running, but DOES NOT hang or lock up, it just becomes so god-awful slow that processes on the machine literally sit and spin for minutes at a time). > - second one is the slowdown during snapshot creation. > In fact, I may count third, where dump itself hangs, as a usermode process, > but kernel still normally operates. > > Patch posted should fix or paper over the first issue for practical means. > Third issue most likely fixed by the subr_sleepqueue race fix. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20081113104514.GA17589>