Date: Wed, 29 Apr 2009 10:44:26 +0200 From: Alexander Leidinger <Alexander@Leidinger.net> To: Ben Kelly <ben@wanderview.com> Cc: freebsd-current@freebsd.org Subject: Re: [patch] zfs livelock and thread priorities Message-ID: <20090429104426.153917n5occcc5m0@webmail.leidinger.net> In-Reply-To: <4D8E4457-89AA-4F19-9960-E090D3B8E319@wanderview.com> References: <DC9F2088-A0AF-467D-8574-F24A045ABD81@wanderview.com> <ed91d4a80904131636u18c90474w7cdaa57bc7000e02@mail.gmail.com> <08D7DC2A-68BE-47B6-8D5D-5DE6B48F87E5@wanderview.com> <AC3C4C3F-40C6-4AF9-BAF3-2C4D1E444839@wanderview.com> <ed91d4a80904142135n429dea52o672abf51116fa707@mail.gmail.com> <ed91d4a80904241816r28531a04r2dc70fa8960d430e@mail.gmail.com> <bc2d970904241947r50576efbgc93164a9e4dd297d@mail.gmail.com> <ed91d4a80904242059n3642a40aud55df6d1b6a1695@mail.gmail.com> <FC83DB1E-6C08-4BD4-8BC9-437D714FEE9E@wanderview.com> <ed91d4a80904271839l49420c8rbcfd52dd6e72eb83@mail.gmail.com> <ed91d4a80904281111q3b9a3c45vc9fcf129dde8c10d@mail.gmail.com> <F86D3461-3ABD-4A56-B9A6-36857364DF4B@wanderview.com> <4D8E4457-89AA-4F19-9960-E090D3B8E319@wanderview.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Quoting Ben Kelly <ben@wanderview.com> (from Tue, 28 Apr 2009 17:19:29 -0400): > > On Apr 28, 2009, at 4:52 PM, Ben Kelly wrote: > >> On Apr 28, 2009, at 2:11 PM, Artem Belevich wrote: >>> My system had eventually deadlocked overnight, though it took much >>> longer than before to reach that point. >>> >>> In the end I've got many many processes sleeping in zio_wait with no >>> disk activity whatsoever. >>> I'm not sure if that's the same issue or not. >>> >>> Here are stack traces for all processes -- http://pastebin.com/f364e1452 >>> I've got the core saved, so if you want me to dig out some more info, >>> let me know if/how I could help. >> >> It looks like there is a possible deadlock between zfs_zget() and >> zfs_zinactive(). They both acquire a lock via >> ZFS_OBJ_HOLD_ENTER(). The zfs_zinactive() path can get called >> indirectly from within zio_done(). The zfs_zget() can in turn >> block waiting for zio_done()'s completion while holding the object >> lock. >> >> The following patch might help: >> >> http://www.wanderview.com/svn/public/misc/zfs/zfs_zinactive_deadlock.diff >> >> This simply bails out of the inactive processing if the object lock >> is already held. I'm not sure if this is 100% correct or not as it >> cannot verify there are references to the vnode. I also tried >> executing the zfs_zinactive() logic in a taskqueue to avoid the >> deadlock, but that caused other deadlocks to occur. > > Sorry to reply to my own mail, but I came up with a better solution > that I think is correct. I just vref() the vnode and then vrele() > it again from a taskqueue to restart the zfs_zinactive() processing > if its still applicable. This sounds a little bit related to the issues we discussed in the unlimited arc cache growth thread. Maybe the high value for the arc cache was a red herring and this is the real problem for the panics / watchdog triggers I experience on the system in question. I'm preparing a kernel with this patch and your zfs-prio patch, but I don't think I can fully test it this week. If I'm lucky I can install the new kernel, but I don't think I can put load on the system this week. Bye, Alexander. -- The length of a marriage is inversely proportional to the amount spent on the wedding. http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090429104426.153917n5occcc5m0>