Date: Tue, 14 Jan 2014 16:48:08 +0200 From: Andriy Gapon <avg@FreeBSD.org> To: krichy@tvnetwork.hu Cc: freebsd-fs@FreeBSD.org Subject: Re: kern/184677 / ZFS snapshot handling deadlocks Message-ID: <52D54E28.5020208@FreeBSD.org> In-Reply-To: <alpine.DEB.2.10.1401141317160.27609@krichy.tvnetwork.hu> References: <alpine.DEB.2.10.1312161647410.11599@krichy.tvnetwork.hu> <alpine.DEB.2.10.1312171326520.7714@krichy.tvnetwork.hu> <alpine.DEB.2.10.1312191629370.4344@krichy.tvnetwork.hu> <20131220134405.GE1658@garage.freebsd.pl> <alpine.DEB.2.10.1312231750030.31822@krichy.tvnetwork.hu> <alpine.DEB.2.10.1312240706470.8128@krichy.tvnetwork.hu> <alpine.DEB.2.10.1312242126250.19813@krichy.tvnetwork.hu> <alpine.DEB.2.10.1312242133240.20268@krichy.tvnetwork.hu> <alpine.DEB.2.10.1312250614001.2785@krichy.tvnetwork.hu> <52D5239B.6000203@FreeBSD.org> <alpine.DEB.2.10.1401141317160.27609@krichy.tvnetwork.hu>
next in thread | previous in thread | raw e-mail | index | archive | help
on 14/01/2014 14:25 krichy@tvnetwork.hu said the following: > Dear Andriy, > > I've also sent a detailed explanation of my changes to Will Andrews, I could > also forward that to you. This would be nice. > You are right, these areas are critical areas, though > the patch is very small and solves some of the issues. And that is also true > that I am not a really FreeBSD developer, just I compared illumos and freebsd > code, debugged these locking issues for a while. Then at the end I think the > result patch is reasonably small, and I also tried to follow some freebsd > recommendations. I hope to be able to examine your proposed patch soon-ish. > If I can help somehow I am willing to do. Also I would accept any critics on my > work. It is for a month or two now when we discovered these bugs, and > unfortunately it happened on our production servers, and with these patches our > system works since then. > > But of course, that is not a proof of concept. Just in case, does the patched code pass my concurrent ls -l test? :-) The "test" is here: http://thread.gmane.org/gmane.os.freebsd.devel.file-systems/18924/focus=19056 > > Kojedzinszky Richard > Euronet Magyarorszag Informatikai Zrt. > > On Tue, 14 Jan 2014, Andriy Gapon wrote: > >> Date: Tue, 14 Jan 2014 13:46:35 +0200 >> From: Andriy Gapon <avg@FreeBSD.org> >> To: krichy@tvnetwork.hu >> Cc: freebsd-fs@FreeBSD.org >> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >> >> on 25/12/2013 07:22 krichy@tvnetwork.hu said the following: >>> I've made a new patch again, which fixes most of my earlier issues. Mainly, I've >>> enabled shared vnode locks for GFS vnodes - is it acceptable? And that way, >>> deadlock cases reduced much, and also the patch is more clear (at least to me). >>> One thing still remains, the spa_namespace_lock race I mentioned before, I dont >>> know how to handle that. >>> >>> Waiting for comments on this. >> >> Richard, >> >> first of all, apologies for the delay with a reply and for not having any >> comment on the essence of your patch... >> >> I do have the following meta-comment. >> >> - working with FreeBSD VFS is hard >> - porting code that was written for a very different VFS model to FreeBSD VFS is >> even harder >> - doing the same for the code that plays various tricks, like .zfs support code >> does, is harder again >> - reviewing somebody else's changes to that kind of code is harder still than >> making such changes >> >> This is quite an unfortunate situation. I am not much surprised by the lack of >> followups to your analysis and patches. >> I am saying this as someone who spent some time analyzing and trying to fix the >> .zfs code and ZFS<->VFS interaction in general. See e.g. >> http://thread.gmane.org/gmane.os.freebsd.devel.file-systems/18924/focus=19056 >> >> My opinion is that .zfs code breaks several fundamental FreeBSD VFS contracts. >> The most glaring violation is that VOP_INACTIVE makes a vnode invalid. >> I think that trying to fix .zfs code by patching individual problems here and >> there is an uphill battle. I think that the same also applies to ZFS ZPL code >> but in a less obvious way. >> The code in many cases just pretends to play by VFS rules by satisfying most >> obvious assertions, but it does not really try to obey VFS contracts. For >> example, almost all locks in znode are mostly redundant given VFS vnode locking. >> But in some cases they are not sufficient precisely because VFS expects its >> locking to be used rather than ZFS internal locking. The most obvious example >> is interaction of VOP_RENAME with other VOPs. >> >> In any case, thanks for your work! I am trying to find some time to review it. >> >> -- >> Andriy Gapon >> -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?52D54E28.5020208>