Date: Wed, 6 Sep 1995 13:33:35 -0700 (MST) From: Terry Lambert <terry@lambert.org> To: rashid@haven.ios.com (Rashid Karimov.) Cc: hackers@FreeBSD.ORG Subject: Re: QUOTAs code causes WEIRD system locks in 210Stable Message-ID: <199509062033.NAA00669@phaeton.artisoft.com> In-Reply-To: <199509061926.PAA27707@haven.ios.com> from "Rashid Karimov." at Sep 6, 95 03:26:22 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> A LOT of processes are sleeping on wmesg "ufslk2" with stat of 3 > and flag of 00004. Saying a lot I mean probably 100-200 of them. It's a deadlock between VOP_LOCK and VFS_LOCK operations. The lock graph is not heirarchical, and it needs to be, or it's impossible to compute transitive closure over the graph. You can effectively flatten the graph by putting the quota file in the root of the file system to which the quota entries themselves apply. This will avoid the a->B b->A dealy embrace deadlock caused by directory traversal not unlocking and backing off to the root on failure (also fixable by asserting the VFS_LOCK and waiting for all VOP_LOCK locks to drain out before granting the VFS_LOCK request to the caller). This leaves A->a a->A and A->a a->B b->A deadlock possible. To avoid that, don't run quotas in a submount of a file system that has quotas. ie: / <- quotas NOT enabled | | /usr <- quotas enabled | | /usr/home <- quotas NOT enabled This is because the quota files are rooted relative to the system root, and you must directory traverse in order to get to the quota file. Probably quota editing wants to occur as file system operations other than read/write's on quota files themselves... a system call interface that calls the file system quota ops. > I tried to see what was on the caddr_t they were sleeping at ( that > should be i_node struct) and I got dev_t = 400 and inode = 2 (!!?). Inode 2 is the / inode for any file system. If /usr is a file system mounted on /usr, a subdirectory of /, then the mount point traversal puts you in inode 2 on the /usr file system. Its the parent/child lock a/lock A/unlock a/lock b/unlock A/lock B/unlock b To get 'B' that causes the deadlock over the mount point traversal. The parent vnode prior to the traversal is not a recoverable unlock on failure to traverse because of the VFS_LOCK. So this is the expected behaviour for a quota'd FS mounted on a quota'd FS at mount point traversal time (or traversal from the FS containing the quota file for the quota'd FS to the quota'ed FS). > What makes me think that was QUOTAs code which caused this is > that in the beginning there were a couple of processes "edquota" > which I started to change the QUOTAs for a couple of users. This should be prevented, or the VFS_LOCK on the quota'ed FS and the VOP_LOCK on the quota'ed FS's quota file should be a/A/b/B locked in heirarchy and the lock not granted until the other process closes the quota file. > So that's about it .... I'm not sure who was the author of the > QUOTAs code and where is the bug exactly. The QUOTA code is from the original BSD FFS/UFS sources. > I also know that certain ppl here think that the current > implementation of QUOTA mechanism sucks and aren't willing to > change it, voting for rewriting the thing form the scratch. A certain amount of rewriting is inevitable. Whether this takes the form of enforcing the placement of the quota file in the root of the FS being quota'd and minor mods to VFS_LOCK, or a full rewrite and the importation of a hierarchical lock manager (should be there anyway for return EWOULDBLOCK for embrace deadlock on flock() operations), is really immaterial. My vote for a rewrite would make the quota code a stackable layer, where you mount /usr on /usr and do quota file I/O so you can put quotas on DOS and other partitions. Currently, I do not have time for this. I'm chasing issues in the namei() and lookup() code and in the unionfs and portalfs use of the bogus nameidata fields for consumption of path components, and the expansion of symlinks into the pathname buffer in place causing NFS mounted symlinks to exceed MAXPATHLEN depending on mount depth differring from the source host's FS. I might have time for this after FreeBSD runs on a couple platforms and supports SMP and kernel level file system multithreading. 8-(. > What should we do about it ? I don't have enough time to dedicate > to this problem and frankly don't have enough kernel programming > experience to work on it. > The same time QUOTAs are a must for FBSD to be used as a user > server. Limit the way in which you use quotas. Don't run multiple instances of edquota. Turn quota's off before running edquota and back on when you are done editing. Use the quotactl(2) interface to turn quotas on using specific file paths to put the quota files on the drives where quotas are being enforced (ie: get rid of the 'userquota' option and turn them on manually per fs in your /etc/rc after they've been mounted. That should keep you away from at least the known failure modes. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199509062033.NAA00669>