From owner-freebsd-hackers Wed Sep 6 12:25:12 1995 Return-Path: hackers-owner Received: (from majordom@localhost) by freefall.freebsd.org (8.6.11/8.6.6) id MAA19779 for hackers-outgoing; Wed, 6 Sep 1995 12:25:12 -0700 Received: from haven.ios.com (haven.ios.com [198.4.75.45]) by freefall.freebsd.org (8.6.11/8.6.6) with ESMTP id MAA19773 for ; Wed, 6 Sep 1995 12:25:10 -0700 Received: (from rashid@localhost) by haven.ios.com (8.6.11/8.6.9) id PAA27707 for hackers@freebsd.org; Wed, 6 Sep 1995 15:26:22 -0400 From: "Rashid Karimov." Message-Id: <199509061926.PAA27707@haven.ios.com> Subject: QUOTAs code causes WEIRD system locks in 210Stable To: hackers@freebsd.org Date: Wed, 6 Sep 1995 15:26:22 -0400 (EDT) X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 3493 Sender: hackers-owner@freebsd.org Precedence: bulk Hi there folx, looks like we're getting closer to the biggest problem the Free BSD faces with on the servers market: system locks. Random and confusing. A long time ago I was advised by Terry (Lambert) that the QUOTAs implementation in FreeBSD could cause the system lockup. Citation: >From looking at the quota code, it looks like it may not take the dev_t field into account when computing quotas, which means that it doesn't use the right mount record to locate the quota file. So it locks entrancy across file systems, but doesn't compute transitive closure over the directed graph which is the set of locks held in all file systems. Really, it should per-fs lock to guarantee reentrancy. Probably this is a 10-15 line fix in the quota code alone, if someone spends the time working it out (I've just taken a quick pass through the code that does vonde-based I/O that the quota stuff uses, and I didn't see any obvious bugs). =-=-=-= End of citation =-=-=-= But it happened that starting with 205 we all got working QUOTAs and they still work with 2.1Stable kind of fine. But: on certain servers here ( they all pretty much the same - P90-120 PCI Adaptecs/SMC EtherPowers,4000 users) I get often locks. System literally dies. And having now DDB compiled I was able to see what's going on when the systems lock. A LOT of processes are sleeping on wmesg "ufslk2" with stat of 3 and flag of 00004. Saying a lot I mean probably 100-200 of them. I tried to see what was on the caddr_t they were sleeping at ( that should be i_node struct) and I got dev_t = 400 and inode = 2 (!!?). Looks like those processes went to sleep from ufs_lock() func. ("ufs_vnops.c" file). What makes me think that was QUOTAs code which caused this is that in the beginning there were a couple of processes "edquota" which I started to change the QUOTAs for a couple of users. When I did ps ( and system was alive at that time) a bit later I saw those processes (only!) were sleeping on the same "ufslk2" event. I decided to start extra "edquota" process - just for the heck of it and system locked up in a minute. When running "ps" from DDB later - all those processes the system was running were sleeping on the same event ( well, almost all of them - probably 90%, but there were a few sleeping on the same wmesg but with different wait channels). So that's about it .... I'm not sure who was the author of the QUOTAs code and where is the bug exactly. I also know that certain ppl here think that the current implementation of QUOTA mechanism sucks and aren't willing to change it, voting for rewriting the thing form the scratch. What should we do about it ? I don't have enough time to dedicate to this problem and frankly don't have enough kernel programming experience to work on it. The same time QUOTAs are a must for FBSD to be used as a user server. A bit more about that system: P90 ASUSP54TP4 motherboard , Adaptec 2940 + 2 SEAGATE BARRACUDAs, QUOTAs are on : /dev/sd0a / ufs rw 1 1 /dev/sd0s1b none swap sw 0 0 proc /proc procfs rw 0 0 /dev/sd0s1e /usr ufs rw 1 1 /dev/sd0s1f /var ufs rw,userquota 1 1 /dev/sd1s1e /u/u1 ufs rw,userquota 1 1 /dev/sd1s1f /u/u2 ufs rw,userquota 1 1 /dev/sd1s1g /u/u3 ufs rw,userquota 1 1 /dev/sd1s1h /u/u4 ufs rw,userquota 1 1 Rashid