From owner-freebsd-stable Thu May 13 2:54:38 1999 Delivered-To: freebsd-stable@freebsd.org Received: from lazlo.internal.steam.com (lazlo.steam.com [199.108.84.37]) by hub.freebsd.org (Postfix) with ESMTP id C1C8A150E3 for ; Thu, 13 May 1999 02:54:34 -0700 (PDT) (envelope-from cliff@steam.com) Received: from lazlo.internal.steam.com (cliff@lazlo.internal.steam.com [192.168.32.2]) by lazlo.internal.steam.com (8.9.3/8.9.3) with ESMTP id CAA02550 for ; Thu, 13 May 1999 02:54:46 -0700 (PDT) Date: Thu, 13 May 1999 02:54:45 -0700 (PDT) From: Cliff Skolnick X-Sender: cliff@lazlo.internal.steam.com To: freebsd-stable@freebsd.org Subject: help w/ deadlock avoiding code Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Could someone please take a look at the following patch to kern_lock.c and tell me if I am insane. I'd like to help find/fix the deadlock problem ASAP. The following code hopefully will limit the time on the sleepq to 10 seconds for a shared lock if there is another holder of a shared lock and LK_CANRECURSE is set. After 10 seconds the shared lock will be granted, even if there are waiting upgrade or exclusive requests. There is a bit of uglyness in how I pass the request type down to acquire, but then again this is to debug only. The code compiles, but I am not running this kernel. Hopefully this will work around the problem while gathering data to figure out what is really happening. I also would like to point out that the commited patch adds a dereference to "p" in the LK_SHARED case. The COUNT() macro checks for NULL, and there is a check against NULL at the top of the function. It just caught my eye, don't know enough to gauge if this is a real problem. Cliff PS if this thread should be moved to freebsd-hackers I'll join that list. *** /sys/kern/kern_lock.c-orig Tue May 11 22:53:19 1999 --- kern_lock.c Thu May 13 02:48:54 1999 *************** *** 41,46 **** --- 41,48 ---- * $Id: kern_lock.c,v 1.23.2.1 1999/05/11 07:54:18 dg Exp $ */ + #define CCS + #include "opt_lint.h" #include *************** *** 48,53 **** --- 50,59 ---- #include #include + #ifdef CCS + #include + #endif CCS + /* * Locking primitives implementation. * Locks provide shared/exclusive sychronization. *************** *** 123,132 **** --- 129,154 ---- return 1; } + static int acquire(struct lock *lkp, int extflags, int wanted) { int s, error; + #ifdef CCS + #define LK_DEADCHECK 10 + + int isshared = 0; + int deadlockavoid = 0; + struct timeval deadlocktime; + + /* + * This is ugly, quick and dirty way to pass in the request. This is + * only to debug. + */ + isshared = extflags & LK_SHARED; + extflags &= ~LK_SHARED; + + #endif CCS if ((extflags & LK_NOWAIT) && (lkp->lk_flags & wanted)) { return EBUSY; } *************** *** 137,148 **** --- 159,190 ---- return 0; } + #ifdef CCS + if ((isshared != 0) && + (lkp->lk_flags & LK_SHARE_NONZERO) && + (extflags & LK_CANRECURSE)) { + deadlockavoid = 1; + getmicrotime(&deadlocktime); + } + #endif CCS + s = splhigh(); while ((lkp->lk_flags & wanted) != 0) { lkp->lk_flags |= LK_WAIT_NONZERO; lkp->lk_waitcount++; simple_unlock(&lkp->lk_interlock); + #ifdef CCS + if ((deadlockavoid == 0) || + ((lkp->lk_timo > 0) && (lkp->lk_timo < LK_DEADCHECK*hz))) { + error = tsleep(lkp, lkp->lk_prio, lkp->lk_wmesg, + lkp->lk_timo); + } else { + error = tsleep(lkp, lkp->lk_prio, lkp->lk_wmesg, + LK_DEADCHECK*hz); + } + #else CCS error = tsleep(lkp, lkp->lk_prio, lkp->lk_wmesg, lkp->lk_timo); + #endif CCS simple_lock(&lkp->lk_interlock); if (lkp->lk_waitcount == 1) { lkp->lk_flags &= ~LK_WAIT_NONZERO; *************** *** 158,163 **** --- 200,230 ---- splx(s); return ENOLCK; } + #ifdef CCS + /* if the lock is still held and deadlock avoiding */ + if ((lkp->lk_flags & wanted) != 0 && + deadlockavoid != 0) { + struct timeval curtime; + + /* looked safe to call at a hight spl */ + getmicrotime(&curtime); + + if ((curtime.tv_sec - deadlocktime.tv_sec) > + LK_DEADCHECK) { + + splx(s); /* drop the priority while printing */ + + /* waited long enough, grant the request now + if no one has an exclusive lock */ + + printf("lock request modified\n"); + lockmgr_printinfo(lkp); + s = splhigh(); + + wanted &= ~(LK_WANT_EXCL | LK_WANT_UPGRADE); + } + } + #endif CCS } splx(s); return 0; *************** *** 218,230 **** --- 285,307 ---- if (p->p_flag & P_DEADLKTREAT) { error = acquire( lkp, + #ifdef CCS + /* ugly, but overload the parameter by adding in the LK_SHARED bit */ + extflags | LK_SHARED, + #else CCS extflags, + #endif LK_HAVE_EXCL ); } else { error = acquire( lkp, + #ifdef CCS + /* ugly again, as above */ + extflags | LK_SHARED, + #else CCS extflags, + #endif LK_HAVE_EXCL | LK_WANT_EXCL | LK_WANT_UPGRADE ); -- Cliff Skolnick | "They that can give up essential liberty to obtain Steam Tunnel Operations | a little temporary safety deserve neither liberty cliff@steam.com | nor safety." http://www.steam.com/ | -- Benjamin Franklin, 1759 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message