From owner-freebsd-stable Wed May 12 22:28:37 1999 Delivered-To: freebsd-stable@freebsd.org Received: from lazlo.internal.steam.com (lazlo.steam.com [199.108.84.37]) by hub.freebsd.org (Postfix) with ESMTP id BEDEF14D91; Wed, 12 May 1999 22:28:33 -0700 (PDT) (envelope-from cliff@steam.com) Received: from lazlo.internal.steam.com (cliff@lazlo.internal.steam.com [192.168.32.2]) by lazlo.internal.steam.com (8.9.3/8.9.3) with ESMTP id WAA01555; Wed, 12 May 1999 22:28:06 -0700 (PDT) Date: Wed, 12 May 1999 22:28:06 -0700 (PDT) From: Cliff Skolnick X-Sender: cliff@lazlo.internal.steam.com To: Cy Schubert - ITSD Open Systems Group Cc: dg@root.com, Mike Tancsa , freebsd-stable@FreeBSD.ORG, luoqi@FreeBSD.ORG, Matthew Dillon Subject: Re: vm_fault deadlock and PR 8416 ... NOT fixed! In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I've had to back this out for 2 reasons: 1) options DEBUG_LOCKS changes the size of the proc structure, which mean "ps" and "w" will not work without a compile. I don't understand the proper use of this flag, so I'm not going to try. 2) After looking more deeply into the suggestion at checking the error return, I now realize that an error is probably not returned. Instead the thread will most likely just go to sleep. I thought of using a LK_NOWAIT, but that will not work and will return right off the bat. I looked into LK_SLEEPFAIL to give it a chance at the lock, but I'm seeing that the majority of locks are created with a 0 timeout value so no luck there. I thought of setting the LK_SLEEPFAIL, and mucking with the timeout on the lock structure itself before calling acquire. This is a little beyond my experience with FreeBSD kernel hacking. I'll wait until someone familiar with the code gives me a bit of advice on this. Any takers? I'm also thinking that a generic deadlock detector could be placed as debug code in the acquire/apause procedure. It would be impossible to catch complex ones, but simple ones could only be a little work. I'm going to look into that in the next couple days after I learn a bit more about the structure of the kernel. I hate to admit that most of my kernel hacking was in network drivers and the VM system for solaris 2.x (where x <= 3), along with a bit of SCSI work on 4.0.3. It has been a while. So many things, so little time, Cliff On Wed, 12 May 1999, Cliff Skolnick wrote: > > I've added "options DEBUG_LOCKS" to my kernel, and added the following code. > I'll post any interesting log messages that I see over the next day or so. > > If I've done something terribly wrong that will hose my system, please email > me! Hacking locking code makes me a bit nervous. > > > Here's the diff to kern_lock.c I just added - someone in the know please > review this, and email if I need to add more instrumentation: > > *** kern_lock.c-orig Tue May 11 22:53:19 1999 > --- kern_lock.c Wed May 12 19:45:59 1999 > *************** > *** 229,236 **** > LK_WANT_UPGRADE > ); > } > ! if (error) > ! break; > sharelock(lkp, 1); > COUNT(p, 1); > break; > --- 229,248 ---- > LK_WANT_UPGRADE > ); > } > ! > ! /* we will fall through and grant the lock after printing info */ > ! if (error) { > ! if ((lkp->lk_flags & LK_SHARE_NONZERO) != 0 && > ! (flags & LK_CANRECURSE) != 0) { > ! printf("deadlocktreat=%d, flags=0x%x\n", > ! p->p_flag & P_DEADLKTREAT, > ! extflags); > ! lockmgr_printinfo(lkp); > ! /* fall through to grant lock */ > ! } else { > ! break; > ! } > ! } > sharelock(lkp, 1); > COUNT(p, 1); > break; > > > On Wed, 12 May 1999, Cy Schubert - ITSD Open Systems Group wrote: > > > Would plan B risk corruption of any data? Could itself be the likely > > cause of any potential panics? > > > > Assuming plan B has no major risks, this might be a temporary > > workaround until we can wrap our minds around this one. It's just a > > rework of Luoqi's patch, just in case we want to try plan B again. > > > > --- kern_lock.c.orig Tue May 11 08:34:52 1999 > > +++ kern_lock.c Wed May 12 05:38:52 1999 > > @@ -215,7 +215,9 @@ > > * lock itself ). > > */ > > if (lkp->lk_lockholder != pid) { > > - if (p->p_flag & P_DEADLKTREAT) { > > + if ((p->p_flag & P_DEADLKTREAT) || > > + ((lkp->lk_flags & LK_SHARE_NONZERO) != 0 && > > + (flags & LK_CANRECURSE) != 0) { > > error = acquire( > > lkp, > > extflags, > > > > If this workaround doesn't work, then setting error = 0 and allowing > > the code to fall through to the subsequent sharelock may be our only > > choice for now. > > > > The other point I wish to make for all on this list is that Matt's > > patch fixes a read()/mmap() deadlock. It doesn't fix a write()/mmap() > > deadlock. > > > > > > Regards, Phone: (250)387-8437 > > Cy Schubert Fax: (250)387-5766 > > Open Systems Group Internet: Cy.Schubert@uumail.gov.bc.ca > > ITSD Cy.Schubert@gems8.gov.bc.ca > > Province of BC > > "e**(i*pi)+1=0" > > > > In message <199905120755.AAA01361@implode.root.com>, David Greenman > > writes: > > > >Well a few minutes ago my system went into deadlock - and this is with the > > > >kern_lock.c dated 5/11. This patch is different than the one in 8416 that > > > >solved my problem before. I'd say this the problem is still there. > > > > > > Time is very short for getting this fixed before the release deadline. I > > > think Luoqi's patch that was in the PR was suseptible to a priority inversion > > > problem and has risks associated with using it. The fix that Matt Dillion > > > made for -current that I back-ported to -stable was an attempt to fix the > > > problem while minimizing the side effects. If it doesn't fix the problem > > > then we'll proceed with plan B which is probably to just go with Luoqi's > > > fix or to possibly troubleshoot Matt's fix (but as I said, time is short). > > > > > > >Once again my server is useless, deadlocked. No panic, responding to pings, > > > >no ability to do disk I/O or any VM related stuff. > > > > > > > >An unhappy freebsd user once again, > > > > > > Is this really necessary? It sure doesn't help the debugging process. > > > > > > -DG > > > > > > David Greenman > > > Co-founder/Principal Architect, The FreeBSD Project - http://www.freebsd.org > > > Creator of high-performance Internet servers - http://www.terasolutions.com > > > > > > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > > > with "unsubscribe freebsd-stable" in the body of the message > > > > > > > > > > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > > with "unsubscribe freebsd-stable" in the body of the message > > > > -- > Cliff Skolnick | "They that can give up essential liberty to obtain > Steam Tunnel Operations | a little temporary safety deserve neither liberty > cliff@steam.com | nor safety." > http://www.steam.com/ | -- Benjamin Franklin, 1759 > > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-stable" in the body of the message > -- Cliff Skolnick | "They that can give up essential liberty to obtain Steam Tunnel Operations | a little temporary safety deserve neither liberty cliff@steam.com | nor safety." http://www.steam.com/ | -- Benjamin Franklin, 1759 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message