Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 12 May 1999 22:28:06 -0700 (PDT)
From:      Cliff Skolnick <cliff@steam.com>
To:        Cy Schubert - ITSD Open Systems Group <Cy.Schubert@uumail.gov.bc.ca>
Cc:        dg@root.com, Mike Tancsa <mike@sentex.net>, freebsd-stable@FreeBSD.ORG, luoqi@FreeBSD.ORG, Matthew Dillon <dillon@apollo.backplane.com>
Subject:   Re: vm_fault deadlock and PR 8416 ... NOT fixed! 
Message-ID:  <Pine.BSF.4.10.9905122201260.1290-100000@lazlo.internal.steam.com>
In-Reply-To: <Pine.BSF.4.10.9905121942190.40655-100000@lazlo.internal.steam.com>

next in thread | previous in thread | raw e-mail | index | archive | help

I've had to back this out for 2 reasons:

1) options DEBUG_LOCKS changes the size of the proc structure, which mean
"ps" and "w" will not work without a compile.  I don't understand the proper
use of this flag, so I'm not going to try.

2) After looking more deeply into the suggestion at checking the error
return, I now realize that an error is probably not returned.  Instead the
thread will most likely just go to sleep.  I thought of using a LK_NOWAIT,
but that will not work and will return right off the bat.  I looked into
LK_SLEEPFAIL to give it a chance at the lock, but I'm seeing that the
majority of locks are created with a 0 timeout value so no luck there.

I thought of setting the LK_SLEEPFAIL, and mucking with the timeout on the
lock structure itself before calling acquire.  This is a little beyond my
experience with FreeBSD kernel hacking.  I'll wait until someone familiar
with the code gives me a bit of advice on this.  Any takers?

I'm also thinking that a generic deadlock detector could be placed as debug
code in the acquire/apause procedure.  It would be impossible to catch
complex ones, but simple ones could only be a little work.  I'm going to
look into that in the next couple days after I learn a bit more about the
structure of the kernel. I hate to admit that most of my kernel hacking was
in network drivers and the VM system for solaris 2.x (where x <= 3), along
with a bit of SCSI work on 4.0.3.  It has been a while.

So many things, so little time,
Cliff

On Wed, 12 May 1999, Cliff Skolnick wrote:

> 
> I've added "options DEBUG_LOCKS" to my kernel, and added the following code.
> I'll post any interesting log messages that I see over the next day or so.
> 
> If I've done something terribly wrong that will hose my system, please email
> me!  Hacking locking code makes me a bit nervous.
> 
> 
> Here's the diff to kern_lock.c I just added - someone in the know please
> review this, and email if I need to add more instrumentation:
> 
> *** kern_lock.c-orig	Tue May 11 22:53:19 1999
> --- kern_lock.c	Wed May 12 19:45:59 1999
> ***************
> *** 229,236 ****
>   					     LK_WANT_UPGRADE
>   					);
>   			}
> ! 			if (error)
> ! 				break;
>   			sharelock(lkp, 1);
>   			COUNT(p, 1);
>   			break;
> --- 229,248 ----
>   					     LK_WANT_UPGRADE
>   					);
>   			}
> ! 
> ! /* we will fall through and grant the lock after printing info */
> ! 			if (error) {
> ! 				if ((lkp->lk_flags & LK_SHARE_NONZERO) != 0 &&
> ! 					(flags & LK_CANRECURSE) != 0) {
> ! 					printf("deadlocktreat=%d, flags=0x%x\n",
> ! 						p->p_flag & P_DEADLKTREAT,
> ! 						extflags);
> ! 					lockmgr_printinfo(lkp);
> ! 					/* fall through to grant lock */
> ! 				} else {
> ! 					break;
> ! 				}
> ! 			}
>   			sharelock(lkp, 1);
>   			COUNT(p, 1);
>   			break;
> 
> 
> On Wed, 12 May 1999, Cy Schubert - ITSD Open Systems Group wrote:
> 
> > Would plan B risk corruption of any data?  Could itself be the likely 
> > cause of any potential panics?
> > 
> > Assuming plan B has no major risks, this might be a temporary 
> > workaround until we can wrap our minds around this one.  It's just a 
> > rework of Luoqi's patch, just in case we want to try plan B again.
> > 
> > --- kern_lock.c.orig	Tue May 11 08:34:52 1999
> > +++ kern_lock.c	Wed May 12 05:38:52 1999
> > @@ -215,7 +215,9 @@
> >  		 * lock itself ).
> >  		 */
> >  		if (lkp->lk_lockholder != pid) {
> > -			if (p->p_flag & P_DEADLKTREAT) {
> > +			if ((p->p_flag & P_DEADLKTREAT) ||
> > +			    ((lkp->lk_flags & LK_SHARE_NONZERO) != 0 &&
> > +			    (flags & LK_CANRECURSE) != 0) {
> >  				error = acquire(
> >  					    lkp,
> >  					    extflags,
> > 
> > If this workaround doesn't work, then setting error = 0 and allowing 
> > the code to fall through to the subsequent sharelock may be our only 
> > choice for now.
> > 
> > The other point I wish to make for all on this list is that Matt's 
> > patch fixes a read()/mmap() deadlock.  It doesn't fix a write()/mmap() 
> > deadlock.
> > 
> > 
> > Regards,                       Phone:  (250)387-8437
> > Cy Schubert                      Fax:  (250)387-5766
> > Open Systems Group          Internet:  Cy.Schubert@uumail.gov.bc.ca
> > ITSD                                   Cy.Schubert@gems8.gov.bc.ca
> > Province of BC
> >                       "e**(i*pi)+1=0"
> > 
> > In message <199905120755.AAA01361@implode.root.com>, David Greenman 
> > writes:
> > > >Well a few minutes ago my system went into deadlock - and this is with the
> > > >kern_lock.c dated 5/11.  This patch is different than the one in 8416 that
> > > >solved my problem before.  I'd say this the problem is still there.
> > > 
> > >    Time is very short for getting this fixed before the release deadline. I
> > > think Luoqi's patch that was in the PR was suseptible to a priority inversion
> > > problem and has risks associated with using it. The fix that Matt Dillion
> > > made for -current that I back-ported to -stable was an attempt to fix the
> > > problem while minimizing the side effects. If it doesn't fix the problem
> > > then we'll proceed with plan B which is probably to just go with Luoqi's
> > > fix or to possibly troubleshoot Matt's fix (but as I said, time is short).
> > > 
> > > >Once again my server is useless, deadlocked.  No panic, responding to pings,
> > > >no ability to do disk I/O or any VM related stuff.
> > > >
> > > >An unhappy freebsd user once again,
> > > 
> > >    Is this really necessary? It sure doesn't help the debugging process.
> > > 
> > > -DG
> > > 
> > > David Greenman
> > > Co-founder/Principal Architect, The FreeBSD Project - http://www.freebsd.org
> > > Creator of high-performance Internet servers - http://www.terasolutions.com
> > > 
> > > 
> > > To Unsubscribe: send mail to majordomo@FreeBSD.org
> > > with "unsubscribe freebsd-stable" in the body of the message
> > > 
> > 
> > 
> > 
> > 
> > To Unsubscribe: send mail to majordomo@FreeBSD.org
> > with "unsubscribe freebsd-stable" in the body of the message
> > 
> 
> --
> Cliff Skolnick          | "They that can give up essential liberty to obtain
> Steam Tunnel Operations |  a little temporary safety deserve neither liberty
> cliff@steam.com         |  nor safety."
> http://www.steam.com/   |                   -- Benjamin Franklin, 1759
> 
> 
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-stable" in the body of the message
> 

--
Cliff Skolnick          | "They that can give up essential liberty to obtain
Steam Tunnel Operations |  a little temporary safety deserve neither liberty
cliff@steam.com         |  nor safety."
http://www.steam.com/   |                   -- Benjamin Franklin, 1759






To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.10.9905122201260.1290-100000>