Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Sep 2004 10:55:56 -0700
From:      Julian Elischer <julian@elischer.org>
To:        Andrew Gallatin <gallatin@cs.duke.edu>
Cc:        freebsd-threads@freebsd.org
Subject:   Re: Unkillable KSE threaded proc
Message-ID:  <4148822C.7000902@elischer.org>
In-Reply-To: <16712.20538.804004.90978@grasshopper.cs.duke.edu>
References:  <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> <4146AAC1.5020701@elischer.org> <16711.383.448500.578640@grasshopper.cs.duke.edu> <4147FC1E.2010608@elischer.org> <16712.20538.804004.90978@grasshopper.cs.duke.edu>

next in thread | previous in thread | raw e-mail | index | archive | help


Andrew Gallatin wrote:

>Julian Elischer writes:
> > either of :
> > http://www.freebsd.org/~julian/q.diff
> > 
> > or
> > 
> > http://www.freebsd.org/~julian/r.diff
> > 
> > Might make some difference.
> > 
> > today's q.diff has a fix that was missing yesterday.
>
>Both seem the same as unpatched head -- app starts, runs normally,
>then skill -9 -u gallatin leaves threads stuck on the cpu, seeminlgly
>deadlocking the system.
>
>But -- I think I now have a clue as to what's going on.  I started a
>ktrace of the problematic process just before doing the skill -9, and
>afterwards it kept on tracing.
>
>I noticed it was stuck doing this:
>
>   569 mx_pingpong RET   ioctl -1 errno 4 Interrupted system call
>   569 mx_pingpong Events dropped.
>   569 mx_pingpong RET   ioctl -1 errno 4 Interrupted system call
>   569 mx_pingpong Events dropped.
>   569 mx_pingpong RET   ioctl -1 errno 4 Interrupted system call
>
>It turns out that the userspace code is basically doing:
>
>  do {
>    MUTEX_LOCK(&lock);
>    should_exit = work();
>    MUTEX_UNLOCK(&lock);
>    ioctl(fd, DRIVER_WAIT)
>  } while (!should_exit);
>  return NULL;
>
>Changing it to
>
><...>
>    rv = ioctl(fd, DRIVER_WAIT)
>  } while ((rv == 0 || rv == EWOULDBLOCK) && !should_exit);
>  return NULL;
>
>Seems like it works around the problem with your r.diff patch applied
>to head.  The ioctl in the driver boils down to a cv_timedwait_sig(),
>which is where the EINTR is coming from.
>
>Even if this is our bug, I think that a user-level bug like this should
>not be able to deadlock the system... 
>

I agree.. the rule is that userland should not be able to crash the system..
so this is a bug either way..

>
>FWIW, even with the fix to the user-level code, we still have the
>original problem (one lingering thread using no CPU) in RELENG_5.
>
>Drew
>
>
>  
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4148822C.7000902>