Date: Wed, 15 Sep 2004 10:22:50 -0400 (EDT) From: Andrew Gallatin <gallatin@cs.duke.edu> To: Julian Elischer <julian@elischer.org> Cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc Message-ID: <16712.20538.804004.90978@grasshopper.cs.duke.edu> In-Reply-To: <4147FC1E.2010608@elischer.org> References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> <4146AAC1.5020701@elischer.org> <16711.383.448500.578640@grasshopper.cs.duke.edu> <4147FC1E.2010608@elischer.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Julian Elischer writes: > either of : > http://www.freebsd.org/~julian/q.diff > > or > > http://www.freebsd.org/~julian/r.diff > > Might make some difference. > > today's q.diff has a fix that was missing yesterday. Both seem the same as unpatched head -- app starts, runs normally, then skill -9 -u gallatin leaves threads stuck on the cpu, seeminlgly deadlocking the system. But -- I think I now have a clue as to what's going on. I started a ktrace of the problematic process just before doing the skill -9, and afterwards it kept on tracing. I noticed it was stuck doing this: 569 mx_pingpong RET ioctl -1 errno 4 Interrupted system call 569 mx_pingpong Events dropped. 569 mx_pingpong RET ioctl -1 errno 4 Interrupted system call 569 mx_pingpong Events dropped. 569 mx_pingpong RET ioctl -1 errno 4 Interrupted system call It turns out that the userspace code is basically doing: do { MUTEX_LOCK(&lock); should_exit = work(); MUTEX_UNLOCK(&lock); ioctl(fd, DRIVER_WAIT) } while (!should_exit); return NULL; Changing it to <...> rv = ioctl(fd, DRIVER_WAIT) } while ((rv == 0 || rv == EWOULDBLOCK) && !should_exit); return NULL; Seems like it works around the problem with your r.diff patch applied to head. The ioctl in the driver boils down to a cv_timedwait_sig(), which is where the EINTR is coming from. Even if this is our bug, I think that a user-level bug like this should not be able to deadlock the system... FWIW, even with the fix to the user-level code, we still have the original problem (one lingering thread using no CPU) in RELENG_5. Drew
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?16712.20538.804004.90978>