From owner-freebsd-threads@FreeBSD.ORG Wed Sep 15 17:55:56 2004 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DCFBC16A4CE for ; Wed, 15 Sep 2004 17:55:56 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id C0BF143D5A for ; Wed, 15 Sep 2004 17:55:56 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id A8E4B7A43E; Wed, 15 Sep 2004 10:55:56 -0700 (PDT) Message-ID: <4148822C.7000902@elischer.org> Date: Wed, 15 Sep 2004 10:55:56 -0700 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: Andrew Gallatin References: <16703.11479.679335.588170@grasshopper.cs.duke.edu> <16703.12410.319869.29996@grasshopper.cs.duke.edu> <413F55B8.50003@elischer.org> <16703.28031.454342.774229@grasshopper.cs.duke.edu> <413F8DBB.5040502@elischer.org> <16704.40876.708925.425911@grasshopper.cs.duke.edu> <4140AA2A.90605@elischer.org> <16704.45327.42494.922427@grasshopper.cs.duke.edu> <4140C04D.1060906@elischer.org> <16704.49447.290897.602540@grasshopper.cs.duke.edu> <4146AAC1.5020701@elischer.org> <16711.383.448500.578640@grasshopper.cs.duke.edu> <4147FC1E.2010608@elischer.org> <16712.20538.804004.90978@grasshopper.cs.duke.edu> In-Reply-To: <16712.20538.804004.90978@grasshopper.cs.duke.edu> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-threads@freebsd.org Subject: Re: Unkillable KSE threaded proc X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Sep 2004 17:55:57 -0000 Andrew Gallatin wrote: >Julian Elischer writes: > > either of : > > http://www.freebsd.org/~julian/q.diff > > > > or > > > > http://www.freebsd.org/~julian/r.diff > > > > Might make some difference. > > > > today's q.diff has a fix that was missing yesterday. > >Both seem the same as unpatched head -- app starts, runs normally, >then skill -9 -u gallatin leaves threads stuck on the cpu, seeminlgly >deadlocking the system. > >But -- I think I now have a clue as to what's going on. I started a >ktrace of the problematic process just before doing the skill -9, and >afterwards it kept on tracing. > >I noticed it was stuck doing this: > > 569 mx_pingpong RET ioctl -1 errno 4 Interrupted system call > 569 mx_pingpong Events dropped. > 569 mx_pingpong RET ioctl -1 errno 4 Interrupted system call > 569 mx_pingpong Events dropped. > 569 mx_pingpong RET ioctl -1 errno 4 Interrupted system call > >It turns out that the userspace code is basically doing: > > do { > MUTEX_LOCK(&lock); > should_exit = work(); > MUTEX_UNLOCK(&lock); > ioctl(fd, DRIVER_WAIT) > } while (!should_exit); > return NULL; > >Changing it to > ><...> > rv = ioctl(fd, DRIVER_WAIT) > } while ((rv == 0 || rv == EWOULDBLOCK) && !should_exit); > return NULL; > >Seems like it works around the problem with your r.diff patch applied >to head. The ioctl in the driver boils down to a cv_timedwait_sig(), >which is where the EINTR is coming from. > >Even if this is our bug, I think that a user-level bug like this should >not be able to deadlock the system... > I agree.. the rule is that userland should not be able to crash the system.. so this is a bug either way.. > >FWIW, even with the fix to the user-level code, we still have the >original problem (one lingering thread using no CPU) in RELENG_5. > >Drew > > > >