From owner-freebsd-stable@FreeBSD.ORG Tue Sep 23 18:55:23 2008 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C6750106569E for ; Tue, 23 Sep 2008 18:55:23 +0000 (UTC) (envelope-from onemda@gmail.com) Received: from rv-out-0506.google.com (rv-out-0506.google.com [209.85.198.236]) by mx1.freebsd.org (Postfix) with ESMTP id 97D3D8FC19 for ; Tue, 23 Sep 2008 18:55:18 +0000 (UTC) (envelope-from onemda@gmail.com) Received: by rv-out-0506.google.com with SMTP id b25so1969038rvf.43 for ; Tue, 23 Sep 2008 11:55:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:cc:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=48YcwhBEQLefLVfg44kF+D62SYmz2GSm+rYsFz8+HRo=; b=bEQ7HpPwtlUuPy/c7tLj6WblzU02FCkbj2dLAEwzHMhDoAX3uE0VJzPr9q2q1cIcvX KtfULRi77jpuR6ez4DT2An0+em6wa6rpPIuemSi+Z5RlZqDJUCj5sCHFbioaB0akAYCY UpdGC7NOyQz9aODYgFK33uFlxkXkp9+atI4L8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=LP2Y0qbUJEiaKJKSdyq8CP45Rscua0JbpIkdV5X+oSwvgr422XL51aOxwqDmCmV0Ys qxIwLs8n949mxlf0ZNjTufeXERCfN1tp8mlRb9FhTD/BGwbMRDf1bPZ8Xeojl3BY9Itf UXYY9N6j/aE0GstPLiT0I5rzeZ5XJP0XUYWAw= Received: by 10.141.3.17 with SMTP id f17mr2904417rvi.180.1222194305356; Tue, 23 Sep 2008 11:25:05 -0700 (PDT) Received: by 10.141.189.15 with HTTP; Tue, 23 Sep 2008 11:25:05 -0700 (PDT) Message-ID: <3a142e750809231125o579445baufb5f2676e4d9a2ca@mail.gmail.com> Date: Tue, 23 Sep 2008 20:25:05 +0200 From: "Paul B. Mahol" To: "Mikhail Teterin" In-Reply-To: <48D92589.8000200@aldan.algebra.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <48D92589.8000200@aldan.algebra.com> Cc: stable@freebsd.org Subject: Re: 7.0-stable: a hung process - scheduler bug? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Sep 2008 18:55:23 -0000 On 9/23/08, Mikhail Teterin wrote: > Hello! > > I was trying to build OpenOffice using all of my 4 CPUs. To be able to > do other work on the machine comfortably, I ran the build under nice, > and assigned real-time priority to the two Xorg processes. > The build started at about 23:10 last night, and hung at 23:46. The > procstat output for the make's process group is: > > PID PPID PGID SID TSID THR LOGIN WCHAN EMUL > COMM > 8371 2425 8371 2425 2425 1 mi wait FreeBSD ELF64 make > 12254 8371 8371 2425 2425 1 mi wait FreeBSD ELF64 sh > 12255 12254 8371 2425 2425 1 mi pause FreeBSD ELF64 > tcsh > 12262 12255 8371 2425 2425 1 mi wait FreeBSD ELF64 > perl5.8.8 > 33010 12262 8371 2425 2425 1 mi wait FreeBSD ELF64 > perl5.8.8 > 33011 33010 8371 2425 2425 1 mi wait FreeBSD ELF64 sh > 33012 33011 8371 2425 2425 1 mi wait FreeBSD ELF64 dmake > 37126 33012 8371 2425 2425 1 mi - FreeBSD ELF64 dmake > > The last line worries me greatly... According to "procstat -t", there is > only one thread there: > > PID TID COMM TDNAME CPU PRI STATE > WCHAN > 37126 100724 dmake - 1 193 sleep - > > And trying to "ktrace -p 37126" returns (even to root, even in /tmp): > > ktrace: ktrace.out: Operation not permitted > > There are no problems ktrace-ing 33012, but nothing comes from there, as > that process simply waits for its child. I guess, the child -- 37126 was > (v)forked to launch a compiler or some such and remains stuck in between > (v)fork and exec somewhere... > > The OS is: FreeBSD 7.0-STABLE/amd64 from Sat Jul 26, 2008 and the box is > otherwise perfectly functional. The scheduling-related options are set > as such: > > options SCHED_4BSD # 4BSD scheduler > options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B > real-time extensions > > Let me know, what else I can do to help fix this bug -- I'm going to > reboot the machine tonight... Should I switch to SCHED_ULE as a > work-around? SCHED_BSD4 is suboptimal for 4 CPUs, and it is replaced with SCHED_ULE on 7 STABLE.