From owner-freebsd-stable@FreeBSD.ORG Tue Sep 23 17:48:01 2008 Return-Path: Delivered-To: stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8BF37106566B for ; Tue, 23 Sep 2008 17:48:01 +0000 (UTC) (envelope-from mi+mill@aldan.algebra.com) Received: from mail4.sea5.speakeasy.net (mail4.sea5.speakeasy.net [69.17.117.6]) by mx1.freebsd.org (Postfix) with ESMTP id 690448FC15 for ; Tue, 23 Sep 2008 17:48:01 +0000 (UTC) (envelope-from mi+mill@aldan.algebra.com) Received: (qmail 19534 invoked from network); 23 Sep 2008 17:21:21 -0000 Received: from aldan.algebra.com (HELO [127.0.0.1]) (mi@[216.254.65.224]) (envelope-sender ) by mail4.sea5.speakeasy.net (qmail-ldap-1.03) with SMTP for ; 23 Sep 2008 17:21:20 -0000 Message-ID: <48D92589.8000200@aldan.algebra.com> Date: Tue, 23 Sep 2008 13:21:13 -0400 From: Mikhail Teterin User-Agent: Thunderbird 2.0.0.16 (X11/20080707) MIME-Version: 1.0 To: stable@FreeBSD.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: 7.0-stable: a hung process - scheduler bug? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Sep 2008 17:48:01 -0000 Hello! I was trying to build OpenOffice using all of my 4 CPUs. To be able to do other work on the machine comfortably, I ran the build under nice, and assigned real-time priority to the two Xorg processes. The build started at about 23:10 last night, and hung at 23:46. The procstat output for the make's process group is: PID PPID PGID SID TSID THR LOGIN WCHAN EMUL COMM 8371 2425 8371 2425 2425 1 mi wait FreeBSD ELF64 make 12254 8371 8371 2425 2425 1 mi wait FreeBSD ELF64 sh 12255 12254 8371 2425 2425 1 mi pause FreeBSD ELF64 tcsh 12262 12255 8371 2425 2425 1 mi wait FreeBSD ELF64 perl5.8.8 33010 12262 8371 2425 2425 1 mi wait FreeBSD ELF64 perl5.8.8 33011 33010 8371 2425 2425 1 mi wait FreeBSD ELF64 sh 33012 33011 8371 2425 2425 1 mi wait FreeBSD ELF64 dmake 37126 33012 8371 2425 2425 1 mi - FreeBSD ELF64 dmake The last line worries me greatly... According to "procstat -t", there is only one thread there: PID TID COMM TDNAME CPU PRI STATE WCHAN 37126 100724 dmake - 1 193 sleep - And trying to "ktrace -p 37126" returns (even to root, even in /tmp): ktrace: ktrace.out: Operation not permitted There are no problems ktrace-ing 33012, but nothing comes from there, as that process simply waits for its child. I guess, the child -- 37126 was (v)forked to launch a compiler or some such and remains stuck in between (v)fork and exec somewhere... The OS is: FreeBSD 7.0-STABLE/amd64 from Sat Jul 26, 2008 and the box is otherwise perfectly functional. The scheduling-related options are set as such: options SCHED_4BSD # 4BSD scheduler options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions Let me know, what else I can do to help fix this bug -- I'm going to reboot the machine tonight... Should I switch to SCHED_ULE as a work-around? Thanks! Yours, -mi