From owner-freebsd-bugs@FreeBSD.ORG Sat Nov 19 01:20:30 2005 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 121F316A41F for ; Sat, 19 Nov 2005 01:20:30 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8175143D49 for ; Sat, 19 Nov 2005 01:20:29 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.3/8.13.3) with ESMTP id jAJ1KT6v021253 for ; Sat, 19 Nov 2005 01:20:29 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.3/8.13.1/Submit) id jAJ1KTXE021247; Sat, 19 Nov 2005 01:20:29 GMT (envelope-from gnats) Resent-Date: Sat, 19 Nov 2005 01:20:29 GMT Resent-Message-Id: <200511190120.jAJ1KTXE021247@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Garry Belka Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 116D316A41F for ; Sat, 19 Nov 2005 01:16:01 +0000 (GMT) (envelope-from garry@NetworkPhysics.COM) Received: from NetworkPhysics.COM (fw.networkphysics.com [205.158.104.176]) by mx1.FreeBSD.org (Postfix) with ESMTP id AFE1E43D46 for ; Sat, 19 Nov 2005 01:16:00 +0000 (GMT) (envelope-from garry@NetworkPhysics.COM) Received: from focus5.fractal.networkphysics.com (focus5.fractal.networkphysics.com [10.10.0.112]) by NetworkPhysics.COM (8.12.10/8.12.10) with ESMTP id jAJ1G0aL012612 for ; Fri, 18 Nov 2005 17:16:00 -0800 (PST) (envelope-from garry@NetworkPhysics.COM) Received: (from garry@localhost) by focus5.fractal.networkphysics.com (8.13.1/8.12.10/Submit) id jAJ1Fxhg061478; Fri, 18 Nov 2005 17:15:59 -0800 (PST) (envelope-from garry) Message-Id: <200511190115.jAJ1Fxhg061478@focus5.fractal.networkphysics.com> Date: Fri, 18 Nov 2005 17:15:59 -0800 (PST) From: Garry Belka To: FreeBSD-gnats-submit@FreeBSD.org X-Send-Pr-Version: 3.113 Cc: Subject: kern/89262: multi-threaded process hangs in kernel in fork() X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Garry Belka List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Nov 2005 01:20:30 -0000 >Number: 89262 >Category: kern >Synopsis: multi-threaded process hangs in kernel in fork() >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sat Nov 19 01:20:28 GMT 2005 >Closed-Date: >Last-Modified: >Originator: Garry Belka >Release: FreeBSD 5.4-RELEASE and 6.0 RELEASE i386 >Organization: Network Physics >Environment: System: FreeBSD tempo 5.4-RELEASE SMP >Description: We see, not too often, that a Java process hangs and can't be killed even by SIGKILL. Apparently, one of the process threads forks. fork1() in kernel attempts to enter a single-threaded mode, but thread_single() fails to complete and hangs waiting until all threads but proc-> p_singlethread are suspended. One of the remaining threads is not suspended and has only SLEEP flag set. pid thread thid flags inhib pflags comm wchan 1982 0xcd150180 100351 00020c00 1 0088 java mi_switch + 426 in section .text thread_suspend_check + 298 in section .text userret + 58 in section .text fork_return + 18 in section .text fork_exit + 102 in section .text 1982 0xce120c00 100948 00000c00 1 0880 java mi_switch + 426 in section .text thread_suspend_check + 298 in section .text userret + 58 in section .text ast + 844 in section .text 1982 0xcd740900 100616 00000808 2 0080 java sbwait cd557320 mi_switch + 426 in section .text (SLEEPING, not SUSPENDED) sleepq_switch + 164 in section .text sleepq_wait_sig + 12 in section .text msleep + 566 in section .text sbwait + 56 in section .text soreceive + 572 in section .text soo_read + 65 in section .text dofileread + 173 in section .text read + 59 in section .text syscall + 551 in section .text 1982 0xc3ae7900 100906 00000808 1 0080 java mi_switch + 426 in section .text sleepq_switch + 164 in section .text sleepq_wait_sig + 12 in section .text msleep + 566 in section .text sbwait + 56 in section .text soreceive + 572 in section .text soo_read + 65 in section .text dofileread + 173 in section .text read + 59 in section .text syscall + 551 in section .text 1982 0xcd719780 100605 00000c00 1 0880 java mi_switch + 426 in section .text thread_suspend_check + 298 in section .text userret + 58 in section .text ast + 844 in section .text 1982 0xcd6d9000 100830 00000000 1 0880 java (p_singlethread) mi_switch + 426 in section .text - line 355 thread_single + 497 in section .text - line 863 fork1 + 169 in section .text - line 257 fork + 24 in section .text syscall + 551 in section .text Signals in singlethread state are not really delivered, SIGKILL stays with the first thread in the queue, and so we got a deadlock. I think that we got into this state because the non-suspended thread was running when singlethread was attempting to put every thread to sleep. All threads were marked TDF_ASTPENDING. However, a bit later ast() failed to deal correctly with a thread that had non-null td->td_mailbox. sys/kern/subr_trap.c:ast() if ((p->p_flag & P_SA) && (td->td_mailbox == NULL)) thread_user_enter(td); >How-To-Repeat: start multiple threads in java on an SMP machine and keep on calling system() in those threads. it will take some time >Fix: --- single_suspend.patch begins here --- Index: kern/kern_thread.c =================================================================== RCS file: /u1/Repo/FreeBSD/sys/kern/kern_thread.c,v retrieving revision 1.3 diff -u -r1.3 kern_thread.c --- kern/kern_thread.c 9 Jul 2005 01:27:18 -0000 1.3 +++ kern/kern_thread.c 15 Nov 2005 03:01:22 -0000 @@ -1001,6 +1001,18 @@ } void +thread_check_single_suspend(struct thread *td) +{ + struct proc *p = td->td_proc; + + if (__predict_false(P_SHOULDSTOP(p))) { + PROC_LOCK(p); + thread_suspend_check(0); + PROC_UNLOCK(p); + } +} + +void thread_unsuspend_one(struct thread *td) { struct proc *p = td->td_proc; Index: kern/subr_trap.c =================================================================== RCS file: /u1/Repo/FreeBSD/sys/kern/subr_trap.c,v retrieving revision 1.1.1.2 diff -u -r1.1.1.2 subr_trap.c --- kern/subr_trap.c 8 Jul 2005 03:01:08 -0000 1.1.1.2 +++ kern/subr_trap.c 15 Nov 2005 03:01:23 -0000 @@ -171,6 +171,8 @@ if ((p->p_flag & P_SA) && (td->td_mailbox == NULL)) thread_user_enter(td); + else + thread_check_single_suspend(td); /* * This updates the p_sflag's for the checks below in one * "atomic" operation with turning off the astpending flag. Index: sys/proc.h =================================================================== RCS file: /u1/Repo/FreeBSD/sys/sys/proc.h,v retrieving revision 1.1.1.5 diff -u -r1.1.1.5 proc.h --- sys/proc.h 8 Jul 2005 03:07:51 -0000 1.1.1.5 +++ sys/proc.h 15 Nov 2005 03:01:28 -0000 @@ -887,6 +887,7 @@ void ksegrp_unlink(struct ksegrp *kg); void thread_signal_add(struct thread *td, int sig); struct thread *thread_alloc(void); +void thread_check_single_suspend(struct thread *td); void thread_exit(void) __dead2; int thread_export_context(struct thread *td, int willexit); void thread_free(struct thread *td); --- single_suspend.patch ends here --- >Release-Note: >Audit-Trail: >Unformatted: