From owner-freebsd-java@FreeBSD.ORG Fri Jan 18 01:26:10 2008 Return-Path: Delivered-To: java@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B22FE16A417 for ; Fri, 18 Jan 2008 01:26:10 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outC.internet-mail-service.net (outC.internet-mail-service.net [216.240.47.226]) by mx1.freebsd.org (Postfix) with ESMTP id 80C2713C44B for ; Fri, 18 Jan 2008 01:26:10 +0000 (UTC) (envelope-from julian@elischer.org) Received: from mx0.idiom.com (HELO idiom.com) (216.240.32.160) by out.internet-mail-service.net (qpsmtpd/0.40) with ESMTP; Thu, 17 Jan 2008 17:10:45 -0800 Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id 3B65D126F4F; Thu, 17 Jan 2008 17:10:44 -0800 (PST) Message-ID: <478FFC91.4050508@elischer.org> Date: Thu, 17 Jan 2008 17:10:41 -0800 From: Julian Elischer User-Agent: Thunderbird 2.0.0.9 (Macintosh/20071031) MIME-Version: 1.0 To: Landon Fuller References: <200711301716.lAUHGEV1064334@repoman.freebsd.org> <90584F61-91FE-446E-978E-FD234553E8FC@threerings.net> In-Reply-To: <90584F61-91FE-446E-978E-FD234553E8FC@threerings.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: nate@yogotech.com, ivo@scito.com, Alfred Perlstein , Daniel Eischen , davidxu@freebsd.org, java@freebsd.org, julian@freebsd.org Subject: Re: cvs commit: src/lib/libkse/thread thr_kern.c X-BeenThere: freebsd-java@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting Java to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Jan 2008 01:26:10 -0000 Landon Fuller wrote: > > On Dec 2, 2007, at 09:31, Arno J. Klaassen wrote: > >> For info, the attached patch, which partially reverts mfc of rev 1.286 >> >> of kern_fork.c, seems to work as well (without the above patch to be >> clear), >> > > I just upgraded our 8-core build server from pre-november 6-STABLE to > 6.3-RELEASE, and ran into this issue, causing our fork-heavy builder > processes to lock up regularly. > > Your suggested patch (reverting the 1.286 MFC to sys/kern/kern_fork.c) > allows our builds to run to completion; I'll try digging into this > further. Given how easy this is to reproduce, I'm hoping this is > possible to fix before 6.3 is officially released? This is a problem.. the reason it was changed was that the previous code results in heavily loaded threaded processes that fork, hanging in indefinite lockups IN THE KERNEL. Eventually the whole machine would become unuseable. In particular when there is NFS being used but in other situations too. SO I'm damned if I do and damned if I don't on this. We were able to prove to ourselves that if a program got into this state it was a definite programming error. As was stated in the discussion to this change: "The change is trying to protect the user from doing something that they shouldn't be doing anyhow." The previous kernel tried to stop all other threads from running and thus, stopping them from changing anything, while the kernel copies the memory into the child process. The fact is that the kernel can't really protect the process from doing this and the other threads in the parent can still leave things in a state that will screw up the child. I gather it is the PARENT that hangs here? It's possible that the answer is that the library needs to be changed as well. Dan, what is the library doing here? > > Here's a simple reproduction case that results in instant spinning > sub-processes: > > #0 0x0000000800648b13 in mutex_lock_common (curthread=0x0, > m=0x8007616e8, abstime=0x0) at > /usr/src/lib/libpthread/thread/thr_mutex.c:503 > #1 0x000000080064ac25 in _pthread_mutex_lock (m=0x8007616e8) at > /usr/src/lib/libpthread/thread/thr_mutex.c:868 > #2 0x000000080063e9ce in _spinlock (lck=0x8009ac200) at > /usr/src/lib/libpthread/thread/thr_spinlock.c:97 > #3 0x00000008007eafc3 in pubrealloc (ptr=0x0, size=24, func=0x8008802b7 > " in malloc():") at /usr/src/lib/libc/stdlib/malloc.c:1090 > #4 0x00000008007eb1e1 in malloc (size=24) at > /usr/src/lib/libc/stdlib/malloc.c:1150 > #5 0x000000080065ab8c in _lockuser_init (lu=0x52e068, priv=0x52e000) at > /usr/src/lib/libpthread/sys/lock.c:99 > #6 0x000000080065ac69 in _lockuser_reinit (lu=0x52e068, priv=0x52e000) > at /usr/src/lib/libpthread/sys/lock.c:128 > #7 0x000000080064d6d0 in _kse_single_thread (curthread=0x50cc00) at > /usr/src/lib/libpthread/thread/thr_kern.c:343 > #8 0x000000080063b627 in _fork () at > /usr/src/lib/libpthread/thread/thr_fork.c:101 > #9 0x00000000004008f1 in forker () > #10 0x000000080064516e in thread_start (curthread=0x50cc00, > start_routine=0x4008e0 , arg=0x0) at > /usr/src/lib/libpthread/thread/thr_create.c:341 > #11 0x00000008007b3cd9 in makectx_wrapper (ucp=0x800530860, > func=0x800645150 , args=0x7fffff7fcfd0) at > /usr/src/lib/libc/amd64/gen/makecontext.c:100 > #12 0x0000000000000000 in ?? () > #13 0x000000000050cc00 in ?? () > #14 0x00000000004008e0 in frame_dummy () > > #include > #include > #include > > void *forker (void *arg) { > while (1) { > pid_t pid = fork(); > if (pid == 0) { > exit(0); > } else if (pid > 0) { > int status; > waitpid(pid, &status, 0); > } else { > printf("Fork failed\n"); > abort(); > } > } > } > > int main(void) { > int i = 0; > for (i = 0; i < 4; i++) { > pthread_t thr; > pthread_create(&thr, NULL, forker, NULL); > pthread_detach(thr); > } > > while(1) > sleep(1000); > }