Skip site navigation (1)Skip section navigation (2)
Date:      02 Dec 2007 18:31:12 +0100
From:      "Arno J. Klaassen" <arno@heho.snv.jussieu.fr>
To:        Daniel Eischen <deischen@freebsd.org>
Cc:        nate@yogotech.com, java@freebsd.org, ivo@scito.com, julian@freebsd.org, davidxu@freebsd.org
Subject:   Re: cvs commit: src/lib/libkse/thread thr_kern.c
Message-ID:  <wphcj1dmvz.fsf@heho.snv.jussieu.fr>
In-Reply-To: <Pine.GSO.4.64.0712011824130.11446@sea.ntplx.net>
References:  <200711301716.lAUHGEV1064334@repoman.freebsd.org> <wpprxrto0s.fsf@heho.snv.jussieu.fr> <Pine.GSO.4.64.0711301659060.5465@sea.ntplx.net> <wpwsrz9uyr.fsf@heho.snv.jussieu.fr> <Pine.GSO.4.64.0711301849310.6581@sea.ntplx.net> <wphcj2plsx.fsf@heho.snv.jussieu.fr> <Pine.GSO.4.64.0712011824130.11446@sea.ntplx.net>

next in thread | previous in thread | raw e-mail | index | archive | help
--=-=-=


Hello,


Daniel Eischen <deischen@freebsd.org> writes:

> On Sat, 1 Dec 2007, Arno J. Klaassen wrote:
> 
> > Daniel Eischen <deischen@freebsd.org> writes:
> >
> >>> Arno J. Klaassen wrote:
> >>>
> >>> [ ... ]
> >>> That gives :
> >>>
> >>> #0  0x000000080075d151 in _pthread_sigmask (how=3, set=0x813cc6e10, oset=0x0)
> >>>    at /files/bsd/src7/lib/libkse/thread/thr_sigmask.c:52
> >>> #1  0x000000080075d103 in _sigprocmask (how=3, set=0x813cc6e10, oset=0x0)
> >>>    at /files/bsd/src7/lib/libkse/thread/thr_sigprocmask.c:49
> >>> #2  0x000000080076c423 in _kse_single_thread (curthread=0x813cc6c00)
> >>>    at /files/bsd/src7/lib/libkse/thread/thr_kern.c:361
> >>> #3  0x0000000800758f29 in _fork ()
> >>>    at /files/bsd/src7/lib/libkse/thread/thr_fork.c:101
> >>> #4  0x0000000801e43158 in jdk_fork_wrapper ()
> >>>    at ../../../src/solaris/native/java/lang/UNIXProcess_md.c:437
> >>>
> >>> Hope this is better
> >>
> >> Yes, this would seem to be a kernel problem, as _get_curthread()
> >> seems to be returning garbage.
> >
> > (gdb) p curthread
> > $1 = (struct pthread *) 0x0
> >
> >
> >> This is a libkse MD function,
> >> that relies on %gs (for i386/amd64) to point to something
> >> that was initialized in the parent.
> >>
> >> Julian, David, got any ideas?
> >
> > I can publish ti full java_g.core if helpful.
> 
> You could of course try this hack to work-around the problem:
> 
> Index: thr_kern.c
> ===================================================================
> RCS file: /home/ncvs/src/lib/libkse/thread/thr_kern.c,v
> retrieving revision 1.127
> diff -u -r1.127 thr_kern.c
> --- thr_kern.c	30 Nov 2007 17:16:14 -0000	1.127
> +++ thr_kern.c	1 Dec 2007 23:23:42 -0000
> @@ -361,6 +361,13 @@
>   	curthread->kse->k_kcb->kcb_kmbx.km_curthread = NULL;
>   	curthread->attr.flags |= PTHREAD_SCOPE_SYSTEM;
> 
> +	/*
> +	 * This shouldn't be necessary.  It sometimes gets corrupted
> +	 * after a fork() in SMP.
> +	 */
> +	_kcb_set(curthread->kse->k_kcb);
> +	_tcb_set(curthread->kse->k_kcb, curthread->tcb);
> +
>   	/* After a fork(), there child should have no pending signals. */
>   	sigemptyset(&curthread->sigpend);
> 
> 

Yes, this works. Thanx!
Is this safe to apply to releng_6 as well?

For info, the attached patch, which partially reverts mfc of rev 1.286
of kern_fork.c, seems to work as well (without the above patch to be clear),
or at least makes it much harder to trigger (just reading the comments it
seems just to give one extra second to copy user space before accessing it
which seems enough in my setup).

Hope this helps to track down the real culprit (I do have problems
with libthr and java as well on 2x2 SMP I do not have elsewhere, but
they are much harder to trigger and I have not been able yet to
find a simple test-setup which reproduces them easily and reproductable).

Thank you very much for your help.

Best, Arno




--=-=-=
Content-Type: text/x-patch
Content-Disposition: attachment; filename=thread_single_rev.patch

Index: sys/kern/kern_fork.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_fork.c,v
retrieving revision 1.282.2.1
diff -u -r1.282.2.1 kern_fork.c
--- sys/kern/kern_fork.c	6 Nov 2007 02:59:40 -0000	1.282.2.1
+++ sys/kern/kern_fork.c	1 Dec 2007 14:17:03 -0000
@@ -246,6 +246,34 @@
 		return (0);
 	}
 
+	/*
+	 * Note 1:1 allows for forking with one thread coming out on the
+	 * other side with the expectation that the process is about to
+	 * exec.
+	 */
+	if (p1->p_flag & P_HADTHREADS) {
+		/*
+		 * Idle the other threads for a second.
+		 * Since the user space is copied, it must remain stable.
+		 * In addition, all threads (from the user perspective)
+		 * need to either be suspended or in the kernel,
+		 * where they will try restart in the parent and will
+		 * be aborted in the child.
+		 */
+		PROC_LOCK(p1);
+		if (thread_single(SINGLE_NO_EXIT)) {
+			/* Abort. Someone else is single threading before us. */
+			PROC_UNLOCK(p1);
+			return (ERESTART);
+		}
+		PROC_UNLOCK(p1);
+		/*
+		 * All other activity in this process
+		 * is now suspended at the user boundary,
+		 * (or other safe places if we think of any).
+		 */
+	}
+
 	/* Allocate new proc. */
 	newproc = uma_zalloc(proc_zone, M_WAITOK);
 #ifdef MAC
@@ -694,6 +722,15 @@
 	PROC_UNLOCK(p2);
 
 	/*
+	 * If other threads are waiting, let them continue now.
+	 */
+	if (p1->p_flag & P_HADTHREADS) {
+		PROC_LOCK(p1);
+		thread_single_end();
+		PROC_UNLOCK(p1);
+	}
+
+	/*
 	 * Return child proc pointer to parent.
 	 */
 	*procp = p2;
@@ -708,6 +745,11 @@
 	mac_destroy_proc(newproc);
 #endif
 	uma_zfree(proc_zone, newproc);
+	if (p1->p_flag & P_HADTHREADS) {
+		PROC_LOCK(p1);
+		thread_single_end();
+		PROC_UNLOCK(p1);
+	}
 	pause("fork", hz / 2);
 	return (error);
 }

--=-=-=--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?wphcj1dmvz.fsf>