Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 17 Mar 2017 11:27:52 +0000
From:      Steven Hartland <killing@multiplay.co.uk>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        "K. Macy" <kmacy@freebsd.org>, "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   Re: Help needed to identify golang fork / memory corruption issue on FreeBSD
Message-ID:  <180a601b-5481-bb41-f7fc-67976aabe451@multiplay.co.uk>
In-Reply-To: <20170317082333.GP16105@kib.kiev.ua>
References:  <27e1a828-5cd9-0755-50ca-d7143e7df117@multiplay.co.uk> <20161206125919.GQ54029@kib.kiev.ua> <8b502580-4d2d-1e1f-9e05-61d46d5ac3b1@multiplay.co.uk> <20161206143532.GR54029@kib.kiev.ua> <e160381c-9935-6edf-04a9-1ff78e95d818@multiplay.co.uk> <CAHM0Q_Mg662u9D0KJ9knEWWqi9Ydy38qKDnjLt6XaS0ks%2B9-iw@mail.gmail.com> <18b40a69-4460-faf2-c0ce-7491eca92782@multiplay.co.uk> <20170317082333.GP16105@kib.kiev.ua>

index | next in thread | previous in thread | raw e-mail

On 17/03/2017 08:23, Konstantin Belousov wrote:
> On Fri, Mar 17, 2017 at 06:30:49AM +0000, Steven Hartland wrote:
>> Ok I think I've identified the cause.
>>
>> If an alternative signal stack is applied to a non-main thread and that
>> thread calls execve then the signal stack is not cleared.
>>
>> This results in all sorts of badness.
>>
>> Full details, including a small C reproduction case can be found here:
>> https://github.com/golang/go/issues/15658#issuecomment-287276856
>>
>> So looks like its kernel bug. If anyone has an ideas about that before I
>> look tomorrow that would be appreciated.
> Yes, there is definitely a kernel bug, which should be fixed by the patch
> below.
>
> Still, what I saw when I looked at the issue, is not quite resembling
> potential consequences of the bug.  Using wrong memory for signal stack
> would result either in much more significant memory corruption if the
> alt stack range is mapped and used for something unrelated, or in killed
> process on signal delivery, if the range is not mapped.  While I saw a
> systematic 'off by 0x10' in some gc structures.
>
> Anyway, patch for the issue you identified:
>
> diff --git a/sys/kern/kern_sig.c b/sys/kern/kern_sig.c
> index 29d5dd4b132..9bf3ba66f5c 100644
> --- a/sys/kern/kern_sig.c
> +++ b/sys/kern/kern_sig.c
> @@ -976,7 +976,6 @@ execsigs(struct proc *p)
>   	 * and are now ignored by default).
>   	 */
>   	PROC_LOCK_ASSERT(p, MA_OWNED);
> -	td = FIRST_THREAD_IN_PROC(p);
>   	ps = p->p_sigacts;
>   	mtx_lock(&ps->ps_mtx);
>   	while (SIGNOTEMPTY(ps->ps_sigcatch)) {
> @@ -1007,6 +1006,8 @@ execsigs(struct proc *p)
>   	 * Reset stack state to the user stack.
>   	 * Clear set of signals caught on the signal stack.
>   	 */
> +	td = curthread;
> +	MPASS(td->td_proc == p);
>   	td->td_sigstk.ss_flags = SS_DISABLE;
>   	td->td_sigstk.ss_size = 0;
>   	td->td_sigstk.ss_sp = 0;
Thanks Kostik, pretty obvious now looking at :)

Testing here we've seen all sorts of corruption looking things, mainly 
around random signals from SIGILL to SIGSEGV but also random kernel 
messages including:
pid 4603 (test): sigreturn copying xfpustate failed
pid 5013 (test): sigreturn xfpusave_len = 0x44d9bb

I'm currently running a test, but its looking good as the test case 
usually crashes in a matter of seconds.

Would you mind if I committed it?

I'm guessing given its nature this is something we'd want MFC'ed and 
Errata's issued for all supported versions?

     Regards
     Steve


home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?180a601b-5481-bb41-f7fc-67976aabe451>