From owner-freebsd-hackers@freebsd.org Fri Mar 17 13:00:15 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D9518D107DA for ; Fri, 17 Mar 2017 13:00:15 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wr0-x22e.google.com (mail-wr0-x22e.google.com [IPv6:2a00:1450:400c:c0c::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6AB881DE9 for ; Fri, 17 Mar 2017 13:00:15 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wr0-x22e.google.com with SMTP id g10so51496337wrg.2 for ; Fri, 17 Mar 2017 06:00:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to; bh=EbmH/lak4asDOoxRDo9XHY4UkVTFKojIAg4QNjj5lII=; b=wQX7gZ416YkU4ES5NOG1LOR1s7T3GO6ZQXENwVlHKwEBOj9D3p3RGhT7gOZC/kgTib o1M/Ux74KOlEjldGXy9OQ6XFAvwEnYlxZHwCMgKHaCtzCaERq+fIh+apVlAApsQ2wDBQ ej6ERrlXIJPq3sgPtNX7QjSmvOypcQlKkEGFJBvuwk9TO7G5UIynZNnK4iE5M6TJxojc 7VhpqhzAXGCKMAVZZF4ApITN1UA6dJWtth7Z6Pb1Cw0jXJDTYKNp9i4vnUsBrm9gqrZW /cT9R4xff4qB12wTDzrgmdxZMdoqZJq3RDs1MQUM/XrVVl2BwDaQ3tF2F0cKIjjx5JUy cO6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to; bh=EbmH/lak4asDOoxRDo9XHY4UkVTFKojIAg4QNjj5lII=; b=DPITciJXXfucTWD59Ll33Cl0RJiO5VxEsDFGBWcXqKr4BoUWksxwa0ssLERVdq7hSl e+TesyDy236VXciVB5TJJ5qtelINsO2BxiGd05/rmqDRFwnHCQ27KFzQIA+pEhL0mq1Z GSfLTvwL2Kq2jkxQktao0vteZzOCL5Ve92RaFya3eZNag/9r1B2yOx+aHahZbwScgrHF 9FGU9mAsDYu0n4eWUfu3mjZQzhbsuLQSLPKdVHJBCx9sKMIXP5CJVYSzXalSsHCDiBTR fKnApWur+BM8ByY0e8Xcuj/J/2saunevYww3Rf/2mgSXX+8i7mZXdhw1fqnBrA8eDxGd dQhg== X-Gm-Message-State: AFeK/H3IH//A5jlp7IhxfYmvd/ig2DeiOlm6IS6skxG32D4c5ceeKpg+3QCTeIANCA3Jw3nM X-Received: by 10.223.136.33 with SMTP id d30mr13896769wrd.117.1489755613948; Fri, 17 Mar 2017 06:00:13 -0700 (PDT) Received: from [10.10.1.58] ([185.97.61.26]) by smtp.gmail.com with ESMTPSA id o2sm2574826wmb.28.2017.03.17.06.00.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 Mar 2017 06:00:13 -0700 (PDT) Subject: Re: Help needed to identify golang fork / memory corruption issue on FreeBSD To: Konstantin Belousov References: <27e1a828-5cd9-0755-50ca-d7143e7df117@multiplay.co.uk> <20161206125919.GQ54029@kib.kiev.ua> <8b502580-4d2d-1e1f-9e05-61d46d5ac3b1@multiplay.co.uk> <20161206143532.GR54029@kib.kiev.ua> <18b40a69-4460-faf2-c0ce-7491eca92782@multiplay.co.uk> <20170317082333.GP16105@kib.kiev.ua> <180a601b-5481-bb41-f7fc-67976aabe451@multiplay.co.uk> <20170317124437.GR16105@kib.kiev.ua> Cc: "K. Macy" , "freebsd-hackers@freebsd.org" From: Steven Hartland Message-ID: Date: Fri, 17 Mar 2017 13:00:12 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20170317124437.GR16105@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Mar 2017 13:00:15 -0000 On 17/03/2017 12:44, Konstantin Belousov wrote: > On Fri, Mar 17, 2017 at 11:27:52AM +0000, Steven Hartland wrote: >> On 17/03/2017 08:23, Konstantin Belousov wrote: >>> On Fri, Mar 17, 2017 at 06:30:49AM +0000, Steven Hartland wrote: >>>> Ok I think I've identified the cause. >>>> >>>> If an alternative signal stack is applied to a non-main thread and that >>>> thread calls execve then the signal stack is not cleared. >>>> >>>> This results in all sorts of badness. >>>> >>>> Full details, including a small C reproduction case can be found here: >>>> https://github.com/golang/go/issues/15658#issuecomment-287276856 >>>> >>>> So looks like its kernel bug. If anyone has an ideas about that before I >>>> look tomorrow that would be appreciated. >>> Yes, there is definitely a kernel bug, which should be fixed by the patch >>> below. >>> >>> Still, what I saw when I looked at the issue, is not quite resembling >>> potential consequences of the bug. Using wrong memory for signal stack >>> would result either in much more significant memory corruption if the >>> alt stack range is mapped and used for something unrelated, or in killed >>> process on signal delivery, if the range is not mapped. While I saw a >>> systematic 'off by 0x10' in some gc structures. >>> >>> Anyway, patch for the issue you identified: >>> >>> diff --git a/sys/kern/kern_sig.c b/sys/kern/kern_sig.c >>> index 29d5dd4b132..9bf3ba66f5c 100644 >>> --- a/sys/kern/kern_sig.c >>> +++ b/sys/kern/kern_sig.c >>> @@ -976,7 +976,6 @@ execsigs(struct proc *p) >>> * and are now ignored by default). >>> */ >>> PROC_LOCK_ASSERT(p, MA_OWNED); >>> - td = FIRST_THREAD_IN_PROC(p); >>> ps = p->p_sigacts; >>> mtx_lock(&ps->ps_mtx); >>> while (SIGNOTEMPTY(ps->ps_sigcatch)) { >>> @@ -1007,6 +1006,8 @@ execsigs(struct proc *p) >>> * Reset stack state to the user stack. >>> * Clear set of signals caught on the signal stack. >>> */ >>> + td = curthread; >>> + MPASS(td->td_proc == p); >>> td->td_sigstk.ss_flags = SS_DISABLE; >>> td->td_sigstk.ss_size = 0; >>> td->td_sigstk.ss_sp = 0; >> Thanks Kostik, pretty obvious now looking at :) >> >> Testing here we've seen all sorts of corruption looking things, mainly >> around random signals from SIGILL to SIGSEGV but also random kernel >> messages including: >> pid 4603 (test): sigreturn copying xfpustate failed >> pid 5013 (test): sigreturn xfpusave_len = 0x44d9bb >> >> I'm currently running a test, but its looking good as the test case >> usually crashes in a matter of seconds. >> >> Would you mind if I committed it? > I am capable of committing the patches. No problem, wouldn't ever suggest otherwise, just didn't want to add to your workload ;-) > >> I'm guessing given its nature this is something we'd want MFC'ed and >> Errata's issued for all supported versions? > MFC will be done for sure. I am not so sure about EN, this is a routine > bugfix. For some reasons 10.3 errata might be indeed the only way to get > this for 10.x users, but I do not see why bother re/so with 11.0. My argument for doing an EN would for 11.0 as well as 10.3 be two fold: 1. It exposes other processes memory so could be considered as security issue? 2. Given its causing quite a bit of pain for golang users (random crashes), which is getting used more and more now, it would be good to get a fix out sooner rather than later and 11.1 is still over 4months off. Regards Steve