From owner-freebsd-hackers@freebsd.org Tue Dec 6 17:07:25 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E023DC6AE7A for ; Tue, 6 Dec 2016 17:07:25 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wm0-x22e.google.com (mail-wm0-x22e.google.com [IPv6:2a00:1450:400c:c09::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6BCBECC3 for ; Tue, 6 Dec 2016 17:07:25 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wm0-x22e.google.com with SMTP id u144so28435617wmu.1 for ; Tue, 06 Dec 2016 09:07:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to; bh=JHw0k5DG0E8/IwhiO5fom7MKN8ofm9ZERGHNo8uSTHk=; b=fROPdA5Vnnu/FyufC/OmvY9NgbQbmckbSTOYI5XHresSrRaLe6kEWGw/K+Dhhg54xD Ji9XL6lGBsMEtXRB+EwpiJKUkU+tJyy3XNQSlQ0dGgyWPkU0CvXD+H3t2CWjSoNnAy7q L6eP48yDZU+7iJOpCq0gsW7SPHV9i7ryJnrubtd64ETh5u8H/1aECqww7j+wqrPoaC8D nCAhiuVu5eu7VXgE9deBFPmQxxs8o/2MLS+ehYS4htJsSQvXoMs1slY7DTSWK7uh+QDM s5nyadYNCvu36ep57xwooqRKl4QJp1gQq5ll9vQsLlDF+NPEcUMToGzheOwLaLFbwSgw hEuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to; bh=JHw0k5DG0E8/IwhiO5fom7MKN8ofm9ZERGHNo8uSTHk=; b=Zt8pePfO/8M9qZgpbvgjHc2QrV2X5QQP1aqKSR4gUaJT/KhiYjD8cMmDjv3p9HIHF3 pcnfFWZKQllgEtyfgzNWthsx16bjdqXrWcNDuvT6xkuOaJBGVuo8dy6D1AJldO4JePNq SNPSuezL5iF5oeTSAFMLZ20utL1lCLuRkJn1sx5gyT33rsppjdz8vRzumAzov5V7qC7t iZBC90W6wjdjFmrpoUFyg8I8WD6Mh3UupFtlbnLqqWHIJJ+wkYMiSQuJA8T8BoaryZAR vn8sKFyog8C8ReiXhlr9ggJfROSASk+lDmL//ALTZuz8iLL1qZsNR4jjWYOclbzKIBPK J6eA== X-Gm-Message-State: AKaTC00yxB3CWIHCajVyHqVzQreVmGJgqt+fmrFgt1HVDGycBvFqynw89TsOwF5HoOdYu+nG X-Received: by 10.28.107.77 with SMTP id g74mr3465740wmc.109.1481044043105; Tue, 06 Dec 2016 09:07:23 -0800 (PST) Received: from [10.10.1.58] ([185.97.61.26]) by smtp.gmail.com with ESMTPSA id w7sm4937325wmd.24.2016.12.06.09.07.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 06 Dec 2016 09:07:22 -0800 (PST) Subject: Re: Help needed to identify golang fork / memory corruption issue on FreeBSD To: Konstantin Belousov References: <27e1a828-5cd9-0755-50ca-d7143e7df117@multiplay.co.uk> <20161206125919.GQ54029@kib.kiev.ua> <8b502580-4d2d-1e1f-9e05-61d46d5ac3b1@multiplay.co.uk> <20161206143532.GR54029@kib.kiev.ua> Cc: "freebsd-hackers@freebsd.org" From: Steven Hartland Message-ID: Date: Tue, 6 Dec 2016 17:07:53 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20161206143532.GR54029@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 17:07:26 -0000 On 06/12/2016 14:35, Konstantin Belousov wrote: > On Tue, Dec 06, 2016 at 01:53:52PM +0000, Steven Hartland wrote: >> On 06/12/2016 12:59, Konstantin Belousov wrote: >>> On Tue, Dec 06, 2016 at 12:31:47PM +0000, Steven Hartland wrote: >>>> Hi guys I'm trying to help identify / fix an issue with golang where by >>>> fork results in memory corruption. >>>> >>>> Details of the issue can be found here: >>>> https://github.com/golang/go/issues/15658 >>>> >>>> In summary when a fork is done in golang is has a chance of causing >>>> memory corruption in the parent resulting in a process crash once detected. >>>> >>>> Its believed that this only effects FreeBSD. >>>> >>>> This has similarities to other reported issues such as this one which >>>> impacted perl during 10.x: >>>> https://rt.perl.org/Public/Bug/Display.html?id=122199 >>> I cannot judge about any similarilities when all the description provided >>> is 'memory corruption'. BTW, the perl issue described, where child segfaults >>> after the fork, is more likely to be caused by the set of problems referenced >>> in the FreeBSD-EN-16:17.vm. >>> >>>> And more recently the issue with nginx on 11.x: >>>> https://lists.freebsd.org/pipermail/freebsd-stable/2016-September/085540.html >>> Which does not affect anything unless aio is used on Sandy/Ivy. >>> >>>> Its possible, some believe likely, that this is a kernel bug around fork >>>> / vm that golang stresses, but I've not been able to confirm. >>>> >>>> I can reproduce the issue at will, takes between 5mins and 1hour using >>>> 16 threads, and it definitely seems like an interaction between fork and >>>> other memory operations. >>> Which arch is the kernel and the process which demonstrates the behaviour ? >>> I mean i386/amd64. >> amd64 > How large is the machine, how many cores, what is the physical memory size ? 24 cores 32GB RAM. CPU: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz (2500.06-MHz K8-class CPU) Origin="GenuineIntel" Id=0x206d7 Family=0x6 Model=0x2d Stepping=7 Features=0xbfebfbff Features2=0x1fbee3ff AMD Features=0x2c100800 AMD Features2=0x1 XSAVE Features=0x1 VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics real memory = 34359738368 (32768 MB) avail memory = 33209896960 (31671 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 24 CPUs FreeBSD/SMP: 2 package(s) x 6 core(s) x 2 hardware threads The HEAD box I'm just updating to run the test on has the same CPU but 128GB of RAM. >>>> I've tried reproducing the issue in C but also no joy (captured in the bug). >>>> >>>> For reference I'm currently testing on 11.0-RELEASE-p3 + kibs PCID fix >>>> (#306350). >>> Switch to HEAD kernel, for start. >>> Show the memory map of the failed process. >>> Are you able to take ktrace of the process while still producing the bug ? >> When ever I've tried ktrace the issue doesn't present itself. >> >> I can try and run it for an extended period to see if it does eventually >> but I did run it for a few hours without any joy. >> >> I'm currently testing with a 11.0-RELEASE debug kernel, witness, >> invariants etc to see if that would detect anything; however so far its >> taking longer than usual to reproduce so it may simply not occur with a >> debug kernel. >> >>> Where is the memory corruption happen ? Is it in go runtime structures, >>> or in the application data ? >> Its usually detected by the runtime GC which panics with a number of >> errors e.g. >> fatal error: all goroutines are asleep - deadlock! >> >> fatal error: workbuf is empty >> >> runtime: nelems=256 nfree=233 nalloc=23 previous allocCount=18 nfreed=65531 >> fatal error: sweep increased allocation count >> >> runtime: failed MSpanList_Remove 0x800698500 0x800b46d40 0x53adb0 0x53ada0 >> fatal error: MSpanList_Remove >> >> As the test is very basic its unlikely to see an issue in the >> application data. >> >>> Can somebody knowledgable of either the go runtime or the app, >>> try to identify the initial corrupted userspace data ? >> The golang developers have looked but where unable to reproduce on >> freebsd-amd64-gce101 gomote running FreeBSD 10.1. This could be a factor >> of the VM its unclear. > This is not what I asked. I am asking is it possible to make an educated > guess at what initial corruption could be to cause the outcome. Like, > if this variable suddently becomes zero, we get the errors. I'll have a look through the crashes dumps I have to see if things point to null / zeroed memory. > Does go runtime use FreeBSD libc and threading library ? No it doesn't, each built binary its totally standalone and uses asm for core system calls. The runtime directly creates kernel threads with thr_new, which it then manages internally. One possibly important difference between golang and C apps is it uses goroutines which are lightweight so called green threads which are mapped onto a set of kernel threads. A pretty good write up of this can be found here: http://blog.nindalf.com/how-goroutines-work/ Regards Steve