From owner-freebsd-hackers@freebsd.org Tue Dec 6 20:34:37 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 43697C6AEB6 for ; Tue, 6 Dec 2016 20:34:37 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wm0-x233.google.com (mail-wm0-x233.google.com [IPv6:2a00:1450:400c:c09::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CD2E014E0 for ; Tue, 6 Dec 2016 20:34:36 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wm0-x233.google.com with SMTP id g23so142506459wme.1 for ; Tue, 06 Dec 2016 12:34:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to; bh=0x7fLjz8ywCh1e0KW/tl8dBZPAela5+/CgTWK8qX954=; b=wQRQH/ODOnLIGnL4C5It8k/qSz4beN/wLUqL7adlskRSY+rHGo2zT44Jm0IxLy6T8e vjlZGAODRk0q5AqS2dAdmCWpHd5vT6CPz1dQYX5wrW4zdaVZwKZx1x6r4WEgXX6wPisB IIEetSVpw+ZxLKEdgHQwqFWy7gnkxl5gzVhJrywbjaLaH9XCblyUCHtXlu9fu5WQ241C THoktEDIrVlawFkRfI0UP3Y1CrIa/wZGkdx0hfuAyeCecaxL59tkX498emUsBTCRW15e Wsn61azYxGsYCFbcnRkNxnFw0+UjZb3XUoH2HiA7R76b0XjDqNxh9Od4Nd8ZtXSVOfJa fdtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to; bh=0x7fLjz8ywCh1e0KW/tl8dBZPAela5+/CgTWK8qX954=; b=kFFTd1tnb1idGcSX6z3U3kj5JK8l+JDBrYg06sPrrKODQIJAM6qhowHgVO3piG5B4H 1Y70qOc6/CylRnOB8oAvPZRzlibm6FteycaWbpD7LK3rPvNKE0wn3fYnx/P2FQPLscej FPBt1MCMAMKms66hYmzug1SG8HYnhofUWi6vvO52aYMByGeu0L3YDfVvvTSCn60iK0ZA DDJrCzAal98a5wWgdXj9tbXTugTgarEhWrC8MyXv7KCY1a2x3XNLO0o+Nd2EGOXeKhIM EtEnr3EEzO10LUBfdIqkCgDoB8xx9dyHTRAzm+5WUdSdZUE+Ix5pst4VtiT5TeCmxlsz /iKw== X-Gm-Message-State: AKaTC03pJ+ASUCA3UcKQbPAJB7IT2oYxU4dqJKdZNOQjsA7zzPjkikzOUsm3v6yg8gvOLnUQ X-Received: by 10.28.185.203 with SMTP id j194mr320012wmf.73.1481056474766; Tue, 06 Dec 2016 12:34:34 -0800 (PST) Received: from [10.10.1.58] ([185.97.61.26]) by smtp.gmail.com with ESMTPSA id b15sm5854952wma.5.2016.12.06.12.34.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 06 Dec 2016 12:34:33 -0800 (PST) Subject: Re: Help needed to identify golang fork / memory corruption issue on FreeBSD To: Konstantin Belousov References: <27e1a828-5cd9-0755-50ca-d7143e7df117@multiplay.co.uk> <20161206125919.GQ54029@kib.kiev.ua> <8b502580-4d2d-1e1f-9e05-61d46d5ac3b1@multiplay.co.uk> <20161206143532.GR54029@kib.kiev.ua> Cc: "freebsd-hackers@freebsd.org" From: Steven Hartland Message-ID: <9b40c93a-871f-bb32-668c-39bc3e31e385@multiplay.co.uk> Date: Tue, 6 Dec 2016 20:35:04 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20161206143532.GR54029@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 20:34:37 -0000 On 06/12/2016 14:35, Konstantin Belousov wrote: > On Tue, Dec 06, 2016 at 01:53:52PM +0000, Steven Hartland wrote: >> On 06/12/2016 12:59, Konstantin Belousov wrote: >>> On Tue, Dec 06, 2016 at 12:31:47PM +0000, Steven Hartland wrote: >>>> Hi guys I'm trying to help identify / fix an issue with golang where by >>>> fork results in memory corruption. >>>> >>>> Details of the issue can be found here: >>>> https://github.com/golang/go/issues/15658 >>>> >>>> In summary when a fork is done in golang is has a chance of causing >>>> memory corruption in the parent resulting in a process crash once detected. >>>> >>>> Its believed that this only effects FreeBSD. >>>> >>>> This has similarities to other reported issues such as this one which >>>> impacted perl during 10.x: >>>> https://rt.perl.org/Public/Bug/Display.html?id=122199 >>> I cannot judge about any similarilities when all the description provided >>> is 'memory corruption'. BTW, the perl issue described, where child segfaults >>> after the fork, is more likely to be caused by the set of problems referenced >>> in the FreeBSD-EN-16:17.vm. >>> >>>> And more recently the issue with nginx on 11.x: >>>> https://lists.freebsd.org/pipermail/freebsd-stable/2016-September/085540.html >>> Which does not affect anything unless aio is used on Sandy/Ivy. >>> >>>> Its possible, some believe likely, that this is a kernel bug around fork >>>> / vm that golang stresses, but I've not been able to confirm. >>>> >>>> I can reproduce the issue at will, takes between 5mins and 1hour using >>>> 16 threads, and it definitely seems like an interaction between fork and >>>> other memory operations. >>> Which arch is the kernel and the process which demonstrates the behaviour ? >>> I mean i386/amd64. >> amd64 > How large is the machine, how many cores, what is the physical memory size ? > >>>> I've tried reproducing the issue in C but also no joy (captured in the bug). >>>> >>>> For reference I'm currently testing on 11.0-RELEASE-p3 + kibs PCID fix >>>> (#306350). >>> Switch to HEAD kernel, for start. >>> Show the memory map of the failed process. No sign of zeroed memory that I can tell. This error was caused by hitting the following validation in gc: func (list *mSpanList) remove(span *mspan) { if span.prev == nil || span.list != list { println("runtime: failed MSpanList_Remove", span, span.prev, span.list, list) throw("MSpanList_Remove") } runtime: failed MSpanList_Remove 0x80052e580 0x80052e300 0x53e9c0 0x53e9b0 fatal error: MSpanList_Remove (gdb) print list $4 = (runtime.mSpanList *) 0x53e9b0 (gdb) print span.list $5 = (runtime.mSpanList *) 0x53e9c0 (gdb) print span.prev $6 = (struct runtime.mspan **) 0x80052e300 (gdb) print *list $7 = {first = 0x80052e580, last = 0x8008aa180} (gdb) print *span.list $8 = {first = 0x8007ea7e0, last = 0x80052e580} procstat -v test.core.1481054183 PID START END PRT RES PRES REF SHD FLAG TP PATH 1178 0x400000 0x49b000 r-x 115 223 3 1 CN-- vn /root/test 1178 0x49b000 0x528000 r-- 97 223 3 1 CN-- vn /root/test 1178 0x528000 0x539000 rw- 10 0 1 0 C--- vn /root/test 1178 0x539000 0x55a000 rw- 16 16 1 0 C--- df 1178 0x800528000 0x800a28000 rw- 118 118 1 0 C--- df 1178 0x800a28000 0x800a68000 rw- 1 1 1 0 CN-- df 1178 0x800a68000 0x800aa8000 rw- 2 2 1 0 CN-- df 1178 0x800aa8000 0x800c08000 rw- 50 50 1 0 CN-- df 1178 0x800c08000 0x800c48000 rw- 2 2 1 0 CN-- df 1178 0x800c48000 0x800c88000 rw- 1 1 1 0 CN-- df 1178 0x800c88000 0x800cc8000 rw- 1 1 1 0 CN-- df 1178 0xc000000000 0xc000001000 rw- 1 1 1 0 CN-- df 1178 0xc41ffe0000 0xc41ffe8000 rw- 8 8 1 0 CN-- df 1178 0xc41ffe8000 0xc41fff0000 rw- 8 8 1 0 CN-- df 1178 0xc41fff0000 0xc41fff8000 rw- 8 8 1 0 C--- df 1178 0xc41fff8000 0xc420300000 rw- 553 553 1 0 C--- df 1178 0xc420300000 0xc420400000 rw- 234 234 1 0 C--- df 1178 0x7ffffffdf000 0x7ffffffff000 rwx 2 2 1 0 C--D df 1178 0x7ffffffff000 0x800000000000 r-x 1 1 33 0 ---- ph This is from FreeBSD 12.0-CURRENT #36 r309618M ktrace on 11.0-RELEASE is still running 6 hours so far. Regards Steve