From owner-freebsd-hackers@freebsd.org Mon Mar 27 23:16:40 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D2AC0D200F6 for ; Mon, 27 Mar 2017 23:16:40 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wr0-x235.google.com (mail-wr0-x235.google.com [IPv6:2a00:1450:400c:c0c::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6013A1BD for ; Mon, 27 Mar 2017 23:16:40 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wr0-x235.google.com with SMTP id w43so65989206wrb.0 for ; Mon, 27 Mar 2017 16:16:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to; bh=PZFcqGUpLQmIzA2FxR7xd8o80B3Pnkxu31kGH2lnmZ0=; b=QawtVmiM2deKrk8YeSSz7uwTm2p2IFH5j6PUYEqC7XY12ZPUWae+aFe6jC8CqFQujO i0RDM+GJueAI9JsCGJw+MibSSr2T4GuVv7ifm12B7Fu4UE6ibpIueQnANBkBuiNn3uT2 SQ9rK5Pf1s/TVxlzkbM7tlQUMORXaACwbCuuXAQu/lpRpSOYaa9wMZTp6dEitHS+HSk8 2pGg7tTZGexRwznOWrHAkAm04f1Ch1dfjQxD6WGYfvEDNIN70YuLjGQMY/mScPPtJgFv eJMC/WydXlfoS3mw6j+HGADGmaooTCzHBo3zpaxggUCM+O8nZigLay77lDbdBEex3Zng TsSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to; bh=PZFcqGUpLQmIzA2FxR7xd8o80B3Pnkxu31kGH2lnmZ0=; b=RkkUhbRRGupOHu4c7+S9L/Ba2uiroPBAll3EygppqmwM6gm1yFwFZ+XxJgfnpiBokl ZCGnsjMp2fsDzrxdYggDV37dI5actWcmVPFLO8pGhjgq8MRpjPMo3MGLJgWp53s+EOVM tSLcKlcstuvRUKwjWbuPjGZCRemz1jZqvMZFQBylXbWHyXJLaFaIm2oro+8qven6gyLb 7ufChODf5cbR/gxIJuBe/0L1GaGvthHwrAyKPHzLivoW9s8HAYCeIVE0uuw4jLpQq2Fp snrqQAwDjcsi23wq8Jeq8BDMuPWNx7VKR5Gv3VfNyTyfTo0czZeEury4XyAQI1za6TkN +bUw== X-Gm-Message-State: AFeK/H0O9wmxQYKu9sMH0RefApoA+JHzOgMdoiUGBn+kcpyJZMsx7CDerOomeidOqUzPLGOk X-Received: by 10.28.153.20 with SMTP id b20mr6588118wme.76.1490656598475; Mon, 27 Mar 2017 16:16:38 -0700 (PDT) Received: from [10.10.1.58] ([185.97.61.26]) by smtp.gmail.com with ESMTPSA id u145sm1237481wmu.1.2017.03.27.16.16.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 Mar 2017 16:16:37 -0700 (PDT) Subject: Re: Help needed to identify golang fork / memory corruption issue on FreeBSD To: Konstantin Belousov References: <20161206143532.GR54029@kib.kiev.ua> <18b40a69-4460-faf2-c0ce-7491eca92782@multiplay.co.uk> <20170317082333.GP16105@kib.kiev.ua> <180a601b-5481-bb41-f7fc-67976aabe451@multiplay.co.uk> <20170317124437.GR16105@kib.kiev.ua> <5ba92447-945e-6fea-ad4f-f58ac2a0012e@multiplay.co.uk> <20170327161833.GL43712@kib.kiev.ua> <3ec35a46-ae70-35cd-29f8-82e7cebb0eb6@multiplay.co.uk> <20170327164905.GN43712@kib.kiev.ua> Cc: "K. Macy" , "freebsd-hackers@freebsd.org" From: Steven Hartland Message-ID: Date: Tue, 28 Mar 2017 00:16:38 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20170327164905.GN43712@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Mar 2017 23:16:40 -0000 On 27/03/2017 17:49, Konstantin Belousov wrote: > On Mon, Mar 27, 2017 at 05:33:49PM +0100, Steven Hartland wrote: >> On 27/03/2017 17:18, Konstantin Belousov wrote: >>> On Mon, Mar 27, 2017 at 12:47:11PM +0100, Steven Hartland wrote: >>>> OK now the similar but unrelated issue with signal stacks is solved I've >>>> moved back to the initial issue. >>>> >>>> I've made some progress with a reproduction case as detailed here: >>>> https://github.com/golang/go/issues/15658#issuecomment-288747812 >>>> >>>> In short it seems that having a running child, while the parent runs GC, >>>> is some how responsible for memory corruption in the parent. >>>> >>>> The reason I believe this is if I run the same GC in the parent after >>>> the child exits instead of while its running, I've been unable to >>>> reproduce the issue. >>>> >>>> As the memory segments are COW then the issue might be in VM subsystem. >>> Well, it might be, but it is a strange corruption mode to believe. >> Indeed, but would you agree the evidence seems to indicate that this may >> be the case, as otherwise I would have expected that running the GC >> after the child process has exited would have zero impact on the issue. >>>> In order to confirm / deny this I was wondering if there was a way to >>>> force a full copy of all segments for the child instead of using the COW >>>> optimisation. >>> No, there is no. By design, copying only occurs on faults, when VM >>> detects that the map entry needs copying. Doing the actual copy at fork >>> time would require writing a lot of new code. >> I noticed in vm_map_copy_entry the following: >> /* >> * We don't want to make writeable wired pages >> copy-on-write. >> * Immediately copy these pages into the new map by >> simulating >> * page faults. The new pages are pageable. >> */ >> vm_fault_copy_entry(dst_map, src_map, dst_entry, src_entry, >> fork_charge); >> >> I wondered if I could use vm_fault_copy_entry to force the copy on fork? > No, the vm_fault_copy_entry() only works with wired entries, e.g. it cannot > page in not yet touched page, and the result is also wired. > >>> Does go have FreeBSD/i386 port ? If yes, is the issue reproducable there ? >> Yes it does, I don't currently have i386 machine to test with, I'm >> assuming testing i386 on amd64 kernel, would likely not have any effect. > Only if the bug is in kernel and not in the go runtime. I am still not > convinced that the kernel is the culprit. > >>> Another blind experiment to try is to comment out call to >>> vm_object_collapse() in sys/vm/vm_map.c:vm_map_copy_entry() and see if >>> it changes anything. >> I'll do that shortly. Still crashed with vm_object_collapse commented out, here's the parent procstat -v: PID START END PRT RES PRES REF SHD FLAG TP PATH 69713 0x400000 0x70e000 r-x 306 601 3 1 CN-- vn /root/golang/src/test5/test5 69713 0x70e000 0x951000 r-- 263 601 3 1 CN-- vn /root/golang/src/test5/test5 69713 0x951000 0x988000 rw- 31 0 1 0 C--- vn /root/golang/src/test5/test5 69713 0x988000 0x9ab000 rw- 18 18 1 0 C--- df 69713 0x800951000 0x800b51000 rw- 41 41 1 0 C--- df 69713 0x800b51000 0x800c21000 rw- 27 27 1 0 C--- df 69713 0x800c21000 0x800c31000 rw- 16 16 1 0 C--- df 69713 0x800c31000 0x800c71000 rw- 1 1 1 0 C--- df 69713 0x800c71000 0x800cf1000 rw- 5 5 1 0 C--- df 69713 0x800cf1000 0x800d31000 rw- 1 1 1 0 CN-- df 69713 0x800d31000 0x800d71000 rw- 1 1 1 0 C--- df 69713 0x800d71000 0x800e31000 rw- 3 3 1 0 C--- df 69713 0x800e31000 0x800eb1000 rw- 3 3 1 0 C--- df 69713 0x800eb1000 0x800ef1000 rw- 2 2 1 0 C--- df 69713 0xc000000000 0xc000001000 rw- 1 1 1 0 CN-- df 69713 0xc41fff0000 0xc41fff8000 rw- 3 3 1 0 CN-- df 69713 0xc41fff8000 0xc420200000 rw- 267 267 1 0 C--- df 69713 0x7ffffffdf000 0x7ffffffff000 rwx 2 2 1 0 C--D df 69713 0x7ffffffff000 0x800000000000 r-x 1 1 27 0 ---- ph Regards Steve