Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 6 Dec 2016 13:53:52 +0000
From:      Steven Hartland <killing@multiplay.co.uk>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   Re: Help needed to identify golang fork / memory corruption issue on FreeBSD
Message-ID:  <8b502580-4d2d-1e1f-9e05-61d46d5ac3b1@multiplay.co.uk>
In-Reply-To: <20161206125919.GQ54029@kib.kiev.ua>
References:  <27e1a828-5cd9-0755-50ca-d7143e7df117@multiplay.co.uk> <20161206125919.GQ54029@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On 06/12/2016 12:59, Konstantin Belousov wrote:
> On Tue, Dec 06, 2016 at 12:31:47PM +0000, Steven Hartland wrote:
>> Hi guys I'm trying to help identify / fix an issue with golang where by
>> fork results in memory corruption.
>>
>> Details of the issue can be found here:
>> https://github.com/golang/go/issues/15658
>>
>> In summary when a fork is done in golang is has a chance of causing
>> memory corruption in the parent resulting in a process crash once detected.
>>
>> Its believed that this only effects FreeBSD.
>>
>> This has similarities to other reported issues such as this one which
>> impacted perl during 10.x:
>> https://rt.perl.org/Public/Bug/Display.html?id=122199
> I cannot judge about any similarilities when all the description provided
> is 'memory corruption'. BTW, the perl issue described, where child segfaults
> after the fork, is more likely to be caused by the set of problems referenced
> in the FreeBSD-EN-16:17.vm.
>
>> And more recently the issue with nginx on 11.x:
>> https://lists.freebsd.org/pipermail/freebsd-stable/2016-September/085540.html
> Which does not affect anything unless aio is used on Sandy/Ivy.
>
>> Its possible, some believe likely, that this is a kernel bug around fork
>> / vm that golang stresses, but I've not been able to confirm.
>>
>> I can reproduce the issue at will, takes between 5mins and 1hour using
>> 16 threads, and it definitely seems like an interaction between fork and
>> other memory operations.
> Which arch is the kernel and the process which demonstrates the behaviour  ?
> I mean i386/amd64.
amd64
>
>> I've tried reproducing the issue in C but also no joy (captured in the bug).
>>
>> For reference I'm currently testing on 11.0-RELEASE-p3 + kibs PCID fix
>> (#306350).
> Switch to HEAD kernel, for start.
> Show the memory map of the failed process.
> Are you able to take ktrace of the process while still producing the bug ?
When ever I've tried ktrace the issue doesn't present itself.

I can try and run it for an extended period to see if it does eventually 
but I did run it for a few hours without any joy.

I'm currently testing with a 11.0-RELEASE debug kernel, witness, 
invariants etc to see if that would detect anything; however so far its 
taking longer than usual to reproduce so it may simply not occur with a 
debug kernel.

> Where is the memory corruption happen ? Is it in go runtime structures,
> or in the application data ?
Its usually detected by the runtime GC which panics with a number of 
errors e.g.
fatal error: all goroutines are asleep - deadlock!

fatal error: workbuf is empty

runtime: nelems=256 nfree=233 nalloc=23 previous allocCount=18 nfreed=65531
fatal error: sweep increased allocation count

runtime: failed MSpanList_Remove 0x800698500 0x800b46d40 0x53adb0 0x53ada0
fatal error: MSpanList_Remove

As the test is very basic its unlikely to see an issue in the 
application data.

> Can somebody knowledgable of either the go runtime or the app,
> try to identify the initial corrupted userspace data ?
The golang developers have looked but where unable to reproduce on 
freebsd-amd64-gce101 gomote running FreeBSD 10.1. This could be a factor 
of the VM its unclear.

The app is tiny test binary which I'm current running with GOGC=2:
package main

import (
         "fmt"
         "os/exec"
         "runtime"
         "time"
)

var (
         gcPeriod     = time.Second * 10
         forkRoutines = 16
)

func run(done chan struct{}) {
         cmd := exec.Command("/usr/bin/true")
         cmd.Start()
         cmd.Wait()

         done <- struct{}{}
}

func main() {
         fmt.Printf("Starting %v forking goroutines...\n", forkRoutines)
         fmt.Println("GOMAXPROCS:", runtime.GOMAXPROCS(0))

         done := make(chan struct{}, forkRoutines*2)

         for i := 0; i < forkRoutines; i++ {
                 go run(done)
         }

         for {
                 start := time.Now()
                 active := forkRoutines
         forking:
                 for range done {
                         if time.Since(start) > gcPeriod {
                                 active--
                                 if active == 0 {
                                         break forking
                                 }
                         } else {
                                 go run(done)
                         }
                 }

                 runtime.GC()

                 for i := 0; i < forkRoutines; i++ {
                         go run(done)
                 }
         }
}



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8b502580-4d2d-1e1f-9e05-61d46d5ac3b1>