Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 6 Dec 2016 17:07:53 +0000
From:      Steven Hartland <killing@multiplay.co.uk>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   Re: Help needed to identify golang fork / memory corruption issue on FreeBSD
Message-ID:  <e160381c-9935-6edf-04a9-1ff78e95d818@multiplay.co.uk>
In-Reply-To: <20161206143532.GR54029@kib.kiev.ua>
References:  <27e1a828-5cd9-0755-50ca-d7143e7df117@multiplay.co.uk> <20161206125919.GQ54029@kib.kiev.ua> <8b502580-4d2d-1e1f-9e05-61d46d5ac3b1@multiplay.co.uk> <20161206143532.GR54029@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help

On 06/12/2016 14:35, Konstantin Belousov wrote:
> On Tue, Dec 06, 2016 at 01:53:52PM +0000, Steven Hartland wrote:
>> On 06/12/2016 12:59, Konstantin Belousov wrote:
>>> On Tue, Dec 06, 2016 at 12:31:47PM +0000, Steven Hartland wrote:
>>>> Hi guys I'm trying to help identify / fix an issue with golang where by
>>>> fork results in memory corruption.
>>>>
>>>> Details of the issue can be found here:
>>>> https://github.com/golang/go/issues/15658
>>>>
>>>> In summary when a fork is done in golang is has a chance of causing
>>>> memory corruption in the parent resulting in a process crash once detected.
>>>>
>>>> Its believed that this only effects FreeBSD.
>>>>
>>>> This has similarities to other reported issues such as this one which
>>>> impacted perl during 10.x:
>>>> https://rt.perl.org/Public/Bug/Display.html?id=122199
>>> I cannot judge about any similarilities when all the description provided
>>> is 'memory corruption'. BTW, the perl issue described, where child segfaults
>>> after the fork, is more likely to be caused by the set of problems referenced
>>> in the FreeBSD-EN-16:17.vm.
>>>
>>>> And more recently the issue with nginx on 11.x:
>>>> https://lists.freebsd.org/pipermail/freebsd-stable/2016-September/085540.html
>>> Which does not affect anything unless aio is used on Sandy/Ivy.
>>>
>>>> Its possible, some believe likely, that this is a kernel bug around fork
>>>> / vm that golang stresses, but I've not been able to confirm.
>>>>
>>>> I can reproduce the issue at will, takes between 5mins and 1hour using
>>>> 16 threads, and it definitely seems like an interaction between fork and
>>>> other memory operations.
>>> Which arch is the kernel and the process which demonstrates the behaviour  ?
>>> I mean i386/amd64.
>> amd64
> How large is the machine, how many cores, what is the physical memory size ?
24 cores 32GB RAM.

CPU: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz (2500.06-MHz K8-class CPU)
   Origin="GenuineIntel"  Id=0x206d7  Family=0x6  Model=0x2d Stepping=7
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> 
Features2=0x1fbee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX>
   AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
   AMD Features2=0x1<LAHF>
   XSAVE Features=0x1<XSAVEOPT>
   VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
   TSC: P-state invariant, performance statistics
real memory  = 34359738368 (32768 MB)
avail memory = 33209896960 (31671 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <DELL   DCSRADON>
FreeBSD/SMP: Multiprocessor System Detected: 24 CPUs
FreeBSD/SMP: 2 package(s) x 6 core(s) x 2 hardware threads

The HEAD box I'm just updating to run the test on has the same CPU but 
128GB of RAM.
>>>> I've tried reproducing the issue in C but also no joy (captured in the bug).
>>>>
>>>> For reference I'm currently testing on 11.0-RELEASE-p3 + kibs PCID fix
>>>> (#306350).
>>> Switch to HEAD kernel, for start.
>>> Show the memory map of the failed process.
>>> Are you able to take ktrace of the process while still producing the bug ?
>> When ever I've tried ktrace the issue doesn't present itself.
>>
>> I can try and run it for an extended period to see if it does eventually
>> but I did run it for a few hours without any joy.
>>
>> I'm currently testing with a 11.0-RELEASE debug kernel, witness,
>> invariants etc to see if that would detect anything; however so far its
>> taking longer than usual to reproduce so it may simply not occur with a
>> debug kernel.
>>
>>> Where is the memory corruption happen ? Is it in go runtime structures,
>>> or in the application data ?
>> Its usually detected by the runtime GC which panics with a number of
>> errors e.g.
>> fatal error: all goroutines are asleep - deadlock!
>>
>> fatal error: workbuf is empty
>>
>> runtime: nelems=256 nfree=233 nalloc=23 previous allocCount=18 nfreed=65531
>> fatal error: sweep increased allocation count
>>
>> runtime: failed MSpanList_Remove 0x800698500 0x800b46d40 0x53adb0 0x53ada0
>> fatal error: MSpanList_Remove
>>
>> As the test is very basic its unlikely to see an issue in the
>> application data.
>>
>>> Can somebody knowledgable of either the go runtime or the app,
>>> try to identify the initial corrupted userspace data ?
>> The golang developers have looked but where unable to reproduce on
>> freebsd-amd64-gce101 gomote running FreeBSD 10.1. This could be a factor
>> of the VM its unclear.
> This is not what I asked.  I am asking is it possible to make an educated
> guess at what initial corruption could be to cause the outcome.  Like,
> if this variable suddently becomes zero, we get the errors.
I'll have a look through the crashes dumps I have to see if things point 
to null / zeroed memory.
> Does go runtime use FreeBSD libc and threading library ?
No it doesn't, each built binary its totally standalone and uses asm for 
core system calls.

The runtime directly creates kernel threads with thr_new, which it then 
manages internally.

One possibly important difference between golang and C apps is it uses 
goroutines which are lightweight so called green threads which are 
mapped onto a set of kernel threads. A pretty good write up of this can 
be found here: http://blog.nindalf.com/how-goroutines-work/

     Regards
     Steve



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?e160381c-9935-6edf-04a9-1ff78e95d818>