From owner-freebsd-hackers@freebsd.org Tue Mar 28 11:39:07 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 201DDD21E71 for ; Tue, 28 Mar 2017 11:39:07 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A0B44288; Tue, 28 Mar 2017 11:39:06 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id v2SBcxe8069171 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 28 Mar 2017 14:39:00 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua v2SBcxe8069171 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id v2SBcx2q069170; Tue, 28 Mar 2017 14:38:59 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 28 Mar 2017 14:38:59 +0300 From: Konstantin Belousov To: Steven Hartland Cc: "K. Macy" , "freebsd-hackers@freebsd.org" Subject: Re: Help needed to identify golang fork / memory corruption issue on FreeBSD Message-ID: <20170328113859.GS43712@kib.kiev.ua> References: <20170317124437.GR16105@kib.kiev.ua> <5ba92447-945e-6fea-ad4f-f58ac2a0012e@multiplay.co.uk> <20170327161833.GL43712@kib.kiev.ua> <3ec35a46-ae70-35cd-29f8-82e7cebb0eb6@multiplay.co.uk> <20170327164905.GN43712@kib.kiev.ua> <17f29342-f3c0-5940-d012-1a698e59a384@multiplay.co.uk> <20170328075859.GQ43712@kib.kiev.ua> <85f86a20-948f-025a-0d09-92ee2a815136@multiplay.co.uk> <20170328083810.GR43712@kib.kiev.ua> <5aa653ba-30e1-c9de-46ce-bad74d78c40c@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5aa653ba-30e1-c9de-46ce-bad74d78c40c@multiplay.co.uk> User-Agent: Mutt/1.8.0 (2017-02-23) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Mar 2017 11:39:07 -0000 On Tue, Mar 28, 2017 at 09:48:23AM +0100, Steven Hartland wrote: > On 28/03/2017 09:38, Konstantin Belousov wrote: > > On Tue, Mar 28, 2017 at 09:23:24AM +0100, Steven Hartland wrote: > >> As I stopped the panic before that I couldn't tell so I've re-run with > >> some debug added just before the panic to capture the addresses of the > >> workbuf structure that the issue was detected in, here goes (parent: > >> 62620, child: 98756): > >> > >> workbuf: 0x800b51800 > >> fatal error: workbuf is not empty > >> workbuf: 0x800a72000 > >> fatal error: workbuf is empty > >> workbuf: 0x800a72000 > >> fatal error: workbuf is not empty > > I do not understand. Why do you show several addresses ? Wouldn't the > > runtime panic after detecting the discrepancy, so there could be only one > > address ? > There are several goroutines (threads) running each detected an error, > as I'm blocking the panic by entering a sleep in the faulting goroutine > to enable the capture of procstat, other routines continue and detect an > error too. Ok. So I tried to simulate the load with an isolated test. Code below is naive, but it should illustrate the idea. Parent allocates some number of private-mapped areas, then runs threads which write bytes into the areas. Simultaneously parent forks children which write distinct byte into the same anonymous memory. Parent checks that it cannot see a byte written by children. So far it did not tripped on my test machine. Feel free to play with it, if you have more insights what go runtime does, modify the code to simulate the failing test more accurately. /* $Id: cowfail.c,v 1.1 2017/03/28 11:29:58 kostik Exp kostik $ */ #include #include #include #include #include #include #include #include #include #include #include static char **areas; static int nareas, nchildren, children, nthreads; static size_t areasz; static const char parent_chars[] = "ab"; static const char child_char = 'c'; static int gen_idx(void) { return (random() % nareas); } static void fill_area(int idx, bool parent) { char *area; char f; area = areas[idx]; f = parent ? parent_chars[random() % sizeof(parent_chars)] : child_char; memset(area, f, areasz); } static void check_area(int idx) { char *area; size_t i; area = areas[idx]; for (i = 0; i < areasz; i++) { if (area[i] == child_char) errx(1, "corrupted area"); } } static void child(void) { int i, idx; for (i = 0; i < 100; i++) { idx = gen_idx(); fill_area(idx, false); } _exit(0); } static void * wthread(void *arg __unused) { for (;;) { fill_area(gen_idx(), true); check_area(gen_idx()); } return (NULL); } int main(void) { pthread_t thr; sigset_t sigs; pid_t pid; int error, i, status; nareas = 1024; nchildren = 8; nthreads = 4; areasz = 1024 * 1024; sigemptyset(&sigs); sigaddset(&sigs, SIGCHLD); error = sigprocmask(SIG_BLOCK, &sigs, NULL); if (error == -1) err(1, "sigprocmask"); areas = calloc(nareas, sizeof(char *)); if (areas == NULL) err(1, "calloc nareas"); for (i = 0; i < nareas; i++) { areas[i] = mmap(NULL, areasz, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); if (areas[i] == MAP_FAILED) err(1, "mmap %d", i); } for (i = 0; i < nthreads; i++) { error = pthread_create(&thr, NULL, wthread, NULL); if (error != 0) errc(1, error, "pthread_create"); } for (;;) { if (children < nchildren) { pid = fork(); if (pid == -1) { err(1, "fork"); } else if (pid == 0) { child(); } else { children++; } } else { pid = waitpid(-1, &status, 0); if (pid == -1) { if (errno != EINTR) err(1, "waitpid"); } else { children--; } } } }