From owner-freebsd-current@FreeBSD.ORG Mon Jan 26 19:32:35 2015 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A2C8A403; Mon, 26 Jan 2015 19:32:35 +0000 (UTC) Received: from mail.ijs.si (mail.ijs.si [IPv6:2001:1470:ff80::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 30B5CD5; Mon, 26 Jan 2015 19:32:35 +0000 (UTC) Received: from amavis-proxy-ori.ijs.si (localhost [IPv6:::1]) by mail.ijs.si (Postfix) with ESMTP id 3kWLnV1lxXzb9; Mon, 26 Jan 2015 20:32:30 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ijs.si; h= user-agent:message-id:organization:subject:subject:from:from :date:date:content-transfer-encoding:content-type:content-type :mime-version:received:received:received:received; s=jakla4; t= 1422300742; x=1424892743; bh=FPU/iGlKX/TBWUaPN5ywvkGCkZsvwfGWIdn YX6YUvU4=; b=aJhDzUSH0EpWSS6xsHS0+bq5QXERbAzMELS/OMfvbnsLGcxvvW+ dhw6ZA7iI2WC2gjxGqN48G7qyV4PXfWoItu5k5xQMMKmFTfs8OBgWDSYBK6ky00+ 2MCy3uPQqdekH8uv0aIb3n9xWqwyibU2kNYHswxSMLUjwPtfnCbKZtNw= X-Virus-Scanned: amavisd-new at ijs.si Received: from mail.ijs.si ([IPv6:::1]) by amavis-proxy-ori.ijs.si (mail.ijs.si [IPv6:::1]) (amavisd-new, port 10012) with ESMTP id Mxl_gSk0gBLR; Mon, 26 Jan 2015 20:32:22 +0100 (CET) Received: from mildred.ijs.si (mailbox.ijs.si [IPv6:2001:1470:ff80::143:1]) by mail.ijs.si (Postfix) with ESMTP; Mon, 26 Jan 2015 20:32:22 +0100 (CET) Received: from neli.ijs.si (neli.ijs.si [IPv6:2001:1470:ff80:88:21c:c0ff:feb1:8c91]) by mildred.ijs.si (Postfix) with ESMTP id 3kWLnL2wyWzfS; Mon, 26 Jan 2015 20:32:22 +0100 (CET) Received: from neli.ijs.si ([2001:1470:ff80:88:21c:c0ff:feb1:8c91]) by neli.ijs.si with HTTP (HTTP/1.1 POST); Mon, 26 Jan 2015 20:32:22 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 26 Jan 2015 20:32:22 +0100 From: Mark Martinec To: freebsd-current@freebsd.org, perl@freebsd.org Subject: Memory corruption in a master perl process after child exits - only under FreeBSD 10.0 amd64 (not in 10.1 or 9.*) Organization: J. Stefan Institute Message-ID: <1ac9f02be1360da3969ddb9501d0375a@mailbox.ijs.si> X-Sender: Mark.Martinec+freebsd@ijs.si User-Agent: Roundcube Webmail/1.0.4 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jan 2015 19:32:35 -0000 There is a problem report since July 2014 in a Perl bug tracker, which seems to affect only FreeBSD 10.0 amd64 (regardless of a version of Perl or usage of clang vs. gcc compiler): https://rt.perl.org/Ticket/Display.html?id=122199 I wonder if someone intimately familiar with handling of virtual memory, fork, swap, and process exit / wait(2) under FreeBSD would be able to recognize what has changed in these areas between 9.2 -> 10.0 and 10.0 -> 10.1, so that only 10.0 is misbehaving, but 10.1 apparently fixed the problem again. Below is my short summary of the issue (it is the last comment in the referenced problem report). Further details are in that PR. It's been a real mystery, difficult to reproduce, but definitely there. It might be a Perl bug, but it looks ever more likely that it is a FreeBSD issue. Mark After upgrading to FreeBSD 10.1 (from 10.0) and running the same application with the same version of Perl for two months now, with child process periodic retiring and re-spawning new child process by a master process as previously under FreeBSD 9.x, I can now confirm that the problem no longer occurs. I can also confirm that the problem under 10.0 can be avoided by not letting child processes to voluntarily exit, so the master process never sees a child termination in wait() and never needs to spawn (fork) another child process. A brief summary of the problem: Setup: an application consisting of a master perl process spawning worker child processes, which periodically voluntarily self-terminate, to be replaced by a fresh child process forked from the master process. Environent: - occurs only on FreeBSD 10.0 amd64, any recent version of perl, gcc or clang. - does not occur on FreeBSD 9.x or 10.1, and not on i383, not reproducible on Linux What seems to be happening: - a child process after doing some work (possibly touching swap) does a normal exit; - a parent process gets a SIGCHLD signal, handles a wait() and for some obscure reason some of its memory gets corrupted; - a parent process forks creating a new worker child process, which inherits corrupted sections of parent's memory, consequently later leading to its (child) crash if it happens to use that part of the memory (opcodes or data structures) during its normal work. Any newly born child process inherits the same memory corruption and crashes alike. So it seems the problem is somehow connected with how FreeBSD 10.0 on amd64 manages virtual memory (fork, exit, wait, possibly involving swap). The problem is apparently fixed in 10.1, and not present in 9.x. Does anybody have a sound explanation?