From owner-freebsd-hackers@freebsd.org Tue Jul 5 05:26:27 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9A52DB855D3 for ; Tue, 5 Jul 2016 05:26:27 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 79E1218E4 for ; Tue, 5 Jul 2016 05:26:27 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: by mailman.ysv.freebsd.org (Postfix) id 7274FB855D1; Tue, 5 Jul 2016 05:26:27 +0000 (UTC) Delivered-To: hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 721FBB855D0 for ; Tue, 5 Jul 2016 05:26:27 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mail-it0-x22e.google.com (mail-it0-x22e.google.com [IPv6:2607:f8b0:4001:c0b::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 45BE718E1 for ; Tue, 5 Jul 2016 05:26:27 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: by mail-it0-x22e.google.com with SMTP id j185so7707757ith.1 for ; Mon, 04 Jul 2016 22:26:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sippysoft-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:from:date:message-id:subject:to; bh=cEuLcSck0eUC3ETcsKybUeZuj1h1cE82SGaPpeXsozw=; b=OBUujVrFLBLTktJQpJGrjIQMCw7UscFvYmwt7KbLsJT4p4x11E4lHJWuEQ1Dp7QxC+ oya5gvOmKhwp0EPyuWAMRo/mjSz5Sg8SutyhwpNlLkeUfOrpINIuB8etQgUtY/WhngJF wt6HQWusvyDAtmJh1DTT5RxZSBUHlxo5hSQAdaPvb9n4r9JSZaEtVbdPFjDaSbrpcdOk DPWFPO440lZfqIzOO3sNzT0Vd136u9YmML6E21WvUzUbjgZFu9WebUbUru5trpfCWK9u VeotiqfZW5OtZ8NqSojSf+Z6q/KagZACAxbVorZejJZ2g0HoymUMrjce1FuD1F8kNH6q GQBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:from:date:message-id:subject :to; bh=cEuLcSck0eUC3ETcsKybUeZuj1h1cE82SGaPpeXsozw=; b=hNfcxSYFCRBwR1PyZfnOSZ5Y8ADW75hle0K1m6sOIpSSUkeu0T4CFqx065insmI14T 8mCBK5/k1macOPFeByZHV+yrc9F/xGraV21oyig3WHa9MRvGE0eXsXHdl/l+0KZWEaH0 14koeUqV32GnW2gk12boXFYRd0EvSz5Em8MwSYCl7KbhVxTz3C4Fju4Xp5ik+gK5CWJf 5dwD9RUfP5dW3b13J6MS75sSUG2hdcFQN173rf6GYmYVYXP3lNuUtmaSUqlGDh4AHYh2 gYcD5WmZm2XZmaJWqYniLPypoqgxLtjSVPjMDUo5ZiCdT4gM78N6+iDlUYJIzv+ZAK6L mvHw== X-Gm-Message-State: ALyK8tJLSDKS0xyr7b3alrR2X9sI1Fkf+EpqkZEf0P9inGtB1vtTQTihmnhKaSTSFbVH2z9QxlpgyFQn+THcKRQG X-Received: by 10.36.91.66 with SMTP id g63mr11055580itb.16.1467696386364; Mon, 04 Jul 2016 22:26:26 -0700 (PDT) MIME-Version: 1.0 Sender: sobomax@sippysoft.com Received: by 10.36.59.193 with HTTP; Mon, 4 Jul 2016 22:26:25 -0700 (PDT) From: Maxim Sobolev Date: Mon, 4 Jul 2016 22:26:25 -0700 X-Google-Sender-Auth: DIV2CM4kakl33DY5WwTrB3JuOKg Message-ID: Subject: A faulty program corrupts some its data preventing correct core generation (Failed to write core file for process postgres (error 14)) To: stable@freebsd.org, hackers@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 05:26:27 -0000 Hi all, investigating some random postgresql-9.1.21 server crashes on FreeBSD 10.3, we've started seeing those after upgrading from postgres 9.1.18 on more than one system, so hardware (e.g. RAM issues) are very unlikely. I suspect that postgres is at fault, however I am also curious how could it be that kernel is not capable of generating core file when application does something silly? Is it that some ELF-related data structures got corrupted or something else? Are we protecting the page where ELF header is mapped with R/O flag? I am looking at possibly recreating this by poking around elf header(s), seeing if I can corrupt it in a similar manner reliably, any pointers or suggestions are appreciated. Jun 27 04:10:18 dal12 kernel: Failed to write core file for process postgres (error 14) Jun 27 04:10:18 dal12 kernel: pid 41361 (postgres), uid 70: exited on signal 11 Jul 1 05:21:46 dal12 kernel: Failed to write core file for process postgres (error 14) Jul 1 05:21:46 dal12 kernel: pid 1722 (postgres), uid 70: exited on signal 11 #define EFAULT 14 /* Bad address */ The resulting files are truncated and is not really usable for anything. We've seen the same issue -rw------- 1 pgsql wheel 1310720 Jun 27 04:10 postgres.41361.core -rw------- 1 pgsql wheel 1310720 Jul 1 05:21 postgres.1722.core [ssp-root@dal12 /var/tmp]$ sudo gdb711 postgres postgres.1722.core GNU gdb (GDB) 7.11 [GDB v7.11 for FreeBSD] Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd10.3". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from postgres...(no debugging symbols found)...done. BFD: Warning: /var/tmp/postgres.1722.core is truncated: expected core file size >= 517120000, found: 1310720. [New LWP 100261] Core was generated by `postgres'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 (gdb) where #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 Backtrace stopped: Cannot access memory at address 0x7fffffffdd08 (gdb) q -Max