From owner-freebsd-stable@freebsd.org Tue Jul 5 05:26:27 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 968C5B855D2 for ; Tue, 5 Jul 2016 05:26:27 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 76AAA18E3 for ; Tue, 5 Jul 2016 05:26:27 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: by mailman.ysv.freebsd.org (Postfix) id 71F22B855CF; Tue, 5 Jul 2016 05:26:27 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 71946B855CE for ; Tue, 5 Jul 2016 05:26:27 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mail-it0-x233.google.com (mail-it0-x233.google.com [IPv6:2607:f8b0:4001:c0b::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 45C4B18E2 for ; Tue, 5 Jul 2016 05:26:27 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: by mail-it0-x233.google.com with SMTP id h190so76087058ith.1 for ; Mon, 04 Jul 2016 22:26:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sippysoft-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:from:date:message-id:subject:to; bh=cEuLcSck0eUC3ETcsKybUeZuj1h1cE82SGaPpeXsozw=; b=OBUujVrFLBLTktJQpJGrjIQMCw7UscFvYmwt7KbLsJT4p4x11E4lHJWuEQ1Dp7QxC+ oya5gvOmKhwp0EPyuWAMRo/mjSz5Sg8SutyhwpNlLkeUfOrpINIuB8etQgUtY/WhngJF wt6HQWusvyDAtmJh1DTT5RxZSBUHlxo5hSQAdaPvb9n4r9JSZaEtVbdPFjDaSbrpcdOk DPWFPO440lZfqIzOO3sNzT0Vd136u9YmML6E21WvUzUbjgZFu9WebUbUru5trpfCWK9u VeotiqfZW5OtZ8NqSojSf+Z6q/KagZACAxbVorZejJZ2g0HoymUMrjce1FuD1F8kNH6q GQBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:from:date:message-id:subject :to; bh=cEuLcSck0eUC3ETcsKybUeZuj1h1cE82SGaPpeXsozw=; b=W2sIdi3JJiumOEFzubC8PUVN9K/tWNCtJQTgwWzhcgw0tA2fHhNWpGeGa6+c6HfrTA J9wCxElaoSnDbI26zhJE7dqTtzpQd2Jf3zSIZhqgh66vZ1NAUp16U7VLXpUHaFBjSWzR uSKoDA5vrnM3pMl/nbEqmUhpXBoB1CmUxYKIOV8yNSym8FL6qmJBbmYffObGXTrPGS0M xt+t+NouaXN3buBHrSMcWf54tbrVwR37rzZQEywioixMzlVJgOsZildrZRHPRdIWzkzj GntPYrmEINzHWLxLZgeR+3AmK+NE2XO4wPDSG4OLYMVPjtAW7DWieVnDgG0sBBWMeWya 0FZA== X-Gm-Message-State: ALyK8tKJkrX5mWYsWjSFbXPFMF0UhmUaYqq/hRW0KUHWisORDV9eJ6OfhTeuLPaUDH9MytuSJtg4lQPA+j8+jIzO X-Received: by 10.36.91.66 with SMTP id g63mr11055580itb.16.1467696386364; Mon, 04 Jul 2016 22:26:26 -0700 (PDT) MIME-Version: 1.0 Sender: sobomax@sippysoft.com Received: by 10.36.59.193 with HTTP; Mon, 4 Jul 2016 22:26:25 -0700 (PDT) From: Maxim Sobolev Date: Mon, 4 Jul 2016 22:26:25 -0700 X-Google-Sender-Auth: DIV2CM4kakl33DY5WwTrB3JuOKg Message-ID: Subject: A faulty program corrupts some its data preventing correct core generation (Failed to write core file for process postgres (error 14)) To: stable@freebsd.org, hackers@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 05:26:27 -0000 Hi all, investigating some random postgresql-9.1.21 server crashes on FreeBSD 10.3, we've started seeing those after upgrading from postgres 9.1.18 on more than one system, so hardware (e.g. RAM issues) are very unlikely. I suspect that postgres is at fault, however I am also curious how could it be that kernel is not capable of generating core file when application does something silly? Is it that some ELF-related data structures got corrupted or something else? Are we protecting the page where ELF header is mapped with R/O flag? I am looking at possibly recreating this by poking around elf header(s), seeing if I can corrupt it in a similar manner reliably, any pointers or suggestions are appreciated. Jun 27 04:10:18 dal12 kernel: Failed to write core file for process postgres (error 14) Jun 27 04:10:18 dal12 kernel: pid 41361 (postgres), uid 70: exited on signal 11 Jul 1 05:21:46 dal12 kernel: Failed to write core file for process postgres (error 14) Jul 1 05:21:46 dal12 kernel: pid 1722 (postgres), uid 70: exited on signal 11 #define EFAULT 14 /* Bad address */ The resulting files are truncated and is not really usable for anything. We've seen the same issue -rw------- 1 pgsql wheel 1310720 Jun 27 04:10 postgres.41361.core -rw------- 1 pgsql wheel 1310720 Jul 1 05:21 postgres.1722.core [ssp-root@dal12 /var/tmp]$ sudo gdb711 postgres postgres.1722.core GNU gdb (GDB) 7.11 [GDB v7.11 for FreeBSD] Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd10.3". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from postgres...(no debugging symbols found)...done. BFD: Warning: /var/tmp/postgres.1722.core is truncated: expected core file size >= 517120000, found: 1310720. [New LWP 100261] Core was generated by `postgres'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 (gdb) where #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 Backtrace stopped: Cannot access memory at address 0x7fffffffdd08 (gdb) q -Max