From owner-freebsd-hackers@freebsd.org Tue Jul 5 11:48:16 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6F91DB21F9C for ; Tue, 5 Jul 2016 11:48:16 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 57DF01DF9 for ; Tue, 5 Jul 2016 11:48:16 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 53318B21F98; Tue, 5 Jul 2016 11:48:16 +0000 (UTC) Delivered-To: hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 52C25B21F97; Tue, 5 Jul 2016 11:48:16 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F11141DF8; Tue, 5 Jul 2016 11:48:15 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u65Bm9b6022894 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 5 Jul 2016 14:48:09 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u65Bm9b6022894 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u65Bm8AJ022893; Tue, 5 Jul 2016 14:48:08 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 5 Jul 2016 14:48:08 +0300 From: Konstantin Belousov To: Maxim Sobolev Cc: stable@freebsd.org, hackers@freebsd.org Subject: Re: A faulty program corrupts some its data preventing correct core generation (Failed to write core file for process postgres (error 14)) Message-ID: <20160705114808.GN38613@kib.kiev.ua> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.1 (2016-04-27) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Jul 2016 11:48:16 -0000 On Mon, Jul 04, 2016 at 10:26:25PM -0700, Maxim Sobolev wrote: > Hi all, investigating some random postgresql-9.1.21 server crashes on > FreeBSD 10.3, we've started seeing those after upgrading from postgres > 9.1.18 on more than one system, so hardware (e.g. RAM issues) are very > unlikely. I suspect that postgres is at fault, however I am also curious > how could it be that kernel is not capable of generating core file when > application does something silly? Is it that some ELF-related data > structures got corrupted or something else? Are we protecting the page > where ELF header is mapped with R/O flag? I am looking at possibly > recreating this by poking around elf header(s), seeing if I can corrupt it > in a similar manner reliably, any pointers or suggestions are appreciated. > > Jun 27 04:10:18 dal12 kernel: Failed to write core file for process > postgres (error 14) > Jun 27 04:10:18 dal12 kernel: pid 41361 (postgres), uid 70: exited on > signal 11 > Jul 1 05:21:46 dal12 kernel: Failed to write core file for process > postgres (error 14) > Jul 1 05:21:46 dal12 kernel: pid 1722 (postgres), uid 70: exited on signal > 11 > > #define EFAULT 14 /* Bad address */ > > The resulting files are truncated and is not really usable for anything. > We've seen the same issue > > -rw------- 1 pgsql wheel 1310720 Jun 27 04:10 postgres.41361.core > -rw------- 1 pgsql wheel 1310720 Jul 1 05:21 postgres.1722.core > > [ssp-root@dal12 /var/tmp]$ sudo gdb711 postgres postgres.1722.core > GNU gdb (GDB) 7.11 [GDB v7.11 for FreeBSD] > Copyright (C) 2016 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-portbld-freebsd10.3". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > . > Find the GDB manual and other documentation resources online at: > . > For help, type "help". > Type "apropos word" to search for commands related to "word"... > Reading symbols from postgres...(no debugging symbols found)...done. > BFD: Warning: /var/tmp/postgres.1722.core is truncated: expected core file > size >= 517120000, found: 1310720. > [New LWP 100261] > Core was generated by `postgres'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 > (gdb) where > #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 > Backtrace stopped: Cannot access memory at address 0x7fffffffdd08 > (gdb) q > https://lists.freebsd.org/pipermail/freebsd-stable/2016-June/084877.html