Date: Tue, 5 Jul 2016 07:43:52 -0700 From: Maxim Sobolev <sobomax@freebsd.org> To: Konstantin Belousov <kostikbel@gmail.com> Cc: stable@freebsd.org, hackers@freebsd.org Subject: Re: A faulty program corrupts some its data preventing correct core generation (Failed to write core file for process postgres (error 14)) Message-ID: <CAH7qZfvKt7b__M_tM9eBD7VjxbaAQPj5kgurrkFkY36eR3qrAg@mail.gmail.com> In-Reply-To: <20160705114808.GN38613@kib.kiev.ua> References: <CAH7qZfu=XveZCAgS0%2BdzQ_jLs9JiktEV3rER88gwqTiW_Fc9dg@mail.gmail.com> <20160705114808.GN38613@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Seems like candidate for the MFC into releng/10.3 and appropriate errata entry? -Max On Tue, Jul 5, 2016 at 4:48 AM, Konstantin Belousov <kostikbel@gmail.com> wrote: > On Mon, Jul 04, 2016 at 10:26:25PM -0700, Maxim Sobolev wrote: > > Hi all, investigating some random postgresql-9.1.21 server crashes on > > FreeBSD 10.3, we've started seeing those after upgrading from postgres > > 9.1.18 on more than one system, so hardware (e.g. RAM issues) are very > > unlikely. I suspect that postgres is at fault, however I am also curious > > how could it be that kernel is not capable of generating core file when > > application does something silly? Is it that some ELF-related data > > structures got corrupted or something else? Are we protecting the page > > where ELF header is mapped with R/O flag? I am looking at possibly > > recreating this by poking around elf header(s), seeing if I can corrupt > it > > in a similar manner reliably, any pointers or suggestions are > appreciated. > > > > Jun 27 04:10:18 dal12 kernel: Failed to write core file for process > > postgres (error 14) > > Jun 27 04:10:18 dal12 kernel: pid 41361 (postgres), uid 70: exited on > > signal 11 > > Jul 1 05:21:46 dal12 kernel: Failed to write core file for process > > postgres (error 14) > > Jul 1 05:21:46 dal12 kernel: pid 1722 (postgres), uid 70: exited on > signal > > 11 > > > > #define EFAULT 14 /* Bad address */ > > > > The resulting files are truncated and is not really usable for anything. > > We've seen the same issue > > > > -rw------- 1 pgsql wheel 1310720 Jun 27 04:10 > postgres.41361.core > > -rw------- 1 pgsql wheel 1310720 Jul 1 05:21 > postgres.1722.core > > > > [ssp-root@dal12 /var/tmp]$ sudo gdb711 postgres postgres.1722.core > > GNU gdb (GDB) 7.11 [GDB v7.11 for FreeBSD] > > Copyright (C) 2016 Free Software Foundation, Inc. > > License GPLv3+: GNU GPL version 3 or later < > http://gnu.org/licenses/gpl.html > > > > > This is free software: you are free to change and redistribute it. > > There is NO WARRANTY, to the extent permitted by law. Type "show > copying" > > and "show warranty" for details. > > This GDB was configured as "x86_64-portbld-freebsd10.3". > > Type "show configuration" for configuration details. > > For bug reporting instructions, please see: > > <http://www.gnu.org/software/gdb/bugs/>. > > Find the GDB manual and other documentation resources online at: > > <http://www.gnu.org/software/gdb/documentation/>. > > For help, type "help". > > Type "apropos word" to search for commands related to "word"... > > Reading symbols from postgres...(no debugging symbols found)...done. > > BFD: Warning: /var/tmp/postgres.1722.core is truncated: expected core > file > > size >= 517120000, found: 1310720. > > [New LWP 100261] > > Core was generated by `postgres'. > > Program terminated with signal SIGSEGV, Segmentation fault. > > #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 > > (gdb) where > > #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 > > Backtrace stopped: Cannot access memory at address 0x7fffffffdd08 > > (gdb) q > > > https://lists.freebsd.org/pipermail/freebsd-stable/2016-June/084877.html > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAH7qZfvKt7b__M_tM9eBD7VjxbaAQPj5kgurrkFkY36eR3qrAg>