Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 5 Jul 2016 07:43:52 -0700
From:      Maxim Sobolev <sobomax@freebsd.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        stable@freebsd.org, hackers@freebsd.org
Subject:   Re: A faulty program corrupts some its data preventing correct core generation (Failed to write core file for process postgres (error 14))
Message-ID:  <CAH7qZfvKt7b__M_tM9eBD7VjxbaAQPj5kgurrkFkY36eR3qrAg@mail.gmail.com>
In-Reply-To: <20160705114808.GN38613@kib.kiev.ua>
References:  <CAH7qZfu=XveZCAgS0%2BdzQ_jLs9JiktEV3rER88gwqTiW_Fc9dg@mail.gmail.com> <20160705114808.GN38613@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Seems like candidate for the MFC into releng/10.3 and appropriate errata
entry?

-Max

On Tue, Jul 5, 2016 at 4:48 AM, Konstantin Belousov <kostikbel@gmail.com>
wrote:

> On Mon, Jul 04, 2016 at 10:26:25PM -0700, Maxim Sobolev wrote:
> > Hi all, investigating some random postgresql-9.1.21 server crashes on
> > FreeBSD 10.3, we've started seeing those after upgrading from postgres
> > 9.1.18 on more than one system, so hardware (e.g. RAM issues) are very
> > unlikely. I suspect that postgres is at fault, however I am also curious
> > how could it be that kernel is not capable of generating core file when
> > application does something silly? Is it that some ELF-related data
> > structures got corrupted or something else? Are we protecting the page
> > where ELF header is mapped with R/O flag? I am looking at possibly
> > recreating this by poking around elf header(s), seeing if I can corrupt
> it
> > in a similar manner reliably, any pointers or suggestions are
> appreciated.
> >
> > Jun 27 04:10:18 dal12 kernel: Failed to write core file for process
> > postgres (error 14)
> > Jun 27 04:10:18 dal12 kernel: pid 41361 (postgres), uid 70: exited on
> > signal 11
> > Jul  1 05:21:46 dal12 kernel: Failed to write core file for process
> > postgres (error 14)
> > Jul  1 05:21:46 dal12 kernel: pid 1722 (postgres), uid 70: exited on
> signal
> > 11
> >
> > #define EFAULT          14              /* Bad address */
> >
> > The resulting files are truncated and is not really usable for anything.
> > We've seen the same issue
> >
> > -rw-------    1 pgsql     wheel     1310720 Jun 27 04:10
> postgres.41361.core
> > -rw-------    1 pgsql     wheel     1310720 Jul  1 05:21
> postgres.1722.core
> >
> > [ssp-root@dal12 /var/tmp]$ sudo gdb711 postgres postgres.1722.core
> > GNU gdb (GDB) 7.11 [GDB v7.11 for FreeBSD]
> > Copyright (C) 2016 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html
> > >
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law.  Type "show
> copying"
> > and "show warranty" for details.
> > This GDB was configured as "x86_64-portbld-freebsd10.3".
> > Type "show configuration" for configuration details.
> > For bug reporting instructions, please see:
> > <http://www.gnu.org/software/gdb/bugs/>.
> > Find the GDB manual and other documentation resources online at:
> > <http://www.gnu.org/software/gdb/documentation/>.
> > For help, type "help".
> > Type "apropos word" to search for commands related to "word"...
> > Reading symbols from postgres...(no debugging symbols found)...done.
> > BFD: Warning: /var/tmp/postgres.1722.core is truncated: expected core
> file
> > size >= 517120000, found: 1310720.
> > [New LWP 100261]
> > Core was generated by `postgres'.
> > Program terminated with signal SIGSEGV, Segmentation fault.
> > #0  0x0000000800cfba67 in ?? () from /lib/libthr.so.3
> > (gdb) where
> > #0  0x0000000800cfba67 in ?? () from /lib/libthr.so.3
> > Backtrace stopped: Cannot access memory at address 0x7fffffffdd08
> > (gdb) q
> >
> https://lists.freebsd.org/pipermail/freebsd-stable/2016-June/084877.html
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAH7qZfvKt7b__M_tM9eBD7VjxbaAQPj5kgurrkFkY36eR3qrAg>