Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 4 Jul 2016 22:26:25 -0700
From:      Maxim Sobolev <sobomax@freebsd.org>
To:        stable@freebsd.org, hackers@freebsd.org
Subject:   A faulty program corrupts some its data preventing correct core generation (Failed to write core file for process postgres (error 14))
Message-ID:  <CAH7qZfu=XveZCAgS0%2BdzQ_jLs9JiktEV3rER88gwqTiW_Fc9dg@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hi all, investigating some random postgresql-9.1.21 server crashes on
FreeBSD 10.3, we've started seeing those after upgrading from postgres
9.1.18 on more than one system, so hardware (e.g. RAM issues) are very
unlikely. I suspect that postgres is at fault, however I am also curious
how could it be that kernel is not capable of generating core file when
application does something silly? Is it that some ELF-related data
structures got corrupted or something else? Are we protecting the page
where ELF header is mapped with R/O flag? I am looking at possibly
recreating this by poking around elf header(s), seeing if I can corrupt it
in a similar manner reliably, any pointers or suggestions are appreciated.

Jun 27 04:10:18 dal12 kernel: Failed to write core file for process
postgres (error 14)
Jun 27 04:10:18 dal12 kernel: pid 41361 (postgres), uid 70: exited on
signal 11
Jul  1 05:21:46 dal12 kernel: Failed to write core file for process
postgres (error 14)
Jul  1 05:21:46 dal12 kernel: pid 1722 (postgres), uid 70: exited on signal
11

#define EFAULT          14              /* Bad address */

The resulting files are truncated and is not really usable for anything.
We've seen the same issue

-rw-------    1 pgsql     wheel     1310720 Jun 27 04:10 postgres.41361.core
-rw-------    1 pgsql     wheel     1310720 Jul  1 05:21 postgres.1722.core

[ssp-root@dal12 /var/tmp]$ sudo gdb711 postgres postgres.1722.core
GNU gdb (GDB) 7.11 [GDB v7.11 for FreeBSD]
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd10.3".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from postgres...(no debugging symbols found)...done.
BFD: Warning: /var/tmp/postgres.1722.core is truncated: expected core file
size >= 517120000, found: 1310720.
[New LWP 100261]
Core was generated by `postgres'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000800cfba67 in ?? () from /lib/libthr.so.3
(gdb) where
#0  0x0000000800cfba67 in ?? () from /lib/libthr.so.3
Backtrace stopped: Cannot access memory at address 0x7fffffffdd08
(gdb) q

-Max



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAH7qZfu=XveZCAgS0%2BdzQ_jLs9JiktEV3rER88gwqTiW_Fc9dg>