Date: Mon, 4 Jul 2016 22:26:25 -0700 From: Maxim Sobolev <sobomax@freebsd.org> To: stable@freebsd.org, hackers@freebsd.org Subject: A faulty program corrupts some its data preventing correct core generation (Failed to write core file for process postgres (error 14)) Message-ID: <CAH7qZfu=XveZCAgS0%2BdzQ_jLs9JiktEV3rER88gwqTiW_Fc9dg@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hi all, investigating some random postgresql-9.1.21 server crashes on FreeBSD 10.3, we've started seeing those after upgrading from postgres 9.1.18 on more than one system, so hardware (e.g. RAM issues) are very unlikely. I suspect that postgres is at fault, however I am also curious how could it be that kernel is not capable of generating core file when application does something silly? Is it that some ELF-related data structures got corrupted or something else? Are we protecting the page where ELF header is mapped with R/O flag? I am looking at possibly recreating this by poking around elf header(s), seeing if I can corrupt it in a similar manner reliably, any pointers or suggestions are appreciated. Jun 27 04:10:18 dal12 kernel: Failed to write core file for process postgres (error 14) Jun 27 04:10:18 dal12 kernel: pid 41361 (postgres), uid 70: exited on signal 11 Jul 1 05:21:46 dal12 kernel: Failed to write core file for process postgres (error 14) Jul 1 05:21:46 dal12 kernel: pid 1722 (postgres), uid 70: exited on signal 11 #define EFAULT 14 /* Bad address */ The resulting files are truncated and is not really usable for anything. We've seen the same issue -rw------- 1 pgsql wheel 1310720 Jun 27 04:10 postgres.41361.core -rw------- 1 pgsql wheel 1310720 Jul 1 05:21 postgres.1722.core [ssp-root@dal12 /var/tmp]$ sudo gdb711 postgres postgres.1722.core GNU gdb (GDB) 7.11 [GDB v7.11 for FreeBSD] Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html > This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd10.3". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from postgres...(no debugging symbols found)...done. BFD: Warning: /var/tmp/postgres.1722.core is truncated: expected core file size >= 517120000, found: 1310720. [New LWP 100261] Core was generated by `postgres'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 (gdb) where #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 Backtrace stopped: Cannot access memory at address 0x7fffffffdd08 (gdb) q -Max
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAH7qZfu=XveZCAgS0%2BdzQ_jLs9JiktEV3rER88gwqTiW_Fc9dg>