From owner-freebsd-hackers@freebsd.org Thu Jul 7 08:31:05 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 92755B7686D for ; Thu, 7 Jul 2016 08:31:05 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "vps1.elischer.org", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 744C216F3 for ; Thu, 7 Jul 2016 08:31:05 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from Julian-MBP3.local (ppp121-45-236-103.lns20.per1.internode.on.net [121.45.236.103]) (authenticated bits=0) by vps1.elischer.org (8.15.2/8.15.2) with ESMTPSA id u678UxYA066256 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO) for ; Thu, 7 Jul 2016 01:31:02 -0700 (PDT) (envelope-from julian@freebsd.org) Subject: Re: A faulty program corrupts some its data preventing correct core generation (Failed to write core file for process postgres (error 14)) To: freebsd-hackers@freebsd.org References: <20160705114808.GN38613@kib.kiev.ua> From: Julian Elischer Message-ID: <39cd0468-8301-06eb-4363-a57b18c60dbb@freebsd.org> Date: Thu, 7 Jul 2016 16:30:54 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.1.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 08:31:05 -0000 On 5/07/2016 10:43 PM, Maxim Sobolev wrote: > Seems like candidate for the MFC into releng/10.3 and appropriate errata > entry? > > -Max quite possibly. it sounds like a problem that needs to be fixed. > > On Tue, Jul 5, 2016 at 4:48 AM, Konstantin Belousov > wrote: > >> On Mon, Jul 04, 2016 at 10:26:25PM -0700, Maxim Sobolev wrote: >>> Hi all, investigating some random postgresql-9.1.21 server crashes on >>> FreeBSD 10.3, we've started seeing those after upgrading from postgres >>> 9.1.18 on more than one system, so hardware (e.g. RAM issues) are very >>> unlikely. I suspect that postgres is at fault, however I am also curious >>> how could it be that kernel is not capable of generating core file when >>> application does something silly? Is it that some ELF-related data >>> structures got corrupted or something else? Are we protecting the page >>> where ELF header is mapped with R/O flag? I am looking at possibly >>> recreating this by poking around elf header(s), seeing if I can corrupt >> it >>> in a similar manner reliably, any pointers or suggestions are >> appreciated. >>> Jun 27 04:10:18 dal12 kernel: Failed to write core file for process >>> postgres (error 14) >>> Jun 27 04:10:18 dal12 kernel: pid 41361 (postgres), uid 70: exited on >>> signal 11 >>> Jul 1 05:21:46 dal12 kernel: Failed to write core file for process >>> postgres (error 14) >>> Jul 1 05:21:46 dal12 kernel: pid 1722 (postgres), uid 70: exited on >> signal >>> 11 >>> >>> #define EFAULT 14 /* Bad address */ >>> >>> The resulting files are truncated and is not really usable for anything. >>> We've seen the same issue >>> >>> -rw------- 1 pgsql wheel 1310720 Jun 27 04:10 >> postgres.41361.core >>> -rw------- 1 pgsql wheel 1310720 Jul 1 05:21 >> postgres.1722.core >>> [ssp-root@dal12 /var/tmp]$ sudo gdb711 postgres postgres.1722.core >>> GNU gdb (GDB) 7.11 [GDB v7.11 for FreeBSD] >>> Copyright (C) 2016 Free Software Foundation, Inc. >>> License GPLv3+: GNU GPL version 3 or later < >> http://gnu.org/licenses/gpl.html >>> This is free software: you are free to change and redistribute it. >>> There is NO WARRANTY, to the extent permitted by law. Type "show >> copying" >>> and "show warranty" for details. >>> This GDB was configured as "x86_64-portbld-freebsd10.3". >>> Type "show configuration" for configuration details. >>> For bug reporting instructions, please see: >>> . >>> Find the GDB manual and other documentation resources online at: >>> . >>> For help, type "help". >>> Type "apropos word" to search for commands related to "word"... >>> Reading symbols from postgres...(no debugging symbols found)...done. >>> BFD: Warning: /var/tmp/postgres.1722.core is truncated: expected core >> file >>> size >= 517120000, found: 1310720. >>> [New LWP 100261] >>> Core was generated by `postgres'. >>> Program terminated with signal SIGSEGV, Segmentation fault. >>> #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 >>> (gdb) where >>> #0 0x0000000800cfba67 in ?? () from /lib/libthr.so.3 >>> Backtrace stopped: Cannot access memory at address 0x7fffffffdd08 >>> (gdb) q >>> >> https://lists.freebsd.org/pipermail/freebsd-stable/2016-June/084877.html >> >> > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" >