From owner-freebsd-threads@freebsd.org Tue Nov 10 12:46:42 2015 Return-Path: Delivered-To: freebsd-threads@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A093AA2B43F for ; Tue, 10 Nov 2015 12:46:42 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8C3F11A7C for ; Tue, 10 Nov 2015 12:46:42 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tAACkgqN035603 for ; Tue, 10 Nov 2015 12:46:42 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-threads@FreeBSD.org Subject: [Bug 204426] Processes terminating cannot access memory Date: Tue, 10 Nov 2015 12:46:42 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: threads X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: rblayzor@inoc.net X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-threads@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Nov 2015 12:46:42 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204426 Bug ID: 204426 Summary: Processes terminating cannot access memory Product: Base System Version: 10.2-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: threads Assignee: freebsd-threads@FreeBSD.org Reporter: rblayzor@inoc.net When upgrading from 10.1-RELEASE to 10.2-RELEASE we have noticed processes randomly terminating. Signal 11 seg fault. This has been an on-going issue and cannot track this down to any particular bug. With -p7 covering FreeBSD-EN-15:20.vm, we thought that maybe this was the issue we were seeing. However, -p7 did not fix the problem. Our environment is several FreeBSD 10.2 amd64 VM's running a multi-mail server with Exim and Dovecot. Hypervisor is VMware ESXi 5.1. Several times a day (or sometimes not at all) Exim and/or Dovecot will exit with signal 11 with similar back traces. Usually from a function in libc or libthr. A couple of example back traces below. Occasionally (although rare) we've seen other processes crash with similar "cannot access memory" seg faults. However, we've not yet seen this since -p7 though we are monitoring closely. Exim backtrace: #0 0x000000080119e4b6 in pthread_suspend_all_np () from /lib/libthr.so.3 [New Thread 803006400 (LWP 100098/)] (gdb) bt #0 0x000000080119e4b6 in pthread_suspend_all_np () from /lib/libthr.so.3 #1 0x00000008011a126a in pthread_getspecific () from /lib/libthr.so.3 #2 0x00000008011a5c96 in __pthread_cxa_finalize () from /lib/libthr.so.3 #3 0x0000000000423536 in daemon_go () #4 0x0000000000438ee9 in main () Dovecot backtrace: #0 0x000000080061b6bc in _rtld_is_dlopened () from /libexec/ld-elf.so.1 (gdb) bt #0 0x000000080061b6bc in _rtld_is_dlopened () from /libexec/ld-elf.so.1 #1 0x000000080061b2ab in _rtld_is_dlopened () from /libexec/ld-elf.so.1 #2 0x0000000800614c8d in _r_debug_postinit () from /libexec/ld-elf.so.1 #3 0x000000080061246d in .text () from /libexec/ld-elf.so.1 #4 0x000000000040abf8 in service_process_create () #5 0x000000000040a38a in services_monitor_reap_children () #6 0x00000008008c2bd3 in io_loop_call_io () from /usr/local/lib/dovecot/libdovecot.so.0 #7 0x00000008008c46ef in io_loop_handler_run_internal () from /usr/local/lib/dovecot/libdovecot.so.0 #8 0x00000008008c30d4 in io_loop_handler_run () from /usr/local/lib/dovecot/libdovecot.so.0 #9 0x00000008008c2eb8 in io_loop_run () from /usr/local/lib/dovecot/libdovecot.so.0 #10 0x000000080085f1d8 in master_service_run () from /usr/local/lib/dovecot/libdovecot.so.0 #11 0x0000000000406512 in main () After checking in on the Exim and Dovecot communities, all indications are that this seems to be library based in some way. We have tried doing fresh installs of the base OS/VM's from a fresh SVN co and complete world rebuild. However, the problem still persists.... Sysctl vars: kern.corefile=/var/tmp/%N.core kern.timecounter.hardware=ACPI-fast # kern.ipc.maxsockbuf=2097152 kern.ipc.somaxconn=2048 kern.maxfiles=65536 kern.maxfilesperproc=16384 net.inet.tcp.sendspace=131072 net.inet.tcp.recvspace=131072 net.inet.udp.recvspace=131072 net.inet.udp.maxdgram=16384 # net.inet.tcp.msl=7500 net.inet.tcp.fast_finwait2_recycle=1 net.inet.icmp.log_redirect=0 net.inet.icmp.drop_redirect=1 net.inet.tcp.delayed_ack=0 net.inet.ip.redirect=0 net.inet6.ip6.redirect=0 net.link.ether.inet.log_arp_wrong_iface=0 kern.sugid_coredump=1 # net.inet.tcp.keepidle=60000 net.inet.tcp.keepintvl=10000 -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-threads@freebsd.org Wed Nov 11 14:50:12 2015 Return-Path: Delivered-To: freebsd-threads@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E9118A2B5F1 for ; Wed, 11 Nov 2015 14:50:12 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CDE261AE1 for ; Wed, 11 Nov 2015 14:50:12 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tABEoC9i038871 for ; Wed, 11 Nov 2015 14:50:12 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-threads@FreeBSD.org Subject: [Bug 200992] proccess won't die in thread_suspend_switch Date: Wed, 11 Nov 2015 14:50:12 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: threads X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: johan@300.nl X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-threads@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Nov 2015 14:50:13 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200992 johans changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |johan@300.nl --- Comment #15 from johans --- I've been seeing almost identical problems on 10.2-RELEASE machines which run under KVM virtualisation. I've had to reboot roughly 2 machines per day as a workaround due to processes like Apache, HAProxy and Postfix getting stuck and blocking any new instances starting up due to the listening sockets getting trapped with the stuck processes. Applying the patch of this PR fixed all our problems. Are there any concrete reasons / problems to not go forward with this patch? If not, I would like to suggest to get this committed with MFC to stable/10 and nominate this for an errata. This problem makes running 10.2-RELEASE highly problematic in virtualised environments with a decent workload. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-threads@freebsd.org Wed Nov 11 15:42:59 2015 Return-Path: Delivered-To: freebsd-threads@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4D377A2C59F for ; Wed, 11 Nov 2015 15:42:59 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 38A451092 for ; Wed, 11 Nov 2015 15:42:59 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tABFgxrZ020240 for ; Wed, 11 Nov 2015 15:42:59 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-threads@FreeBSD.org Subject: [Bug 200992] proccess won't die in thread_suspend_switch Date: Wed, 11 Nov 2015 15:42:59 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: threads X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: johan@300.nl X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-threads@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Nov 2015 15:42:59 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200992 --- Comment #16 from johans --- To give a little more context on how this patch is tested, I'm now running this patch on multiple machines in production, each running roughly an excess of a few hundred mbit. All functioning as load balancers. The first rollout of this patch has been running for a few days already without any issues. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-threads@freebsd.org Wed Nov 11 16:34:00 2015 Return-Path: Delivered-To: freebsd-threads@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 43D32A2C375 for ; Wed, 11 Nov 2015 16:34:00 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2FF1E1538 for ; Wed, 11 Nov 2015 16:34:00 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tABGY0uH055887 for ; Wed, 11 Nov 2015 16:34:00 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-threads@FreeBSD.org Subject: [Bug 200992] proccess won't die in thread_suspend_switch Date: Wed, 11 Nov 2015 16:33:59 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: threads X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: swills@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-threads@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Nov 2015 16:34:00 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200992 --- Comment #17 from Steve Wills --- (In reply to johans from comment #15) The delay may be partially my fault, as I hadn't responded to the request for feedback regarding the patch solving my original issue. I hadn't experienced the issue very frequently (not nearly as frequently as you) so even though I had been running my system with the patch for a while, I wasn't confident it had been resolved. That said, it has been a while now and I haven't seen the issue. Given how it has solved the issue for you as well, we should definitely look at getting it committed and merged, and perhaps an errata issued. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-threads@freebsd.org Thu Nov 12 07:50:55 2015 Return-Path: Delivered-To: freebsd-threads@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A678DA2D8C5 for ; Thu, 12 Nov 2015 07:50:55 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 931A41AC1 for ; Thu, 12 Nov 2015 07:50:55 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tAC7otXj044292 for ; Thu, 12 Nov 2015 07:50:55 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-threads@FreeBSD.org Subject: [Bug 200992] proccess won't die in thread_suspend_switch Date: Thu, 12 Nov 2015 07:50:55 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: threads X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: patch X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: linimon@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-threads@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: keywords Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Nov 2015 07:50:55 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200992 Mark Linimon changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |patch --- Comment #18 from Mark Linimon --- Adding a note from a recent posting to freebsd-stable@: From: Johan Schuijt-Li The patch attached [here] solved all our problems. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-threads@freebsd.org Thu Nov 12 08:29:27 2015 Return-Path: Delivered-To: freebsd-threads@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 06639A2C2FB for ; Thu, 12 Nov 2015 08:29:27 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E7EF01C0C for ; Thu, 12 Nov 2015 08:29:26 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tAC8TQNr057553 for ; Thu, 12 Nov 2015 08:29:26 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-threads@FreeBSD.org Subject: [Bug 200992] proccess won't die in thread_suspend_switch Date: Thu, 12 Nov 2015 08:29:27 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: threads X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: patch X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: johan@300.nl X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-threads@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Nov 2015 08:29:27 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200992 --- Comment #19 from johans --- (In reply to Mark Linimon from comment #18) Hey Mark, that would be me, I was already active in this PR ;) -- You are receiving this mail because: You are the assignee for the bug.