From owner-freebsd-threads@freebsd.org Tue Nov 10 12:46:42 2015 Return-Path: Delivered-To: freebsd-threads@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A093AA2B43F for ; Tue, 10 Nov 2015 12:46:42 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8C3F11A7C for ; Tue, 10 Nov 2015 12:46:42 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tAACkgqN035603 for ; Tue, 10 Nov 2015 12:46:42 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-threads@FreeBSD.org Subject: [Bug 204426] Processes terminating cannot access memory Date: Tue, 10 Nov 2015 12:46:42 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: threads X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: rblayzor@inoc.net X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-threads@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Nov 2015 12:46:42 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204426 Bug ID: 204426 Summary: Processes terminating cannot access memory Product: Base System Version: 10.2-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: threads Assignee: freebsd-threads@FreeBSD.org Reporter: rblayzor@inoc.net When upgrading from 10.1-RELEASE to 10.2-RELEASE we have noticed processes randomly terminating. Signal 11 seg fault. This has been an on-going issue and cannot track this down to any particular bug. With -p7 covering FreeBSD-EN-15:20.vm, we thought that maybe this was the issue we were seeing. However, -p7 did not fix the problem. Our environment is several FreeBSD 10.2 amd64 VM's running a multi-mail server with Exim and Dovecot. Hypervisor is VMware ESXi 5.1. Several times a day (or sometimes not at all) Exim and/or Dovecot will exit with signal 11 with similar back traces. Usually from a function in libc or libthr. A couple of example back traces below. Occasionally (although rare) we've seen other processes crash with similar "cannot access memory" seg faults. However, we've not yet seen this since -p7 though we are monitoring closely. Exim backtrace: #0 0x000000080119e4b6 in pthread_suspend_all_np () from /lib/libthr.so.3 [New Thread 803006400 (LWP 100098/)] (gdb) bt #0 0x000000080119e4b6 in pthread_suspend_all_np () from /lib/libthr.so.3 #1 0x00000008011a126a in pthread_getspecific () from /lib/libthr.so.3 #2 0x00000008011a5c96 in __pthread_cxa_finalize () from /lib/libthr.so.3 #3 0x0000000000423536 in daemon_go () #4 0x0000000000438ee9 in main () Dovecot backtrace: #0 0x000000080061b6bc in _rtld_is_dlopened () from /libexec/ld-elf.so.1 (gdb) bt #0 0x000000080061b6bc in _rtld_is_dlopened () from /libexec/ld-elf.so.1 #1 0x000000080061b2ab in _rtld_is_dlopened () from /libexec/ld-elf.so.1 #2 0x0000000800614c8d in _r_debug_postinit () from /libexec/ld-elf.so.1 #3 0x000000080061246d in .text () from /libexec/ld-elf.so.1 #4 0x000000000040abf8 in service_process_create () #5 0x000000000040a38a in services_monitor_reap_children () #6 0x00000008008c2bd3 in io_loop_call_io () from /usr/local/lib/dovecot/libdovecot.so.0 #7 0x00000008008c46ef in io_loop_handler_run_internal () from /usr/local/lib/dovecot/libdovecot.so.0 #8 0x00000008008c30d4 in io_loop_handler_run () from /usr/local/lib/dovecot/libdovecot.so.0 #9 0x00000008008c2eb8 in io_loop_run () from /usr/local/lib/dovecot/libdovecot.so.0 #10 0x000000080085f1d8 in master_service_run () from /usr/local/lib/dovecot/libdovecot.so.0 #11 0x0000000000406512 in main () After checking in on the Exim and Dovecot communities, all indications are that this seems to be library based in some way. We have tried doing fresh installs of the base OS/VM's from a fresh SVN co and complete world rebuild. However, the problem still persists.... Sysctl vars: kern.corefile=/var/tmp/%N.core kern.timecounter.hardware=ACPI-fast # kern.ipc.maxsockbuf=2097152 kern.ipc.somaxconn=2048 kern.maxfiles=65536 kern.maxfilesperproc=16384 net.inet.tcp.sendspace=131072 net.inet.tcp.recvspace=131072 net.inet.udp.recvspace=131072 net.inet.udp.maxdgram=16384 # net.inet.tcp.msl=7500 net.inet.tcp.fast_finwait2_recycle=1 net.inet.icmp.log_redirect=0 net.inet.icmp.drop_redirect=1 net.inet.tcp.delayed_ack=0 net.inet.ip.redirect=0 net.inet6.ip6.redirect=0 net.link.ether.inet.log_arp_wrong_iface=0 kern.sugid_coredump=1 # net.inet.tcp.keepidle=60000 net.inet.tcp.keepintvl=10000 -- You are receiving this mail because: You are the assignee for the bug.