Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Nov 2015 12:46:42 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-threads@FreeBSD.org
Subject:   [Bug 204426] Processes terminating cannot access memory
Message-ID:  <bug-204426-16@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204426

            Bug ID: 204426
           Summary: Processes terminating cannot access memory
           Product: Base System
           Version: 10.2-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: threads
          Assignee: freebsd-threads@FreeBSD.org
          Reporter: rblayzor@inoc.net

When upgrading from 10.1-RELEASE to 10.2-RELEASE we have noticed processes
randomly terminating. Signal 11 seg fault.

This has been an on-going issue and cannot track this down to any particular
bug. With -p7 covering FreeBSD-EN-15:20.vm, we thought that maybe this was the
issue we were seeing. However, -p7 did not fix the problem.

Our environment is several FreeBSD 10.2 amd64 VM's running a multi-mail server
with Exim and Dovecot. Hypervisor is VMware ESXi 5.1.

Several times a day (or sometimes not at all) Exim and/or Dovecot will exit
with signal 11 with similar back traces. Usually from a function in libc or
libthr. A couple of example back traces below.

Occasionally (although rare) we've seen other processes crash with similar
"cannot access memory" seg faults. However, we've not yet seen this since -p7
though we are monitoring closely.



Exim backtrace:

#0  0x000000080119e4b6 in pthread_suspend_all_np () from /lib/libthr.so.3
[New Thread 803006400 (LWP 100098/<unknown>)]
(gdb) bt
#0  0x000000080119e4b6 in pthread_suspend_all_np () from /lib/libthr.so.3
#1  0x00000008011a126a in pthread_getspecific () from /lib/libthr.so.3
#2  0x00000008011a5c96 in __pthread_cxa_finalize () from /lib/libthr.so.3
#3  0x0000000000423536 in daemon_go ()
#4  0x0000000000438ee9 in main ()



Dovecot backtrace:

#0  0x000000080061b6bc in _rtld_is_dlopened () from /libexec/ld-elf.so.1
(gdb) bt
#0  0x000000080061b6bc in _rtld_is_dlopened () from /libexec/ld-elf.so.1
#1  0x000000080061b2ab in _rtld_is_dlopened () from /libexec/ld-elf.so.1
#2  0x0000000800614c8d in _r_debug_postinit () from /libexec/ld-elf.so.1
#3  0x000000080061246d in .text () from /libexec/ld-elf.so.1
#4  0x000000000040abf8 in service_process_create ()
#5  0x000000000040a38a in services_monitor_reap_children ()
#6  0x00000008008c2bd3 in io_loop_call_io () from
/usr/local/lib/dovecot/libdovecot.so.0
#7  0x00000008008c46ef in io_loop_handler_run_internal ()
   from /usr/local/lib/dovecot/libdovecot.so.0
#8  0x00000008008c30d4 in io_loop_handler_run () from
/usr/local/lib/dovecot/libdovecot.so.0
#9  0x00000008008c2eb8 in io_loop_run () from
/usr/local/lib/dovecot/libdovecot.so.0
#10 0x000000080085f1d8 in master_service_run () from
/usr/local/lib/dovecot/libdovecot.so.0
#11 0x0000000000406512 in main ()



After checking in on the Exim and Dovecot communities, all indications are that
this seems to be library based in some way.

We have tried doing fresh installs of the base OS/VM's from a fresh SVN co and
complete world rebuild. However, the problem still persists....



Sysctl vars:

kern.corefile=/var/tmp/%N.core
kern.timecounter.hardware=ACPI-fast
#
kern.ipc.maxsockbuf=2097152
kern.ipc.somaxconn=2048
kern.maxfiles=65536
kern.maxfilesperproc=16384
net.inet.tcp.sendspace=131072
net.inet.tcp.recvspace=131072
net.inet.udp.recvspace=131072
net.inet.udp.maxdgram=16384
#
net.inet.tcp.msl=7500
net.inet.tcp.fast_finwait2_recycle=1
net.inet.icmp.log_redirect=0
net.inet.icmp.drop_redirect=1
net.inet.tcp.delayed_ack=0
net.inet.ip.redirect=0
net.inet6.ip6.redirect=0
net.link.ether.inet.log_arp_wrong_iface=0
kern.sugid_coredump=1
#
net.inet.tcp.keepidle=60000
net.inet.tcp.keepintvl=10000

-- 
You are receiving this mail because:
You are the assignee for the bug.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-204426-16>