Date: Thu, 10 Jan 2019 20:31:10 +0100 From: Dave Cottlehuber <dch@skunkwerks.at> To: freebsd-questions@freebsd.org Subject: repeated segfault of daemon after ~245 minutes of OS uptime, or multiples thereof (12.0Rp1 amd64) Message-ID: <1547148670.2385923.1631171800.2ED4526F@webmail.messagingengine.com>
next in thread | raw e-mail | index | archive | help
Does this rather unusual duration remind somebody of some periodic counter cycle? I'm stumped. AFAICT the failure is periodic based on the host OS boot time, and *not* the runtime, but I'm not 100% sure on that yet. Per subject, round about 245 minutes after host boot, and repeatedly after that at the same interval (+- a couple of minutes), a jailed erlang runtime (databases/couchdb2) segfaults, on multiple systems. All are low end 8 core non-HT x86_64 arch atom CPUs C2750 @ with 8GB RAM, and are well within normal limits for cpu, ram, disk io. I've looked at 245 minutes in hex & binary, as minutes, seconds, milli- and micro-, and none of these resemble some sort of nibble-aligned counter that might conceivably overflow. All ports are built via a custom poudriere stack, albeit not very far off standard ports - we need a handful of custom settings and packages. URL below has details. sysctls are quite a few but are largely identical to those I run elsewhere, admittedly on larger boxes. networking is internet facing BGP and a private fc00/7 IPv6 vpn for the cluster nodes to communicate. lldb stacktraces and further notes at https://hackmd.io/elgRy4IWSR-FhViJDXHbDA Also, my lldb skills are limited, if somebody can recommend a resource to bone up that would be awesome. A+ Dave
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1547148670.2385923.1631171800.2ED4526F>