Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Jan 2019 20:31:10 +0100
From:      Dave Cottlehuber <dch@skunkwerks.at>
To:        freebsd-questions@freebsd.org
Subject:   repeated segfault of daemon after ~245 minutes of OS uptime, or multiples thereof (12.0Rp1 amd64)
Message-ID:  <1547148670.2385923.1631171800.2ED4526F@webmail.messagingengine.com>

next in thread | raw e-mail | index | archive | help
Does this rather unusual duration remind somebody of some periodic counter cycle? I'm stumped. AFAICT the failure is periodic based on the host OS boot time, and *not* the runtime, but I'm not 100% sure on that yet.

Per subject, round about 245 minutes after host boot, and repeatedly after that at the same interval (+- a couple of minutes), a jailed erlang runtime (databases/couchdb2) segfaults, on multiple systems. All are low end 8 core non-HT x86_64 arch atom CPUs C2750 @ with 8GB RAM, and are well within normal limits for cpu, ram, disk io. 

I've looked at 245 minutes in hex & binary, as minutes, seconds, milli- and micro-, and none of these resemble some sort of nibble-aligned counter that might conceivably overflow.

All ports are built via a custom poudriere stack, albeit not very far off standard ports - we need a handful of custom settings and packages. URL below has details.

sysctls are quite a few but are largely identical to those I run elsewhere, admittedly on larger boxes.

networking is internet facing BGP and a private fc00/7 IPv6 vpn for the cluster nodes to communicate.

lldb stacktraces and further notes at https://hackmd.io/elgRy4IWSR-FhViJDXHbDA

Also, my lldb skills are limited, if somebody can recommend a resource to bone up that would be awesome.

A+
Dave



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1547148670.2385923.1631171800.2ED4526F>