Date: Thu, 23 Feb 2017 17:23:33 +0000 From: bugzilla-noreply@freebsd.org To: freebsd-ports-bugs@FreeBSD.org Subject: [Bug 217313] net/libzmq4: EHOSTDOWN from getsockopt must not cause assertion abort; causes SaltStack crashes Message-ID: <bug-217313-13@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D217313 Bug ID: 217313 Summary: net/libzmq4: EHOSTDOWN from getsockopt must not cause assertion abort; causes SaltStack crashes Product: Ports & Packages Version: Latest Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: Individual Port(s) Assignee: koobs@FreeBSD.org Reporter: Mark.Martinec@ijs.si Assignee: koobs@FreeBSD.org Flags: maintainer-feedback?(koobs@FreeBSD.org) Created attachment 180246 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D180246&action= =3Dedit Adds a missing "errno =3D=3D EHOSTDOWN" exemption to assert Environment: 11.0-RELEASE-p7 amd64, zeromq-4.1.5, py27-salt-2016.11.2 . For some time I have been observing instability of a running salt-minion on several of our hosts: after a couple of days Salt master loses contact with most FreeBSD hosts, but Linux hosts remain connected. After such event one can find a coredump of a salt-minion process (written in python) on such systems, and manually restarting a salt_minion is needed. After adding some debugging, and capturing stdout and stderr of a non-daemonized salt-minion process, the following is the last reported line: Host is down (src/tcp_connecter.cpp:359) So it seems there was some minor network glitch or outage, and libzmq decided to abort the process with an assertion, which is supposed to exempt all usual network socket -related problems and catch only a potential application problem. The problem is that the exemption list is missing the EHOSTDOWN error code, which should not be a cause of a process abort. The essential code snippet is shown here: /usr/ports/net/libzmq4/work/zeromq-4.1.5/src/tcp_connecter.cpp : zmq::fd_t zmq::tcp_connecter_t::connect () { // Async connect has finished. Check whether an error occurred int err =3D 0; socklen_t len =3D sizeof err; const int rc =3D getsockopt (s, SOL_SOCKET, SO_ERROR, (char*) &err, &le= n); // Assert if the error was caused by 0MQ bug. // Networking problems are OK. No need to assert. [...] // Following code should handle both Berkeley-derived socket // implementations and Solaris. if (rc =3D=3D -1) err =3D errno; if (err !=3D 0) { errno =3D err; errno_assert ( errno =3D=3D ECONNREFUSED || errno =3D=3D ECONNRESET || errno =3D=3D ETIMEDOUT || errno =3D=3D EHOSTUNREACH || + errno =3D=3D EHOSTDOWN || errno =3D=3D ENETUNREACH || errno =3D=3D ENETDOWN || errno =3D=3D EINVAL); return retired_fd; } So, please add the missing "errno =3D=3D EHOSTDOWN ||" exemption to the errno_assert list. A patch is included. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-217313-13>