Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 23 Feb 2017 17:23:33 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-ports-bugs@FreeBSD.org
Subject:   [Bug 217313] net/libzmq4: EHOSTDOWN from getsockopt must not cause assertion abort; causes SaltStack crashes
Message-ID:  <bug-217313-13@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D217313

            Bug ID: 217313
           Summary: net/libzmq4: EHOSTDOWN from getsockopt must not cause
                    assertion abort; causes SaltStack crashes
           Product: Ports & Packages
           Version: Latest
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: Individual Port(s)
          Assignee: koobs@FreeBSD.org
          Reporter: Mark.Martinec@ijs.si
          Assignee: koobs@FreeBSD.org
             Flags: maintainer-feedback?(koobs@FreeBSD.org)

Created attachment 180246
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D180246&action=
=3Dedit
Adds a missing "errno =3D=3D EHOSTDOWN" exemption to assert

Environment: 11.0-RELEASE-p7 amd64, zeromq-4.1.5, py27-salt-2016.11.2 .

For some time I have been observing instability of a running salt-minion
on several of our hosts: after a couple of days Salt master loses
contact with most FreeBSD hosts, but Linux hosts remain connected.
After such event one can find a coredump of a salt-minion process
(written in python) on such systems, and manually restarting a
salt_minion is needed.

After adding some debugging, and capturing stdout and stderr of
a non-daemonized salt-minion process, the following is the last
reported line:

  Host is down (src/tcp_connecter.cpp:359)

So it seems there was some minor network glitch or outage, and
libzmq decided to abort the process with an assertion, which is
supposed to exempt all usual network socket -related problems
and catch only a potential application problem.

The problem is that the exemption list is missing the EHOSTDOWN
error code, which should not be a cause of a process abort.

The essential code snippet is shown here:


/usr/ports/net/libzmq4/work/zeromq-4.1.5/src/tcp_connecter.cpp :

zmq::fd_t zmq::tcp_connecter_t::connect ()
{
    //  Async connect has finished. Check whether an error occurred
    int err =3D 0;
    socklen_t len =3D sizeof err;

    const int rc =3D getsockopt (s, SOL_SOCKET, SO_ERROR, (char*) &err, &le=
n);

    //  Assert if the error was caused by 0MQ bug.
    //  Networking problems are OK. No need to assert.
[...]
    //  Following code should handle both Berkeley-derived socket
    //  implementations and Solaris.
    if (rc =3D=3D -1)
        err =3D errno;
    if (err !=3D 0) {
        errno =3D err;
        errno_assert (
            errno =3D=3D ECONNREFUSED ||
            errno =3D=3D ECONNRESET ||
            errno =3D=3D ETIMEDOUT ||
            errno =3D=3D EHOSTUNREACH ||
  +         errno =3D=3D EHOSTDOWN ||
            errno =3D=3D ENETUNREACH ||
            errno =3D=3D ENETDOWN ||
            errno =3D=3D EINVAL);
        return retired_fd;
    }


So, please add the missing "errno =3D=3D EHOSTDOWN ||" exemption
to the errno_assert list.

A patch is included.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-217313-13>