Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 Jun 2003 03:00:23 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Varshavchick Alexander <alex@metrocom.ru>
Cc:        Lev Walkin <vlm@netli.lan>
Subject:   Re: How to delete unix socket entries
Message-ID:  <3EF972B7.D03BC818@mindspring.com>
References:  <Pine.GSO.4.33.0306251225020.27984-100000@apache.metrocom.ru>

index | next in thread | previous in thread | raw e-mail

Varshavchick Alexander wrote:
> On Wed, 25 Jun 2003, Lev Walkin wrote:
> > So, the only sane method of removing them would be to find the parent of
> > these processes and kill -9 it, as suggested by Terry.
> 
> But we're talking not about process, but about data structures which
> netstat reports to be active and connected with the above mentioned stream
> socket file:

You have to understand how resource tracking works.

When a process dies, it gives all the resources back to the
system, including references on open files.

A UNIX domain socket is an open file reference.

As I said, there *may* be a bug; if there is, it has to do
with the fact that both endpoints are on the same system,
and therefore they are in a deadly embrace, probably with
data outstanding.

I notice from your last post that I was correct in my
assumption that there was outstanding data in the sockets
(I see 17 bytes each in the Recv-Q of the last two you
posted about).

It's possible that the one endpoint is waiting for the
other to drain the socket, and the other end is waiting for
a close and is in half close state.

The correct fix for this is to disable keepalives on UNIX
domain sockets.  If this is what's happening, then doing
the floowing *BEFORE* you get into this wedged state may
help:

	sysctl net.inet.tcp.always_keepalive=0

Alternately, you could explicitly disable it with a call to
setsockopt() in the source code of the problem program, and
recompile it.

As I said before, it this is the case, then the root cause
is likely that both endpoints are on the same machine, and/or
in the same program (are you talking to yourself using this
as a means of communicating between threads?  That could
easily shoot you in the foot this way...).

If you want more help, you will have to either post a very
short program that can duplicate the error (e.g. ~10 lines),
or you will have to post a URL for a longer program that can
do it (so you don't SPAM the list with a 5M tarball).

-- Terry


help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3EF972B7.D03BC818>