Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 09 Dec 2012 09:57:30 -0800
From:      Richard Sharpe <rsharpe@richardsharpe.com>
To:        Alfred Perlstein <bright@mu.org>
Cc:        freebsd-hackers@freebsd.org, Andre Oppermann <andre@freebsd.org>
Subject:   Re: Possible obscure socket leak when system under load and listener is slow to accept
Message-ID:  <1355075850.6752.15.camel@localhost.localdomain>
In-Reply-To: <50C4475D.9020300@mu.org>
References:  <50C3D22D.3060008@freebsd.org> <1355015131.6752.12.camel@localhost.localdomain>  <50C4475D.9020300@mu.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 2012-12-09 at 00:10 -0800, Alfred Perlstein wrote:
> On 12/8/12 5:05 PM, Richard Sharpe wrote:
> > On Sun, 2012-12-09 at 00:50 +0100, Andre Oppermann wrote:
> >>> Hi folks,
> >>>
> >>> Our QA group (at xxx) using Samba and smbtorture has been seeing a
> >>> lot of cases where accept returns ECONNABORTED because the system load
> >>> is high and Samba has a large listen backlog.
> >>>
> >>> Every now and then we get a crash in smbd or in winbindd and winbindd
> >>> complains of too many open files in the system.
> >>>
> >>> In looking at kern_accept, it seems to me that FreeBSD can leak a socket
> >>> when kern_accept calls soaccept on it but gets ECONNABORTED. This error
> >>> is the only error returned from tcp_usr_accept.
> >>>
> >>> It seems like the socket taken off so_comp is never freed in this case
> >>> and that there has been a call on soref on it as well, so that something
> >>> like the following is needed in the error path:
> >>>
> >>> ==== //some-path/freebsd/sys/kern/uipc_syscalls.c#1
> >>> - /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c ====
> >>> @@ -433,6 +433,14 @@
> >>>                   */
> >>>                  if (name)
> >>>                          *namelen = 0;
> >>> +               /*
> >>> +                * We need to close the socket we unlinked
> >>> +                * so we do not leak it.
> >>> +                */
> >>> +               ACCEPT_LOCK();
> >>> +               SOCK_LOCK(so);
> >>> +               soclose(so);
> >>>                  goto noconnection;
> >>>          }
> >>>          if (sa == NULL) {
> >>>
> >>> I think an soclose is needed at this point because soisconnected has
> >>> been called on the socket.
> >>>
> >>> Do you think this analysis is reasonable?
> >>   >
> >>> We are using FreeBSD 8.0 but it seems the same is true for 9.0. However,
> >>> maybe I am wrong since I am not sure if the fdclose call would free the
> >>> socket, but a quick look suggested that it doesn't.
> >> The fdclose should properly tear down the file descriptor.  The call
> >> graph is: fdclose() -> fdrop() -> _fdrop() -> fo_close()/soo_close() ->
> >> soclose() -> sorele() -> sofree() -> sodealloc().
> >>
> >> A socket leak would not count against "kern.maxfiles" unless the file
> >> descriptor leaks as well.  So it is unlikely that this is the problem.
> > OK, thanks for the feedback. I will keep looking.
> >
> >> Samba may open a large number of files (real files and sockets) and
> >> you may run into the maxfiles limit.  You can check the limit with
> >> "sysctl kern.maxfiles" and increase it at boot time in boot/loader.conf
> >> with "kern.maxfiles=100000" for example.
> > Well, some of the smbds are dying, but it is possible that there is a
> > file leak in Samba or our VFS that we are tripping as well.
> 
> lsof and sockstat can be helpful.  lsof may be able to help determine if 
> there's a leak because it MAY will find sockets not associated with a 
> process.
> 
> Hope this helps.

Thanks Alfred. After following through the call graph and confirming
(with the code) that it was correct, I am now pretty convinced that I
was wrong in assuming that it was a socket leak.

However, lsof will be useful in allowing me to see how many FDs each
smdb in this test is using. We have, I am told, kern.maxfiles set to
65536, which I think might be a little low for the test they are
running. 





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1355075850.6752.15.camel>