Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 09 Dec 2012 00:10:05 -0800
From:      Alfred Perlstein <bright@mu.org>
To:        Richard Sharpe <rsharpe@richardsharpe.com>
Cc:        freebsd-hackers@freebsd.org, Andre Oppermann <andre@freebsd.org>
Subject:   Re: Possible obscure socket leak when system under load and listener is slow to accept
Message-ID:  <50C4475D.9020300@mu.org>
In-Reply-To: <1355015131.6752.12.camel@localhost.localdomain>
References:  <50C3D22D.3060008@freebsd.org> <1355015131.6752.12.camel@localhost.localdomain>

next in thread | previous in thread | raw e-mail | index | archive | help
On 12/8/12 5:05 PM, Richard Sharpe wrote:
> On Sun, 2012-12-09 at 00:50 +0100, Andre Oppermann wrote:
>>> Hi folks,
>>>
>>> Our QA group (at xxx) using Samba and smbtorture has been seeing a
>>> lot of cases where accept returns ECONNABORTED because the system load
>>> is high and Samba has a large listen backlog.
>>>
>>> Every now and then we get a crash in smbd or in winbindd and winbindd
>>> complains of too many open files in the system.
>>>
>>> In looking at kern_accept, it seems to me that FreeBSD can leak a socket
>>> when kern_accept calls soaccept on it but gets ECONNABORTED. This error
>>> is the only error returned from tcp_usr_accept.
>>>
>>> It seems like the socket taken off so_comp is never freed in this case
>>> and that there has been a call on soref on it as well, so that something
>>> like the following is needed in the error path:
>>>
>>> ==== //some-path/freebsd/sys/kern/uipc_syscalls.c#1
>>> - /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c ====
>>> @@ -433,6 +433,14 @@
>>>                   */
>>>                  if (name)
>>>                          *namelen = 0;
>>> +               /*
>>> +                * We need to close the socket we unlinked
>>> +                * so we do not leak it.
>>> +                */
>>> +               ACCEPT_LOCK();
>>> +               SOCK_LOCK(so);
>>> +               soclose(so);
>>>                  goto noconnection;
>>>          }
>>>          if (sa == NULL) {
>>>
>>> I think an soclose is needed at this point because soisconnected has
>>> been called on the socket.
>>>
>>> Do you think this analysis is reasonable?
>>   >
>>> We are using FreeBSD 8.0 but it seems the same is true for 9.0. However,
>>> maybe I am wrong since I am not sure if the fdclose call would free the
>>> socket, but a quick look suggested that it doesn't.
>> The fdclose should properly tear down the file descriptor.  The call
>> graph is: fdclose() -> fdrop() -> _fdrop() -> fo_close()/soo_close() ->
>> soclose() -> sorele() -> sofree() -> sodealloc().
>>
>> A socket leak would not count against "kern.maxfiles" unless the file
>> descriptor leaks as well.  So it is unlikely that this is the problem.
> OK, thanks for the feedback. I will keep looking.
>
>> Samba may open a large number of files (real files and sockets) and
>> you may run into the maxfiles limit.  You can check the limit with
>> "sysctl kern.maxfiles" and increase it at boot time in boot/loader.conf
>> with "kern.maxfiles=100000" for example.
> Well, some of the smbds are dying, but it is possible that there is a
> file leak in Samba or our VFS that we are tripping as well.

lsof and sockstat can be helpful.  lsof may be able to help determine if 
there's a leak because it MAY will find sockets not associated with a 
process.

Hope this helps.

-Alfred




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50C4475D.9020300>