From owner-freebsd-hackers@FreeBSD.ORG Sun Dec 9 08:10:06 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CD9D29C0; Sun, 9 Dec 2012 08:10:06 +0000 (UTC) (envelope-from bright@mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id AE0DB8FC16; Sun, 9 Dec 2012 08:10:06 +0000 (UTC) Received: from Alfreds-MacBook-Pro-6.local (c-67-180-208-218.hsd1.ca.comcast.net [67.180.208.218]) by elvis.mu.org (Postfix) with ESMTPSA id 285521A3C78; Sun, 9 Dec 2012 00:10:05 -0800 (PST) Message-ID: <50C4475D.9020300@mu.org> Date: Sun, 09 Dec 2012 00:10:05 -0800 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Richard Sharpe Subject: Re: Possible obscure socket leak when system under load and listener is slow to accept References: <50C3D22D.3060008@freebsd.org> <1355015131.6752.12.camel@localhost.localdomain> In-Reply-To: <1355015131.6752.12.camel@localhost.localdomain> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-hackers@freebsd.org, Andre Oppermann X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Dec 2012 08:10:06 -0000 On 12/8/12 5:05 PM, Richard Sharpe wrote: > On Sun, 2012-12-09 at 00:50 +0100, Andre Oppermann wrote: >>> Hi folks, >>> >>> Our QA group (at xxx) using Samba and smbtorture has been seeing a >>> lot of cases where accept returns ECONNABORTED because the system load >>> is high and Samba has a large listen backlog. >>> >>> Every now and then we get a crash in smbd or in winbindd and winbindd >>> complains of too many open files in the system. >>> >>> In looking at kern_accept, it seems to me that FreeBSD can leak a socket >>> when kern_accept calls soaccept on it but gets ECONNABORTED. This error >>> is the only error returned from tcp_usr_accept. >>> >>> It seems like the socket taken off so_comp is never freed in this case >>> and that there has been a call on soref on it as well, so that something >>> like the following is needed in the error path: >>> >>> ==== //some-path/freebsd/sys/kern/uipc_syscalls.c#1 >>> - /home/rsharpe/dev-src/packages/freebsd/sys/kern/uipc_syscalls.c ==== >>> @@ -433,6 +433,14 @@ >>> */ >>> if (name) >>> *namelen = 0; >>> + /* >>> + * We need to close the socket we unlinked >>> + * so we do not leak it. >>> + */ >>> + ACCEPT_LOCK(); >>> + SOCK_LOCK(so); >>> + soclose(so); >>> goto noconnection; >>> } >>> if (sa == NULL) { >>> >>> I think an soclose is needed at this point because soisconnected has >>> been called on the socket. >>> >>> Do you think this analysis is reasonable? >> > >>> We are using FreeBSD 8.0 but it seems the same is true for 9.0. However, >>> maybe I am wrong since I am not sure if the fdclose call would free the >>> socket, but a quick look suggested that it doesn't. >> The fdclose should properly tear down the file descriptor. The call >> graph is: fdclose() -> fdrop() -> _fdrop() -> fo_close()/soo_close() -> >> soclose() -> sorele() -> sofree() -> sodealloc(). >> >> A socket leak would not count against "kern.maxfiles" unless the file >> descriptor leaks as well. So it is unlikely that this is the problem. > OK, thanks for the feedback. I will keep looking. > >> Samba may open a large number of files (real files and sockets) and >> you may run into the maxfiles limit. You can check the limit with >> "sysctl kern.maxfiles" and increase it at boot time in boot/loader.conf >> with "kern.maxfiles=100000" for example. > Well, some of the smbds are dying, but it is possible that there is a > file leak in Samba or our VFS that we are tripping as well. lsof and sockstat can be helpful. lsof may be able to help determine if there's a leak because it MAY will find sockets not associated with a process. Hope this helps. -Alfred