Date: Wed, 14 May 2008 13:14:41 +0300 From: Mikolaj Golub <to.my.trociny@gmail.com> To: <freebsd-hackers@freebsd.org> Subject: Re: Socket leak Message-ID: <81zlqtfazy.fsf@zhuzha.ua1> In-Reply-To: <482A2639.7000401@datapipe.com> (Mark Saad's message of "Tue\, 13 May 2008 19\:37\:29 -0400") References: <482A2639.7000401@datapipe.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 13 May 2008 19:37:29 -0400 Mark Saad wrote: MS> I started logging the values of kern.ipc.numopensockets and I noticed MS> that something is leaking sockets. Here is a sample of the log MS> 2008-04-29--15:04.10 ____ kern.ipc.numopensockets: 1501 MS> 2008-04-29--16:04.01 ____ kern.ipc.numopensockets: 1535 MS> 2008-04-29--17:04.00 ____ kern.ipc.numopensockets: 1617 MS> 2008-04-29--18:04.00 ____ kern.ipc.numopensockets: 1710 MS> This continues until kern.ipc.maxsockets its reached or the box is MS> rebooted. MS> The other thing we looked at was the output from vmstat -z MS> The first thing was the high amount of malloc 128 bucket failures MS> 128 Bucket: 524, 0, 2489, 80, 8364, 23055239 MS> I also logged the mbuf clusters, we never reached the max mbuf clusters MS> Its almost like there are stale sockets. Here is a snapshot of the server now MS> ewr# sockstat -4u |wc -l MS> 139 MS> ewr# sysctl kern.ipc.numopensockets MS> kern.ipc.numopensockets: 13935 MS> ewr# uptime MS> 7:30PM up 6 days, 26 mins, 3 users, load averages: 0.18, 0.25, 0.17 We had the same problem on one of hosts running 6.2-RELEASE-p11. The situation was complicated by the fact that I didn't have root access to the host and there were problems with getting more debugging or running tcpdump. Eventually, it appeared that problem was caused by proftpd. One of our clients connected to ftp server every five minutes looking for new file to download. When there was the file everything was good. But when there wasn't, some strange things happened. In proftpd logs we had: FTP session opened. mod_delay/0.5: delaying for 28 usecs user fake authenticated by mod_auth_pam.c USER fake: Login successful. Preparing to chroot to directory '/var/ftp/fake' Environment successfully chroot()ed. mod_delay/0.5: delaying for 621 usecs Entering Passive Mode (XX,YY,ZZ,213,241,70). FTP session closed. i.e. the client connected to server, had login successful, created new DATA connection in passive mode and then exited. But although proftpd reported that connection closed and proftpd process exited we still had this orphaned connection in our system reported by netstat in ESTABLISHED state. sockstat did not display this connections. Some of these connections could be in CLOSE_WAIT mode instead of ESTABLISHED. Such connection was seen by netstat for several hours and then disappeared but I suspect that the socket buffer was not freed and numopensockets counter did not decrease. Unfortunately, I did not managed to persuade admin to increase DebugLevel in proftpd.conf and run tcpdump to investigate more what was going on. It turned out that we had proftpd built for FREEBSD5_4: Compile-time Settings: Version: 1.3.0 Platform: FREEBSD5 (FREEBSD5_4) Built With: configure --localstatedir=/var/run --sysconfdir=/usr/local/etc --disable-sendfile --disable-ipv6 --with-modules=mod_ratio:mod_readme:mod_rewrite:mod_wrap:mod_ifsession --prefix=/usr/local i386-portbld-freebsd5.4 Upgrade to more recent proftpd built for proper platform resolved the problem. So I would recommend to look for process that could cause this leak. In my case careful investigation of netstat output history and comparing with sockstat output helped to find guilty. May be it would help to restart daemons one by one and see if sockets are freed. You can surely increase kern.ipc.maxsockets as workaround until you identify what causes the problem. -- Mikolaj Golub
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?81zlqtfazy.fsf>