From owner-freebsd-stable@FreeBSD.ORG Tue May 8 13:14:37 2007 Return-Path: X-Original-To: freebsd-stable@FreeBSD.ORG Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E876F16A403; Tue, 8 May 2007 13:14:36 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [83.120.8.8]) by mx1.freebsd.org (Postfix) with ESMTP id 5F06713C45A; Tue, 8 May 2007 13:14:36 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (mrwvmz@localhost [127.0.0.1]) by lurza.secnetix.de (8.13.4/8.13.4) with ESMTP id l48DEUDY084405; Tue, 8 May 2007 15:14:35 +0200 (CEST) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.13.4/8.13.1/Submit) id l48DETdC084404; Tue, 8 May 2007 15:14:29 +0200 (CEST) (envelope-from olli) Date: Tue, 8 May 2007 15:14:29 +0200 (CEST) Message-Id: <200705081314.l48DETdC084404@lurza.secnetix.de> From: Oliver Fromme To: freebsd-stable@FreeBSD.ORG, scrappy@FreeBSD.ORG In-Reply-To: X-Newsgroups: list.freebsd-stable User-Agent: tin/1.8.2-20060425 ("Shillay") (UNIX) (FreeBSD/4.11-STABLE (i386)) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.1.2 (lurza.secnetix.de [127.0.0.1]); Tue, 08 May 2007 15:14:35 +0200 (CEST) Cc: Subject: Re: Socket leak (Was: Re: What triggers "No Buffer Space) ?Available"? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: freebsd-stable@FreeBSD.ORG, scrappy@FreeBSD.ORG List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2007 13:14:37 -0000 Marc G. Fournier wrote: > Oliver Fromme wrote: > > If I remember correctly, you wrote that 11k sockets are > > in use with 90 jails. That's about 120 sockets per jail, > > which isn't out of the ordinary. Of course it depends on > > what is running in those jails, but my guess is that you > > just need to increase the limit on the number of sockets > > (i.e. kern.ipc.maxsockets). > > The problem is that if I compare it to another server, running 2/3 as > many jails, I'm finding its using 1/4 as many sockets, after over 60 > days of uptime: > > kern.ipc.numopensockets: 3929 > kern.ipc.maxsockets: 12328 What kind of jails are those? What applications are running inside them? It's quite possible that the processes on one machine use 120 sockets per jail, while on a different machine they use only half that many per jail, on average. Of course, I can't tell for sure without knowing what is running in those jails. > But, let's try what I think it was Matt suggested ... Yes, that was a good suggestion. > right now, I'm at just over 11k sockets on that machine, so I'm going > to shutdown everything except bare minimum server (all jails shut > off) and see where sockets drop to after that ... > > I'm down to ~7400 sockets: > > kern.ipc.numopensockets: 7400 > kern.ipc.maxsockets: 12328 > > ps looks like: > > mars# ps aux > USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND > [kernel threads omitted] > root 1 0.0 0.0 768 232 ?? ILs Sat12PM 3:22.01 /sbin/init -- > root 480 0.0 0.0 528 244 ?? Is Sat12PM 0:04.32 /sbin/devd > root 539 0.0 0.0 1388 848 ?? Ss Sat12PM 0:07.21 /usr/sbin/syslogd -l /var/run/log -l /var/named/var/run/log -s -s > daemon 708 0.0 0.0 1316 748 ?? Ss Sat12PM 0:02.49 /usr/sbin/rwhod > root 749 0.0 0.0 3532 1824 ?? Is Sat12PM 0:07.60 /usr/sbin/sshd > root 768 0.0 0.0 1412 920 ?? Is Sat12PM 0:02.23 /usr/sbin/cron -s > root 2087 0.0 0.0 2132 1360 ?? Ss Sat01PM 0:04.73 screen -R > root 88103 0.0 0.1 6276 2600 ?? Ss 11:41PM 0:00.62 sshd: root@ttyp0 (sshd) > root 91218 0.0 0.1 6276 2664 ?? Ss 11:49PM 0:00.24 sshd: root@ttyp4 (sshd) > root 813 0.0 0.0 1352 748 v0 Is+ Sat12PM 0:00.00 /usr/libexec/getty Pc ttyv0 > root 88106 0.0 0.1 5160 2516 p0 Ss 11:41PM 0:00.20 -tcsh (tcsh) > root 97563 0.0 0.0 1468 804 p0 R+ 12:17AM 0:00.00 ps aux > root 2088 0.0 0.1 5352 2368 p2 Is+ Sat01PM 0:00.03 /bin/tcsh > root 2112 0.0 0.1 5220 2360 p3 Ss+ Sat01PM 0:00.04 /bin/tcsh > root 91221 0.0 0.1 5140 2440 p4 Ss+ 11:49PM 0:00.12 -tcsh (tcsh) I don't think those processes should consume 7400 sockets. Indeed, this really looks like a leak in the kernel. > And netstat -n -funix shows 7355 lines similar to: > > d05f1000 stream 0 0 0 d05f1090 0 0 > d05f1090 stream 0 0 0 d05f1000 0 0 > cf1be000 stream 0 0 0 cf1bdea0 0 0 > cf1bdea0 stream 0 0 0 cf1be000 0 0 > cec42bd0 stream 0 0 0 cf2ac480 0 0 > cf2ac480 stream 0 0 0 cec42bd0 0 0 > > with the final few associated with running processes: How do you determine that? You _cannot_ tell from netstat which sockets are associated with running processes. > I'm willing to shut everthing down like this again the next time it happens (in > 2-3 days) if someone has some other command / output they'd like fo rme to > provide the output of? Maybe "sockstat -u" and/or "fstat | grep -w local" (both of those commands should basically list the same kind of information). My guess is that the output will be rather short, i.e. much shorter than 7355 lines. If that's true, it is another indication that the problem is caused by a kernel leak. > And, I have the following outputs as of the above, where everythign is shutdown > and its running on minimal processes: > > # ls -lt > total 532 > - -rw-r--r-- 1 root wheel 11142 May 8 00:20 fstat.out > - -rw-r--r-- 1 root wheel 742 May 8 00:20 netstat_m.out > - -rw-r--r-- 1 root wheel 486047 May 8 00:20 netstat_na.out > - -rw-r--r-- 1 root wheel 735 May 8 00:20 sockstat.out ^^^ Aha. :-) Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "C++ is the only current language making COBOL look good." -- Bertrand Meyer