Date: Wed, 24 Dec 2003 11:21:29 -0500 (EST) From: Robert Watson <rwatson@freebsd.org> To: Oliver Brandmueller <ob@e-Gitt.NET> Cc: freebsd-current@freebsd.org Subject: Re: file descriptor leak in 5.2-RC Message-ID: <Pine.NEB.3.96L.1031224112102.66152D-100000@fledge.watson.org> In-Reply-To: <Pine.NEB.3.96L.1031224110359.66152B-100000@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
As a follow-up, it would also be interesting to know if you're using linux emulation or some other kernel emulation. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Senior Research Scientist, McAfee Research On Wed, 24 Dec 2003, Robert Watson wrote: > > On Wed, 24 Dec 2003, Oliver Brandmueller wrote: > > > Hi. > > > > I just started (by accident) a new thread regarding the same topic... > > Hmm. So this makes multiple reports, so we definitely have a problem. > Are you using any sort of threaded applications -- if so, which threading > packates are you using (linuxthreads, libc_r, libkse, et al). Do you know > if you're making use of /dev/fd/*, or /dev/std* in scripts on your system? > Do you have any reports of unusual process exits (via signals, etc)? If > you look at the output of lsof or fstat while the system is actively > running, it might be interesting to get a list of the kinds of sockets in > use. Somewhere, presumably we're slipping a file descriptor reference, > perhaps in a failure mode that turns up frequently in your environment. > Helping to identify what differentiates your environment from the ones > where this doesn't turn up may help track down the problem. The areas > I've asked you to look at above are "interesting" file descriptor handling > cases, and the problem might well be in one of these. > > Robert N M Watson FreeBSD Core Team, TrustedBSD Projects > robert@fledge.watson.org Senior Research Scientist, McAfee Research > > > On Sat, Dec 20, 2003 at 09:38:11PM +0100, Poul-Henning Kamp wrote: > > > In message <Pine.NEB.3.96L.1031220105954.46326Q-100000@fledge.watson.org>, Robe > > > rt Watson writes: > > > > > > >[...] so if we actually have a leak, > > > >fstat(8) should show a small number of files, but the sysctl > > > >kern.openfiles should reveal a large number of files open. > > > > > > sysctl kern.malloc | grep "file desc" ? > > > > I can with no problems reproduce this behaviour. > > > > The machine is a mail filtering server running exim, amavisd + > > SpamAssassin and ClamAV. I do have the machine currently in a testing > > environment and thus can do some experimentation. > > > > The machine gets the whole feed of messages we usually have (but just > > not delivers any mail back to the main servers after filtering). This > > means about 3-5 Mails per second going through the machine, which seems > > enough to reproduce the effect very fast. > > > > The following values are (with SCHED_4BSD, SCHED-ULE give the same) read > > in single user mode after the machine had been up for about 25 minutes > > and did 10 minutes of mail filtering. Of course none of the daemons are > > running anymore: > > > > # sysctl kern.openfiles > > kern.openfiles: 4715 > > # lsof | wc -l > > 35 > > # fstat | wc -l > > 23 > > # sysctl kern.malloc | grep "file desc" > > file desc to leader 0 0K 1K 3 32 > > file desc 102 26K 58K 15408 256 > > # ps ax > > PID TT STAT TIME COMMAND > > 0 ?? DLs 0:00.11 (swapper) > > 1 ?? ILs 0:00.64 /sbin/init -- > > 2 ?? DL 0:00.11 (g_event) > > 3 ?? DL 0:02.30 (g_up) > > 4 ?? DL 0:01.70 (g_down) > > 5 ?? DL 0:00.00 (taskqueue) > > 6 ?? IL 0:00.00 (acpi_task0) > > 7 ?? IL 0:00.00 (acpi_task1) > > 8 ?? IL 0:00.00 (acpi_task2) > > 9 ?? DL 0:00.00 (pagedaemon) > > 10 ?? DL 0:00.00 (ktrace) > > 11 ?? RL 26:37.86 (idle: cpu3) > > 12 ?? RL 26:33.18 (idle: cpu2) > > 13 ?? RL 25:53.23 (idle: cpu1) > > 14 ?? RL 25:26.75 (idle: cpu0) > > 27 ?? WL 0:00.00 (irq14: ata0) > > 29 ?? WL 0:01.34 (irq16: uhci0) > > 37 ?? WL 0:01.61 (irq24: twe0) > > 61 ?? WL 0:02.00 (irq48: em0) > > 86 ?? WL 0:01.65 (swi8: tty:sio clock) > > 88 ?? WL 0:03.32 (swi1: net) > > 89 ?? DL 0:00.43 (random) > > 91 ?? WL 0:00.00 (swi7: acpitaskq) > > 92 ?? WL 0:00.00 (swi7: task queue) > > 94 ?? WL 0:00.00 (swi0: tty:sio) > > 95 ?? DL 0:05.38 (pagezero) > > 96 ?? DL 0:00.02 (bufdaemon) > > 97 ?? DL 0:00.01 (vnlru) > > 98 ?? DL 0:00.88 (syncer) > > 415 ?? DL 0:00.00 (usb0) > > 416 ?? DL 0:00.00 (usbtask) > > 15403 d0 Ss 0:00.01 -sh (sh) > > 15415 d0 R+ 0:00.00 ps ax > > # uname -a > > FreeBSD lupin 5.2-CURRENT FreeBSD 5.2-CURRENT #13: Wed Dec 24 15:31:44 CET 2003 root@lupin.eusc.inter.net:/usr/obj/usr/src/sys/MOMAIL i386 > > # uptime > > 4:35PM up 29 mins, 1 user, load averages: 0.10, 0.75, 0.63 > > > > There are no debugging options in the kernel and malloc.conf is linked > > to aj since I needed to do the performance testing. The machine has to > > go into production state on sunday; I would like to stay with FBSD 5 due > > to the better SMP performance and the ability to do FS snapshots. Only > > in the worst case I'd put a 4-STABLE on it. So I will give any help I > > can to solve the issue. > > > > Greetinx, merry x-mas, Oliver > > > > -- > > | Oliver Brandmueller | Offenbacher Str. 1 | Germany D-14197 Berlin | > > | Fon +49-172-3130856 | Fax +49-172-3145027 | WWW: http://the.addict.de/ | > > | Ich bin das Internet. Sowahr ich Gott helfe. | > > | Eine gewerbliche Nutzung aller enthaltenen Adressen ist nicht gestattet! | > > _______________________________________________ > > freebsd-current@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-current > > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > > > > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1031224112102.66152D-100000>