Date: Sat, 30 Oct 2004 13:36:14 -0300 (ADT) From: "Marc G. Fournier" <scrappy@hub.org> To: freebsd-stable@freebsd.org Subject: Re: vnode 'leak' in 4.x ... Message-ID: <20041030133328.V6085@ganymede.hub.org> In-Reply-To: <20041030133044.O6085@ganymede.hub.org> References: <20041030131002.O6085@ganymede.hub.org> <20041030133044.O6085@ganymede.hub.org>
next in thread | previous in thread | raw e-mail | index | archive | help
And, just before rebooting the server in question, the process listing looks like: USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND ipaudit 34190 0.0 0.0 0 0 ?? Z 1:30PM 0:00.00 (sh) root 34521 0.0 0.0 444 280 p3 R+ 1:33PM 0:00.00 ps aux root 34520 0.0 0.0 3080 1452 ?? S 1:33PM 0:00.00 sendmail: startup with [218.17.67.38] (sendmail) root 34222 0.0 0.1 5024 4644 ?? S 1:30PM 0:00.04 /usr/local/ipaudit/bin/ipaudit -g /usr/local/ipaudit/ipaudit-web.conf -o /usr/local/ipaudit/data/30min/2004-10-30-13:30 ipaudit 34221 0.0 0.0 636 256 ?? I 1:30PM 0:00.00 /bin/sh cron/cron30min ipaudit 34220 0.0 0.0 636 256 ?? I 1:30PM 0:00.00 /bin/sh cron/cron30min root 34186 0.0 0.0 1032 632 ?? I 1:30PM 0:00.00 cron: running job (cron) root 26779 0.0 0.0 1324 900 p0 Ss+ 1:15PM 0:00.12 -csh (csh) root 26777 0.0 0.0 5296 1668 ?? S 1:15PM 0:00.16 sshd: root@ttyp0 (sshd) root 26725 0.0 0.0 1080 604 p2 S+ 1:15PM 0:00.02 grep vnode root 26724 0.0 0.0 916 416 p2 S+ 1:15PM 0:00.04 tail -f /var/log/syswatch root 19666 0.0 0.0 3116 1828 ?? I 1:05PM 0:00.02 sendmail: server [219.236.18.233] cmd read (sendmail) root 18305 0.0 0.0 1352 948 p3 Ss 12:59PM 0:00.40 -csh (csh) root 18303 0.0 0.0 5296 1668 ?? S 12:59PM 0:00.62 sshd: root@ttyp3 (sshd) root 18291 0.0 0.1 5328 3056 ?? Ss 12:59PM 0:00.26 /usr/local/sbin/named root 18038 0.0 0.0 1328 924 p2 Is 12:58PM 0:00.52 -csh (csh) root 18036 0.0 0.0 5296 1668 ?? S 12:58PM 0:00.30 sshd: root@ttyp2 (sshd) root 208 0.0 0.0 956 8 v7 Is+ 5Sep04 0:00.00 /usr/libexec/getty Pc ttyv7 root 207 0.0 0.0 956 8 v6 Is+ 5Sep04 0:00.00 /usr/libexec/getty Pc ttyv6 root 206 0.0 0.0 956 8 v5 Is+ 5Sep04 0:00.00 /usr/libexec/getty Pc ttyv5 root 205 0.0 0.0 956 8 v4 Is+ 5Sep04 0:00.00 /usr/libexec/getty Pc ttyv4 root 204 0.0 0.0 956 8 v3 Is+ 5Sep04 0:00.00 /usr/libexec/getty Pc ttyv3 root 203 0.0 0.0 956 8 v2 Is+ 5Sep04 0:00.00 /usr/libexec/getty Pc ttyv2 root 202 0.0 0.0 956 8 v1 Is+ 5Sep04 0:00.00 /usr/libexec/getty Pc ttyv1 root 201 0.0 0.0 956 8 v0 Is+ 5Sep04 0:00.01 /usr/libexec/getty Pc ttyv0 root 190 0.0 0.0 1980 416 ?? I 5Sep04 0:26.67 /usr/local/sbin/upclient smmsp 145 0.0 0.0 2936 652 ?? Is 5Sep04 0:05.20 sendmail: Queue runner@00:30:00 for /var/spool/clientmqueue (sendmail) root 142 0.0 0.0 3056 1048 ?? Ss 5Sep04 12:31.00 sendmail: accepting connections (sendmail) root 111 0.0 0.0 2596 672 ?? Is 5Sep04 3:07.15 /usr/sbin/sshd root 109 0.0 0.0 1032 528 ?? Ss 5Sep04 2:05.55 /usr/sbin/cron daemon 103 0.0 0.0 912 364 ?? Ss 5Sep04 1:17.51 rwhod root 101 0.0 0.0 263152 428 ?? Is 5Sep04 2:29.86 rpc.statd root 99 0.0 0.0 360 0 ?? I 5Sep04 21:19.41 nfsd: server (nfsd) root 98 0.0 0.0 360 0 ?? I 5Sep04 105:39.79 nfsd: server (nfsd) root 97 0.0 0.0 360 0 ?? I 5Sep04 291:27.28 nfsd: server (nfsd) root 96 0.0 0.0 360 0 ?? I 5Sep04 1453:56.62 nfsd: server (nfsd) root 95 0.0 0.0 368 0 ?? Is 5Sep04 0:00.00 nfsd: master (nfsd) root 92 0.0 0.0 588 252 ?? Is 5Sep04 2:30.06 mountd -r daemon 90 0.0 0.0 1012 460 ?? Is 5Sep04 2:35.51 /usr/sbin/portmap root 85 0.0 0.0 996 388 ?? Ss 5Sep04 22:39.76 /usr/sbin/syslogd -ss root 7 0.0 0.0 0 0 ?? DL 5Sep04 655:25.39 (vnlru) root 6 0.0 0.0 0 0 ?? DL 5Sep04 1088:06.50 (syncer) root 5 0.0 0.0 0 0 ?? DL 5Sep04 2:57.05 (bufdaemon) root 4 0.0 0.0 0 0 ?? DL 5Sep04 0:00.00 (vmdaemon) root 3 0.0 0.0 0 0 ?? DL 5Sep04 23:36.87 (pagedaemon) root 2 0.0 0.0 0 0 ?? DL 5Sep04 0:00.00 (taskqueue) root 1 0.0 0.0 552 72 ?? SLs 5Sep04 1:54.30 /sbin/init -- root 0 0.0 0.0 0 0 ?? DLs 5Sep04 0:00.00 (swapper) root 34522 0.0 0.0 344 184 p3 R+ 1:33PM 0:00.00 less Shutting down all other processes on the server, and umounting everything but the required file systems (ie. umounting the heavily used one), resulted in it freeing up about 30k vnodes, and then it hovers around 55k free ... if I restarted everything, it would once more fall down to the 20k mark or so, and vnlru would be constantly in a vlrup state :( Oct 30 13:21:00 venus root: debug.numvnodes: 522265 - debug.freevnodes: 19384 - debug.vnlru_nowhere: 209881 - vlrup Oct 30 13:22:01 venus root: debug.numvnodes: 522265 - debug.freevnodes: 19935 - debug.vnlru_nowhere: 209901 - vlrup Oct 30 13:23:01 venus root: debug.numvnodes: 522265 - debug.freevnodes: 22739 - debug.vnlru_nowhere: 209920 - vlrup Oct 30 13:24:00 venus root: debug.numvnodes: 522265 - debug.freevnodes: 22031 - debug.vnlru_nowhere: 209940 - vlrup Oct 30 13:25:00 venus root: debug.numvnodes: 522265 - debug.freevnodes: 31552 - debug.vnlru_nowhere: 209960 - vlrup Oct 30 13:26:00 venus root: debug.numvnodes: 522265 - debug.freevnodes: 26440 - debug.vnlru_nowhere: 209980 - vlrup Oct 30 13:27:00 venus root: debug.numvnodes: 522265 - debug.freevnodes: 50454 - debug.vnlru_nowhere: 209986 - vlrup Oct 30 13:28:01 venus root: debug.numvnodes: 522265 - debug.freevnodes: 52263 - debug.vnlru_nowhere: 210005 - vlruwt Oct 30 13:29:01 venus root: debug.numvnodes: 522265 - debug.freevnodes: 51269 - debug.vnlru_nowhere: 210017 - vlrup Oct 30 13:30:01 venus root: debug.numvnodes: 522265 - debug.freevnodes: 52146 - debug.vnlru_nowhere: 210027 - vlruwt Oct 30 13:31:00 venus root: debug.numvnodes: 522265 - debug.freevnodes: 54789 - debug.vnlru_nowhere: 210027 - vlruwt Oct 30 13:32:00 venus root: debug.numvnodes: 522265 - debug.freevnodes: 54938 - debug.vnlru_nowhere: 210027 - vlruwt Oct 30 13:33:00 venus root: debug.numvnodes: 522265 - debug.freevnodes: 54932 - debug.vnlru_nowhere: 210027 - vlruwt Oct 30 13:34:00 venus root: debug.numvnodes: 522265 - debug.freevnodes: 54935 - debug.vnlru_nowhere: 210027 - vlruwt On Sat, 30 Oct 2004, Marc G. Fournier wrote: > > Just to give an idea of what a second server, with less uptime, is looking > like, with the approx. the same # of VMs on her: > > Oct 30 13:29:00 neptune root: debug.numvnodes: 462882 - debug.freevnodes: > 132826 - debug.vnlru_nowhere: 0 - vlruwt > Oct 30 13:30:00 neptune root: debug.numvnodes: 462882 - debug.freevnodes: > 151976 - debug.vnlru_nowhere: 0 - vlruwt > > But she's only been up 7 days so far ... > > On Sat, 30 Oct 2004, Marc G. Fournier wrote: > >> >> A little while ago, I reported a suspicion that vnodes just weren't being >> freed up on long running servers ... after 55days of uptime on one of my >> servers, here is what I'm dealing with ... >> >> 793 'samples' today (one every minute) >> 786 with vnlru in a vlrup state >> >> I shutdown all of the VMs running on the large hard drive (the only place >> unionfs is being used) and umount'd the drive ... there were some suggested >> back then that this might/should free everything back up again ... but it >> didn't: >> >> Oct 30 13:06:02 venus root: debug.numvnodes: 522265 - debug.freevnodes: >> 57966 - debug.vnlru_nowhere: 209679 - vlruwt >> Oct 30 13:07:00 venus root: debug.numvnodes: 522265 - debug.freevnodes: >> 57268 - debug.vnlru_nowhere: 209679 - vlruwt >> Oct 30 13:08:00 venus root: debug.numvnodes: 522265 - debug.freevnodes: >> 52335 - debug.vnlru_nowhere: 209679 - vlruwt >> Oct 30 13:09:00 venus root: debug.numvnodes: 522265 - debug.freevnodes: >> 50228 - debug.vnlru_nowhere: 209682 - vlrup >> Oct 30 13:10:01 venus root: debug.numvnodes: 522265 - debug.freevnodes: >> 44407 - debug.vnlru_nowhere: 209690 - vlrup >> Oct 30 13:11:00 venus root: debug.numvnodes: 522265 - debug.freevnodes: >> 35424 - debug.vnlru_nowhere: 209697 - vlrup >> Oct 30 13:12:02 venus root: debug.numvnodes: 522265 - debug.freevnodes: >> 34626 - debug.vnlru_nowhere: 209708 - vlrup >> Oct 30 13:13:00 venus root: debug.numvnodes: 522265 - debug.freevnodes: >> 29214 - debug.vnlru_nowhere: 209727 - vlrup >> Oct 30 13:14:00 venus root: debug.numvnodes: 522265 - debug.freevnodes: >> 24414 - debug.vnlru_nowhere: 209746 - vlrup >> Oct 30 13:15:00 venus root: debug.numvnodes: 522265 - debug.freevnodes: >> 26994 - debug.vnlru_nowhere: 209766 - vlrup >> >> The 'vlruwt' states above are while I had everything shutdown ... the >> vlrup's all started again after I mounted the drive and started to restart >> the VMs themselves ... >> >> I expect a high # of vnodes to be used ... that isn't the issue ... the >> issue is that even getting rid of the major mount point, so that only /, >> /tmp, /usr, /var are left up, the large # of vnodes that are in use on that >> mount point aren't being freed by vnlru :( >> >> I hate to reboot the server, but it looks like I've got no choice at this >> point ... is there something else that I can do, in 50 days or so, to >> provide more information? >> >> Thanks ... >> >> ---- >> Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) >> Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664 >> > > ---- > Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) > Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664 > ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20041030133328.V6085>