Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 May 2013 23:47:17 +0200
From:      dennis berger <db@nipsi.de>
To:        Jeremy Chadwick <jdc@koitsu.org>
Cc:        FreeBSD stable <freebsd-stable@freebsd.org>, Jack Vogel <jfvogel@gmail.com>
Subject:   Re: still mbuf leak in 9.0 / 9.1?
Message-ID:  <696B5622-A95D-4187-A027-07ECC9B5AD1F@nipsi.de>
In-Reply-To: <20130515211436.GA42790@icarus.home.lan>
References:  <FDFFFCCB-BDF8-4E27-AF9D-D14D7E0D426D@nipsi.de> <CAFOYbcmF5WybuyJ9DuotcJf_u1FxwBKOLtHvpnT-05cVG6ES=A@mail.gmail.com> <004BC6EA-D8E6-473E-851C-9CDA7578510A@nipsi.de> <20130515211436.GA42790@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help

Am 15.05.2013 um 23:14 schrieb Jeremy Chadwick:

> On Wed, May 15, 2013 at 10:13:04PM +0200, dennis berger wrote:
>> Hi jack,
>>=20
>> so the increasing number of "mbufs in use" or mbuf clusters in use is =
normal, you would say?
>> jumbo frames are of size 9k. I know that they're from different =
pools, I also checked that pool.
>> nmb are:
>>=20
>> #cat loader.conf
>>=20
>> #tuning network
>> hw.intr_storm_threshold=3D9000
>> kern.ipc.nmbclusters=3D262144
>> kern.ipc.nmbjumbop=3D262144
>> kern.ipc.nmbjumbo9=3D65536
>> kern.ipc.nmbjumbo16=3D32768
>>=20
>>=20
>> 14-05-2013-14-09.txt:9246/4918/14164/262144 mbuf clusters in use =
(current/cache/total/max)
>> 14-05-2013-15-09.txt:9256/4856/14112/262144 mbuf clusters in use =
(current/cache/total/max)
>> 14-05-2013-16-09.txt:9266/4846/14112/262144 mbuf clusters in use =
(current/cache/total/max)
>> 14-05-2013-17-09.txt:9276/4836/14112/262144 mbuf clusters in use =
(current/cache/total/max)
>> 14-05-2013-18-09.txt:9286/4826/14112/262144 mbuf clusters in use =
(current/cache/total/max)
>> 14-05-2013-19-09.txt:9296/4734/14030/262144 mbuf clusters in use =
(current/cache/total/max)
>> 14-05-2013-20-09.txt:9306/4724/14030/262144 mbuf clusters in use =
(current/cache/total/max)
>> 14-05-2013-21-09.txt:9316/4714/14030/262144 mbuf clusters in use =
(current/cache/total/max)
>> 14-05-2013-22-09.txt:9326/4704/14030/262144 mbuf clusters in use =
(current/cache/total/max)
>> 14-05-2013-23-09.txt:9336/4694/14030/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-00-09.txt:9346/4684/14030/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-01-09.txt:9356/4674/14030/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-02-09.txt:9366/4664/14030/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-03-09.txt:9379/4279/13658/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-04-09.txt:9384/4086/13470/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-05-09.txt:9394/4076/13470/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-06-09.txt:9404/4066/13470/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-07-09.txt:9414/5040/14454/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-08-09.txt:9424/5030/14454/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-09-09.txt:9434/4898/14332/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-10-09.txt:9444/4850/14294/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-11-09.txt:9454/5000/14454/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-12-09.txt:9464/4874/14338/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-13-09.txt:9474/4856/14330/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-14-09.txt:17674/4460/22134/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-15-09.txt:17684/4450/22134/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-16-09.txt:17694/4696/22390/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-17-09.txt:17704/4686/22390/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-18-09.txt:17714/4658/22372/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-19-09.txt:17724/4648/22372/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-20-09.txt:17734/4638/22372/262144 mbuf clusters in use =
(current/cache/total/max)
>> 15-05-2013-21-09.txt:17744/4628/22372/262144 mbuf clusters in use =
(current/cache/total/max)
>>=20
>> Please see the link to http://knownhosts.org/reports-14-15.tgz in my =
original post, there is the full information including 9k jumbo frames.
>>=20
>> it's the driver version 2.4.8 which should be from 9.1-release =
directly
>> yes TWINAX is correct.
>>=20
>> I'll replace the driver with the latest one.
>>=20
>> best regards and thanks,
>> dennis
>>=20
>>=20
>> Am 15.05.2013 um 19:00 schrieb Jack Vogel:
>>=20
>>> So, you stop getting 10G transmission and so you are looking at mbuf =
leaks? I don't see
>>> anything in your data that makes it look like you've run out of =
available mbufs.  You said
>>> you're running jumbos, what size? You do realize that if you do this =
the clusters are coming
>>> from different pools and you are not displaying those. What are all =
your nmb limits set to?
>>>=20
>>> So, this is 9.1 RELEASE, or stable? If you are using the driver from =
release I would first off
>>> suggest you test the code from HEAD.
>>>=20
>>> What is the 10G device, I see its using Twinax, and I have been told =
there is a problem at
>>> times with those that is corrected in recent shared code, this is =
why you should try the
>>> latest code.
>>>=20
>>> Cheers,
>>>=20
>>> Jack
>>>=20
>>>=20
>>>=20
>>> On Wed, May 15, 2013 at 2:00 AM, dennis berger <db@nipsi.de> wrote:
>>> Hi list,
>>> since we activated 10gbe on ixgbe cards + jumbo frames(9k) on 9.0 =
and now on 9.1 we recognize that after a random period of time, =
sometimes a week, sometimes only a day, the
>>> system doesn't send any packets out. The phenomenon is that you =
can't login via ssh, nfs and istgt is not operative. Yet you can login =
on the console and execute commands.
>>> A clean shutdown isn't possible though. It hangs after vnode =
cleaning, normally you would see detaching of usb devices here, or other =
devices maybe?
>>> I've read the other post on this ML about mbuf leak in the arp =
handling code in if_ether.c line 558. We don't see any of those notices =
in dmesg so I don't think that glebius fix would apply for us.
>>> I'm collecting system and memory information every hour.
>>>=20
>>>=20
>>> Script looks like this.
>>> less /etc/periodic/hourly/100.report-memory.sh
>>> #!/bin/sh
>>>=20
>>> reporttimestamp=3D`date +%d-%m-%Y-%H-%M`
>>> reportname=3D${reporttimestamp}.txt
>>>=20
>>> cd /root/memory-mon
>>>=20
>>> top -b > $reportname
>>> echo "" >> $reportname
>>> vmstat -m >> $reportname
>>> echo "" >> $reportname
>>> vmstat -z >> $reportname
>>> echo "" >> $reportname
>>> netstat -Q >> $reportname
>>> echo "" >> $reportname
>>> netstat -n -x >> $reportname
>>> echo "" >> $reportname
>>> netstat -m >> $reportname
>>> /usr/bin/perl /usr/local/bin/zfs-stats -a >> $reportname
>>>=20
>>> When you grep for mbuf or mbuf usage you will see this for example:
>>>=20
>>> root@freenas:/root/memory-mon # grep mbuf_packet: *
>>> 14-05-2013-14-09.txt:mbuf_packet:            256,      0,    9246,   =
 2786,201700429,   0,   0
>>> 14-05-2013-15-09.txt:mbuf_packet:            256,      0,    9256,   =
 2776,201773122,   0,   0
>>> 14-05-2013-16-09.txt:mbuf_packet:            256,      0,    9266,   =
 2766,201871553,   0,   0
>>> 14-05-2013-17-09.txt:mbuf_packet:            256,      0,    9276,   =
 2756,201915405,   0,   0
>>> 14-05-2013-18-09.txt:mbuf_packet:            256,      0,    9286,   =
 2746,201927956,   0,   0
>>> 14-05-2013-19-09.txt:mbuf_packet:            256,      0,    9296,   =
 2352,201935681,   0,   0
>>> 14-05-2013-20-09.txt:mbuf_packet:            256,      0,    9306,   =
 2342,201943754,   0,   0
>>> 14-05-2013-21-09.txt:mbuf_packet:            256,      0,    9316,   =
 2332,201950961,   0,   0
>>> 14-05-2013-22-09.txt:mbuf_packet:            256,      0,    9326,   =
 2450,201958150,   0,   0
>>> 14-05-2013-23-09.txt:mbuf_packet:            256,      0,    9336,   =
 2440,201967178,   0,   0
>>> 15-05-2013-00-09.txt:mbuf_packet:            256,      0,    9346,   =
 2430,201974561,   0,   0
>>> 15-05-2013-01-09.txt:mbuf_packet:            256,      0,    9356,   =
 2420,201982105,   0,   0
>>> 15-05-2013-02-09.txt:mbuf_packet:            256,      0,    9366,   =
 2410,201989463,   0,   0
>>> 15-05-2013-03-09.txt:mbuf_packet:            256,      0,    9378,   =
 1502,203019168,   0,   0
>>> 15-05-2013-04-09.txt:mbuf_packet:            256,      0,    9384,   =
 1624,205953601,   0,   0
>>> 15-05-2013-05-09.txt:mbuf_packet:            256,      0,    9394,   =
 1870,205959258,   0,   0
>>> 15-05-2013-06-09.txt:mbuf_packet:            256,      0,    9404,   =
 2500,205969396,   0,   0
>>> 15-05-2013-07-09.txt:mbuf_packet:            256,      0,    9414,   =
 3386,207945161,   0,   0
>>> 15-05-2013-08-09.txt:mbuf_packet:            256,      0,    9424,   =
 3376,208094689,   0,   0
>>> 15-05-2013-09-09.txt:mbuf_packet:            256,      0,    9434,   =
 2982,208172465,   0,   0
>>> 15-05-2013-10-09.txt:mbuf_packet:            256,      0,    9444,   =
 3100,208270369,   0,   0
>>>=20
>>> and
>>>=20
>>> root@freenas:/root/memory-mon # grep "mbufs in use" *
>>> 14-05-2013-14-09.txt:58444/5816/64260 mbufs in use =
(current/cache/total)
>>> 14-05-2013-15-09.txt:58455/5805/64260 mbufs in use =
(current/cache/total)
>>> 14-05-2013-16-09.txt:58464/5796/64260 mbufs in use =
(current/cache/total)
>>> 14-05-2013-17-09.txt:58475/5785/64260 mbufs in use =
(current/cache/total)
>>> 14-05-2013-18-09.txt:58484/5776/64260 mbufs in use =
(current/cache/total)
>>> 14-05-2013-19-09.txt:58493/5767/64260 mbufs in use =
(current/cache/total)
>>> 14-05-2013-20-09.txt:58503/5757/64260 mbufs in use =
(current/cache/total)
>>> 14-05-2013-21-09.txt:58513/5747/64260 mbufs in use =
(current/cache/total)
>>> 14-05-2013-22-09.txt:58523/5737/64260 mbufs in use =
(current/cache/total)
>>> 14-05-2013-23-09.txt:58533/5727/64260 mbufs in use =
(current/cache/total)
>>> 15-05-2013-00-09.txt:58543/5717/64260 mbufs in use =
(current/cache/total)
>>> 15-05-2013-01-09.txt:58554/5706/64260 mbufs in use =
(current/cache/total)
>>> 15-05-2013-02-09.txt:58563/5697/64260 mbufs in use =
(current/cache/total)
>>> 15-05-2013-03-09.txt:58639/5621/64260 mbufs in use =
(current/cache/total)
>>> 15-05-2013-04-09.txt:58581/5679/64260 mbufs in use =
(current/cache/total)
>>> 15-05-2013-05-09.txt:58591/5669/64260 mbufs in use =
(current/cache/total)
>>> 15-05-2013-06-09.txt:58602/5658/64260 mbufs in use =
(current/cache/total)
>>> 15-05-2013-07-09.txt:58613/5647/64260 mbufs in use =
(current/cache/total)
>>> 15-05-2013-08-09.txt:58623/6027/64650 mbufs in use =
(current/cache/total)
>>> 15-05-2013-09-09.txt:58634/6016/64650 mbufs in use =
(current/cache/total)
>>> 15-05-2013-10-09.txt:58645/6005/64650 mbufs in use =
(current/cache/total)
>>>=20
>>>=20
>>> This increasing number of used mbuf_packets and mbufs in use makes =
me nervous.
>>> See the complete reports http://knownhosts.org:/reports-14-15.tgz
>>>=20
>>> Thanks for help,
>>>=20
>>> -dennis
>>>=20
>>>=20
>>>=20
>>> --------------BEGIN System information---------------
>>> It's a stock FreeBSD 9.1, yet the hostname is called freenas. Don't =
be confused.
>>>=20
>>>=20
>>> igb0: flags=3D8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu =
1500
>>>        =
options=3D401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSU=
M,TSO4,VLAN_HWTSO>
>>>        ether 00:25:90:34:c1:12
>>>        nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
>>>        media: Ethernet autoselect (1000baseT <full-duplex>)
>>>        status: active
>>> igb1: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 =
mtu 1500
>>>        =
options=3D401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSU=
M,TSO4,VLAN_HWTSO>
>>>        ether 00:25:90:34:c1:13
>>>        inet 172.16.1.6 netmask 0xfffff000 broadcast 172.16.15.255
>>>        inet6 fe80::225:90ff:fe34:c113%igb1 prefixlen 64 scopeid 0x2
>>>        nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
>>>        media: Ethernet autoselect (1000baseT <full-duplex>)
>>>        status: active
>>> ix0: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 =
mtu 9000
>>>        =
options=3D401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSU=
M,TSO4,VLAN_HWTSO>
>>>        ether 00:1b:21:cc:12:8b
>>>        inet 10.254.254.242 netmask 0xfffffffc broadcast =
10.254.254.243
>>>        inet6 fe80::21b:21ff:fecc:128b%ix0 prefixlen 64 scopeid 0xb
>>>        nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
>>>        media: Ethernet autoselect (10Gbase-Twinax <full-duplex>)
>>>        status: active
>>> ix1: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 =
mtu 9000
>>>        =
options=3D401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSU=
M,TSO4,VLAN_HWTSO>
>>>        ether 00:1b:21:cc:12:8a
>>>        inet 10.254.254.254 netmask 0xfffffffc broadcast =
10.254.254.255
>>>        inet6 fe80::21b:21ff:fecc:128a%ix1 prefixlen 64 scopeid 0xc
>>>        nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
>>>        media: Ethernet autoselect (10Gbase-Twinax <full-duplex>)
>>>        status: active
>>> ix2: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 =
mtu 9000
>>>        =
options=3D401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSU=
M,TSO4,VLAN_HWTSO>
>>>        ether 00:1b:21:cc:12:b3
>>>        inet 10.254.254.246 netmask 0xfffffffc broadcast =
10.254.254.247
>>>        inet6 fe80::21b:21ff:fecc:12b3%ix2 prefixlen 64 scopeid 0xd
>>>        nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
>>>        media: Ethernet autoselect
>>>        status: no carrier
>>> ix3: flags=3D8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
>>>        =
options=3D401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSU=
M,TSO4,VLAN_HWTSO>
>>>        ether 00:1b:21:cc:12:b2
>>>        nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
>>>        media: Ethernet autoselect
>>>        status: no carrier
>>> lo0: flags=3D8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
>>>        options=3D600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
>>>        inet6 ::1 prefixlen 128
>>>        inet6 fe80::1%lo0 prefixlen 64 scopeid 0xf
>>>        inet 127.0.0.1 netmask 0xff000000
>>>        nd6 options=3D21<PERFORMNUD,AUTO_LINKLOCAL>
>>>=20
>>> #dmesg
>>> =85..
>>> mfi0: 21294 (421879975s/0x0008/info) - Battery started charging
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>> ix1: link state changed to DOWN
>>> ix1: link state changed to UP
>>>=20
>>>=20
>>> I should add that the servers that are directly connected to this =
freebsd server reboot every night. This is why you see ix0 UP/DOWN
>>> messages in dmesg.
>>>=20
>>>=20
>>>=20
>>>=20
>>>=20
>>>=20
>>> ------------- END System information------------
>=20
> 1. You appear convinced that the issue is related to mbuf exhaustion,
> but you haven't provided evidence that you're hitting the mbuf maximum
> (in your case 262144).
>=20
> What you *have* shown is your mbuf count gradually increasing (sans
> 15-05-2013-13-09.txt vs. 15-05-2013-14-09.txt which shows mbufs almost
> doubling (!)), which could indicate a leak but then again might not.
>=20
> If you reach mbuf maximum, then yes, network I/O can cease or stall
> (possibly indefinitely).  However, broken/busted network I/O can also
> happen due to other issues unrelated to mbufs, such as network stack
> issues, firewall stack issues, or network driver bugs.  Are you using
> pf, ipfw, or ipfilter on this system?

I'll watch this over a longer period of time and come back.
No pf, ipfw etc. on the system.=20

>=20
> 2. I think we'd all appreciate if you disclosed **exactly** what =
version
> of FreeBSD you're using (Subject says "9.0 or 9.1" which is
> insufficient).  Please provide "uname -a" output (you can XXX out the
> hostname if you want) -- and if you're still using csup/cvsup and =
built
> your own kernel/world, we'll need to know exactly what date your src
> files were from when you rebuilt.
>=20
> I'm wary of CC'ing folks who can help troubleshoot mbuf exhaustion
> issues until answers to the above can be provided, as I don't want to
> waste their time.

FreeBSD  9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec  4 09:23:10 =
UTC 2012     root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  =
amd64


>=20
> 3. Regarding this:
>=20
>>> A clean shutdown isn't possible though. It hangs after vnode
>>> cleaning, normally you would see detaching of usb devices here, or
>>> other devices maybe?
>=20
> Please don't conflate this with your above issue.  This is almost
> certainly unrelated.  Please start a new thread about that if desired.

Maybe this is a misunderstanding normally this system will shutdown =
cleanly, of course.
This hang only appears after the network problem above.

-dennis

>=20
> --=20
> | Jeremy Chadwick                                   jdc@koitsu.org |
> | UNIX Systems Administrator                http://jdc.koitsu.org/ |
> | Mountain View, CA, US                                            |
> | Making life hard for others since 1977.             PGP 4BD6C0CB |
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to =
"freebsd-stable-unsubscribe@freebsd.org"

Dipl.-Inform. (FH)
Dennis Berger

email:   db@bsdsystems.de
mobile: +491791231509
fon: +494054001817




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?696B5622-A95D-4187-A027-07ECC9B5AD1F>