From owner-freebsd-stable@FreeBSD.ORG Wed May 15 21:14:39 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 387FBF69 for ; Wed, 15 May 2013 21:14:39 +0000 (UTC) (envelope-from jdc@koitsu.org) Received: from qmta03.emeryville.ca.mail.comcast.net (qmta03.emeryville.ca.mail.comcast.net [IPv6:2001:558:fe2d:43:76:96:30:32]) by mx1.freebsd.org (Postfix) with ESMTP id 0E2D5B5D for ; Wed, 15 May 2013 21:14:39 +0000 (UTC) Received: from omta05.emeryville.ca.mail.comcast.net ([76.96.30.43]) by qmta03.emeryville.ca.mail.comcast.net with comcast id cDBF1l0030vp7WLA3MEefX; Wed, 15 May 2013 21:14:38 +0000 Received: from koitsu.strangled.net ([67.180.84.87]) by omta05.emeryville.ca.mail.comcast.net with comcast id cMEc1l00c1t3BNj8RMEdlc; Wed, 15 May 2013 21:14:37 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 9DA4273A33; Wed, 15 May 2013 14:14:36 -0700 (PDT) Date: Wed, 15 May 2013 14:14:36 -0700 From: Jeremy Chadwick To: dennis berger Subject: Re: still mbuf leak in 9.0 / 9.1? Message-ID: <20130515211436.GA42790@icarus.home.lan> References: <004BC6EA-D8E6-473E-851C-9CDA7578510A@nipsi.de> MIME-Version: 1.0 Content-Type: text/plain; charset=unknown-8bit Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <004BC6EA-D8E6-473E-851C-9CDA7578510A@nipsi.de> User-Agent: Mutt/1.5.21 (2010-09-15) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=q20121106; t=1368652478; bh=c7prFNXjhZem9uCHH9FbegMuijy7OPezY7yhtJrdnGs=; h=Received:Received:Received:Date:From:To:Subject:Message-ID: MIME-Version:Content-Type; b=aLspMfDFhmeaqfAuA6pWXQeM7LkGpDwaTpkfgDZsBzUDs7KT24YZjH4cgklxzepCT uVVheKdMNyXN3gvEhp9jQeWAy1X15FiPSFfR1X7aNU9uoTSDBXEc82Qy8Zay4+uzbR fPPlcaga19/gf4MkUz4XGkN544ml4wral3wC8kxwPJQsf6OTDKpT6nBYB8tnesbYDs NspI5ixjXZGZVJkF6izy6zNbwggR+fAxi2zcnGCIDNQVgsKnjCQdUycIa852IS8QsW Gx4HsiEOqXzcI5bKygS90oRXIaY4U/Yvl8u3FqeqRctR0bT+wiANGC2QFOv5Xl5rqD kPQ1ftyMo41cQ== Cc: FreeBSD stable , Jack Vogel X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 May 2013 21:14:39 -0000 On Wed, May 15, 2013 at 10:13:04PM +0200, dennis berger wrote: > Hi jack, > > so the increasing number of "mbufs in use" or mbuf clusters in use is normal, you would say? > jumbo frames are of size 9k. I know that they're from different pools, I also checked that pool. > nmb are: > > #cat loader.conf > > #tuning network > hw.intr_storm_threshold=9000 > kern.ipc.nmbclusters=262144 > kern.ipc.nmbjumbop=262144 > kern.ipc.nmbjumbo9=65536 > kern.ipc.nmbjumbo16=32768 > > > 14-05-2013-14-09.txt:9246/4918/14164/262144 mbuf clusters in use (current/cache/total/max) > 14-05-2013-15-09.txt:9256/4856/14112/262144 mbuf clusters in use (current/cache/total/max) > 14-05-2013-16-09.txt:9266/4846/14112/262144 mbuf clusters in use (current/cache/total/max) > 14-05-2013-17-09.txt:9276/4836/14112/262144 mbuf clusters in use (current/cache/total/max) > 14-05-2013-18-09.txt:9286/4826/14112/262144 mbuf clusters in use (current/cache/total/max) > 14-05-2013-19-09.txt:9296/4734/14030/262144 mbuf clusters in use (current/cache/total/max) > 14-05-2013-20-09.txt:9306/4724/14030/262144 mbuf clusters in use (current/cache/total/max) > 14-05-2013-21-09.txt:9316/4714/14030/262144 mbuf clusters in use (current/cache/total/max) > 14-05-2013-22-09.txt:9326/4704/14030/262144 mbuf clusters in use (current/cache/total/max) > 14-05-2013-23-09.txt:9336/4694/14030/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-00-09.txt:9346/4684/14030/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-01-09.txt:9356/4674/14030/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-02-09.txt:9366/4664/14030/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-03-09.txt:9379/4279/13658/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-04-09.txt:9384/4086/13470/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-05-09.txt:9394/4076/13470/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-06-09.txt:9404/4066/13470/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-07-09.txt:9414/5040/14454/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-08-09.txt:9424/5030/14454/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-09-09.txt:9434/4898/14332/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-10-09.txt:9444/4850/14294/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-11-09.txt:9454/5000/14454/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-12-09.txt:9464/4874/14338/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-13-09.txt:9474/4856/14330/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-14-09.txt:17674/4460/22134/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-15-09.txt:17684/4450/22134/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-16-09.txt:17694/4696/22390/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-17-09.txt:17704/4686/22390/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-18-09.txt:17714/4658/22372/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-19-09.txt:17724/4648/22372/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-20-09.txt:17734/4638/22372/262144 mbuf clusters in use (current/cache/total/max) > 15-05-2013-21-09.txt:17744/4628/22372/262144 mbuf clusters in use (current/cache/total/max) > > Please see the link to http://knownhosts.org/reports-14-15.tgz in my original post, there is the full information including 9k jumbo frames. > > it's the driver version 2.4.8 which should be from 9.1-release directly > yes TWINAX is correct. > > I'll replace the driver with the latest one. > > best regards and thanks, > dennis > > > Am 15.05.2013 um 19:00 schrieb Jack Vogel: > > > So, you stop getting 10G transmission and so you are looking at mbuf leaks? I don't see > > anything in your data that makes it look like you've run out of available mbufs. You said > > you're running jumbos, what size? You do realize that if you do this the clusters are coming > > from different pools and you are not displaying those. What are all your nmb limits set to? > > > > So, this is 9.1 RELEASE, or stable? If you are using the driver from release I would first off > > suggest you test the code from HEAD. > > > > What is the 10G device, I see its using Twinax, and I have been told there is a problem at > > times with those that is corrected in recent shared code, this is why you should try the > > latest code. > > > > Cheers, > > > > Jack > > > > > > > > On Wed, May 15, 2013 at 2:00 AM, dennis berger wrote: > > Hi list, > > since we activated 10gbe on ixgbe cards + jumbo frames(9k) on 9.0 and now on 9.1 we recognize that after a random period of time, sometimes a week, sometimes only a day, the > > system doesn't send any packets out. The phenomenon is that you can't login via ssh, nfs and istgt is not operative. Yet you can login on the console and execute commands. > > A clean shutdown isn't possible though. It hangs after vnode cleaning, normally you would see detaching of usb devices here, or other devices maybe? > > I've read the other post on this ML about mbuf leak in the arp handling code in if_ether.c line 558. We don't see any of those notices in dmesg so I don't think that glebius fix would apply for us. > > I'm collecting system and memory information every hour. > > > > > > Script looks like this. > > less /etc/periodic/hourly/100.report-memory.sh > > #!/bin/sh > > > > reporttimestamp=`date +%d-%m-%Y-%H-%M` > > reportname=${reporttimestamp}.txt > > > > cd /root/memory-mon > > > > top -b > $reportname > > echo "" >> $reportname > > vmstat -m >> $reportname > > echo "" >> $reportname > > vmstat -z >> $reportname > > echo "" >> $reportname > > netstat -Q >> $reportname > > echo "" >> $reportname > > netstat -n -x >> $reportname > > echo "" >> $reportname > > netstat -m >> $reportname > > /usr/bin/perl /usr/local/bin/zfs-stats -a >> $reportname > > > > When you grep for mbuf or mbuf usage you will see this for example: > > > > root@freenas:/root/memory-mon # grep mbuf_packet: * > > 14-05-2013-14-09.txt:mbuf_packet: 256, 0, 9246, 2786,201700429, 0, 0 > > 14-05-2013-15-09.txt:mbuf_packet: 256, 0, 9256, 2776,201773122, 0, 0 > > 14-05-2013-16-09.txt:mbuf_packet: 256, 0, 9266, 2766,201871553, 0, 0 > > 14-05-2013-17-09.txt:mbuf_packet: 256, 0, 9276, 2756,201915405, 0, 0 > > 14-05-2013-18-09.txt:mbuf_packet: 256, 0, 9286, 2746,201927956, 0, 0 > > 14-05-2013-19-09.txt:mbuf_packet: 256, 0, 9296, 2352,201935681, 0, 0 > > 14-05-2013-20-09.txt:mbuf_packet: 256, 0, 9306, 2342,201943754, 0, 0 > > 14-05-2013-21-09.txt:mbuf_packet: 256, 0, 9316, 2332,201950961, 0, 0 > > 14-05-2013-22-09.txt:mbuf_packet: 256, 0, 9326, 2450,201958150, 0, 0 > > 14-05-2013-23-09.txt:mbuf_packet: 256, 0, 9336, 2440,201967178, 0, 0 > > 15-05-2013-00-09.txt:mbuf_packet: 256, 0, 9346, 2430,201974561, 0, 0 > > 15-05-2013-01-09.txt:mbuf_packet: 256, 0, 9356, 2420,201982105, 0, 0 > > 15-05-2013-02-09.txt:mbuf_packet: 256, 0, 9366, 2410,201989463, 0, 0 > > 15-05-2013-03-09.txt:mbuf_packet: 256, 0, 9378, 1502,203019168, 0, 0 > > 15-05-2013-04-09.txt:mbuf_packet: 256, 0, 9384, 1624,205953601, 0, 0 > > 15-05-2013-05-09.txt:mbuf_packet: 256, 0, 9394, 1870,205959258, 0, 0 > > 15-05-2013-06-09.txt:mbuf_packet: 256, 0, 9404, 2500,205969396, 0, 0 > > 15-05-2013-07-09.txt:mbuf_packet: 256, 0, 9414, 3386,207945161, 0, 0 > > 15-05-2013-08-09.txt:mbuf_packet: 256, 0, 9424, 3376,208094689, 0, 0 > > 15-05-2013-09-09.txt:mbuf_packet: 256, 0, 9434, 2982,208172465, 0, 0 > > 15-05-2013-10-09.txt:mbuf_packet: 256, 0, 9444, 3100,208270369, 0, 0 > > > > and > > > > root@freenas:/root/memory-mon # grep "mbufs in use" * > > 14-05-2013-14-09.txt:58444/5816/64260 mbufs in use (current/cache/total) > > 14-05-2013-15-09.txt:58455/5805/64260 mbufs in use (current/cache/total) > > 14-05-2013-16-09.txt:58464/5796/64260 mbufs in use (current/cache/total) > > 14-05-2013-17-09.txt:58475/5785/64260 mbufs in use (current/cache/total) > > 14-05-2013-18-09.txt:58484/5776/64260 mbufs in use (current/cache/total) > > 14-05-2013-19-09.txt:58493/5767/64260 mbufs in use (current/cache/total) > > 14-05-2013-20-09.txt:58503/5757/64260 mbufs in use (current/cache/total) > > 14-05-2013-21-09.txt:58513/5747/64260 mbufs in use (current/cache/total) > > 14-05-2013-22-09.txt:58523/5737/64260 mbufs in use (current/cache/total) > > 14-05-2013-23-09.txt:58533/5727/64260 mbufs in use (current/cache/total) > > 15-05-2013-00-09.txt:58543/5717/64260 mbufs in use (current/cache/total) > > 15-05-2013-01-09.txt:58554/5706/64260 mbufs in use (current/cache/total) > > 15-05-2013-02-09.txt:58563/5697/64260 mbufs in use (current/cache/total) > > 15-05-2013-03-09.txt:58639/5621/64260 mbufs in use (current/cache/total) > > 15-05-2013-04-09.txt:58581/5679/64260 mbufs in use (current/cache/total) > > 15-05-2013-05-09.txt:58591/5669/64260 mbufs in use (current/cache/total) > > 15-05-2013-06-09.txt:58602/5658/64260 mbufs in use (current/cache/total) > > 15-05-2013-07-09.txt:58613/5647/64260 mbufs in use (current/cache/total) > > 15-05-2013-08-09.txt:58623/6027/64650 mbufs in use (current/cache/total) > > 15-05-2013-09-09.txt:58634/6016/64650 mbufs in use (current/cache/total) > > 15-05-2013-10-09.txt:58645/6005/64650 mbufs in use (current/cache/total) > > > > > > This increasing number of used mbuf_packets and mbufs in use makes me nervous. > > See the complete reports http://knownhosts.org:/reports-14-15.tgz > > > > Thanks for help, > > > > -dennis > > > > > > > > --------------BEGIN System information--------------- > > It's a stock FreeBSD 9.1, yet the hostname is called freenas. Don't be confused. > > > > > > igb0: flags=8c02 metric 0 mtu 1500 > > options=401bb > > ether 00:25:90:34:c1:12 > > nd6 options=21 > > media: Ethernet autoselect (1000baseT ) > > status: active > > igb1: flags=8843 metric 0 mtu 1500 > > options=401bb > > ether 00:25:90:34:c1:13 > > inet 172.16.1.6 netmask 0xfffff000 broadcast 172.16.15.255 > > inet6 fe80::225:90ff:fe34:c113%igb1 prefixlen 64 scopeid 0x2 > > nd6 options=21 > > media: Ethernet autoselect (1000baseT ) > > status: active > > ix0: flags=8843 metric 0 mtu 9000 > > options=401bb > > ether 00:1b:21:cc:12:8b > > inet 10.254.254.242 netmask 0xfffffffc broadcast 10.254.254.243 > > inet6 fe80::21b:21ff:fecc:128b%ix0 prefixlen 64 scopeid 0xb > > nd6 options=21 > > media: Ethernet autoselect (10Gbase-Twinax ) > > status: active > > ix1: flags=8843 metric 0 mtu 9000 > > options=401bb > > ether 00:1b:21:cc:12:8a > > inet 10.254.254.254 netmask 0xfffffffc broadcast 10.254.254.255 > > inet6 fe80::21b:21ff:fecc:128a%ix1 prefixlen 64 scopeid 0xc > > nd6 options=21 > > media: Ethernet autoselect (10Gbase-Twinax ) > > status: active > > ix2: flags=8843 metric 0 mtu 9000 > > options=401bb > > ether 00:1b:21:cc:12:b3 > > inet 10.254.254.246 netmask 0xfffffffc broadcast 10.254.254.247 > > inet6 fe80::21b:21ff:fecc:12b3%ix2 prefixlen 64 scopeid 0xd > > nd6 options=21 > > media: Ethernet autoselect > > status: no carrier > > ix3: flags=8802 metric 0 mtu 1500 > > options=401bb > > ether 00:1b:21:cc:12:b2 > > nd6 options=21 > > media: Ethernet autoselect > > status: no carrier > > lo0: flags=8049 metric 0 mtu 16384 > > options=600003 > > inet6 ::1 prefixlen 128 > > inet6 fe80::1%lo0 prefixlen 64 scopeid 0xf > > inet 127.0.0.1 netmask 0xff000000 > > nd6 options=21 > > > > #dmesg > > ….. > > mfi0: 21294 (421879975s/0x0008/info) - Battery started charging > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > ix1: link state changed to DOWN > > ix1: link state changed to UP > > > > > > I should add that the servers that are directly connected to this freebsd server reboot every night. This is why you see ix0 UP/DOWN > > messages in dmesg. > > > > > > > > > > > > > > ------------- END System information------------ 1. You appear convinced that the issue is related to mbuf exhaustion, but you haven't provided evidence that you're hitting the mbuf maximum (in your case 262144). What you *have* shown is your mbuf count gradually increasing (sans 15-05-2013-13-09.txt vs. 15-05-2013-14-09.txt which shows mbufs almost doubling (!)), which could indicate a leak but then again might not. If you reach mbuf maximum, then yes, network I/O can cease or stall (possibly indefinitely). However, broken/busted network I/O can also happen due to other issues unrelated to mbufs, such as network stack issues, firewall stack issues, or network driver bugs. Are you using pf, ipfw, or ipfilter on this system? 2. I think we'd all appreciate if you disclosed **exactly** what version of FreeBSD you're using (Subject says "9.0 or 9.1" which is insufficient). Please provide "uname -a" output (you can XXX out the hostname if you want) -- and if you're still using csup/cvsup and built your own kernel/world, we'll need to know exactly what date your src files were from when you rebuilt. I'm wary of CC'ing folks who can help troubleshoot mbuf exhaustion issues until answers to the above can be provided, as I don't want to waste their time. 3. Regarding this: > > A clean shutdown isn't possible though. It hangs after vnode > > cleaning, normally you would see detaching of usb devices here, or > > other devices maybe? Please don't conflate this with your above issue. This is almost certainly unrelated. Please start a new thread about that if desired. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |