Date: Thu, 06 Aug 2009 00:47:34 +0400 From: "Marat N.Afanasyev" <amarat@ksu.ru> To: alexpalias-bsdstable@yahoo.com, "freebsd-stable@freebsd.org >> FreeBSD-STABLE Mailing List" <freebsd-stable@freebsd.org> Subject: Re: em driver input errors Message-ID: <4A79EFE6.5080001@ksu.ru> In-Reply-To: <27085.65527.qm@web56408.mail.re3.yahoo.com> References: <27085.65527.qm@web56408.mail.re3.yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format. --------------ms080704000503060402080302 Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit alexpalias-bsdstable@yahoo.com wrote: > Good day > > I'm looking for suggestions for tuning my setup in order to get rid of the input errors I'm seeing on em0, em1 and em2 when using vlans. > > [This message (excluding the description of the second machine at the end) has also been sent to the freebsd-net mailing list, a few days ago] > > I'm running a FreeBSD 7.2 router and I am seeing a lot of input errors on one of the em interfaces (em0), coupled with (at approximately the same times) much fewer errors on em1 and em2. Monitoring is done with SNMP from another machine, and the CPU load as reported via SNMP is mostly below 30%, with a couple of spikes up to 35%. > > Software description: > > - FreeBSD 7.2-RELEASE-p2, amd64 > - bsnmpd with modules: hostres and (from ports) snmp_ucd > - quagga 0.99.12 (running only zebra and bgpd) > - netgraph (ng_ether and ng_netflow) > > Hardware description: > > - Dell machine, dual Xeon 3.20 GHz, 4 GB RAM > - 2 x built-in gigabit interfaces (em0, em1) > - 1 x dual-port gigabit interface, PCI-X (em2, em3) [see pciconf near the end] > > > The machine receives the global routing table ("netstat -nr | wc -l" gives 289115 currently). > > All of the em interfaces are just configured "up", with various vlan interfaces on them. Note that I use "kpps" to mean "thousands of packets per second", sorry if that's the wrong shorthand. > > - em0 sees a traffic of 10...22 kpps in, and 15...35 kpps out. In bits, it's 30...120Mbits/s in, and 100...210Mbits/s out. Vlans configured are vlan100 and vlan200, and most of the traffic is on vlan100 (vlan200 sees 4kpps in / 0.5kpps out maximum, with the average at about one third of this). em0 is the external interface, and its traffic corresponds to the sum of traffic through em1 and em2 > > - em1 has 5 vlans, and sees about 22kpps in / 11kpps out (maximum) > > - em2 has a single VLAN, and sees about 4...13kpps both in and out (almost equal in/out during most of the day) > > - em3 is a backup interface, with 2 VLANS, and is the only one which has seen no errors. > > Only the vlans on em0 are analyzed by ng_netflow, and the errors I'm seeing have started appearing days before netgraph was even loaded in the kernel. > > Tuning done: > > /boot/loader.conf: > hw.em.rxd=4096 > hw.em.txd=4096 > > Witout the above we were seeing way more errors, now they are reduced, but still come in bursts of over 1000 errors on em0. > > /etc/sysctl.conf: > net.inet.ip.fastforwarding=1 > dev.em.0.rx_processing_limit=300 > dev.em.1.rx_processing_limit=300 > dev.em.2.rx_processing_limit=300 > dev.em.3.rx_processing_limit=300 > > Still seeing errros, after some searching the mailing lists we also added: > > # the four lines below are repeated for em1, em2, em3 > dev.em.0.rx_int_delay=0 > dev.em.0.rx_abs_int_delay=0 > dev.em.0.tx_int_delay=0 > dev.em.0.tx_abs_int_delay=0 > > Still getting errors, so I also added: > > net.inet.ip.intr_queue_maxlen=4096 > net.route.netisr_maxqlen=1024 > > and > > kern.ipc.nmbclusters=655360 > > > Also tried with rx_processing_limit set to -1 on all em interfaces, still getting errors. > > Looking at the shape of the error and packet graphs, there seems to be a correlation between the number of packets per second on em0 and the height of the error "spikes" on the error graph. These spikes are spread throughout the day, with spaces (zones with no errors) of various lengths (10 minutes ... 2 hours spaces within the last 24 hours), but sometimes there are errors even in the lowest kpps times of the day. > > em0 and em1 error times are correlated, with all errors on the graph for em0 having a smaller corresponding error spike on em1 at the same time, and sometimes an error spike on em2. > > The old router was seeing about the same traffic, and had em0, em1, re0 and re1 network cards, and was only seeing errors on the em cards. It was running 7.2-PRERELEASE/i386 > > > Any suggestions would be greatly appreciated. Please note that this is a live router, and I can't reboot it (unless absolutely necessary). Tuning that can be applied without rebooting will be tried first. > > Here are some more details: > > Trimmed output of netstat -ni (sorry if there are line breaks): > Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll > em0 1500 <Link#1> 00:14:22:xx:xx:xx 19744458839 15494721 24284439443 0 0 > em1 1500 <Link#2> 00:14:22:xx:xx:xx 12832245469 123181 10105031790 0 0 > em2 1500 <Link#3> 00:04:23:xx:xx:xx 12082552403 10964 10339416865 0 0 > em3 1500 <Link#4> 00:04:23:xx:xx:xx 79912337 0 48178737 0 0 > > Relevant part of pciconf -vl: > > em0@pci0:6:7:0: class=0x020000 card=0x016d1028 chip=0x10768086 rev=0x05 hdr=0x00 > vendor = 'Intel Corporation' > device = '82541EI Gigabit Ethernet Controller' > class = network > subclass = ethernet > em1@pci0:7:8:0: class=0x020000 card=0x016d1028 chip=0x10768086 rev=0x05 hdr=0x00 > vendor = 'Intel Corporation' > device = '82541EI Gigabit Ethernet Controller' > class = network > subclass = ethernet > em2@pci0:9:4:0: class=0x020000 card=0x10128086 chip=0x10108086 rev=0x01 hdr=0x00 > vendor = 'Intel Corporation' > device = '82546EB Dual Port Gigabit Ethernet Controller (Copper)' > class = network > subclass = ethernet > em3@pci0:9:4:1: class=0x020000 card=0x10128086 chip=0x10108086 rev=0x01 hdr=0x00 > vendor = 'Intel Corporation' > device = '82546EB Dual Port Gigabit Ethernet Controller (Copper)' > class = network > subclass = ethernet > > Kernel messages after sysctl dev.em.0.stats=1: > (note that I've removed the lines which only showed zeros in the second and third outputs) > > em0: Excessive collisions = 0 > em0: Sequence errors = 0 > em0: Defer count = 0 > em0: Missed Packets = 15435312 > em0: Receive No Buffers = 16446113 > em0: Receive Length Errors = 0 > em0: Receive errors = 1 > em0: Crc errors = 2 > em0: Alignment errors = 0 > em0: Collision/Carrier extension errors = 0 > em0: RX overruns = 96826 > em0: watchdog timeouts = 0 > em0: RX MSIX IRQ = 0 TX MSIX IRQ = 0 LINK MSIX IRQ = 0 > em0: XON Rcvd = 0 > em0: XON Xmtd = 0 > em0: XOFF Rcvd = 0 > em0: XOFF Xmtd = 0 > em0: Good Packets Rcvd = 19002068797 > em0: Good Packets Xmtd = 23168462599 > em0: TSO Contexts Xmtd = 0 > em0: TSO Contexts Failed = 0 > > [later] > em0: Excessive collisions = 0 > em0: Missed Packets = 15459111 > em0: Receive No Buffers = 16447082 > em0: Receive errors = 1 > em0: Crc errors = 2 > em0: RX overruns = 96835 > em0: Good Packets Rcvd = 19165047284 > em0: Good Packets Xmtd = 23386976960 > > [later] > em0: Excessive collisions = 0 > em0: Missed Packets = 15470583 > em0: Receive No Buffers = 16447686 > em0: Receive errors = 1 > em0: Crc errors = 2 > em0: RX overruns = 96840 > em0: Good Packets Rcvd = 19255466068 > em0: Good Packets Xmtd = 23519004546 > > > Machine #2 > > I'm also seeing input errors on another machine, a Core 2 Duo E8200 CPU with 2 em cards, this time connected via PCI Express. This machine handles less traffic, and errors mostly appear on em0, which does NOT use vlans. All of the traffic goes on to em1 which does use vlans, but has recorded about 10% of the errors on em0. Also, netgraph is not used at all on this machine. Only hw.em.rxd, hw.em.txd and the dev.em.*.rx_processing_limit tunables were set for this machine. > > Relevant info for machine #2: > Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll > em0 1500 <Link#1> 00:1b:21:xx:xx:xx 3095638890 12762 2604519812 0 0 > em1 1500 <Link#2> 00:1b:21:xx:xx:xx 2608953742 1636 2998185465 0 0 > > em0@pci0:4:0:0: class=0x020000 card=0x10838086 chip=0x10b98086 rev=0x06 hdr=0x00 > vendor = 'Intel Corporation' > device = '82572EI PRO/1000 PT Desktop Adapter (Copper)' > class = network > subclass = ethernet > em1@pci0:3:0:0: class=0x020000 card=0x10838086 chip=0x10b98086 rev=0x06 hdr=0x00 > vendor = 'Intel Corporation' > device = '82572EI PRO/1000 PT Desktop Adapter (Copper)' > class = network > subclass = ethernet > > em0: Excessive collisions = 0 > em0: Sequence errors = 0 > em0: Defer count = 402 > em0: Missed Packets = 12762 > em0: Receive No Buffers = 0 > em0: Receive Length Errors = 0 > em0: Receive errors = 0 > em0: Crc errors = 0 > em0: Alignment errors = 0 > em0: Collision/Carrier extension errors = 0 > em0: RX overruns = 237 > em0: watchdog timeouts = 0 > em0: RX MSIX IRQ = 0 TX MSIX IRQ = 0 LINK MSIX IRQ = 0 > em0: XON Rcvd = 249 > em0: XON Xmtd = 244 > em0: XOFF Rcvd = 402 > em0: XOFF Xmtd = 261 > em0: Good Packets Rcvd = 3092053709 > em0: Good Packets Xmtd = 2622962119 > em0: TSO Contexts Xmtd = 12760095 > em0: TSO Contexts Failed = 0 > > > Thank you for your time. > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > do you have enough mbufs? look for mbuf clusters value in netstat -m output. if 'current' is near to 'total' you'll have drops of input packets. and take a look to 'jumbo clusters' in netstat -m too. -- SY, Marat --------------ms080704000503060402080302 Content-Type: application/x-pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIII8zCC AtQwggI9oAMCAQICEHpsMo6nkbUVegxjAzzxYCkwDQYJKoZIhvcNAQEFBQAwYjELMAkGA1UE BhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMT I1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA5MDQwMTE5MTUxOFoX DTEwMDQwMTE5MTUxOFowPzEfMB0GA1UEAxMWVGhhd3RlIEZyZWVtYWlsIE1lbWJlcjEcMBoG CSqGSIb3DQEJARYNYW1hcmF0QGtzdS5ydTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoC ggEBALqa7MfgjbsxmgpTOKxAN7w+cFViFA8NrULAARwVQJQJCnVRGf3i97EwNdLE8VTNniU4 ybS4gtLsy9gfNuuyPV2AJESpgrxaG+KZyHu1f6P4e31YBbnbtWVTUxZ3U/vWoL+BOAOI4S84 Cx834a4uYK75WhpZKd56qet5loyn9N1wBZNgCh9AwU31lA/Q0iCSKpEIxuhbElNXHNnqAlts CtNXsKgsT8mP7QI52h0cBOPSZqvz++e/wruJGgKeCECqo8ftwwya3CYkH1lhH2Q1zeXwez1E 1+solM48odH+odn29ctmOqr3PzZfmBJyGFf5FagTKNia/ys48yBtVU/RXHsCAwEAAaMqMCgw GAYDVR0RBBEwD4ENYW1hcmF0QGtzdS5ydTAMBgNVHRMBAf8EAjAAMA0GCSqGSIb3DQEBBQUA A4GBAG4Pj7KRSJ/M28KNynJOPCHg26L15S9OfQ+ckMaPPDRAejtdlUdCgkoyD9d1Du/amAk6 A3NcY2I/MsFW2vSonQfU+7cJZiyuhfw7wQlOovCx7USw1dmF6u3EljWZV+Kg4Vi3vN2dPyJx tv8li9McWQoMLmm5zzFGGRaSRnnrnZFsMIIC1DCCAj2gAwIBAgIQemwyjqeRtRV6DGMDPPFg KTANBgkqhkiG9w0BAQUFADBiMQswCQYDVQQGEwJaQTElMCMGA1UEChMcVGhhd3RlIENvbnN1 bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIElz c3VpbmcgQ0EwHhcNMDkwNDAxMTkxNTE4WhcNMTAwNDAxMTkxNTE4WjA/MR8wHQYDVQQDExZU aGF3dGUgRnJlZW1haWwgTWVtYmVyMRwwGgYJKoZIhvcNAQkBFg1hbWFyYXRAa3N1LnJ1MIIB IjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAuprsx+CNuzGaClM4rEA3vD5wVWIUDw2t QsABHBVAlAkKdVEZ/eL3sTA10sTxVM2eJTjJtLiC0uzL2B8267I9XYAkRKmCvFob4pnIe7V/ o/h7fVgFudu1ZVNTFndT+9agv4E4A4jhLzgLHzfhri5grvlaGlkp3nqp63mWjKf03XAFk2AK H0DBTfWUD9DSIJIqkQjG6FsSU1cc2eoCW2wK01ewqCxPyY/tAjnaHRwE49Jmq/P757/Cu4ka Ap4IQKqjx+3DDJrcJiQfWWEfZDXN5fB7PUTX6yiUzjyh0f6h2fb1y2Y6qvc/Nl+YEnIYV/kV qBMo2Jr/KzjzIG1VT9FcewIDAQABoyowKDAYBgNVHREEETAPgQ1hbWFyYXRAa3N1LnJ1MAwG A1UdEwEB/wQCMAAwDQYJKoZIhvcNAQEFBQADgYEAbg+PspFIn8zbwo3Kck48IeDbovXlL059 D5yQxo88NEB6O12VR0KCSjIP13UO79qYCToDc1xjYj8ywVba9KidB9T7twlmLK6F/DvBCU6i 8LHtRLDV2YXq7cSWNZlX4qDhWLe83Z0/InG2/yWL0xxZCgwuabnPMUYZFpJGeeudkWwwggM/ MIICqKADAgECAgENMA0GCSqGSIb3DQEBBQUAMIHRMQswCQYDVQQGEwJaQTEVMBMGA1UECBMM V2VzdGVybiBDYXBlMRIwEAYDVQQHEwlDYXBlIFRvd24xGjAYBgNVBAoTEVRoYXd0ZSBDb25z dWx0aW5nMSgwJgYDVQQLEx9DZXJ0aWZpY2F0aW9uIFNlcnZpY2VzIERpdmlzaW9uMSQwIgYD VQQDExtUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgQ0ExKzApBgkqhkiG9w0BCQEWHHBlcnNv bmFsLWZyZWVtYWlsQHRoYXd0ZS5jb20wHhcNMDMwNzE3MDAwMDAwWhcNMTMwNzE2MjM1OTU5 WjBiMQswCQYDVQQGEwJaQTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRk LjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0EwgZ8wDQYJ KoZIhvcNAQEBBQADgY0AMIGJAoGBAMSmPFVzVftOucqZWh5owHUEcJ3f6f+jHuy9zfVb8hp2 vX8MOmHyv1HOAdTlUAow1wJjWiyJFXCO3cnwK4Vaqj9xVsuvPAsH5/EfkTYkKhPPK9Xzgnc9 A74r/rsYPge/QIACZNenprufZdHFKlSFD0gEf6e20TxhBEAeZBlyYLf7AgMBAAGjgZQwgZEw EgYDVR0TAQH/BAgwBgEB/wIBADBDBgNVHR8EPDA6MDigNqA0hjJodHRwOi8vY3JsLnRoYXd0 ZS5jb20vVGhhd3RlUGVyc29uYWxGcmVlbWFpbENBLmNybDALBgNVHQ8EBAMCAQYwKQYDVR0R BCIwIKQeMBwxGjAYBgNVBAMTEVByaXZhdGVMYWJlbDItMTM4MA0GCSqGSIb3DQEBBQUAA4GB AEiM0VCD6gsuzA2jZqxnD3+vrL7CF6FDlpSdf0whuPg2H6otnzYvwPQcUCCTcDz9reFhYsPZ Ohl+hLGZGwDFGguCdJ4lUJRix9sncVcljd2pnDmOjCBPZV+V2vf3h9bGCE6u9uo05RAaWzVN d+NWIXiC3CEZNd4ksdMdRv9dX2VPMYIDZDCCA2ACAQEwdjBiMQswCQYDVQQGEwJaQTElMCMG A1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBl cnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0ECEHpsMo6nkbUVegxjAzzxYCkwCQYFKw4DAhoF AKCCAcMwGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMDkwODA1 MjA0NzM0WjAjBgkqhkiG9w0BCQQxFgQU9O/ch45O+vdugyY9kz5C9RAvfXgwUgYJKoZIhvcN AQkPMUUwQzAKBggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYF Kw4DAgcwDQYIKoZIhvcNAwICASgwgYUGCSsGAQQBgjcQBDF4MHYwYjELMAkGA1UEBhMCWkEx JTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0 ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBAhB6bDKOp5G1FXoMYwM88WApMIGHBgsq hkiG9w0BCRACCzF4oHYwYjELMAkGA1UEBhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0 aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1 aW5nIENBAhB6bDKOp5G1FXoMYwM88WApMA0GCSqGSIb3DQEBAQUABIIBAAqz3QXGHqolxaLv y8IYAbv6hG9l72IXDFF9eKJGtiwOkQmgO0ngKBQFbxGpT4PPWJ/oflLJCLa/vXfFeXhtRbTJ /Vw+uqW9tHxm9W3Pz3IJSmfFVLxUIwCO5kHsPBcmCOs0PhgTj1UA2OZLiyylbBFenaPHtFFR 0zVC0t1s1Llj53Bek2hOeOFx7vSwHKaKygQfwtdsUwmBFOfXzOF1SuhQU8d2W/1vEa8vQaPN y5TjwPjO7AiWtmb2SGbA6Ezo8oU/Du2DkHja0s9mOG+yFTq4olQv1lxC5/4MxLvID3eoEe+C hNR2f/PePc5+sGqqqJoPCalqH0lklqk4catlX/8AAAAAAAA= --------------ms080704000503060402080302--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4A79EFE6.5080001>