Date: Sat, 02 Oct 2010 10:32:02 -0400 From: Mike Tancsa <mike@sentex.net> To: Jack Vogel <jfvogel@gmail.com> Cc: pyunyh@gmail.com, freebsd-stable@freebsd.org Subject: Re: RELENG_7 em problems (and RELENG_8) Message-ID: <201010021432.o92EWAIs033670@lava.sentex.ca> In-Reply-To: <AANLkTik5mzeKPYrp3_80Ng9ByFj%2BLSHsd3xT2JCP98E%2B@mail.gmail.c om> References: <201006102031.o5AKVCH2016467@lava.sentex.ca> <201007021739.o62HdMOU092319@lava.sentex.ca> <20100702193654.GD10862@michelle.cdnetworks.com> <201008162107.o7GL76pA080191@lava.sentex.ca> <20100817185208.GA6482@michelle.cdnetworks.com> <201008171955.o7HJt67T087902@lava.sentex.ca> <20100817200020.GE6482@michelle.cdnetworks.com> <201009141759.o8EHxcZ0013539@lava.sentex.ca> <AANLkTimiTmA1HHeWmGm1MAFf-H=OqC17vwZvFWpgcHCZ@mail.gmail.com> <201009262157.o8QLvR0L012171@lava.sentex.ca> <AANLkTinxLScVxQ2ib%2BcLXEBGTATAU36%2BOKr7%2B5SQXE89@mail.gmail.com> <201009262343.o8QNhgDG012676@lava.sentex.ca> <AANLkTik5mzeKPYrp3_80Ng9ByFj%2BLSHsd3xT2JCP98E%2B@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Jack, Two quick notes about the new driver. On the server that was having nic lockups, so far so good. Saturday AM, the box would take a lot of level0 dumps as well as do about 70Mb/s of outbound rsync traffic. By now, the nic would have wedged at least once So far so good! On different, new box, I decided to try HEAD, with the new driver, and ran into problems with the onboard nic em0@pci0:0:25:0: class=0x020000 card=0x00368086 chip=0x10f08086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 13[e0] = PCI Advanced Features: FLR TP em0: <Intel(R) PRO/1000 Network Connection 7.0.5> port 0xf020-0xf03f mem 0xfe500000-0xfe51ffff,0xfe527000-0xfe527fff irq 20 at device 25.0 on pci0 em0: Using MSI interrupt em0: [FILTER] em0: Ethernet address: 70:71:bc:09:5e:aa This is an intel branded desktop board acpi0: <INTEL DH55TC> on motherboard I find I have to disable rx and tx csum on the interface, otherwise there are a lot of re-transmits due to missed packets. tcpdump implies the packets are going out, but it seems never to get out. The mother board is at the office on an unmanaged switch right now, so I dont have any stats from the switch. But tcpdump shows a lot of outbound re-transmits. Turning off rxcsum and txcsum fixes the problem. dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.0.8 dev.em.0.%driver: em dev.em.0.%location: slot=25 function=0 handle=\_SB_.PCI0.GBE_ dev.em.0.%pnpinfo: vendor=0x8086 device=0x10f0 subvendor=0x8086 subdevice=0x0036 class=0x020000 dev.em.0.%parent: pci0 dev.em.0.nvm: -1 dev.em.0.rx_int_delay: 0 dev.em.0.tx_int_delay: 66 dev.em.0.rx_abs_int_delay: 66 dev.em.0.tx_abs_int_delay: 66 dev.em.0.rx_processing_limit: 100 dev.em.0.link_irq: 0 dev.em.0.mbuf_alloc_fail: 0 dev.em.0.cluster_alloc_fail: 0 dev.em.0.dropped: 0 dev.em.0.tx_dma_fail: 0 dev.em.0.rx_overruns: 0 dev.em.0.watchdog_timeouts: 0 dev.em.0.device_control: 1074790976 dev.em.0.rx_control: 67141634 dev.em.0.fc_high_water: 8192 dev.em.0.fc_low_water: 6692 dev.em.0.queue0.txd_head: 15 dev.em.0.queue0.txd_tail: 17 dev.em.0.queue0.tx_irq: 0 dev.em.0.queue0.no_desc_avail: 0 dev.em.0.queue0.rxd_head: 843 dev.em.0.queue0.rxd_tail: 842 dev.em.0.queue0.rx_irq: 0 dev.em.0.mac_stats.excess_coll: 0 dev.em.0.mac_stats.single_coll: 0 dev.em.0.mac_stats.multiple_coll: 0 dev.em.0.mac_stats.late_coll: 0 dev.em.0.mac_stats.collision_count: 0 dev.em.0.mac_stats.symbol_errors: 0 dev.em.0.mac_stats.sequence_errors: 0 dev.em.0.mac_stats.defer_count: 0 dev.em.0.mac_stats.missed_packets: 0 dev.em.0.mac_stats.recv_no_buff: 0 dev.em.0.mac_stats.recv_undersize: 0 dev.em.0.mac_stats.recv_fragmented: 0 dev.em.0.mac_stats.recv_oversize: 0 dev.em.0.mac_stats.recv_jabber: 0 dev.em.0.mac_stats.recv_errs: 0 dev.em.0.mac_stats.crc_errs: 0 dev.em.0.mac_stats.alignment_errs: 0 dev.em.0.mac_stats.coll_ext_errs: 0 dev.em.0.mac_stats.xon_recvd: 80 dev.em.0.mac_stats.xon_txd: 0 dev.em.0.mac_stats.xoff_recvd: 82 dev.em.0.mac_stats.xoff_txd: 0 dev.em.0.mac_stats.total_pkts_recvd: 35697 dev.em.0.mac_stats.good_pkts_recvd: 35535 dev.em.0.mac_stats.bcast_pkts_recvd: 231 dev.em.0.mac_stats.mcast_pkts_recvd: 85 dev.em.0.mac_stats.rx_frames_64: 0 dev.em.0.mac_stats.rx_frames_65_127: 0 dev.em.0.mac_stats.rx_frames_128_255: 0 dev.em.0.mac_stats.rx_frames_256_511: 0 dev.em.0.mac_stats.rx_frames_512_1023: 0 dev.em.0.mac_stats.rx_frames_1024_1522: 0 dev.em.0.mac_stats.good_octets_recvd: 14878015 dev.em.0.mac_stats.good_octets_txd: 14051783 dev.em.0.mac_stats.total_pkts_txd: 45313 dev.em.0.mac_stats.good_pkts_txd: 45313 dev.em.0.mac_stats.bcast_pkts_txd: 3 dev.em.0.mac_stats.mcast_pkts_txd: 5 dev.em.0.mac_stats.tx_frames_64: 0 dev.em.0.mac_stats.tx_frames_65_127: 0 dev.em.0.mac_stats.tx_frames_128_255: 0 dev.em.0.mac_stats.tx_frames_256_511: 0 dev.em.0.mac_stats.tx_frames_512_1023: 0 dev.em.0.mac_stats.tx_frames_1024_1522: 0 dev.em.0.mac_stats.tso_txd: 2788 dev.em.0.mac_stats.tso_ctx_fail: 0 dev.em.0.interrupts.asserts: 48733 dev.em.0.interrupts.rx_pkt_timer: 0 dev.em.0.interrupts.rx_abs_timer: 0 dev.em.0.interrupts.tx_pkt_timer: 0 dev.em.0.interrupts.tx_abs_timer: 0 dev.em.0.interrupts.tx_queue_empty: 0 dev.em.0.interrupts.tx_queue_min_thresh: 0 dev.em.0.interrupts.rx_desc_min_thresh: 0 dev.em.0.interrupts.rx_overrun: 0 dev.em.0.wake: 0 At 08:00 PM 9/26/2010, Jack Vogel wrote: >The system I've had stress tests running on has 82574 LOMs, so I hope it >will solve the problem, will see tomorrow morning at how things have held >up... > >Jack > > >On Sun, Sep 26, 2010 at 4:43 PM, Mike Tancsa ><<mailto:mike@sentex.net>mike@sentex.net> wrote: >At 06:19 PM 9/26/2010, Jack Vogel wrote: >Your em1 is using MSI not MSIX and thus can't have multiple queues. I'm >not sure whats broken from what you show here. I will try to get the new >driver out shortly for you to try. > > >With this particular NIC, it will wedge under high load. I tried 2 >different motherboards and chipsets the same behaviour. > > ---Mike > > >Jack > > > >On Sun, Sep 26, 2010 at 2:57 PM, Mike Tancsa ><<mailto:mike@sentex.net><mailto:mike@sentex.net>mike@sentex.net> wrote: >At 06:36 PM 9/24/2010, Jack Vogel wrote: >There is a new revision of the em driver coming next week, its going thru some >stress pounding over the weekend, if no issues show up I'll put it into HEAD. > >Yongari's changes in TX context handling which effects checksum and tso >are added. I've also decided that multiple queues in 82574 just are a source >of problems without a lot of benefit, so it still uses MSIX but with >only 3 vectors, >meaning it seperates TX and RX but has a single queue. > > >Thanks, looking forward to trying it out! With respect to the >multiple queues, I thought the driver already used just the one on >RELENG_8 ? If not, is there a way to force the existing driver to >use just the one queue ? > >On the box that has the NIC locking up, it shows > >em1@pci0:9:0:0: class=0x020000 card=0x34ec8086 chip=0x10d38086 >rev=0x00 hdr=0x00 > > vendor = 'Intel Corporation' > device = 'Intel 82574L Gigabit Ethernet Controller (82574L)' > class = network > subclass = ethernet > cap 01[c8] = powerspec 2 supports D0 D3 current D0 > cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message > cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1) > >and > >vmstat -i shows > >irq256: em0 5129063 353 >irq257: em1 531251 36 > >in a wedged state, stats look like > >dev.em.1.%desc: Intel(R) PRO/1000 Network Connection 7.0.5 >dev.em.1.%driver: em >dev.em.1.%location: slot=0 function=0 handle=\_SB_.PCI0.PEX4.HART >dev.em.1.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x8086 >subdevice=0x34ec class=0x020000 >dev.em.1.%parent: pci9 >dev.em.1.nvm: -1 >dev.em.1.rx_int_delay: 0 >dev.em.1.tx_int_delay: 66 >dev.em.1.rx_abs_int_delay: 66 >dev.em.1.tx_abs_int_delay: 66 >dev.em.1.rx_processing_limit: 100 >dev.em.1.link_irq: 0 >dev.em.1.mbuf_alloc_fail: 0 >dev.em.1.cluster_alloc_fail: 0 >dev.em.1.dropped: 0 >dev.em.1.tx_dma_fail: 0 >dev.em.1.fc_high_water: 18432 >dev.em.1.fc_low_water: 16932 >dev.em.1.mac_stats.excess_coll: 0 >dev.em.1.mac_stats.symbol_errors: 0 >dev.em.1.mac_stats.sequence_errors: 0 >dev.em.1.mac_stats.defer_count: 0 >dev.em.1.mac_stats.missed_packets: 41522 >dev.em.1.mac_stats.recv_no_buff: 19 >dev.em.1.mac_stats.recv_errs: 0 >dev.em.1.mac_stats.crc_errs: 0 >dev.em.1.mac_stats.alignment_errs: 0 >dev.em.1.mac_stats.coll_ext_errs: 0 >dev.em.1.mac_stats.rx_overruns: 41398 >dev.em.1.mac_stats.watchdog_timeouts: 0 >dev.em.1.mac_stats.xon_recvd: 0 >dev.em.1.mac_stats.xon_txd: 0 >dev.em.1.mac_stats.xoff_recvd: 0 >dev.em.1.mac_stats.xoff_txd: 0 >dev.em.1.mac_stats.total_pkts_recvd: 95229129 >dev.em.1.mac_stats.good_pkts_recvd: 95187607 >dev.em.1.mac_stats.bcast_pkts_recvd: 79244 >dev.em.1.mac_stats.mcast_pkts_recvd: 0 >dev.em.1.mac_stats.rx_frames_64: 93680 >dev.em.1.mac_stats.rx_frames_65_127: 1516349 >dev.em.1.mac_stats.rx_frames_128_255: 4464941 >dev.em.1.mac_stats.rx_frames_256_511: 4024 >dev.em.1.mac_stats.rx_frames_512_1023: 2096067 >dev.em.1.mac_stats.rx_frames_1024_1522: 87012546 >dev.em.1.mac_stats.good_octets_recvd: 0 >dev.em.1.mac_stats.good_octest_txd: 0 >dev.em.1.mac_stats.total_pkts_txd: 66775098 >dev.em.1.mac_stats.good_pkts_txd: 66775098 >dev.em.1.mac_stats.bcast_pkts_txd: 509 >dev.em.1.mac_stats.mcast_pkts_txd: 7 >dev.em.1.mac_stats.tx_frames_64: 48038472 >dev.em.1.mac_stats.tx_frames_65_127: 13402833 >dev.em.1.mac_stats.tx_frames_128_255: 5324413 >dev.em.1.mac_stats.tx_frames_256_511: 957 >dev.em.1.mac_stats.tx_frames_512_1023: 319 >dev.em.1.mac_stats.tx_frames_1024_1522: 8104 >dev.em.1.mac_stats.tso_txd: 1069 >dev.em.1.mac_stats.tso_ctx_fail: 0 >dev.em.1.interrupts.asserts: 0 >dev.em.1.interrupts.rx_pkt_timer: 0 >dev.em.1.interrupts.rx_abs_timer: 0 >dev.em.1.interrupts.tx_pkt_timer: 0 >dev.em.1.interrupts.tx_abs_timer: 0 >dev.em.1.interrupts.tx_queue_empty: 0 >dev.em.1.interrupts.tx_queue_min_thresh: 0 >dev.em.1.interrupts.rx_desc_min_thresh: 0 >dev.em.1.interrupts.rx_overrun: 0 >dev.em.1.host.breaker_tx_pkt: 0 >dev.em.1.host.host_tx_pkt_discard: 0 >dev.em.1.host.rx_pkt: 0 >dev.em.1.host.breaker_rx_pkts: 0 >dev.em.1.host.breaker_rx_pkt_drop: 0 >dev.em.1.host.tx_good_pkt: 0 >dev.em.1.host.breaker_tx_pkt_drop: 0 >dev.em.1.host.rx_good_bytes: 0 >dev.em.1.host.tx_good_bytes: 0 >dev.em.1.host.length_errors: 0 >dev.em.1.host.serdes_violation_pkt: 0 >dev.em.1.host.header_redir_missed: 0 > >ifconfig down/up just panics or locks up the box when its in this >state. I also have IPMI enabled on this nic, but it shows the same >issue with it disabled. > > ---Mike > > > >-------------------------------------------------------------------- >Mike Tancsa, tel +1 519 651 3400 >Sentex Communications, ><mailto:mike@sentex.net><mailto:mike@sentex.net>mike@sentex.net >Providing Internet since >1994 ><<http://www.sentex.net>http://www.sentex.net>www.sentex.net >Cambridge, Ontario >Canada ><<http://www.sentex.net/mike>http://www.sentex.net/mike>www.sentex.net/mike > > >-------------------------------------------------------------------- >Mike Tancsa, tel +1 519 651 3400 >Sentex >Communications, ><mailto:mike@sentex.net>mike@sentex.net >Providing Internet since >1994 <http://www.sentex.net>www.sentex.net >Cambridge, Ontario >Canada <http://www.sentex.net/mike>www.sentex.net/mike > -------------------------------------------------------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet since 1994 www.sentex.net Cambridge, Ontario Canada www.sentex.net/mike
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201010021432.o92EWAIs033670>