From owner-freebsd-net@FreeBSD.ORG Tue Nov 8 17:21:31 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8103A106566B for ; Tue, 8 Nov 2011 17:21:31 +0000 (UTC) (envelope-from hoomanfazaeli@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id CFB178FC13 for ; Tue, 8 Nov 2011 17:21:30 +0000 (UTC) Received: by bkbzs8 with SMTP id zs8so854454bkb.13 for ; Tue, 08 Nov 2011 09:21:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type; bh=6VLOHIUAni7G/xc05I+A/KG6Y9ACNr0GhitFJ1sEo3Q=; b=uIGUqkkFn98RMIsdNMTa7HmszvGY1Tv8UAxFyRwLpXIDuqaRZLetDeb3blIHgaOa3k KJmWH6WTrG7ddo2XqzD0FQfINISJivEC3qOM/fhD109J7ugjacYUuQLaWGYdNHNQZBEs NG8gaYF/LWCJzziuuKc83kUiT5d0zubyAQy2Q= Received: by 10.205.114.65 with SMTP id ez1mr23107458bkc.99.1320772889725; Tue, 08 Nov 2011 09:21:29 -0800 (PST) Received: from [127.0.0.1] ([84.241.57.181]) by mx.google.com with ESMTPS id fu17sm2199712bkc.9.2011.11.08.09.21.24 (version=SSLv3 cipher=OTHER); Tue, 08 Nov 2011 09:21:28 -0800 (PST) Message-ID: <4EB96511.50701@gmail.com> Date: Tue, 08 Nov 2011 20:51:21 +0330 From: Hooman Fazaeli User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.23) Gecko/20110920 Thunderbird/3.1.15 MIME-Version: 1.0 To: Jason Wolfe References: <4E8F157A.40702@sentex.net> <4E8F51D4.1060509@sentex.net> <4EA7E203.3020306@sepehrs.com> <4EA80818.3030504@sentex.net> <4EA80F88.4000400@hotplug.ru> <4EA82715.2000404@gmail.com> <4EA8FA40.7010504@hotplug.ru> <4EA91836.2040508@gmail.com> <4EA959EE.2070806@hotplug.ru> <4EAD116A.8090006@gmail.com> <4EAE58A2.9040803@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: pyunyh@gmail.com, freebsd-net@freebsd.org, Jack Vogel , Emil Muratov Subject: Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Nov 2011 17:21:31 -0000 On 11/7/2011 9:24 PM, Jason Wolfe wrote: > On Mon, Oct 31, 2011 at 1:13 AM, Hooman Fazaeli > wrote: > > > Attached is a patch for if_em.c. It flushes interface queue when it is full > and link is not active. Please note that when this happens, drops are increasing > on interface and this will trigger your scripts as before. You need to change > a little the scripts as follows: > > check interface TX status > if (interface TX seems hung) { > sleep 5 > check interface TX status > if (interface TX seems hung) { > reset the interface. > } > } > > For MULTIQUEUE, it just disables the check for link status (which is not good). > so pls. test in non-MULTIQUEUE mode. > > The patch also contains some minor fixups to compile on 7 plus > a fix from r1.69 which addressed RX hang problem (the fix was > later removed in r1.70). I included it for Emil to give it a > try. > > Pls. let me know if you have any problems with patch. > > > Hooman, > > Unfortunately one of the server just had a wedge event a couple hours ago with this patch. To confirm your changes should cause a recovery within the time I'm allowing, here is the current format: > > check interface TX status > if (interface TX seems hung) { > sleep 3 > check packets out > sleep 2 > check packets out > if (packets not incrementing) { > reset the interface > } > } > > I bounced em0 because dropped packets incremented 1749543 to 1749708 and the interface is not incrementing packets out. > > 4:10AM up 6 days, 15:23, 0 users, load averages: 0.02, 0.12, 0.14 > > em0: flags=8843 metric 0 mtu 1500 > options=219b > ether X > inet6 X%em0 prefixlen 64 scopeid 0x1 > nd6 options=1 > media: Ethernet autoselect (1000baseT ) > status: active > > em1: flags=8843 metric 0 mtu 1500 > options=219b > ether X > inet6 X%em1 prefixlen 64 scopeid 0x2 > nd6 options=3 > media: Ethernet autoselect (1000baseT ) > status: active > > ipfw0: flags=8801 metric 0 mtu 65536 > > lo0: flags=8049 metric 0 mtu 16384 > options=3 > inet 127.0.0.1 netmask 0xff000000 > inet6 ::1 prefixlen 128 > inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 > inet X.X.X.X netmask 0xffffffff > inet X.X.X.X netmask 0xffffffff > inet X.X.X.X netmask 0xffffffff > nd6 options=3 > > lagg0: flags=8843 metric 0 mtu 1500 > options=219b > ether X > inet X.X.X.X netmask 0xffffff00 broadcast X.X.X.X > inet6 X%lagg0 prefixlen 64 scopeid 0x5 > inet6 X prefixlen 64 autoconf > nd6 options=3 > media: Ethernet autoselect > status: active > laggproto loadbalance > laggport: em0 flags=4 > laggport: em1 flags=4 > > interrupt total rate > irq3: uart1 3810 0 > cpu0: timer 1147568087 2000 > irq256: em0:rx 0 59779710 104 > irq257: em0:tx 0 2771888652 4831 > irq258: em0:link 1 0 > irq259: em1:rx 0 3736828886 6512 > irq260: em1:tx 0 2790566376 4863 > irq261: em1:link 27286 0 > irq262: mps0 395687386 689 > cpu1: timer 1147559894 2000 > cpu2: timer 1147559901 2000 > cpu3: timer 1147559902 2000 > Total 14345029891 25001 > > 13466/4144/17610 mbufs in use (current/cache/total) > 2567/2635/5202/5853720 mbuf clusters in use (current/cache/total/max) > 2567/633 mbuf+clusters out of packet secondary zone in use (current/cache) > 6798/554/7352/2926859 4k (page size) jumbo clusters in use (current/cache/total/max) > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) > 35692K/8522K/44214K bytes allocated to network (current/cache/total) > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) > 0/0/0 sfbufs in use (current/peak/max) > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > 0 requests for I/O initiated by sendfile > 0 calls to protocol drain routines > > Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop > em0 1500 00:25:90:2b:e5:75 60747643 0 0 11246408092 0 0 1750763 > em0 1500 fe80:1::225:9 fe80:1::225:90ff: 0 - - 4 - - - > em1 1500 00:25:90:2b:e5:75 11237195776 123950 0 11344722383 0 0 545682 > em1 1500 fe80:2::225:9 fe80:2::225:90ff: 0 - - 1 - - - > lagg0 1500 00:25:90:2b:e5:75 11297850142 0 0 22588666102 2296445 0 0 > lagg0 1500 69.164.38.0/2 69.164.38.83 10189108030 - - 22592881776 - - - > lagg0 1500 fe80:5::225:9 fe80:5::225:90ff: 24 - - 28 - - - > lagg0 1500 2607:f4e8:310 2607:f4e8:310:12: 19578 - - 19591 - - - > > kern.msgbuf: > > Nov 7 04:10:06 cds1033 kernel: Interface is RUNNING and INACTIVE > Nov 7 04:10:06 cds1033 kernel: em0: hw tdh = 558, hw tdt = 558 > Nov 7 04:10:06 cds1033 kernel: em0: hw rdh = 889, hw rdt = 888 > Nov 7 04:10:06 cds1033 kernel: em0: Tx Queue Status = 0 > Nov 7 04:10:06 cds1033 kernel: em0: TX descriptors avail = 1024 > Nov 7 04:10:06 cds1033 kernel: em0: Tx Descriptors avail failure = 0 > Nov 7 04:10:06 cds1033 kernel: em0: RX discarded packets = 0 > Nov 7 04:10:06 cds1033 kernel: em0: RX Next to Check = 889 > Nov 7 04:10:06 cds1033 kernel: em0: RX Next to Refresh = 888 > Nov 7 04:10:06 cds1033 kernel: em0: Link state: active > Nov 7 04:10:06 cds1033 kernel: Interface is RUNNING and INACTIVE > Nov 7 04:10:06 cds1033 kernel: em1: hw tdh = 837, hw tdt = 837 > Nov 7 04:10:06 cds1033 kernel: em1: hw rdh = 198, hw rdt = 189 > Nov 7 04:10:06 cds1033 kernel: em1: Tx Queue Status = 0 > Nov 7 04:10:06 cds1033 kernel: em1: TX descriptors avail = 1021 > Nov 7 04:10:06 cds1033 kernel: em1: Tx Descriptors avail failure = 0 > Nov 7 04:10:06 cds1033 kernel: em1: RX discarded packets = 0 > Nov 7 04:10:06 cds1033 kernel: em1: RX Next to Check = 303 > Nov 7 04:10:06 cds1033 kernel: em1: RX Next to Refresh = 302 > Nov 7 04:10:06 cds1033 kernel: em1: Link state: active > > net.inet.ip.intr_queue_maxlen: 512 > net.inet.ip.intr_queue_drops: 0 > dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.3 > dev.em.0.%driver: em > dev.em.0.%location: slot=0 function=0 > dev.em.0.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x15d9 subdevice=0x10d3 class=0x020000 > dev.em.0.%parent: pci1 > dev.em.0.nvm: -1 > dev.em.0.debug: -1 > dev.em.0.rx_int_delay: 0 > dev.em.0.tx_int_delay: 66 > dev.em.0.rx_abs_int_delay: 66 > dev.em.0.tx_abs_int_delay: 66 > dev.em.0.rx_processing_limit: 100 > dev.em.0.flow_control: 0 > dev.em.0.eee_control: 0 > dev.em.0.link_irq: 1 > dev.em.0.mbuf_alloc_fail: 0 > dev.em.0.cluster_alloc_fail: 0 > dev.em.0.dropped: 0 > dev.em.0.tx_dma_fail: 0 > dev.em.0.rx_overruns: 0 > dev.em.0.watchdog_timeouts: 0 > dev.em.0.link_toggles: 3 > dev.em.0.device_control: 1074790984 > dev.em.0.rx_control: 67141634 > dev.em.0.fc_high_water: 18432 > dev.em.0.fc_low_water: 16932 > dev.em.0.queue0.txd_head: 558 > dev.em.0.queue0.txd_tail: 558 > dev.em.0.queue0.tx_irq: 2771888613 > dev.em.0.queue0.no_desc_avail: 0 > dev.em.0.queue0.rxd_head: 102 > dev.em.0.queue0.rxd_tail: 101 > dev.em.0.queue0.rx_irq: 59779941 > dev.em.0.mac_stats.excess_coll: 0 > dev.em.0.mac_stats.single_coll: 0 > dev.em.0.mac_stats.multiple_coll: 0 > dev.em.0.mac_stats.late_coll: 0 > dev.em.0.mac_stats.collision_count: 0 > dev.em.0.mac_stats.symbol_errors: 0 > dev.em.0.mac_stats.sequence_errors: 0 > dev.em.0.mac_stats.defer_count: 0 > dev.em.0.mac_stats.missed_packets: 0 > dev.em.0.mac_stats.recv_no_buff: 0 > dev.em.0.mac_stats.recv_undersize: 0 > dev.em.0.mac_stats.recv_fragmented: 0 > dev.em.0.mac_stats.recv_oversize: 0 > dev.em.0.mac_stats.recv_jabber: 0 > dev.em.0.mac_stats.recv_errs: 0 > dev.em.0.mac_stats.crc_errs: 0 > dev.em.0.mac_stats.alignment_errs: 0 > dev.em.0.mac_stats.coll_ext_errs: 0 > dev.em.0.mac_stats.xon_recvd: 0 > dev.em.0.mac_stats.xon_txd: 0 > dev.em.0.mac_stats.xoff_recvd: 0 > dev.em.0.mac_stats.xoff_txd: 0 > dev.em.0.mac_stats.total_pkts_recvd: 60768005 > dev.em.0.mac_stats.good_pkts_recvd: 60768005 > dev.em.0.mac_stats.bcast_pkts_recvd: 60744833 > dev.em.0.mac_stats.mcast_pkts_recvd: 2948 > dev.em.0.mac_stats.rx_frames_64: 60744822 > dev.em.0.mac_stats.rx_frames_65_127: 3491 > dev.em.0.mac_stats.rx_frames_128_255: 19293 > dev.em.0.mac_stats.rx_frames_256_511: 364 > dev.em.0.mac_stats.rx_frames_512_1023: 35 > dev.em.0.mac_stats.rx_frames_1024_1522: 0 > dev.em.0.mac_stats.good_octets_recvd: 3890774076 > dev.em.0.mac_stats.good_octets_txd: 15295538422911 > dev.em.0.mac_stats.total_pkts_txd: 11246435226 > dev.em.0.mac_stats.good_pkts_txd: 11246435226 > dev.em.0.mac_stats.bcast_pkts_txd: 4 > dev.em.0.mac_stats.mcast_pkts_txd: 3830 > dev.em.0.mac_stats.tx_frames_64: 60338532 > dev.em.0.mac_stats.tx_frames_65_127: 897135811 > dev.em.0.mac_stats.tx_frames_128_255: 13933787 > dev.em.0.mac_stats.tx_frames_256_511: 22107540 > dev.em.0.mac_stats.tx_frames_512_1023: 116374045 > dev.em.0.mac_stats.tx_frames_1024_1522: 10136545511 > dev.em.0.mac_stats.tso_txd: 0 > dev.em.0.mac_stats.tso_ctx_fail: 0 > dev.em.0.interrupts.asserts: 3 > dev.em.0.interrupts.rx_pkt_timer: 0 > dev.em.0.interrupts.rx_abs_timer: 0 > dev.em.0.interrupts.tx_pkt_timer: 0 > dev.em.0.interrupts.tx_abs_timer: 0 > dev.em.0.interrupts.tx_queue_empty: 0 > dev.em.0.interrupts.tx_queue_min_thresh: 0 > dev.em.0.interrupts.rx_desc_min_thresh: 0 > dev.em.0.interrupts.rx_overrun: 0 > dev.em.1.%desc: Intel(R) PRO/1000 Network Connection 7.2.3 > dev.em.1.%driver: em > dev.em.1.%location: slot=0 function=0 > dev.em.1.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x15d9 subdevice=0x10d3 class=0x020000 > dev.em.1.%parent: pci2 > dev.em.1.nvm: -1 > dev.em.1.debug: -1 > dev.em.1.rx_int_delay: 0 > dev.em.1.tx_int_delay: 66 > dev.em.1.rx_abs_int_delay: 66 > dev.em.1.tx_abs_int_delay: 66 > dev.em.1.rx_processing_limit: 100 > dev.em.1.flow_control: 0 > dev.em.1.eee_control: 0 > dev.em.1.link_irq: 27207 > dev.em.1.mbuf_alloc_fail: 0 > dev.em.1.cluster_alloc_fail: 0 > dev.em.1.dropped: 0 > dev.em.1.tx_dma_fail: 0 > dev.em.1.rx_overruns: 0 > dev.em.1.watchdog_timeouts: 0 > dev.em.1.link_toggles: 3 > dev.em.1.device_control: 1074790984 > dev.em.1.rx_control: 67141634 > dev.em.1.fc_high_water: 18432 > dev.em.1.fc_low_water: 16932 > dev.em.1.queue0.txd_head: 804 > dev.em.1.queue0.txd_tail: 804 > dev.em.1.queue0.tx_irq: 2790571888 > dev.em.1.queue0.no_desc_avail: 0 > dev.em.1.queue0.rxd_head: 686 > dev.em.1.queue0.rxd_tail: 686 > dev.em.1.queue0.rx_irq: 3729597000 > dev.em.1.mac_stats.excess_coll: 0 > dev.em.1.mac_stats.single_coll: 0 > dev.em.1.mac_stats.multiple_coll: 0 > dev.em.1.mac_stats.late_coll: 0 > dev.em.1.mac_stats.collision_count: 0 > dev.em.1.mac_stats.symbol_errors: 0 > dev.em.1.mac_stats.sequence_errors: 0 > dev.em.1.mac_stats.defer_count: 0 > dev.em.1.mac_stats.missed_packets: 123950 > dev.em.1.mac_stats.recv_no_buff: 7063 > dev.em.1.mac_stats.recv_undersize: 0 > dev.em.1.mac_stats.recv_fragmented: 0 > dev.em.1.mac_stats.recv_oversize: 0 > dev.em.1.mac_stats.recv_jabber: 0 > dev.em.1.mac_stats.recv_errs: 0 > dev.em.1.mac_stats.crc_errs: 0 > dev.em.1.mac_stats.alignment_errs: 0 > dev.em.1.mac_stats.coll_ext_errs: 0 > dev.em.1.mac_stats.xon_recvd: 0 > dev.em.1.mac_stats.xon_txd: 0 > dev.em.1.mac_stats.xoff_recvd: 0 > dev.em.1.mac_stats.xoff_txd: 0 > dev.em.1.mac_stats.total_pkts_recvd: 11237327404 > dev.em.1.mac_stats.good_pkts_recvd: 11237203454 > dev.em.1.mac_stats.bcast_pkts_recvd: 60741559 > dev.em.1.mac_stats.mcast_pkts_recvd: 2935 > dev.em.1.mac_stats.rx_frames_64: 4479481614 > dev.em.1.mac_stats.rx_frames_65_127: 5124633522 > dev.em.1.mac_stats.rx_frames_128_255: 13228214 > dev.em.1.mac_stats.rx_frames_256_511: 23321029 > dev.em.1.mac_stats.rx_frames_512_1023: 74960116 > dev.em.1.mac_stats.rx_frames_1024_1522: 1521578959 > dev.em.1.mac_stats.good_octets_recvd: 3024997755633 > dev.em.1.mac_stats.good_octets_txd: 15349574964083 > dev.em.1.mac_stats.total_pkts_txd: 11344737635 > dev.em.1.mac_stats.good_pkts_txd: 11344737635 > dev.em.1.mac_stats.bcast_pkts_txd: 2501 > dev.em.1.mac_stats.mcast_pkts_txd: 8 > dev.em.1.mac_stats.tx_frames_64: 62839963 > dev.em.1.mac_stats.tx_frames_65_127: 962219224 > dev.em.1.mac_stats.tx_frames_128_255: 14620849 > dev.em.1.mac_stats.tx_frames_256_511: 20410906 > dev.em.1.mac_stats.tx_frames_512_1023: 117272600 > dev.em.1.mac_stats.tx_frames_1024_1522: 10167374093 > dev.em.1.mac_stats.tso_txd: 0 > dev.em.1.mac_stats.tso_ctx_fail: 0 > dev.em.1.interrupts.asserts: 19749 > dev.em.1.interrupts.rx_pkt_timer: 1 > dev.em.1.interrupts.rx_abs_timer: 0 > dev.em.1.interrupts.tx_pkt_timer: 0 > dev.em.1.interrupts.tx_abs_timer: 0 > dev.em.1.interrupts.tx_queue_empty: 0 > dev.em.1.interrupts.tx_queue_min_thresh: 0 > dev.em.1.interrupts.rx_desc_min_thresh: 0 > dev.em.1.interrupts.rx_overrun: 18 > hw.em.rx_hang_fixup: 0 > hw.em.eee_setting: 0 > hw.em.fc_setting: 0 > hw.em.rx_process_limit: 100 > hw.em.enable_msix: 1 > hw.em.sbp: 0 > hw.em.smart_pwr_down: 0 > hw.em.txd: 1024 > hw.em.rxd: 1024 > hw.em.rx_abs_int_delay: 66 > hw.em.tx_abs_int_delay: 66 > hw.em.rx_int_delay: 0 > hw.em.tx_int_delay: 66 > > Jason I have allocated more time to the problem and guess I can explain what your problem is. With MSIX disabled, the driver uses fast interrupt handler (em_irq_fast) which calls rx/tx task and then checks for link status change. This implies that rx/tx task is executed with every link state change. This is not efficient, as it is a waste of time to start transmission when link is down. However, it has the effect that after a temporary link loss (active->inactive->active), _start is executed and transmission continues normally. The value of link_toggles (3) clearly indicates that you had such a transition when the problem occured. With MSIX enabled, the link task (em_handle_link) does _not_ triggers _start when the link changes state from inactive to active (which it should). If if_snd quickly fills up during a temporary link loss, transmission is stopped forever and the driver never recovers from that state. The last patch should have reduced the frequency of the problem but it assumes every IFQ_ENQUEUE is followed by a if_start which is not a true assumption. If you are willing to test, I can prepare another patch for you to fix the issue in a different and more reliable way.