From owner-freebsd-net@FreeBSD.ORG Mon Oct 31 04:03:06 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A44E21065673 for ; Mon, 31 Oct 2011 04:03:06 +0000 (UTC) (envelope-from nitroboost@gmail.com) Received: from mail-yw0-f54.google.com (mail-yw0-f54.google.com [209.85.213.54]) by mx1.freebsd.org (Postfix) with ESMTP id 56D758FC17 for ; Mon, 31 Oct 2011 04:03:06 +0000 (UTC) Received: by ywt32 with SMTP id 32so7276145ywt.13 for ; Sun, 30 Oct 2011 21:03:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=3Xfj0aPsLJMitNnEXOOZCNHeXL7sULiErvsBfbmXO+w=; b=gE+SkZi5bavyuuOgc4r3SuOKWnnmIcx3mdFU9JkgTxWutt9/VSGu+/nRbEFkjvJgFE 0AHkrU1GoN/WiXExRcUnfsq67zXNVQ8GcSzIJLVHF/Cm76ojlPIpcCD1z0D5+SlFTQnP OVXvfjvzkQj0bkq7rTUb+Wyd/380dhPiYijPk= MIME-Version: 1.0 Received: by 10.182.17.103 with SMTP id n7mr2521686obd.68.1320033785354; Sun, 30 Oct 2011 21:03:05 -0700 (PDT) Received: by 10.182.35.193 with HTTP; Sun, 30 Oct 2011 21:03:05 -0700 (PDT) In-Reply-To: <4EAD116A.8090006@gmail.com> References: <4E8F157A.40702@sentex.net> <4E8F51D4.1060509@sentex.net> <4EA7E203.3020306@sepehrs.com> <4EA80818.3030504@sentex.net> <4EA80F88.4000400@hotplug.ru> <4EA82715.2000404@gmail.com> <4EA8FA40.7010504@hotplug.ru> <4EA91836.2040508@gmail.com> <4EA959EE.2070806@hotplug.ru> <4EAD116A.8090006@gmail.com> Date: Sun, 30 Oct 2011 21:03:05 -0700 Message-ID: From: Jason Wolfe To: Hooman Fazaeli Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: pyunyh@gmail.com, freebsd-net@freebsd.org, Jack Vogel , Emil Muratov Subject: Re: Intel 82574L interface wedging on em 7.1.9/7.2.3 when MSIX enabled X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Oct 2011 04:03:06 -0000 On Sun, Oct 30, 2011 at 1:57 AM, Hooman Fazaeli wrote: > > I finally managed to re-produce an affect similar to Jason's case. It > may not be the exact same issue, but it is a serious problem and must > be addressed. > > 1. Push out packet on em/igb with high rate. > 2. Disconnect cable and wait for a few seconds. "netstat -ind" shows that > Drops are increasing. > 3. Re-connect the cable. Both sides of like re-negotiate and the links > comes up. > 4. But ..., no packets is ever transmitted again and Drops still > increasing! > > This is because em/lem/igb and some other interfaces (i.e., bce) have > a check at the very beginning of their _start function > which checks link status and immediately returns if it is inactive. > This behavior causes if_snd to fills up in step 2 and when this happens, > IFQ_HANDOFF never calls if_start again, even when the link becomes > active again. > > A cable unplug is not necessary to trigger the issue. Any temporary > link loss (e.i., during re-negotiation) can potentially lead to > aforementioned problem. > > IMHO, this is not a driver issue and the real fix would be to change > IFQ_HANDOFF to call if_start when the queue is full. > > Jason, If you are interested, I can prepare a patch for you > to address this issue in if_em and see if it helps. > Hooman, Thanks for looking into this. I'd be happy to test any patch thrown my way, but keep in mind my issue is only tickled when MSI-X is enabled. My interfaces aren't bouncing, though it might be possible some unique path in the MSI-X code is causing a throughput hang akin to connectivity loss? Jack is the delta your speaking to the 7.2.4 code? I did manage to get the code from Intel compiled with a couple minutes of work, but haven't loaded it up yet as I didn't see anything that caught my untrained eye in the diffs. I'll wait until its ported over and would be happy to test if needed. Conveniently enough I just received another report from my test boxes with a pretty stock loader.conf. I had forgotten to remove the advanced options from the interfaces after I cycled them to pick up the fc_setting=0. Fixed that up just meow. hw.em.fc_setting="0" cc_cubic_load="YES" I bounced em0 because dropped packets incremented 368756 to 369124 and the interface is not incrementing packets out. 5:35PM up 2 days, 17:45, 0 users, load averages: 0.34, 0.45, 0.48 em0: flags=8843 metric 0 mtu 1500 options=219b ether X inet6 X%em0 prefixlen 64 scopeid 0x1 nd6 options=1 media: Ethernet autoselect (1000baseT ) status: active em1: flags=8843 metric 0 mtu 1500 options=219b ether X inet6 X%em1 prefixlen 64 scopeid 0x2 inet6 X prefixlen 64 autoconf nd6 options=3 media: Ethernet autoselect (1000baseT ) status: active ipfw0: flags=8801 metric 0 mtu 65536 lo0: flags=8049 metric 0 mtu 16384 options=3 inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 inet X.X.X.X netmask 0xffffffff inet X.X.X.X netmask 0xffffffff inet X.X.X.X netmask 0xffffffff inet X.X.X.X netmask 0xffffffff inet X.X.X.X netmask 0xffffffff inet X.X.X.X netmask 0xffffffff inet X.X.X.X netmask 0xffffffff inet X.X.X.X netmask 0xffffffff nd6 options=3 lagg0: flags=8843 metric 0 mtu 1500 options=219b ether X inet X.X.X.X netmask 0xffffff00 broadcast X.X.X.X inet6 X%lagg0 prefixlen 64 scopeid 0x5 nd6 options=3 media: Ethernet autoselect status: active laggproto loadbalance laggport: em0 flags=4 laggport: em1 flags=4 interrupt total rate irq3: uart1 3456 0 cpu0: timer 473404250 2000 irq256: em0:rx 0 24614350 103 irq257: em0:tx 0 1220810972 5157 irq258: em0:link 1 0 irq259: em1:rx 0 1533295149 6477 irq260: em1:tx 0 1194032538 5044 irq261: em1:link 3272 0 irq262: mps0 189602667 801 cpu3: timer 473396089 2000 cpu1: timer 473396089 2000 cpu2: timer 473396081 2000 Total 6055954914 25585 32999/8476/41475 mbufs in use (current/cache/total) 4064/3398/7462/5872038 mbuf clusters in use (current/cache/total/max) 4064/800 mbuf+clusters out of packet secondary zone in use (current/cache) 24900/669/25569/2936019 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 115977K/11591K/127568K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 61 requests for I/O initiated by sendfile 0 calls to protocol drain routines Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop em0 1500 00:25:90:2a:a2:d7 24946787 0 0 5734180355 0 0 369844 em0 1500 fe80:1::225:9 fe80:1::225:90ff: 0 - - 2 - - - em1 1500 00:25:90:2a:a2:d7 5220869518 15996 0 5429971995 0 0 37009 em1 1500 fe80:2::225:9 fe80:2::225:90ff: 0 - - 1 - - - em1 1500 2607:f4e8:310 2607:f4e8:310:12: 0 - - 0 - - - lagg0 1500 00:25:90:2a:a2:d7 5245767782 0 0 11162877037 406853 0 0 lagg0 1500 69.164.38.0/2 69.164.38.69 4776881809 - - 11164303625 - - - lagg0 1500 fe80:5::225:9 fe80:5::225:90ff: 0 - - 3 - - - kern.msgbuf: Oct 30 17:08:38 cds1019 kernel: ifa_add_loopback_route: insertion failed Oct 30 17:12:10 cds1019 kernel: ifa_add_loopback_route: insertion failed Oct 30 17:20:20 cds1019 last message repeated 3 times Oct 30 17:32:13 cds1019 last message repeated 4 times Oct 30 17:34:27 cds1019 kernel: ifa_add_loopback_route: insertion failed Oct 30 17:35:03 cds1019 kernel: Interface is RUNNING and INACTIVE Oct 30 17:35:03 cds1019 kernel: em0: hw tdh = 818, hw tdt = 818 Oct 30 17:35:03 cds1019 kernel: em0: hw rdh = 99, hw rdt = 98 Oct 30 17:35:03 cds1019 kernel: em0: Tx Queue Status = 0 Oct 30 17:35:03 cds1019 kernel: em0: TX descriptors avail = 1024 Oct 30 17:35:03 cds1019 kernel: em0: Tx Descriptors avail failure = 0 Oct 30 17:35:03 cds1019 kernel: em0: RX discarded packets = 0 Oct 30 17:35:03 cds1019 kernel: em0: RX Next to Check = 110 Oct 30 17:35:03 cds1019 kernel: em0: RX Next to Refresh = 109 Oct 30 17:35:03 cds1019 kernel: em0: Link state: active Oct 30 17:35:03 cds1019 kernel: Interface is RUNNING and INACTIVE Oct 30 17:35:03 cds1019 kernel: em1: hw tdh = 688, hw tdt = 688 Oct 30 17:35:03 cds1019 kernel: em1: hw rdh = 861, hw rdt = 860 Oct 30 17:35:03 cds1019 kernel: em1: Tx Queue Status = 1 Oct 30 17:35:03 cds1019 kernel: em1: TX descriptors avail = 1024 Oct 30 17:35:03 cds1019 kernel: em1: Tx Descriptors avail failure = 0 Oct 30 17:35:03 cds1019 kernel: em1: RX discarded packets = 0 Oct 30 17:35:03 cds1019 kernel: em1: RX Next to Check = 19 Oct 30 17:35:03 cds1019 kernel: em1: RX Next to Refresh = 85 Oct 30 17:35:03 cds1019 kernel: em1: Link state: active net.inet.ip.intr_queue_maxlen: 512 net.inet.ip.intr_queue_drops: 0 dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.3 dev.em.0.%driver: em dev.em.0.%location: slot=0 function=0 dev.em.0.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x15d9 subdevice=0x10d3 class=0x020000 dev.em.0.%parent: pci1 dev.em.0.nvm: -1 dev.em.0.debug: -1 dev.em.0.rx_int_delay: 0 dev.em.0.tx_int_delay: 66 dev.em.0.rx_abs_int_delay: 66 dev.em.0.tx_abs_int_delay: 66 dev.em.0.rx_processing_limit: 100 dev.em.0.flow_control: 0 dev.em.0.eee_control: 0 dev.em.0.link_irq: 1 dev.em.0.mbuf_alloc_fail: 0 dev.em.0.cluster_alloc_fail: 0 dev.em.0.dropped: 0 dev.em.0.tx_dma_fail: 0 dev.em.0.rx_overruns: 0 dev.em.0.watchdog_timeouts: 0 dev.em.0.device_control: 1074790984 dev.em.0.rx_control: 67141634 dev.em.0.fc_high_water: 18432 dev.em.0.fc_low_water: 16932 dev.em.0.queue0.txd_head: 818 dev.em.0.queue0.txd_tail: 818 dev.em.0.queue0.tx_irq: 1220810942 dev.em.0.queue0.no_desc_avail: 0 dev.em.0.queue0.rxd_head: 314 dev.em.0.queue0.rxd_tail: 313 dev.em.0.queue0.rx_irq: 24614539 dev.em.0.mac_stats.excess_coll: 0 dev.em.0.mac_stats.single_coll: 0 dev.em.0.mac_stats.multiple_coll: 0 dev.em.0.mac_stats.late_coll: 0 dev.em.0.mac_stats.collision_count: 0 dev.em.0.mac_stats.symbol_errors: 0 dev.em.0.mac_stats.sequence_errors: 0 dev.em.0.mac_stats.defer_count: 0 dev.em.0.mac_stats.missed_packets: 0 dev.em.0.mac_stats.recv_no_buff: 0 dev.em.0.mac_stats.recv_undersize: 0 dev.em.0.mac_stats.recv_fragmented: 0 dev.em.0.mac_stats.recv_oversize: 0 dev.em.0.mac_stats.recv_jabber: 0 dev.em.0.mac_stats.recv_errs: 0 dev.em.0.mac_stats.crc_errs: 0 dev.em.0.mac_stats.alignment_errs: 0 dev.em.0.mac_stats.coll_ext_errs: 0 dev.em.0.mac_stats.xon_recvd: 0 dev.em.0.mac_stats.xon_txd: 0 dev.em.0.mac_stats.xoff_recvd: 0 dev.em.0.mac_stats.xoff_txd: 0 dev.em.0.mac_stats.total_pkts_recvd: 24955229 dev.em.0.mac_stats.good_pkts_recvd: 24955229 dev.em.0.mac_stats.bcast_pkts_recvd: 24945466 dev.em.0.mac_stats.mcast_pkts_recvd: 1377 dev.em.0.mac_stats.rx_frames_64: 24945502 dev.em.0.mac_stats.rx_frames_65_127: 1606 dev.em.0.mac_stats.rx_frames_128_255: 7963 dev.em.0.mac_stats.rx_frames_256_511: 157 dev.em.0.mac_stats.rx_frames_512_1023: 1 dev.em.0.mac_stats.rx_frames_1024_1522: 0 dev.em.0.mac_stats.good_octets_recvd: 1597800626 dev.em.0.mac_stats.good_octets_txd: 7770925595880 dev.em.0.mac_stats.total_pkts_txd: 5734191501 dev.em.0.mac_stats.good_pkts_txd: 5734191501 dev.em.0.mac_stats.bcast_pkts_txd: 4 dev.em.0.mac_stats.mcast_pkts_txd: 1584 dev.em.0.mac_stats.tx_frames_64: 25504167 dev.em.0.mac_stats.tx_frames_65_127: 485744914 dev.em.0.mac_stats.tx_frames_128_255: 5963478 dev.em.0.mac_stats.tx_frames_256_511: 9021090 dev.em.0.mac_stats.tx_frames_512_1023: 55803019 dev.em.0.mac_stats.tx_frames_1024_1522: 5152154833 dev.em.0.mac_stats.tso_txd: 0 dev.em.0.mac_stats.tso_ctx_fail: 0 dev.em.0.interrupts.asserts: 3 dev.em.0.interrupts.rx_pkt_timer: 0 dev.em.0.interrupts.rx_abs_timer: 0 dev.em.0.interrupts.tx_pkt_timer: 0 dev.em.0.interrupts.tx_abs_timer: 0 dev.em.0.interrupts.tx_queue_empty: 0 dev.em.0.interrupts.tx_queue_min_thresh: 0 dev.em.0.interrupts.rx_desc_min_thresh: 0 dev.em.0.interrupts.rx_overrun: 0 dev.em.1.%desc: Intel(R) PRO/1000 Network Connection 7.2.3 dev.em.1.%driver: em dev.em.1.%location: slot=0 function=0 dev.em.1.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x15d9 subdevice=0x10d3 class=0x020000 dev.em.1.%parent: pci2 dev.em.1.nvm: -1 dev.em.1.debug: -1 dev.em.1.rx_int_delay: 0 dev.em.1.tx_int_delay: 66 dev.em.1.rx_abs_int_delay: 66 dev.em.1.tx_abs_int_delay: 66 dev.em.1.rx_processing_limit: 100 dev.em.1.flow_control: 0 dev.em.1.eee_control: 0 dev.em.1.link_irq: 3272 dev.em.1.mbuf_alloc_fail: 0 dev.em.1.cluster_alloc_fail: 0 dev.em.1.dropped: 0 dev.em.1.tx_dma_fail: 0 dev.em.1.rx_overruns: 0 dev.em.1.watchdog_timeouts: 0 dev.em.1.device_control: 1074790984 dev.em.1.rx_control: 67141634 dev.em.1.fc_high_water: 18432 dev.em.1.fc_low_water: 16932 dev.em.1.queue0.txd_head: 9 dev.em.1.queue0.txd_tail: 14 dev.em.1.queue0.tx_irq: 1194045687 dev.em.1.queue0.no_desc_avail: 0 dev.em.1.queue0.rxd_head: 317 dev.em.1.queue0.rxd_tail: 315 dev.em.1.queue0.rx_irq: 1528821076 dev.em.1.mac_stats.excess_coll: 0 dev.em.1.mac_stats.single_coll: 0 dev.em.1.mac_stats.multiple_coll: 0 dev.em.1.mac_stats.late_coll: 0 dev.em.1.mac_stats.collision_count: 0 dev.em.1.mac_stats.symbol_errors: 0 dev.em.1.mac_stats.sequence_errors: 0 dev.em.1.mac_stats.defer_count: 0 dev.em.1.mac_stats.missed_packets: 15996 dev.em.1.mac_stats.recv_no_buff: 1035 dev.em.1.mac_stats.recv_undersize: 0 dev.em.1.mac_stats.recv_fragmented: 0 dev.em.1.mac_stats.recv_oversize: 0 dev.em.1.mac_stats.recv_jabber: 0 dev.em.1.mac_stats.recv_errs: 0 dev.em.1.mac_stats.crc_errs: 0 dev.em.1.mac_stats.alignment_errs: 0 dev.em.1.mac_stats.coll_ext_errs: 0 dev.em.1.mac_stats.xon_recvd: 0 dev.em.1.mac_stats.xon_txd: 0 dev.em.1.mac_stats.xoff_recvd: 0 dev.em.1.mac_stats.xoff_txd: 0 dev.em.1.mac_stats.total_pkts_recvd: 5220907923 dev.em.1.mac_stats.good_pkts_recvd: 5220891927 dev.em.1.mac_stats.bcast_pkts_recvd: 24943828 dev.em.1.mac_stats.mcast_pkts_recvd: 1375 dev.em.1.mac_stats.rx_frames_64: 1897964107 dev.em.1.mac_stats.rx_frames_65_127: 2457591621 dev.em.1.mac_stats.rx_frames_128_255: 5971619 dev.em.1.mac_stats.rx_frames_256_511: 11235726 dev.em.1.mac_stats.rx_frames_512_1023: 32267871 dev.em.1.mac_stats.rx_frames_1024_1522: 815860983 dev.em.1.mac_stats.good_octets_recvd: 1562681996232 dev.em.1.mac_stats.good_octets_txd: 7334946827215 dev.em.1.mac_stats.total_pkts_txd: 5430018684 dev.em.1.mac_stats.good_pkts_txd: 5430018682 dev.em.1.mac_stats.bcast_pkts_txd: 940 dev.em.1.mac_stats.mcast_pkts_txd: 12 dev.em.1.mac_stats.tx_frames_64: 24843579 dev.em.1.mac_stats.tx_frames_65_127: 483343565 dev.em.1.mac_stats.tx_frames_128_255: 5819327 dev.em.1.mac_stats.tx_frames_256_511: 7998506 dev.em.1.mac_stats.tx_frames_512_1023: 44787728 dev.em.1.mac_stats.tx_frames_1024_1522: 4863225979 dev.em.1.mac_stats.tso_txd: 0 dev.em.1.mac_stats.tso_ctx_fail: 0 dev.em.1.interrupts.asserts: 2398 dev.em.1.interrupts.rx_pkt_timer: 1 dev.em.1.interrupts.rx_abs_timer: 0 dev.em.1.interrupts.tx_pkt_timer: 0 dev.em.1.interrupts.tx_abs_timer: 0 dev.em.1.interrupts.tx_queue_empty: 0 dev.em.1.interrupts.tx_queue_min_thresh: 0 dev.em.1.interrupts.rx_desc_min_thresh: 0 dev.em.1.interrupts.rx_overrun: 2 hw.em.eee_setting: 0 hw.em.fc_setting: 0 hw.em.rx_process_limit: 100 hw.em.enable_msix: 1 hw.em.sbp: 0 hw.em.smart_pwr_down: 0 hw.em.txd: 1024 hw.em.rxd: 1024 hw.em.rx_abs_int_delay: 66 hw.em.tx_abs_int_delay: 66 hw.em.rx_int_delay: 0 hw.em.tx_int_delay: 66 Jason