From owner-freebsd-net@FreeBSD.ORG Thu Mar 31 13:51:02 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3D053106564A for ; Thu, 31 Mar 2011 13:51:02 +0000 (UTC) (envelope-from lacombar@gmail.com) Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com [209.85.218.54]) by mx1.freebsd.org (Postfix) with ESMTP id EF30D8FC23 for ; Thu, 31 Mar 2011 13:51:01 +0000 (UTC) Received: by yie12 with SMTP id 12so1130783yie.13 for ; Thu, 31 Mar 2011 06:51:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to:cc :content-type; bh=X6/KdAkdds5B5SYumr54RS6nOrTuCYENRfu1eRrrwkg=; b=ZGut0t72s9uJSNcdOwnKgyAxHR4asARi8GfbCjYTxxsohrLX8/P7+DG2uNplySuBmm W/uvCcKCpGDdArSNfG0uLXTX6JSq4uCY2djbO3vgL3EzZNeX4iZAzJ5E1wHvY8q1/8tp RY55Znbdx/eGXmvZaemRSOvyR5MapYVcThjso= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:cc:content-type; b=RCGYM8CE86n4bynWRQhNL8MIOSImb0FTgJlQHj3LK/6poi4tPBXqlMOIoHPN7lP9lT VHkxq8xV5yfPoARFiy5UnfVp7jsZBK77/pyqxK47sqhlwSr4Gzsdzjm4cU04rh1uOPW8 ANzqcEhG/n78gLEFkuDDu1TVkTJJJJBsiNe8Y= MIME-Version: 1.0 Received: by 10.43.63.72 with SMTP id xd8mr2901691icb.215.1301579460835; Thu, 31 Mar 2011 06:51:00 -0700 (PDT) Received: by 10.42.146.72 with HTTP; Thu, 31 Mar 2011 06:51:00 -0700 (PDT) Date: Thu, 31 Mar 2011 09:51:00 -0400 Message-ID: From: Arnaud Lacombe To: Jack Vogel Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-net@freebsd.org Subject: em(4) hang [Was: Re: igb(4) won't start with "igb0: Could not setup receive structures"] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Mar 2011 13:51:02 -0000 Hi [let's start a new thread :)] On Wed, Mar 30, 2011 at 2:22 PM, Jack Vogel wrote: > Read the code in HEAD, em_local_timer() has a test of ALL the rx queues and > will schedule a task that refreshes mbufs if they are empty. This has > exactly the > same effect as checking for some interrupt cause, a cause that is not > available > when using MSIX on 82574, but this approach works for everything. > ok, it took me a long time to reproduce the issue with em(4) version 7.1.9, about 3h rather than a few minutes a month ago and only got ~875 allocations failure vs. several thousand before, here are some stats: # sysctl -a | grep missed dev.em.0.mac_stats.missed_packets: 1917112 dev.em.1.mac_stats.missed_packets: 0 dev.em.2.mac_stats.missed_packets: 0 dev.em.3.mac_stats.missed_packets: 0 dev.em.4.mac_stats.missed_packets: 0 dev.em.5.mac_stats.missed_packets: 0 # sysctl dev.em.0.debug=1 dev.em.0.debug: I-1nterface is RUNNING and INACTIVE em0: hw tdh = 861, hw tdt = 861 em0: hw rdh = 929, hw rdt = 929 em0: Tx Queue Status = 0 em0: TX descriptors avail = 1024 em0: Tx Descriptors avail failure = 0 em0: RX discarded packets = 0 em0: RX Next to Check = 929 em0: RX Next to Refresh = 930 -> -1 I backported the -current driver to 7.1 and re-ran the test overnight. Now, the box is running 7.2.2. The box was hung this morning: dev.em.0.mac_stats.missed_packets: 25513991 dev.em.1.mac_stats.missed_packets: 0 dev.em.2.mac_stats.missed_packets: 0 dev.em.3.mac_stats.missed_packets: 0 dev.em.4.mac_stats.missed_packets: 0 dev.em.5.mac_stats.missed_packets: 0 There has been about 1000 mbuf allocation denial. I changed some relevant field of the RX soft stat in the sysctl output of the device [Of course, the only field of interest, `next_to_check' is invalid because of a typo... I should not change code past a certain hour :)], here it is: # sysctl dev.em.0 dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.2 dev.em.0.%driver: em dev.em.0.%location: slot=0 function=0 dev.em.0.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x8086 subdevice=0x0000 class=0x020000 dev.em.0.%parent: pci1 dev.em.0.nvm: -1 dev.em.0.debug: -1 dev.em.0.rx_int_delay: 0 dev.em.0.tx_int_delay: 66 dev.em.0.rx_abs_int_delay: 66 dev.em.0.tx_abs_int_delay: 66 dev.em.0.rx_processing_limit: 100 dev.em.0.flow_control: 3 dev.em.0.eee_control: 0 dev.em.0.link_irq: 11621474 dev.em.0.mbuf_alloc_fail: 0 dev.em.0.cluster_alloc_fail: 0 dev.em.0.dropped: 0 dev.em.0.tx_dma_fail: 0 dev.em.0.rx_overruns: 0 dev.em.0.watchdog_timeouts: 0 dev.em.0.device_control: 1477444168 dev.em.0.rx_control: 67141634 dev.em.0.fc_high_water: 18432 dev.em.0.fc_low_water: 16932 dev.em.0.queue0.txd_head: 904 dev.em.0.queue0.txd_tail: 904 dev.em.0.queue0.tx_irq: 10291170 dev.em.0.queue0.no_desc_avail: 0 dev.em.0.queue0.rxd_head: 766 dev.em.0.queue0.rxd_tail: 767 dev.em.0.queue0.rx_irq: 6937760 dev.em.0.queue0.rx_discarded: 0 dev.em.0.queue0.rx_forced_refill: 0 dev.em.0.queue0.next_to_check: 6937760 ^^^ this field is invalid... bad code... :( dev.em.0.queue0.next_to_refresh: 767 dev.em.0.mac_stats.excess_coll: 0 dev.em.0.mac_stats.single_coll: 0 dev.em.0.mac_stats.multiple_coll: 0 dev.em.0.mac_stats.late_coll: 0 dev.em.0.mac_stats.collision_count: 0 dev.em.0.mac_stats.symbol_errors: 0 dev.em.0.mac_stats.sequence_errors: 0 dev.em.0.mac_stats.defer_count: 0 dev.em.0.mac_stats.missed_packets: 25752895 dev.em.0.mac_stats.recv_no_buff: 3 dev.em.0.mac_stats.recv_undersize: 0 dev.em.0.mac_stats.recv_fragmented: 0 dev.em.0.mac_stats.recv_oversize: 0 dev.em.0.mac_stats.recv_jabber: 0 dev.em.0.mac_stats.recv_errs: 0 dev.em.0.mac_stats.crc_errs: 0 dev.em.0.mac_stats.alignment_errs: 0 dev.em.0.mac_stats.coll_ext_errs: 0 dev.em.0.mac_stats.xon_recvd: 0 dev.em.0.mac_stats.xon_txd: 0 dev.em.0.mac_stats.xoff_recvd: 0 dev.em.0.mac_stats.xoff_txd: 25752073 dev.em.0.mac_stats.total_pkts_recvd: 39996734 dev.em.0.mac_stats.good_pkts_recvd: 14243839 dev.em.0.mac_stats.bcast_pkts_recvd: 5 dev.em.0.mac_stats.mcast_pkts_recvd: 0 dev.em.0.mac_stats.rx_frames_64: 13878627 dev.em.0.mac_stats.rx_frames_65_127: 365212 dev.em.0.mac_stats.rx_frames_128_255: 0 dev.em.0.mac_stats.rx_frames_256_511: 0 dev.em.0.mac_stats.rx_frames_512_1023: 0 dev.em.0.mac_stats.rx_frames_1024_1522: 0 dev.em.0.mac_stats.good_octets_recvd: 916346006 dev.em.0.mac_stats.good_octets_txd: 21377046229 dev.em.0.mac_stats.total_pkts_txd: 44415008 dev.em.0.mac_stats.good_pkts_txd: 18661905 dev.em.0.mac_stats.bcast_pkts_txd: 24822815 dev.em.0.mac_stats.mcast_pkts_txd: 0 dev.em.0.mac_stats.tx_frames_64: 1278447 dev.em.0.mac_stats.tx_frames_65_127: 1221602 dev.em.0.mac_stats.tx_frames_128_255: 503121 dev.em.0.mac_stats.tx_frames_256_511: 770073 dev.em.0.mac_stats.tx_frames_512_1023: 1921953 dev.em.0.mac_stats.tx_frames_1024_1522: 12966709 dev.em.0.mac_stats.tso_txd: 0 dev.em.0.mac_stats.tso_ctx_fail: 0 dev.em.0.interrupts.asserts: 5297765 dev.em.0.interrupts.rx_pkt_timer: 0 dev.em.0.interrupts.rx_abs_timer: 0 dev.em.0.interrupts.tx_pkt_timer: 1 dev.em.0.interrupts.tx_abs_timer: 1 dev.em.0.interrupts.tx_queue_empty: 0 dev.em.0.interrupts.tx_queue_min_thresh: 0 dev.em.0.interrupts.rx_desc_min_thresh: 0 dev.em.0.interrupts.rx_overrun: 3186 I added `rx_forced_refill', which account the number of time the RX taskqueue is scheduled from em_local_timer(). Now, if I read the patched em_local_timer() correctly, the relevant code being: /* trigger tq to refill rx ring queue if it is empty */ for (int i = 0; i < adapter->num_queues; i++, rxr++) { if (rxr->next_to_check == rxr->next_to_refresh) { rxr->rx_forced_refill++; taskqueue_enqueue(rxr->tq, &rxr->rx_task); } } and assuming that we have the same pattern between `next_to_check' and `next_to_refresh' as we did with 7.1.9, it is understandable the driver hang. I'll remove part of the changes I made to keep only `rx_forced_refill' and the associated sysctl, re-run the tests and come back with correct value, hopefully in a few hours. - Arnaud