From owner-freebsd-stable@FreeBSD.ORG Thu Nov 10 11:52:23 2011 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0F691106564A for ; Thu, 10 Nov 2011 11:52:23 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from mail.digiware.nl (mail.digiware.nl [217.149.136.189]) by mx1.freebsd.org (Postfix) with ESMTP id 8A8208FC12 for ; Thu, 10 Nov 2011 11:52:21 +0000 (UTC) Received: from rack1.digiware.nl (localhost.digiware.nl [127.0.0.1]) by mail.digiware.nl (Postfix) with ESMTP id 0E9A5153433; Thu, 10 Nov 2011 12:51:49 +0100 (CET) X-Virus-Scanned: amavisd-new at digiware.nl Received: from mail.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EDygJDS00c6C; Thu, 10 Nov 2011 12:51:42 +0100 (CET) Received: from [10.20.7.13] (seven.iphion.nl [217.149.136.129]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.digiware.nl (Postfix) with ESMTPSA id BAF3D153436; Thu, 10 Nov 2011 12:51:42 +0100 (CET) Message-ID: <4EBBBACE.3020900@digiware.nl> Date: Thu, 10 Nov 2011 12:51:42 +0100 From: Willem Jan Withagen User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0) Gecko/20111105 Thunderbird/8.0 MIME-Version: 1.0 To: Jeremy Chadwick References: <4EBB97DF.3020803@digiware.nl> <20111110095041.GA73812@icarus.home.lan> In-Reply-To: <20111110095041.GA73812@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "stable@freebsd.org" , "Vogel, Jack" Subject: Re: em0 watchdog timeout X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Nov 2011 11:52:23 -0000 On 10-11-2011 10:50, Jeremy Chadwick wrote: > On Thu, Nov 10, 2011 at 10:22:39AM +0100, Willem Jan Withagen wrote: >> Still running this file server on ZFS, and every now and then em0 >> goes down, and is not revivable.... Nothing goes in or out the >> box... >> >> Any suggestions as how to (help) fix this? > > CC'ing Jack Vogel of Intel. > > We need "pciconf -lvbc" output (-lv by itself isn't sufficient in this > regard). em0@pci0:0:25:0: class=0x020000 card=0x10bd15d9 chip=0x10bd8086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = 'Intel 82566DM Gigabit Ethernet Adapter (82566DM)' class = network subclass = ethernet bar [10] = type Memory, range 32, base 0xdf900000, size 131072, enabled bar [14] = type Memory, range 32, base 0xdf924000, size 4096, enabled bar [18] = type I/O Port, range 32, base 0x1820, size 32, enabled cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 13[e0] = PCI Advanced Features: FLR TP dmidecode gives: Handle 0x0001, DMI type 1, 27 bytes System Information Manufacturer: Supermicro Product Name: C2SBX Version: 0123456789 Serial Number: 0123456789 UUID: 53D1A494-D663-A0E7-890B-003048DE97CD Wake-up Type: Power Switch SKU Number: Not Specified Family: Not Specified > Also, please do "sysctl dev.em.0.debug=1", which will show nothing > useful in the output, however "dmesg" shortly after should have a bunch > of driver-level debugging information that should help (output starts > with "Interface is ...". Please provide that too. System is rebooted. So currrently there is nothing serious in trouble. But trying to switch is on does not seem to work? # sysctl dev.em.0.debug=1 dev.em.0.debug: -1 -> -1 # sysctl -a | grep debug | grep em dev.em.0.debug: -1 Or is it just to dump this: Nov 10 12:44:27 zfs kernel: Interface is RUNNING and INACTIVE Nov 10 12:44:27 zfs kernel: em0: hw tdh = 965, hw tdt = 965 Nov 10 12:44:27 zfs kernel: em0: hw rdh = 586, hw rdt = 585 Nov 10 12:44:27 zfs kernel: em0: Tx Queue Status = 0 Nov 10 12:44:27 zfs kernel: em0: TX descriptors avail = 1024 Nov 10 12:44:27 zfs kernel: em0: Tx Descriptors avail failure = 0 Nov 10 12:44:27 zfs kernel: em0: RX discarded packets = 0 Nov 10 12:44:27 zfs kernel: em0: RX Next to Check = 586 Nov 10 12:44:27 zfs kernel: em0: RX Next to Refresh = 585 I'm telling everybody always that they should go for intel ethernet devices, because "they just work". And I'm still very much convinced of this. So I'll be more than happy to do any debugging and/or testing required. The only thing I can not afford at the moment is leave this box in disconnected state. And note that this problem only raises it nasty head very few weeks... --WjW > >> Nov 10 09:07:41 zfs kernel: em0: Watchdog timeout -- resetting >> Nov 10 09:07:41 zfs kernel: em0: Queue(0) tdh = 187, hw tdt = 189 >> Nov 10 09:07:41 zfs kernel: em0: TX(0) desc avail = 1022,Next TX to Clean = 187 >> Nov 10 09:11:32 zfs kernel: em0: Watchdog timeout -- resetting >> Nov 10 09:11:32 zfs kernel: em0: Queue(0) tdh = 139, hw tdt = 151 >> Nov 10 09:11:32 zfs kernel: em0: TX(0) desc avail = 1012,Next TX to Clean = 139 >> Nov 10 09:16:05 zfs kernel: em0: Watchdog timeout -- resetting >> Nov 10 09:16:05 zfs kernel: em0: Queue(0) tdh = 152, hw tdt = 163 >> Nov 10 09:16:05 zfs kernel: em0: TX(0) desc avail = 1013,Next TX to Clean = 152 >> Nov 10 09:33:10 zfs kernel: em0: Watchdog timeout -- resetting >> Nov 10 09:33:10 zfs kernel: em0: Queue(0) tdh = 161, hw tdt = 176 >> Nov 10 09:33:10 zfs kernel: em0: TX(0) desc avail = 1008,Next TX to Clean = 160 >> Nov 10 09:53:18 zfs kernel: em0: Watchdog timeout -- resetting >> Nov 10 09:53:18 zfs kernel: em0: Queue(0) tdh = 157, hw tdt = 172 >> Nov 10 09:53:18 zfs kernel: em0: TX(0) desc avail = 1009,Next TX to Clean = 157 >> >> Device is: >> Nov 10 10:07:27 zfs kernel: em0: port 0x1820-0x183f mem 0xdf900000-0xdf91ffff,0xdf924000-0xdf924fff irq 16 at device 25.0 on pci0 >> Nov 10 10:07:27 zfs kernel: em0: Using an MSI interrupt >> Nov 10 10:07:27 zfs kernel: em0: [FILTER] >> >> pciconf -lv: >> em0@pci0:0:25:0: class=0x020000 card=0x10bd15d9 >> chip=0x10bd8086 rev=0x02 hdr=0x00 >> vendor = 'Intel Corporation' >> device = 'Intel 82566DM Gigabit Ethernet Adapter (82566DM)' >> class = network >> subclass = ethernet >> >> uname: >> 8.2-STABLE FreeBSD 8.2-STABLE #12: Sun Oct 2 13:36:55 CEST 2011 >> amd64 >> >> sysctl -a | grep em.0: >> dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.3 >> dev.em.0.%driver: em >> dev.em.0.%location: slot=25 function=0 handle=\_SB_.PCI0.LAN_ >> dev.em.0.%pnpinfo: vendor=0x8086 device=0x10bd subvendor=0x15d9 >> subdevice=0x10bd class=0x020000 >> dev.em.0.%parent: pci0 >> dev.em.0.nvm: -1 >> dev.em.0.debug: -1 >> dev.em.0.rx_int_delay: 0 >> dev.em.0.tx_int_delay: 66 >> dev.em.0.rx_abs_int_delay: 66 >> dev.em.0.tx_abs_int_delay: 66 >> dev.em.0.rx_processing_limit: 100 >> dev.em.0.flow_control: 3 >> dev.em.0.eee_control: 0 >> dev.em.0.link_irq: 0 >> dev.em.0.mbuf_alloc_fail: 0 >> dev.em.0.cluster_alloc_fail: 0 >> dev.em.0.dropped: 0 >> dev.em.0.tx_dma_fail: 0 >> dev.em.0.rx_overruns: 6 >> dev.em.0.watchdog_timeouts: 5 >> dev.em.0.device_control: 1074790976 >> dev.em.0.rx_control: 67141634 >> dev.em.0.fc_high_water: 8192 >> dev.em.0.fc_low_water: 6692 >> dev.em.0.queue0.txd_head: 78 >> dev.em.0.queue0.txd_tail: 78 >> dev.em.0.queue0.tx_irq: 0 >> dev.em.0.queue0.no_desc_avail: 0 >> dev.em.0.queue0.rxd_head: 376 >> dev.em.0.queue0.rxd_tail: 375 >> dev.em.0.queue0.rx_irq: 0 >> dev.em.0.mac_stats.excess_coll: 0 >> dev.em.0.mac_stats.single_coll: 0 >> dev.em.0.mac_stats.multiple_coll: 0 >> dev.em.0.mac_stats.late_coll: 0 >> dev.em.0.mac_stats.collision_count: 0 >> dev.em.0.mac_stats.symbol_errors: 0 >> dev.em.0.mac_stats.sequence_errors: 0 >> dev.em.0.mac_stats.defer_count: 0 >> dev.em.0.mac_stats.missed_packets: 9 >> dev.em.0.mac_stats.recv_no_buff: 0 >> dev.em.0.mac_stats.recv_undersize: 0 >> dev.em.0.mac_stats.recv_fragmented: 0 >> dev.em.0.mac_stats.recv_oversize: 0 >> dev.em.0.mac_stats.recv_jabber: 0 >> dev.em.0.mac_stats.recv_errs: 1 >> dev.em.0.mac_stats.crc_errs: 1 >> dev.em.0.mac_stats.alignment_errs: 0 >> dev.em.0.mac_stats.coll_ext_errs: 0 >> dev.em.0.mac_stats.xon_recvd: 0 >> dev.em.0.mac_stats.xon_txd: 0 >> dev.em.0.mac_stats.xoff_recvd: 0 >> dev.em.0.mac_stats.xoff_txd: 0 >> dev.em.0.mac_stats.total_pkts_recvd: 160062850 >> dev.em.0.mac_stats.good_pkts_recvd: 160062840 >> dev.em.0.mac_stats.bcast_pkts_recvd: 79648 >> dev.em.0.mac_stats.mcast_pkts_recvd: 10220 >> dev.em.0.mac_stats.rx_frames_64: 0 >> dev.em.0.mac_stats.rx_frames_65_127: 0 >> dev.em.0.mac_stats.rx_frames_128_255: 0 >> dev.em.0.mac_stats.rx_frames_256_511: 0 >> dev.em.0.mac_stats.rx_frames_512_1023: 0 >> dev.em.0.mac_stats.rx_frames_1024_1522: 0 >> dev.em.0.mac_stats.good_octets_recvd: 107143604749 >> dev.em.0.mac_stats.good_octets_txd: 129876768158 >> dev.em.0.mac_stats.total_pkts_txd: 179010567 >> dev.em.0.mac_stats.good_pkts_txd: 179010567 >> dev.em.0.mac_stats.bcast_pkts_txd: 14608 >> dev.em.0.mac_stats.mcast_pkts_txd: 206 >> dev.em.0.mac_stats.tx_frames_64: 0 >> dev.em.0.mac_stats.tx_frames_65_127: 0 >> dev.em.0.mac_stats.tx_frames_128_255: 0 >> dev.em.0.mac_stats.tx_frames_256_511: 0 >> dev.em.0.mac_stats.tx_frames_512_1023: 0 >> dev.em.0.mac_stats.tx_frames_1024_1522: 0 >> dev.em.0.mac_stats.tso_txd: 3691806 >> dev.em.0.mac_stats.tso_ctx_fail: 0 >> dev.em.0.interrupts.asserts: 130023913 >> dev.em.0.interrupts.rx_pkt_timer: 0 >> dev.em.0.interrupts.rx_abs_timer: 0 >> dev.em.0.interrupts.tx_pkt_timer: 0 >> dev.em.0.interrupts.tx_abs_timer: 0 >> dev.em.0.interrupts.tx_queue_empty: 0 >> dev.em.0.interrupts.tx_queue_min_thresh: 0 >> dev.em.0.interrupts.rx_desc_min_thresh: 0 >> dev.em.0.interrupts.rx_overrun: 0 >> dev.em.0.wake: 0 >