Date: Thu, 10 Nov 2011 12:51:42 +0100 From: Willem Jan Withagen <wjw@digiware.nl> To: Jeremy Chadwick <freebsd@jdc.parodius.com> Cc: "stable@freebsd.org" <stable@freebsd.org>, "Vogel, Jack" <jack.vogel@intel.com> Subject: Re: em0 watchdog timeout Message-ID: <4EBBBACE.3020900@digiware.nl> In-Reply-To: <20111110095041.GA73812@icarus.home.lan> References: <4EBB97DF.3020803@digiware.nl> <20111110095041.GA73812@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On 10-11-2011 10:50, Jeremy Chadwick wrote: > On Thu, Nov 10, 2011 at 10:22:39AM +0100, Willem Jan Withagen wrote: >> Still running this file server on ZFS, and every now and then em0 >> goes down, and is not revivable.... Nothing goes in or out the >> box... >> >> Any suggestions as how to (help) fix this? > > CC'ing Jack Vogel of Intel. > > We need "pciconf -lvbc" output (-lv by itself isn't sufficient in this > regard). em0@pci0:0:25:0: class=0x020000 card=0x10bd15d9 chip=0x10bd8086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = 'Intel 82566DM Gigabit Ethernet Adapter (82566DM)' class = network subclass = ethernet bar [10] = type Memory, range 32, base 0xdf900000, size 131072, enabled bar [14] = type Memory, range 32, base 0xdf924000, size 4096, enabled bar [18] = type I/O Port, range 32, base 0x1820, size 32, enabled cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message cap 13[e0] = PCI Advanced Features: FLR TP dmidecode gives: Handle 0x0001, DMI type 1, 27 bytes System Information Manufacturer: Supermicro Product Name: C2SBX Version: 0123456789 Serial Number: 0123456789 UUID: 53D1A494-D663-A0E7-890B-003048DE97CD Wake-up Type: Power Switch SKU Number: Not Specified Family: Not Specified > Also, please do "sysctl dev.em.0.debug=1", which will show nothing > useful in the output, however "dmesg" shortly after should have a bunch > of driver-level debugging information that should help (output starts > with "Interface is ...". Please provide that too. System is rebooted. So currrently there is nothing serious in trouble. But trying to switch is on does not seem to work? # sysctl dev.em.0.debug=1 dev.em.0.debug: -1 -> -1 # sysctl -a | grep debug | grep em dev.em.0.debug: -1 Or is it just to dump this: Nov 10 12:44:27 zfs kernel: Interface is RUNNING and INACTIVE Nov 10 12:44:27 zfs kernel: em0: hw tdh = 965, hw tdt = 965 Nov 10 12:44:27 zfs kernel: em0: hw rdh = 586, hw rdt = 585 Nov 10 12:44:27 zfs kernel: em0: Tx Queue Status = 0 Nov 10 12:44:27 zfs kernel: em0: TX descriptors avail = 1024 Nov 10 12:44:27 zfs kernel: em0: Tx Descriptors avail failure = 0 Nov 10 12:44:27 zfs kernel: em0: RX discarded packets = 0 Nov 10 12:44:27 zfs kernel: em0: RX Next to Check = 586 Nov 10 12:44:27 zfs kernel: em0: RX Next to Refresh = 585 I'm telling everybody always that they should go for intel ethernet devices, because "they just work". And I'm still very much convinced of this. So I'll be more than happy to do any debugging and/or testing required. The only thing I can not afford at the moment is leave this box in disconnected state. And note that this problem only raises it nasty head very few weeks... --WjW > >> Nov 10 09:07:41 zfs kernel: em0: Watchdog timeout -- resetting >> Nov 10 09:07:41 zfs kernel: em0: Queue(0) tdh = 187, hw tdt = 189 >> Nov 10 09:07:41 zfs kernel: em0: TX(0) desc avail = 1022,Next TX to Clean = 187 >> Nov 10 09:11:32 zfs kernel: em0: Watchdog timeout -- resetting >> Nov 10 09:11:32 zfs kernel: em0: Queue(0) tdh = 139, hw tdt = 151 >> Nov 10 09:11:32 zfs kernel: em0: TX(0) desc avail = 1012,Next TX to Clean = 139 >> Nov 10 09:16:05 zfs kernel: em0: Watchdog timeout -- resetting >> Nov 10 09:16:05 zfs kernel: em0: Queue(0) tdh = 152, hw tdt = 163 >> Nov 10 09:16:05 zfs kernel: em0: TX(0) desc avail = 1013,Next TX to Clean = 152 >> Nov 10 09:33:10 zfs kernel: em0: Watchdog timeout -- resetting >> Nov 10 09:33:10 zfs kernel: em0: Queue(0) tdh = 161, hw tdt = 176 >> Nov 10 09:33:10 zfs kernel: em0: TX(0) desc avail = 1008,Next TX to Clean = 160 >> Nov 10 09:53:18 zfs kernel: em0: Watchdog timeout -- resetting >> Nov 10 09:53:18 zfs kernel: em0: Queue(0) tdh = 157, hw tdt = 172 >> Nov 10 09:53:18 zfs kernel: em0: TX(0) desc avail = 1009,Next TX to Clean = 157 >> >> Device is: >> Nov 10 10:07:27 zfs kernel: em0:<Intel(R) PRO/1000 Network Connection 7.2.3> port 0x1820-0x183f mem 0xdf900000-0xdf91ffff,0xdf924000-0xdf924fff irq 16 at device 25.0 on pci0 >> Nov 10 10:07:27 zfs kernel: em0: Using an MSI interrupt >> Nov 10 10:07:27 zfs kernel: em0: [FILTER] >> >> pciconf -lv: >> em0@pci0:0:25:0: class=0x020000 card=0x10bd15d9 >> chip=0x10bd8086 rev=0x02 hdr=0x00 >> vendor = 'Intel Corporation' >> device = 'Intel 82566DM Gigabit Ethernet Adapter (82566DM)' >> class = network >> subclass = ethernet >> >> uname: >> 8.2-STABLE FreeBSD 8.2-STABLE #12: Sun Oct 2 13:36:55 CEST 2011 >> amd64 >> >> sysctl -a | grep em.0: >> dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.3 >> dev.em.0.%driver: em >> dev.em.0.%location: slot=25 function=0 handle=\_SB_.PCI0.LAN_ >> dev.em.0.%pnpinfo: vendor=0x8086 device=0x10bd subvendor=0x15d9 >> subdevice=0x10bd class=0x020000 >> dev.em.0.%parent: pci0 >> dev.em.0.nvm: -1 >> dev.em.0.debug: -1 >> dev.em.0.rx_int_delay: 0 >> dev.em.0.tx_int_delay: 66 >> dev.em.0.rx_abs_int_delay: 66 >> dev.em.0.tx_abs_int_delay: 66 >> dev.em.0.rx_processing_limit: 100 >> dev.em.0.flow_control: 3 >> dev.em.0.eee_control: 0 >> dev.em.0.link_irq: 0 >> dev.em.0.mbuf_alloc_fail: 0 >> dev.em.0.cluster_alloc_fail: 0 >> dev.em.0.dropped: 0 >> dev.em.0.tx_dma_fail: 0 >> dev.em.0.rx_overruns: 6 >> dev.em.0.watchdog_timeouts: 5 >> dev.em.0.device_control: 1074790976 >> dev.em.0.rx_control: 67141634 >> dev.em.0.fc_high_water: 8192 >> dev.em.0.fc_low_water: 6692 >> dev.em.0.queue0.txd_head: 78 >> dev.em.0.queue0.txd_tail: 78 >> dev.em.0.queue0.tx_irq: 0 >> dev.em.0.queue0.no_desc_avail: 0 >> dev.em.0.queue0.rxd_head: 376 >> dev.em.0.queue0.rxd_tail: 375 >> dev.em.0.queue0.rx_irq: 0 >> dev.em.0.mac_stats.excess_coll: 0 >> dev.em.0.mac_stats.single_coll: 0 >> dev.em.0.mac_stats.multiple_coll: 0 >> dev.em.0.mac_stats.late_coll: 0 >> dev.em.0.mac_stats.collision_count: 0 >> dev.em.0.mac_stats.symbol_errors: 0 >> dev.em.0.mac_stats.sequence_errors: 0 >> dev.em.0.mac_stats.defer_count: 0 >> dev.em.0.mac_stats.missed_packets: 9 >> dev.em.0.mac_stats.recv_no_buff: 0 >> dev.em.0.mac_stats.recv_undersize: 0 >> dev.em.0.mac_stats.recv_fragmented: 0 >> dev.em.0.mac_stats.recv_oversize: 0 >> dev.em.0.mac_stats.recv_jabber: 0 >> dev.em.0.mac_stats.recv_errs: 1 >> dev.em.0.mac_stats.crc_errs: 1 >> dev.em.0.mac_stats.alignment_errs: 0 >> dev.em.0.mac_stats.coll_ext_errs: 0 >> dev.em.0.mac_stats.xon_recvd: 0 >> dev.em.0.mac_stats.xon_txd: 0 >> dev.em.0.mac_stats.xoff_recvd: 0 >> dev.em.0.mac_stats.xoff_txd: 0 >> dev.em.0.mac_stats.total_pkts_recvd: 160062850 >> dev.em.0.mac_stats.good_pkts_recvd: 160062840 >> dev.em.0.mac_stats.bcast_pkts_recvd: 79648 >> dev.em.0.mac_stats.mcast_pkts_recvd: 10220 >> dev.em.0.mac_stats.rx_frames_64: 0 >> dev.em.0.mac_stats.rx_frames_65_127: 0 >> dev.em.0.mac_stats.rx_frames_128_255: 0 >> dev.em.0.mac_stats.rx_frames_256_511: 0 >> dev.em.0.mac_stats.rx_frames_512_1023: 0 >> dev.em.0.mac_stats.rx_frames_1024_1522: 0 >> dev.em.0.mac_stats.good_octets_recvd: 107143604749 >> dev.em.0.mac_stats.good_octets_txd: 129876768158 >> dev.em.0.mac_stats.total_pkts_txd: 179010567 >> dev.em.0.mac_stats.good_pkts_txd: 179010567 >> dev.em.0.mac_stats.bcast_pkts_txd: 14608 >> dev.em.0.mac_stats.mcast_pkts_txd: 206 >> dev.em.0.mac_stats.tx_frames_64: 0 >> dev.em.0.mac_stats.tx_frames_65_127: 0 >> dev.em.0.mac_stats.tx_frames_128_255: 0 >> dev.em.0.mac_stats.tx_frames_256_511: 0 >> dev.em.0.mac_stats.tx_frames_512_1023: 0 >> dev.em.0.mac_stats.tx_frames_1024_1522: 0 >> dev.em.0.mac_stats.tso_txd: 3691806 >> dev.em.0.mac_stats.tso_ctx_fail: 0 >> dev.em.0.interrupts.asserts: 130023913 >> dev.em.0.interrupts.rx_pkt_timer: 0 >> dev.em.0.interrupts.rx_abs_timer: 0 >> dev.em.0.interrupts.tx_pkt_timer: 0 >> dev.em.0.interrupts.tx_abs_timer: 0 >> dev.em.0.interrupts.tx_queue_empty: 0 >> dev.em.0.interrupts.tx_queue_min_thresh: 0 >> dev.em.0.interrupts.rx_desc_min_thresh: 0 >> dev.em.0.interrupts.rx_overrun: 0 >> dev.em.0.wake: 0 >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4EBBBACE.3020900>