Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Nov 2011 12:51:42 +0100
From:      Willem Jan Withagen <wjw@digiware.nl>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        "stable@freebsd.org" <stable@freebsd.org>, "Vogel, Jack" <jack.vogel@intel.com>
Subject:   Re: em0 watchdog timeout
Message-ID:  <4EBBBACE.3020900@digiware.nl>
In-Reply-To: <20111110095041.GA73812@icarus.home.lan>
References:  <4EBB97DF.3020803@digiware.nl> <20111110095041.GA73812@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On 10-11-2011 10:50, Jeremy Chadwick wrote:
> On Thu, Nov 10, 2011 at 10:22:39AM +0100, Willem Jan Withagen wrote:
>> Still running this file server on ZFS, and every now and then em0
>> goes down, and is not revivable.... Nothing goes in or out the
>> box...
>>
>> Any suggestions as how to (help) fix this?
>
> CC'ing Jack Vogel of Intel.
>
> We need "pciconf -lvbc" output (-lv by itself isn't sufficient in this
> regard).

em0@pci0:0:25:0:        class=0x020000 card=0x10bd15d9 chip=0x10bd8086 
rev=0x02 hdr=0x00
     vendor     = 'Intel Corporation'
     device     = 'Intel 82566DM Gigabit Ethernet Adapter (82566DM)'
     class      = network
     subclass   = ethernet
     bar   [10] = type Memory, range 32, base 0xdf900000, size 131072, 
enabled
     bar   [14] = type Memory, range 32, base 0xdf924000, size 4096, enabled
     bar   [18] = type I/O Port, range 32, base 0x1820, size 32, enabled
     cap 01[c8] = powerspec 2  supports D0 D3  current D0
     cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
     cap 13[e0] = PCI Advanced Features: FLR TP

dmidecode gives:
Handle 0x0001, DMI type 1, 27 bytes
System Information
         Manufacturer: Supermicro
         Product Name: C2SBX
         Version: 0123456789
         Serial Number: 0123456789
         UUID: 53D1A494-D663-A0E7-890B-003048DE97CD
         Wake-up Type: Power Switch
         SKU Number: Not Specified
         Family: Not Specified

> Also, please do "sysctl dev.em.0.debug=1", which will show nothing
> useful in the output, however "dmesg" shortly after should have a bunch
> of driver-level debugging information that should help (output starts
> with "Interface is ...".  Please provide that too.

System is rebooted. So currrently there is nothing serious in trouble.
But trying to switch is on does not seem to work?

# sysctl dev.em.0.debug=1
dev.em.0.debug: -1 -> -1
# sysctl -a | grep debug | grep em
dev.em.0.debug: -1

Or is it just to dump this:

Nov 10 12:44:27 zfs kernel: Interface is RUNNING and INACTIVE
Nov 10 12:44:27 zfs kernel: em0: hw tdh = 965, hw tdt = 965
Nov 10 12:44:27 zfs kernel: em0: hw rdh = 586, hw rdt = 585
Nov 10 12:44:27 zfs kernel: em0: Tx Queue Status = 0
Nov 10 12:44:27 zfs kernel: em0: TX descriptors avail = 1024
Nov 10 12:44:27 zfs kernel: em0: Tx Descriptors avail failure = 0
Nov 10 12:44:27 zfs kernel: em0: RX discarded packets = 0
Nov 10 12:44:27 zfs kernel: em0: RX Next to Check = 586
Nov 10 12:44:27 zfs kernel: em0: RX Next to Refresh = 585

I'm telling everybody always that they should go for intel ethernet 
devices, because "they just work". And I'm still very much convinced of 
this. So I'll be more than happy to do any debugging and/or testing 
required. The only thing I can not afford at the moment is leave this 
box in disconnected state.

And note that this problem only raises it nasty head very few weeks...

--WjW

>
>> Nov 10 09:07:41 zfs kernel: em0: Watchdog timeout -- resetting
>> Nov 10 09:07:41 zfs kernel: em0: Queue(0) tdh = 187, hw tdt = 189
>> Nov 10 09:07:41 zfs kernel: em0: TX(0) desc avail = 1022,Next TX to Clean = 187
>> Nov 10 09:11:32 zfs kernel: em0: Watchdog timeout -- resetting
>> Nov 10 09:11:32 zfs kernel: em0: Queue(0) tdh = 139, hw tdt = 151
>> Nov 10 09:11:32 zfs kernel: em0: TX(0) desc avail = 1012,Next TX to Clean = 139
>> Nov 10 09:16:05 zfs kernel: em0: Watchdog timeout -- resetting
>> Nov 10 09:16:05 zfs kernel: em0: Queue(0) tdh = 152, hw tdt = 163
>> Nov 10 09:16:05 zfs kernel: em0: TX(0) desc avail = 1013,Next TX to Clean = 152
>> Nov 10 09:33:10 zfs kernel: em0: Watchdog timeout -- resetting
>> Nov 10 09:33:10 zfs kernel: em0: Queue(0) tdh = 161, hw tdt = 176
>> Nov 10 09:33:10 zfs kernel: em0: TX(0) desc avail = 1008,Next TX to Clean = 160
>> Nov 10 09:53:18 zfs kernel: em0: Watchdog timeout -- resetting
>> Nov 10 09:53:18 zfs kernel: em0: Queue(0) tdh = 157, hw tdt = 172
>> Nov 10 09:53:18 zfs kernel: em0: TX(0) desc avail = 1009,Next TX to Clean = 157
>>
>> Device is:
>> Nov 10 10:07:27 zfs kernel: em0:<Intel(R) PRO/1000 Network Connection 7.2.3>  port 0x1820-0x183f mem 0xdf900000-0xdf91ffff,0xdf924000-0xdf924fff irq 16 at device 25.0 on pci0
>> Nov 10 10:07:27 zfs kernel: em0: Using an MSI interrupt
>> Nov 10 10:07:27 zfs kernel: em0: [FILTER]
>>
>> pciconf -lv:
>> em0@pci0:0:25:0:        class=0x020000 card=0x10bd15d9
>> chip=0x10bd8086 rev=0x02 hdr=0x00
>>      vendor     = 'Intel Corporation'
>>      device     = 'Intel 82566DM Gigabit Ethernet Adapter (82566DM)'
>>      class      = network
>>      subclass   = ethernet
>>
>> uname:
>> 	8.2-STABLE FreeBSD 8.2-STABLE #12: Sun Oct  2 13:36:55 CEST 2011
>> 	amd64
>>
>> sysctl -a | grep em.0:
>> dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.2.3
>> dev.em.0.%driver: em
>> dev.em.0.%location: slot=25 function=0 handle=\_SB_.PCI0.LAN_
>> dev.em.0.%pnpinfo: vendor=0x8086 device=0x10bd subvendor=0x15d9
>> subdevice=0x10bd class=0x020000
>> dev.em.0.%parent: pci0
>> dev.em.0.nvm: -1
>> dev.em.0.debug: -1
>> dev.em.0.rx_int_delay: 0
>> dev.em.0.tx_int_delay: 66
>> dev.em.0.rx_abs_int_delay: 66
>> dev.em.0.tx_abs_int_delay: 66
>> dev.em.0.rx_processing_limit: 100
>> dev.em.0.flow_control: 3
>> dev.em.0.eee_control: 0
>> dev.em.0.link_irq: 0
>> dev.em.0.mbuf_alloc_fail: 0
>> dev.em.0.cluster_alloc_fail: 0
>> dev.em.0.dropped: 0
>> dev.em.0.tx_dma_fail: 0
>> dev.em.0.rx_overruns: 6
>> dev.em.0.watchdog_timeouts: 5
>> dev.em.0.device_control: 1074790976
>> dev.em.0.rx_control: 67141634
>> dev.em.0.fc_high_water: 8192
>> dev.em.0.fc_low_water: 6692
>> dev.em.0.queue0.txd_head: 78
>> dev.em.0.queue0.txd_tail: 78
>> dev.em.0.queue0.tx_irq: 0
>> dev.em.0.queue0.no_desc_avail: 0
>> dev.em.0.queue0.rxd_head: 376
>> dev.em.0.queue0.rxd_tail: 375
>> dev.em.0.queue0.rx_irq: 0
>> dev.em.0.mac_stats.excess_coll: 0
>> dev.em.0.mac_stats.single_coll: 0
>> dev.em.0.mac_stats.multiple_coll: 0
>> dev.em.0.mac_stats.late_coll: 0
>> dev.em.0.mac_stats.collision_count: 0
>> dev.em.0.mac_stats.symbol_errors: 0
>> dev.em.0.mac_stats.sequence_errors: 0
>> dev.em.0.mac_stats.defer_count: 0
>> dev.em.0.mac_stats.missed_packets: 9
>> dev.em.0.mac_stats.recv_no_buff: 0
>> dev.em.0.mac_stats.recv_undersize: 0
>> dev.em.0.mac_stats.recv_fragmented: 0
>> dev.em.0.mac_stats.recv_oversize: 0
>> dev.em.0.mac_stats.recv_jabber: 0
>> dev.em.0.mac_stats.recv_errs: 1
>> dev.em.0.mac_stats.crc_errs: 1
>> dev.em.0.mac_stats.alignment_errs: 0
>> dev.em.0.mac_stats.coll_ext_errs: 0
>> dev.em.0.mac_stats.xon_recvd: 0
>> dev.em.0.mac_stats.xon_txd: 0
>> dev.em.0.mac_stats.xoff_recvd: 0
>> dev.em.0.mac_stats.xoff_txd: 0
>> dev.em.0.mac_stats.total_pkts_recvd: 160062850
>> dev.em.0.mac_stats.good_pkts_recvd: 160062840
>> dev.em.0.mac_stats.bcast_pkts_recvd: 79648
>> dev.em.0.mac_stats.mcast_pkts_recvd: 10220
>> dev.em.0.mac_stats.rx_frames_64: 0
>> dev.em.0.mac_stats.rx_frames_65_127: 0
>> dev.em.0.mac_stats.rx_frames_128_255: 0
>> dev.em.0.mac_stats.rx_frames_256_511: 0
>> dev.em.0.mac_stats.rx_frames_512_1023: 0
>> dev.em.0.mac_stats.rx_frames_1024_1522: 0
>> dev.em.0.mac_stats.good_octets_recvd: 107143604749
>> dev.em.0.mac_stats.good_octets_txd: 129876768158
>> dev.em.0.mac_stats.total_pkts_txd: 179010567
>> dev.em.0.mac_stats.good_pkts_txd: 179010567
>> dev.em.0.mac_stats.bcast_pkts_txd: 14608
>> dev.em.0.mac_stats.mcast_pkts_txd: 206
>> dev.em.0.mac_stats.tx_frames_64: 0
>> dev.em.0.mac_stats.tx_frames_65_127: 0
>> dev.em.0.mac_stats.tx_frames_128_255: 0
>> dev.em.0.mac_stats.tx_frames_256_511: 0
>> dev.em.0.mac_stats.tx_frames_512_1023: 0
>> dev.em.0.mac_stats.tx_frames_1024_1522: 0
>> dev.em.0.mac_stats.tso_txd: 3691806
>> dev.em.0.mac_stats.tso_ctx_fail: 0
>> dev.em.0.interrupts.asserts: 130023913
>> dev.em.0.interrupts.rx_pkt_timer: 0
>> dev.em.0.interrupts.rx_abs_timer: 0
>> dev.em.0.interrupts.tx_pkt_timer: 0
>> dev.em.0.interrupts.tx_abs_timer: 0
>> dev.em.0.interrupts.tx_queue_empty: 0
>> dev.em.0.interrupts.tx_queue_min_thresh: 0
>> dev.em.0.interrupts.rx_desc_min_thresh: 0
>> dev.em.0.interrupts.rx_overrun: 0
>> dev.em.0.wake: 0
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4EBBBACE.3020900>