Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 08 Jan 2015 11:05:17 +0100
From:      Harald Schmalzbauer <h.schmalzbauer@omnilan.de>
To:        FreeBSD Stable <freebsd-stable@freebsd.org>
Subject:   Re: igb(4) watchdog timeout, lagg(4) fails
Message-ID:  <54AE565D.50208@omnilan.de>
In-Reply-To: <54ACC6A2.1050400@omnilan.de>
References:  <54ACC6A2.1050400@omnilan.de>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigC61D9B5CE24D54DAC6837971
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

 Bez=C3=BCglich Harry Schmalzbauer's Nachricht vom 07.01.2015 06:39 (loca=
ltime):
>  Hello,
>
> recently I upgraded one server from 9.1 to 10.1. There are two 82576
> (one port of two Intel ET Dual-Port GbE [kawela]), driven by igb(4).
> I've never seen any watchdog timeout with FreeBSD-9.1 but suddenly (wit=
h
> 10-stable) I see:
> igb0: Watchdog timeout -- resetting
> igb0: Queue(0) tdh =3D 2974, hw tdt =3D 2973
> igb0: TX(0) desc avail =3D 0,Next TX to Clean =3D 0
>
> My biggest problem is, that lagg(4) doesn't detect the problem with
> igb0. It's configured with "lagghash l2' and most connections were
> interupted until I manually do 'ifconfig igb0 down'. Then lagg does it'=
s
> job and connectivity was restored via the remaining igb1.
>
> Is there a way to auto-if-down an interface which suffers from watchdog=

> timeouts? And any way to really reset it without rebooting the machine?=


igb wathchdog timeout happened again :-( ~48 hours after the last with
very moderate-to-low avarage traffic.

This time I could fetch dev.igb sysctls before igb0 was reset by watchdog=

It's showing strange irq load:

dev.igb.%parent:
dev.igb.0.%desc: Intel(R) PRO/1000 Network Connection version - 2.4.0
dev.igb.0.%driver: igb
dev.igb.0.%location: slot=3D0 function=3D0 handle=3D\_SB_.PCI0.PE60.S1F0
dev.igb.0.%pnpinfo: vendor=3D0x8086 device=3D0x10c9 subvendor=3D0x8086
subdevice=3D0xa03c class=3D0x020000
dev.igb.0.%parent: pci7
dev.igb.0.nvm: -1
dev.igb.0.enable_aim: 1
dev.igb.0.fc: 3
dev.igb.0.rx_processing_limit: 100
dev.igb.0.link_irq: 5
dev.igb.0.dropped: 0
dev.igb.0.tx_dma_fail: 0
dev.igb.0.rx_overruns: 0
dev.igb.0.watchdog_timeouts: 1
dev.igb.0.device_control: 1488978497
dev.igb.0.rx_control: 67272738
dev.igb.0.interrupt_mask: 4
dev.igb.0.extended_int_mask: 2147483679
dev.igb.0.tx_buf_alloc: 0
dev.igb.0.rx_buf_alloc: 0
dev.igb.0.fc_high_water: 47488
dev.igb.0.fc_low_water: 47472
dev.igb.0.queue0.interrupt_rate: 8000
dev.igb.0.queue0.txd_head: 0
dev.igb.0.queue0.txd_tail: 468
dev.igb.0.queue0.no_desc_avail: 41
dev.igb.0.queue0.tx_packets: 90807
dev.igb.0.queue0.rxd_head: 0
dev.igb.0.queue0.rxd_tail: 4095
dev.igb.0.queue0.rx_packets: 443307
dev.igb.0.queue0.rx_bytes: 0
dev.igb.0.queue0.lro_queued: 0
dev.igb.0.queue0.lro_flushed: 0
dev.igb.0.queue1.interrupt_rate: 8000
dev.igb.0.queue1.txd_head: 0
dev.igb.0.queue1.txd_tail: 221
dev.igb.0.queue1.no_desc_avail: 0
dev.igb.0.queue1.tx_packets: 300702
dev.igb.0.queue1.rxd_head: 0
dev.igb.0.queue1.rxd_tail: 4095
dev.igb.0.queue1.rx_packets: 734853
dev.igb.0.queue1.rx_bytes: 0
dev.igb.0.queue1.lro_queued: 0
dev.igb.0.queue1.lro_flushed: 0
dev.igb.0.queue2.interrupt_rate: 8000
dev.igb.0.queue2.txd_head: 0
dev.igb.0.queue2.txd_tail: 116
dev.igb.0.queue2.no_desc_avail: 0
dev.igb.0.queue2.tx_packets: 635285
dev.igb.0.queue2.rxd_head: 0
dev.igb.0.queue2.rxd_tail: 4095
dev.igb.0.queue2.rx_packets: 163156
dev.igb.0.queue2.rx_bytes: 0
dev.igb.0.queue2.lro_queued: 0
dev.igb.0.queue2.lro_flushed: 0
dev.igb.0.queue3.interrupt_rate: 8000
dev.igb.0.queue3.txd_head: 0
dev.igb.0.queue3.txd_tail: 199
dev.igb.0.queue3.no_desc_avail: 0
dev.igb.0.queue3.tx_packets: 177701
dev.igb.0.queue3.rxd_head: 0
dev.igb.0.queue3.rxd_tail: 4095
dev.igb.0.queue3.rx_packets: 209749
dev.igb.0.queue3.rx_bytes: 0
dev.igb.0.queue3.lro_queued: 0
dev.igb.0.queue3.lro_flushed: 0
dev.igb.0.mac_stats.excess_coll: 0
dev.igb.0.mac_stats.single_coll: 0
dev.igb.0.mac_stats.multiple_coll: 0
dev.igb.0.mac_stats.late_coll: 0
dev.igb.0.mac_stats.collision_count: 0
dev.igb.0.mac_stats.symbol_errors: 0
dev.igb.0.mac_stats.sequence_errors: 0
dev.igb.0.mac_stats.defer_count: 0
dev.igb.0.mac_stats.missed_packets: 0
dev.igb.0.mac_stats.recv_length_errors: 0
dev.igb.0.mac_stats.recv_no_buff: 0
dev.igb.0.mac_stats.recv_undersize: 0
dev.igb.0.mac_stats.recv_fragmented: 0
dev.igb.0.mac_stats.recv_oversize: 0
dev.igb.0.mac_stats.recv_jabber: 0
dev.igb.0.mac_stats.recv_errs: 0
dev.igb.0.mac_stats.crc_errs: 0
dev.igb.0.mac_stats.alignment_errs: 0
dev.igb.0.mac_stats.tx_no_crs: 0
dev.igb.0.mac_stats.coll_ext_errs: 0
dev.igb.0.mac_stats.xon_recvd: 0
dev.igb.0.mac_stats.xon_txd: 0
dev.igb.0.mac_stats.xoff_recvd: 0
dev.igb.0.mac_stats.xoff_txd: 0
dev.igb.0.mac_stats.unsupported_fc_recvd: 0
dev.igb.0.mac_stats.mgmt_pkts_recvd: 0
dev.igb.0.mac_stats.mgmt_pkts_drop: 0
dev.igb.0.mac_stats.mgmt_pkts_txd: 0
dev.igb.0.mac_stats.total_pkts_recvd: 1707305
dev.igb.0.mac_stats.good_pkts_recvd: 1551183
dev.igb.0.mac_stats.bcast_pkts_recvd: 179491
dev.igb.0.mac_stats.mcast_pkts_recvd: 1868
dev.igb.0.mac_stats.rx_frames_64: 212
dev.igb.0.mac_stats.rx_frames_65_127: 843418
dev.igb.0.mac_stats.rx_frames_128_255: 116516
dev.igb.0.mac_stats.rx_frames_256_511: 81391
dev.igb.0.mac_stats.rx_frames_512_1023: 14010
dev.igb.0.mac_stats.rx_frames_1024_1522: 495636
dev.igb.0.mac_stats.good_octets_recvd: 4228681579
dev.igb.0.mac_stats.total_octets_recvd: 4239899893
dev.igb.0.mac_stats.good_octets_txd: 3039302164
dev.igb.0.mac_stats.total_octets_recvd: 4239899893
dev.igb.0.mac_stats.good_octets_txd: 3039302164
dev.igb.0.mac_stats.total_octets_txd: 3039302164
dev.igb.0.mac_stats.total_pkts_txd: 1424648
dev.igb.0.mac_stats.good_pkts_txd: 1424648
dev.igb.0.mac_stats.bcast_pkts_txd: 412
dev.igb.0.mac_stats.mcast_pkts_txd: 6
dev.igb.0.mac_stats.tx_frames_64: 639519
dev.igb.0.mac_stats.tx_frames_65_127: 253844
dev.igb.0.mac_stats.tx_frames_128_255: 180022
dev.igb.0.mac_stats.tx_frames_256_511: 873
dev.igb.0.mac_stats.tx_frames_512_1023: 292
dev.igb.0.mac_stats.tx_frames_1024_1522: 350098
dev.igb.0.mac_stats.tso_txd: 95280
dev.igb.0.mac_stats.tso_ctx_fail: 0
dev.igb.0.interrupts.asserts: 3323144
dev.igb.0.interrupts.rx_pkt_timer: 1551160
dev.igb.0.interrupts.rx_abs_timer: 0
dev.igb.0.interrupts.tx_pkt_timer: 0
dev.igb.0.interrupts.tx_abs_timer: 1551069
dev.igb.0.interrupts.tx_queue_empty: 1424637
dev.igb.0.interrupts.tx_queue_min_thresh: 0
dev.igb.0.interrupts.rx_desc_min_thresh: 0
dev.igb.0.interrupts.rx_overrun: 0
dev.igb.0.host.breaker_tx_pkt: 0
dev.igb.0.host.host_tx_pkt_discard: 0
dev.igb.0.host.rx_pkt: 23
dev.igb.0.host.breaker_rx_pkts: 0
dev.igb.0.host.breaker_rx_pkt_drop: 0
dev.igb.0.host.tx_good_pkt: 11
dev.igb.0.host.breaker_tx_pkt_drop: 0
dev.igb.0.host.rx_good_bytes: 4228681579
dev.igb.0.host.tx_good_bytes: 3039302164
dev.igb.0.host.length_errors: 0
dev.igb.0.host.serdes_violation_pkt: 0
dev.igb.0.host.header_redir_missed: 0


Also igb1 was quiet busy at that time, but igb1 never hung:

dev.igb.1.queue0.interrupt_rate: 10526
dev.igb.1.queue0.txd_head: 1879
dev.igb.1.queue0.txd_tail: 1879
dev.igb.1.queue0.no_desc_avail: 0
dev.igb.1.queue0.tx_packets: 8694
dev.igb.1.queue0.rxd_head: 1116
dev.igb.1.queue0.rxd_tail: 1115
dev.igb.1.queue0.rx_packets: 181340
dev.igb.1.queue0.rx_bytes: 11819287
dev.igb.1.queue0.lro_queued: 0
dev.igb.1.queue0.lro_flushed: 0
dev.igb.1.queue1.interrupt_rate: 76923
dev.igb.1.queue1.txd_head: 945
dev.igb.1.queue1.txd_tail: 945
dev.igb.1.queue1.no_desc_avail: 0
dev.igb.1.queue1.tx_packets: 9295572
dev.igb.1.queue1.rxd_head: 203
dev.igb.1.queue1.rxd_tail: 202
dev.igb.1.queue1.rx_packets: 18239691
dev.igb.1.queue1.rx_bytes: 23591559819
dev.igb.1.queue1.lro_queued: 0
dev.igb.1.queue1.lro_flushed: 0
dev.igb.1.queue2.interrupt_rate: 43478
dev.igb.1.queue2.txd_head: 4027
dev.igb.1.queue2.txd_tail: 4027
dev.igb.1.queue2.no_desc_avail: 0
dev.igb.1.queue2.tx_packets: 7335
dev.igb.1.queue2.rxd_head: 2158
dev.igb.1.queue2.rxd_tail: 2157
dev.igb.1.queue2.rx_packets: 2153
dev.igb.1.queue2.rx_bytes: 413198
dev.igb.1.queue2.lro_queued: 0
dev.igb.1.queue2.lro_flushed: 0
dev.igb.1.queue3.interrupt_rate: 43478

Should I consider tungin "hw.igb.max_interrupt_rate" ?

Any help highly appreciated!
Like mentioned initially, I've never had this issue with FreeBSD 9.1
with exactly the same environment/workload.

Thanks,

-Harry


--------------enigC61D9B5CE24D54DAC6837971
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (FreeBSD)

iEYEARECAAYFAlSuVl0ACgkQLDqVQ9VXb8hP+ACglU00n6O1aYGjbRV5jUbIjyHU
BBYAnA8ckcrihi59DGrFnaCsLFmmOZMR
=1bn0
-----END PGP SIGNATURE-----

--------------enigC61D9B5CE24D54DAC6837971--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54AE565D.50208>