Date: Thu, 08 Jan 2015 11:05:17 +0100 From: Harald Schmalzbauer <h.schmalzbauer@omnilan.de> To: FreeBSD Stable <freebsd-stable@freebsd.org> Subject: Re: igb(4) watchdog timeout, lagg(4) fails Message-ID: <54AE565D.50208@omnilan.de> In-Reply-To: <54ACC6A2.1050400@omnilan.de> References: <54ACC6A2.1050400@omnilan.de>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigC61D9B5CE24D54DAC6837971 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Bez=C3=BCglich Harry Schmalzbauer's Nachricht vom 07.01.2015 06:39 (loca= ltime): > Hello, > > recently I upgraded one server from 9.1 to 10.1. There are two 82576 > (one port of two Intel ET Dual-Port GbE [kawela]), driven by igb(4). > I've never seen any watchdog timeout with FreeBSD-9.1 but suddenly (wit= h > 10-stable) I see: > igb0: Watchdog timeout -- resetting > igb0: Queue(0) tdh =3D 2974, hw tdt =3D 2973 > igb0: TX(0) desc avail =3D 0,Next TX to Clean =3D 0 > > My biggest problem is, that lagg(4) doesn't detect the problem with > igb0. It's configured with "lagghash l2' and most connections were > interupted until I manually do 'ifconfig igb0 down'. Then lagg does it'= s > job and connectivity was restored via the remaining igb1. > > Is there a way to auto-if-down an interface which suffers from watchdog= > timeouts? And any way to really reset it without rebooting the machine?= igb wathchdog timeout happened again :-( ~48 hours after the last with very moderate-to-low avarage traffic. This time I could fetch dev.igb sysctls before igb0 was reset by watchdog= It's showing strange irq load: dev.igb.%parent: dev.igb.0.%desc: Intel(R) PRO/1000 Network Connection version - 2.4.0 dev.igb.0.%driver: igb dev.igb.0.%location: slot=3D0 function=3D0 handle=3D\_SB_.PCI0.PE60.S1F0 dev.igb.0.%pnpinfo: vendor=3D0x8086 device=3D0x10c9 subvendor=3D0x8086 subdevice=3D0xa03c class=3D0x020000 dev.igb.0.%parent: pci7 dev.igb.0.nvm: -1 dev.igb.0.enable_aim: 1 dev.igb.0.fc: 3 dev.igb.0.rx_processing_limit: 100 dev.igb.0.link_irq: 5 dev.igb.0.dropped: 0 dev.igb.0.tx_dma_fail: 0 dev.igb.0.rx_overruns: 0 dev.igb.0.watchdog_timeouts: 1 dev.igb.0.device_control: 1488978497 dev.igb.0.rx_control: 67272738 dev.igb.0.interrupt_mask: 4 dev.igb.0.extended_int_mask: 2147483679 dev.igb.0.tx_buf_alloc: 0 dev.igb.0.rx_buf_alloc: 0 dev.igb.0.fc_high_water: 47488 dev.igb.0.fc_low_water: 47472 dev.igb.0.queue0.interrupt_rate: 8000 dev.igb.0.queue0.txd_head: 0 dev.igb.0.queue0.txd_tail: 468 dev.igb.0.queue0.no_desc_avail: 41 dev.igb.0.queue0.tx_packets: 90807 dev.igb.0.queue0.rxd_head: 0 dev.igb.0.queue0.rxd_tail: 4095 dev.igb.0.queue0.rx_packets: 443307 dev.igb.0.queue0.rx_bytes: 0 dev.igb.0.queue0.lro_queued: 0 dev.igb.0.queue0.lro_flushed: 0 dev.igb.0.queue1.interrupt_rate: 8000 dev.igb.0.queue1.txd_head: 0 dev.igb.0.queue1.txd_tail: 221 dev.igb.0.queue1.no_desc_avail: 0 dev.igb.0.queue1.tx_packets: 300702 dev.igb.0.queue1.rxd_head: 0 dev.igb.0.queue1.rxd_tail: 4095 dev.igb.0.queue1.rx_packets: 734853 dev.igb.0.queue1.rx_bytes: 0 dev.igb.0.queue1.lro_queued: 0 dev.igb.0.queue1.lro_flushed: 0 dev.igb.0.queue2.interrupt_rate: 8000 dev.igb.0.queue2.txd_head: 0 dev.igb.0.queue2.txd_tail: 116 dev.igb.0.queue2.no_desc_avail: 0 dev.igb.0.queue2.tx_packets: 635285 dev.igb.0.queue2.rxd_head: 0 dev.igb.0.queue2.rxd_tail: 4095 dev.igb.0.queue2.rx_packets: 163156 dev.igb.0.queue2.rx_bytes: 0 dev.igb.0.queue2.lro_queued: 0 dev.igb.0.queue2.lro_flushed: 0 dev.igb.0.queue3.interrupt_rate: 8000 dev.igb.0.queue3.txd_head: 0 dev.igb.0.queue3.txd_tail: 199 dev.igb.0.queue3.no_desc_avail: 0 dev.igb.0.queue3.tx_packets: 177701 dev.igb.0.queue3.rxd_head: 0 dev.igb.0.queue3.rxd_tail: 4095 dev.igb.0.queue3.rx_packets: 209749 dev.igb.0.queue3.rx_bytes: 0 dev.igb.0.queue3.lro_queued: 0 dev.igb.0.queue3.lro_flushed: 0 dev.igb.0.mac_stats.excess_coll: 0 dev.igb.0.mac_stats.single_coll: 0 dev.igb.0.mac_stats.multiple_coll: 0 dev.igb.0.mac_stats.late_coll: 0 dev.igb.0.mac_stats.collision_count: 0 dev.igb.0.mac_stats.symbol_errors: 0 dev.igb.0.mac_stats.sequence_errors: 0 dev.igb.0.mac_stats.defer_count: 0 dev.igb.0.mac_stats.missed_packets: 0 dev.igb.0.mac_stats.recv_length_errors: 0 dev.igb.0.mac_stats.recv_no_buff: 0 dev.igb.0.mac_stats.recv_undersize: 0 dev.igb.0.mac_stats.recv_fragmented: 0 dev.igb.0.mac_stats.recv_oversize: 0 dev.igb.0.mac_stats.recv_jabber: 0 dev.igb.0.mac_stats.recv_errs: 0 dev.igb.0.mac_stats.crc_errs: 0 dev.igb.0.mac_stats.alignment_errs: 0 dev.igb.0.mac_stats.tx_no_crs: 0 dev.igb.0.mac_stats.coll_ext_errs: 0 dev.igb.0.mac_stats.xon_recvd: 0 dev.igb.0.mac_stats.xon_txd: 0 dev.igb.0.mac_stats.xoff_recvd: 0 dev.igb.0.mac_stats.xoff_txd: 0 dev.igb.0.mac_stats.unsupported_fc_recvd: 0 dev.igb.0.mac_stats.mgmt_pkts_recvd: 0 dev.igb.0.mac_stats.mgmt_pkts_drop: 0 dev.igb.0.mac_stats.mgmt_pkts_txd: 0 dev.igb.0.mac_stats.total_pkts_recvd: 1707305 dev.igb.0.mac_stats.good_pkts_recvd: 1551183 dev.igb.0.mac_stats.bcast_pkts_recvd: 179491 dev.igb.0.mac_stats.mcast_pkts_recvd: 1868 dev.igb.0.mac_stats.rx_frames_64: 212 dev.igb.0.mac_stats.rx_frames_65_127: 843418 dev.igb.0.mac_stats.rx_frames_128_255: 116516 dev.igb.0.mac_stats.rx_frames_256_511: 81391 dev.igb.0.mac_stats.rx_frames_512_1023: 14010 dev.igb.0.mac_stats.rx_frames_1024_1522: 495636 dev.igb.0.mac_stats.good_octets_recvd: 4228681579 dev.igb.0.mac_stats.total_octets_recvd: 4239899893 dev.igb.0.mac_stats.good_octets_txd: 3039302164 dev.igb.0.mac_stats.total_octets_recvd: 4239899893 dev.igb.0.mac_stats.good_octets_txd: 3039302164 dev.igb.0.mac_stats.total_octets_txd: 3039302164 dev.igb.0.mac_stats.total_pkts_txd: 1424648 dev.igb.0.mac_stats.good_pkts_txd: 1424648 dev.igb.0.mac_stats.bcast_pkts_txd: 412 dev.igb.0.mac_stats.mcast_pkts_txd: 6 dev.igb.0.mac_stats.tx_frames_64: 639519 dev.igb.0.mac_stats.tx_frames_65_127: 253844 dev.igb.0.mac_stats.tx_frames_128_255: 180022 dev.igb.0.mac_stats.tx_frames_256_511: 873 dev.igb.0.mac_stats.tx_frames_512_1023: 292 dev.igb.0.mac_stats.tx_frames_1024_1522: 350098 dev.igb.0.mac_stats.tso_txd: 95280 dev.igb.0.mac_stats.tso_ctx_fail: 0 dev.igb.0.interrupts.asserts: 3323144 dev.igb.0.interrupts.rx_pkt_timer: 1551160 dev.igb.0.interrupts.rx_abs_timer: 0 dev.igb.0.interrupts.tx_pkt_timer: 0 dev.igb.0.interrupts.tx_abs_timer: 1551069 dev.igb.0.interrupts.tx_queue_empty: 1424637 dev.igb.0.interrupts.tx_queue_min_thresh: 0 dev.igb.0.interrupts.rx_desc_min_thresh: 0 dev.igb.0.interrupts.rx_overrun: 0 dev.igb.0.host.breaker_tx_pkt: 0 dev.igb.0.host.host_tx_pkt_discard: 0 dev.igb.0.host.rx_pkt: 23 dev.igb.0.host.breaker_rx_pkts: 0 dev.igb.0.host.breaker_rx_pkt_drop: 0 dev.igb.0.host.tx_good_pkt: 11 dev.igb.0.host.breaker_tx_pkt_drop: 0 dev.igb.0.host.rx_good_bytes: 4228681579 dev.igb.0.host.tx_good_bytes: 3039302164 dev.igb.0.host.length_errors: 0 dev.igb.0.host.serdes_violation_pkt: 0 dev.igb.0.host.header_redir_missed: 0 Also igb1 was quiet busy at that time, but igb1 never hung: dev.igb.1.queue0.interrupt_rate: 10526 dev.igb.1.queue0.txd_head: 1879 dev.igb.1.queue0.txd_tail: 1879 dev.igb.1.queue0.no_desc_avail: 0 dev.igb.1.queue0.tx_packets: 8694 dev.igb.1.queue0.rxd_head: 1116 dev.igb.1.queue0.rxd_tail: 1115 dev.igb.1.queue0.rx_packets: 181340 dev.igb.1.queue0.rx_bytes: 11819287 dev.igb.1.queue0.lro_queued: 0 dev.igb.1.queue0.lro_flushed: 0 dev.igb.1.queue1.interrupt_rate: 76923 dev.igb.1.queue1.txd_head: 945 dev.igb.1.queue1.txd_tail: 945 dev.igb.1.queue1.no_desc_avail: 0 dev.igb.1.queue1.tx_packets: 9295572 dev.igb.1.queue1.rxd_head: 203 dev.igb.1.queue1.rxd_tail: 202 dev.igb.1.queue1.rx_packets: 18239691 dev.igb.1.queue1.rx_bytes: 23591559819 dev.igb.1.queue1.lro_queued: 0 dev.igb.1.queue1.lro_flushed: 0 dev.igb.1.queue2.interrupt_rate: 43478 dev.igb.1.queue2.txd_head: 4027 dev.igb.1.queue2.txd_tail: 4027 dev.igb.1.queue2.no_desc_avail: 0 dev.igb.1.queue2.tx_packets: 7335 dev.igb.1.queue2.rxd_head: 2158 dev.igb.1.queue2.rxd_tail: 2157 dev.igb.1.queue2.rx_packets: 2153 dev.igb.1.queue2.rx_bytes: 413198 dev.igb.1.queue2.lro_queued: 0 dev.igb.1.queue2.lro_flushed: 0 dev.igb.1.queue3.interrupt_rate: 43478 Should I consider tungin "hw.igb.max_interrupt_rate" ? Any help highly appreciated! Like mentioned initially, I've never had this issue with FreeBSD 9.1 with exactly the same environment/workload. Thanks, -Harry --------------enigC61D9B5CE24D54DAC6837971 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAlSuVl0ACgkQLDqVQ9VXb8hP+ACglU00n6O1aYGjbRV5jUbIjyHU BBYAnA8ckcrihi59DGrFnaCsLFmmOZMR =1bn0 -----END PGP SIGNATURE----- --------------enigC61D9B5CE24D54DAC6837971--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54AE565D.50208>