Date: Sun, 09 Feb 2014 20:56:21 +0400 From: Boris Samorodov <bsam@passap.ru> To: pyunyh@gmail.com Cc: FreeBSD CURRENT <freebsd-current@freebsd.org> Subject: Re: regression: msk0 watchdog timeout and interrupt storm Message-ID: <52F7B335.4010509@passap.ru> In-Reply-To: <52F3C275.6000106@passap.ru> References: <526FBA53.9000208@passap.ru> <20131030021650.GA3106@michelle.cdnetworks.com> <52725C3D.2030602@passap.ru> <52ECADF3.4020909@passap.ru> <20140206020003.GC2810@michelle.cdnetworks.com> <52F3C275.6000106@passap.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
06.02.2014 21:12, Boris Samorodov пишет: > 06.02.2014 06:00, Yonghyeon PYUN пишет: >> On Sat, Feb 01, 2014 at 12:18:59PM +0400, Boris Samorodov wrote: >>> Hi Yonghyeon and All, >>> >>> (this time it's a CURRENT issue) >>> >>> 31.10.2013 17:33, Boris Samorodov пишет: >>>> 30.10.2013 06:16, Yonghyeon PYUN пишет: >>>>> On Tue, Oct 29, 2013 at 05:38:27PM +0400, Boris Samorodov wrote: >>>> >>>>>> >From time to time I use a notebook and boot FreeBSD from USB >>>>>> stick. FreeBSD 9.2-i386 works OK. So I tried to use >>>>>> FreeBSD 10.0-i386 BETA2 and the network adapter works for >>>>>> some 10-15 seconds and then stops with diagnostic message >>>>>> "msk0:watchdog timeout". I've found similar case at >>>>>> freebsd-current@ with no workaround. Yes, there is an >>>>>> interrupt storm as well. >>>>> >>>>> There had been no functional changes for very long time so I'm not >>>>> sure what's going on here. I've attached local change I have at >>>>> this moment but I'm afraid it wouldn't address the issue above. >>>>> >>>>> I recall jhb also reported interrupt storm in the past but the root >>>>> cause was not identified yet. Could you change msk_intr() and let >>>>> me know which interrupt is firing? >>>> >>>> I've yet to organize a build. >>>> >>>>>> Here is some additional info: >>>>>> ----- >>>>>> mskc0@pci0:3:0:0: class=0x020000 card=0xff501179 chip=0x435511ab >>>>>> rev=0x12 hdr=0x00 >>>>>> vendor = 'Marvell Technology Group Ltd.' >>>>>> device = '88E8040T PCI-E Fast Ethernet Controller' >>>>>> class = network >>>>>> subclass = ethernet >>>>>> cap 01[48] = powerspec 3 supports D0 D1 D2 D3 current D0 >>>>>> cap 05[5c] = MSI supports 1 message, 64 bit enabled with 1 message >>>>>> cap 10[c0] = PCI-Express 2 legacy endpoint max data 128(128) link x1(x1) >>>>>> speed 2.5(2.5) ASPM disabled(L0s/L1) >>>>>> ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected >>>>>> ecap 0003[130] = Serial 1 b8b063ffff681e00 >>>>>> ----- >>>> >>>> Meanwhile some more investigations, "vmstat -i" for calm and storm: >>>> ----- >>>> interrupt total rate >>>> irq1: atkbd0 1025 2 >>>> irq9: acpi0 204 0 >>>> irq14: ata0 327 0 >>>> irq16: uhci0+ 246 0 >>>> irq20: hpet0 22472 52 >>>> irq23: uhci2 ehci1 10341 24 >>>> irq256: hdac0 52 0 >>>> irq257: mskc0 258 0 >>>> irq258: ahci0 221 0 >>>> Total 35146 81 >>>> ----- >>>> interrupt total rate >>>> irq1: atkbd0 1508 2 >>>> irq9: acpi0 234 0 >>>> irq14: ata0 409 0 >>>> irq16: uhci0+ 246 0 >>>> irq20: hpet0 72288 131 >>>> irq23: uhci2 ehci1 10846 19 >>>> irq256: hdac0 52 0 >>>> irq257: mskc0 4419760 8021 >>>> irq258: ahci0 221 0 >>>> Total 4505564 8177 >>>> ----- >>>> >>>> And "vmstat -w1" for calm and storm: >>>> ----- >>>> procs memory page disks faults cpu >>>> r b w avm fre flt re pi po fr sr mm0 ad0 in sy cs >>>> us sy id >>>> 0 0 0 206928 956040 277 0 2 0 330 4 0 0 117 476 >>>> 454 0 1 99 >>>> 0 0 0 206928 956036 0 0 0 0 8 4 0 0 50 123 >>>> 137 0 0 100 >>>> 0 0 0 206928 956036 0 0 0 0 0 4 0 0 47 120 >>>> 92 0 1 99 >>>> 0 0 0 206928 956036 0 0 0 0 0 4 0 0 43 123 >>>> 119 0 1 99 >>>> 0 0 0 206928 956036 0 0 0 0 0 4 0 0 55 132 >>>> 123 0 1 99 >>>> 0 0 0 206928 956004 0 0 0 0 0 4 0 0 68 123 >>>> 185 0 1 99 >>>> 0 0 0 206928 956036 0 0 0 0 8 4 0 0 86 123 >>>> 266 0 1 99 >>>> 0 0 0 206928 956036 0 0 0 0 0 4 0 0 44 125 >>>> 124 0 0 100 >>>> 0 0 0 206928 956036 0 0 0 0 0 4 0 0 64 128 >>>> 164 0 1 99 >>>> 0 0 0 206928 956036 0 0 0 0 0 4 0 0 42 131 >>>> 101 0 1 99 >>>> ----- >>>> procs memory page disks faults cpu >>>> r b w avm fre flt re pi po fr sr mm0 ad0 in sy cs >>>> us sy id >>>> 0 0 0 213648 954676 104 0 1 0 121 4 0 0 22299 204 >>>> 44262 0 10 90 >>>> 0 0 0 213648 954672 0 0 0 0 8 4 0 0 112259 123 >>>> 222379 0 44 56 >>>> 0 0 0 213648 954672 0 0 0 0 0 4 0 0 111792 123 >>>> 221489 0 43 57 >>>> 0 0 0 213648 954672 1 0 0 0 0 4 0 0 109887 183 >>>> 217754 0 43 57 >>>> 0 0 0 213648 954668 2 0 0 0 0 4 0 0 109543 146 >>>> 216963 0 44 56 >>>> 0 0 0 213648 954668 0 0 0 0 0 4 0 0 110142 123 >>>> 218187 0 45 55 >>>> 0 0 0 213648 954660 472 0 0 0 474 4 0 0 109340 717 >>>> 216674 0 42 57 >>>> 0 0 0 213648 954656 2 0 0 0 0 4 0 0 109459 147 >>>> 216831 0 43 57 >>>> 0 0 0 213648 954656 0 0 0 0 0 4 0 0 109462 131 >>>> 216827 0 43 57 >>>> 0 0 0 213648 954656 0 0 0 0 0 4 0 0 109454 123 >>>> 216803 0 42 58 >>>> ----- >>>> >>>> Dmesg is here: ftp://ftp.wart.ru/pub/misc/tos.dmesg.boot.txt . >>>> >>>> BTW, some more observations. While downloading a file the system >>>> goto watchdog timeout rather quickly, but the system works. If I >>>> try to upload files the system works much longer (for a couple of >>>> minutes) but then freeses. No ctrl-alt-esc. Only cold restart works. >>> >>> I've successfully upgraded to 10.0-RELEASE. Then I tried CURRENT >>> (verbose dmesg is here: ftp://ftp.wart.ru/pub/misc/dmesg.boot.a300.txt ) >>> and I've got watchdog timeouts. The situation is very much alike >>> (see previous diagnostics). Just uploads happens very quickly and >>> the machine is not freezed and operates well. >>> >>> This time I have sources and can test patches (if any) rather >>> quickly. >>> >> >> There is no driver code difference between CURRENT and >> 10.0-RELEASE. If you don't encounter watchdog timeouts on >> 10.0-RELEASE I have no idea what's going on there. >> I recall a couple of users are seeing msk(4) watchdog timeouts on >> 10.0-RELEASE/CURRENT so I started to think about r234666 which was >> not merged to stable/9 and stable/8. >> >> Could you back out r234666 and let me know whether it makes any >> difference for you? > > Thank you! > > That was it. The system survived svn up of /usr/src, rebuild/reinstall > and almost 25000 patches were downloaded by portsnap. Some additional info. As of r261651 at CURRENT the driver works for me if: . disable multi-core at BIOS (so kern.smp.cpus: 1); . do not load driver at /boot/loader.conf (i.e. use the builtin kernel driver); . disable WITNESS* and INVARIANTS* (GENERIC does not work even with single CPU). -- WBR, Boris Samorodov (bsam) FreeBSD Committer, http://www.FreeBSD.org The Power To Serve
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?52F7B335.4010509>