From owner-freebsd-current@FreeBSD.ORG Thu Feb 6 17:12:26 2014 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BB0B7CF8 for ; Thu, 6 Feb 2014 17:12:26 +0000 (UTC) Received: from forward4l.mail.yandex.net (forward4l.mail.yandex.net [IPv6:2a02:6b8:0:1819::4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 656691D3D for ; Thu, 6 Feb 2014 17:12:26 +0000 (UTC) Received: from smtp11.mail.yandex.net (smtp11.mail.yandex.net [95.108.130.67]) by forward4l.mail.yandex.net (Yandex) with ESMTP id B62AD14410A4; Thu, 6 Feb 2014 21:12:22 +0400 (MSK) Received: from smtp11.mail.yandex.net (localhost [127.0.0.1]) by smtp11.mail.yandex.net (Yandex) with ESMTP id 602D27E0061; Thu, 6 Feb 2014 21:12:22 +0400 (MSK) Received: from 78.108.206.159.tel.ru (78.108.206.159.tel.ru [78.108.206.159]) by smtp11.mail.yandex.net (nwsmtp/Yandex) with ESMTPSA id FEzRRiNgo6-CMhqJqdp; Thu, 6 Feb 2014 21:12:22 +0400 (using TLSv1 with cipher CAMELLIA256-SHA (256/256 bits)) (Client certificate not present) X-Yandex-Uniq: 71f8ddf0-9bf6-4421-b342-c1aeba3ee3da Message-ID: <52F3C275.6000106@passap.ru> Date: Thu, 06 Feb 2014 21:12:21 +0400 From: Boris Samorodov Organization: =?UTF-8?B?0JfQkNCeICLQktCQ0KDQoiI=?= User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: pyunyh@gmail.com Subject: Re: regression: msk0 watchdog timeout and interrupt storm References: <526FBA53.9000208@passap.ru> <20131030021650.GA3106@michelle.cdnetworks.com> <52725C3D.2030602@passap.ru> <52ECADF3.4020909@passap.ru> <20140206020003.GC2810@michelle.cdnetworks.com> In-Reply-To: <20140206020003.GC2810@michelle.cdnetworks.com> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: FreeBSD CURRENT X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Feb 2014 17:12:26 -0000 06.02.2014 06:00, Yonghyeon PYUN пишет: > On Sat, Feb 01, 2014 at 12:18:59PM +0400, Boris Samorodov wrote: >> Hi Yonghyeon and All, >> >> (this time it's a CURRENT issue) >> >> 31.10.2013 17:33, Boris Samorodov пишет: >>> 30.10.2013 06:16, Yonghyeon PYUN пишет: >>>> On Tue, Oct 29, 2013 at 05:38:27PM +0400, Boris Samorodov wrote: >>> >>>>> >From time to time I use a notebook and boot FreeBSD from USB >>>>> stick. FreeBSD 9.2-i386 works OK. So I tried to use >>>>> FreeBSD 10.0-i386 BETA2 and the network adapter works for >>>>> some 10-15 seconds and then stops with diagnostic message >>>>> "msk0:watchdog timeout". I've found similar case at >>>>> freebsd-current@ with no workaround. Yes, there is an >>>>> interrupt storm as well. >>>> >>>> There had been no functional changes for very long time so I'm not >>>> sure what's going on here. I've attached local change I have at >>>> this moment but I'm afraid it wouldn't address the issue above. >>>> >>>> I recall jhb also reported interrupt storm in the past but the root >>>> cause was not identified yet. Could you change msk_intr() and let >>>> me know which interrupt is firing? >>> >>> I've yet to organize a build. >>> >>>>> Here is some additional info: >>>>> ----- >>>>> mskc0@pci0:3:0:0: class=0x020000 card=0xff501179 chip=0x435511ab >>>>> rev=0x12 hdr=0x00 >>>>> vendor = 'Marvell Technology Group Ltd.' >>>>> device = '88E8040T PCI-E Fast Ethernet Controller' >>>>> class = network >>>>> subclass = ethernet >>>>> cap 01[48] = powerspec 3 supports D0 D1 D2 D3 current D0 >>>>> cap 05[5c] = MSI supports 1 message, 64 bit enabled with 1 message >>>>> cap 10[c0] = PCI-Express 2 legacy endpoint max data 128(128) link x1(x1) >>>>> speed 2.5(2.5) ASPM disabled(L0s/L1) >>>>> ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected >>>>> ecap 0003[130] = Serial 1 b8b063ffff681e00 >>>>> ----- >>> >>> Meanwhile some more investigations, "vmstat -i" for calm and storm: >>> ----- >>> interrupt total rate >>> irq1: atkbd0 1025 2 >>> irq9: acpi0 204 0 >>> irq14: ata0 327 0 >>> irq16: uhci0+ 246 0 >>> irq20: hpet0 22472 52 >>> irq23: uhci2 ehci1 10341 24 >>> irq256: hdac0 52 0 >>> irq257: mskc0 258 0 >>> irq258: ahci0 221 0 >>> Total 35146 81 >>> ----- >>> interrupt total rate >>> irq1: atkbd0 1508 2 >>> irq9: acpi0 234 0 >>> irq14: ata0 409 0 >>> irq16: uhci0+ 246 0 >>> irq20: hpet0 72288 131 >>> irq23: uhci2 ehci1 10846 19 >>> irq256: hdac0 52 0 >>> irq257: mskc0 4419760 8021 >>> irq258: ahci0 221 0 >>> Total 4505564 8177 >>> ----- >>> >>> And "vmstat -w1" for calm and storm: >>> ----- >>> procs memory page disks faults cpu >>> r b w avm fre flt re pi po fr sr mm0 ad0 in sy cs >>> us sy id >>> 0 0 0 206928 956040 277 0 2 0 330 4 0 0 117 476 >>> 454 0 1 99 >>> 0 0 0 206928 956036 0 0 0 0 8 4 0 0 50 123 >>> 137 0 0 100 >>> 0 0 0 206928 956036 0 0 0 0 0 4 0 0 47 120 >>> 92 0 1 99 >>> 0 0 0 206928 956036 0 0 0 0 0 4 0 0 43 123 >>> 119 0 1 99 >>> 0 0 0 206928 956036 0 0 0 0 0 4 0 0 55 132 >>> 123 0 1 99 >>> 0 0 0 206928 956004 0 0 0 0 0 4 0 0 68 123 >>> 185 0 1 99 >>> 0 0 0 206928 956036 0 0 0 0 8 4 0 0 86 123 >>> 266 0 1 99 >>> 0 0 0 206928 956036 0 0 0 0 0 4 0 0 44 125 >>> 124 0 0 100 >>> 0 0 0 206928 956036 0 0 0 0 0 4 0 0 64 128 >>> 164 0 1 99 >>> 0 0 0 206928 956036 0 0 0 0 0 4 0 0 42 131 >>> 101 0 1 99 >>> ----- >>> procs memory page disks faults cpu >>> r b w avm fre flt re pi po fr sr mm0 ad0 in sy cs >>> us sy id >>> 0 0 0 213648 954676 104 0 1 0 121 4 0 0 22299 204 >>> 44262 0 10 90 >>> 0 0 0 213648 954672 0 0 0 0 8 4 0 0 112259 123 >>> 222379 0 44 56 >>> 0 0 0 213648 954672 0 0 0 0 0 4 0 0 111792 123 >>> 221489 0 43 57 >>> 0 0 0 213648 954672 1 0 0 0 0 4 0 0 109887 183 >>> 217754 0 43 57 >>> 0 0 0 213648 954668 2 0 0 0 0 4 0 0 109543 146 >>> 216963 0 44 56 >>> 0 0 0 213648 954668 0 0 0 0 0 4 0 0 110142 123 >>> 218187 0 45 55 >>> 0 0 0 213648 954660 472 0 0 0 474 4 0 0 109340 717 >>> 216674 0 42 57 >>> 0 0 0 213648 954656 2 0 0 0 0 4 0 0 109459 147 >>> 216831 0 43 57 >>> 0 0 0 213648 954656 0 0 0 0 0 4 0 0 109462 131 >>> 216827 0 43 57 >>> 0 0 0 213648 954656 0 0 0 0 0 4 0 0 109454 123 >>> 216803 0 42 58 >>> ----- >>> >>> Dmesg is here: ftp://ftp.wart.ru/pub/misc/tos.dmesg.boot.txt . >>> >>> BTW, some more observations. While downloading a file the system >>> goto watchdog timeout rather quickly, but the system works. If I >>> try to upload files the system works much longer (for a couple of >>> minutes) but then freeses. No ctrl-alt-esc. Only cold restart works. >> >> I've successfully upgraded to 10.0-RELEASE. Then I tried CURRENT >> (verbose dmesg is here: ftp://ftp.wart.ru/pub/misc/dmesg.boot.a300.txt ) >> and I've got watchdog timeouts. The situation is very much alike >> (see previous diagnostics). Just uploads happens very quickly and >> the machine is not freezed and operates well. >> >> This time I have sources and can test patches (if any) rather >> quickly. >> > > There is no driver code difference between CURRENT and > 10.0-RELEASE. If you don't encounter watchdog timeouts on > 10.0-RELEASE I have no idea what's going on there. > I recall a couple of users are seeing msk(4) watchdog timeouts on > 10.0-RELEASE/CURRENT so I started to think about r234666 which was > not merged to stable/9 and stable/8. > > Could you back out r234666 and let me know whether it makes any > difference for you? Thank you! That was it. The system survived svn up of /usr/src, rebuild/reinstall and almost 25000 patches were downloaded by portsnap. -- WBR, Boris Samorodov (bsam) FreeBSD Committer, http://www.FreeBSD.org The Power To Serve