From owner-freebsd-current@FreeBSD.ORG Thu Feb 6 02:00:10 2014 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 18780DE3 for ; Thu, 6 Feb 2014 02:00:10 +0000 (UTC) Received: from mail-pd0-x231.google.com (mail-pd0-x231.google.com [IPv6:2607:f8b0:400e:c02::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id DC9A21320 for ; Thu, 6 Feb 2014 02:00:09 +0000 (UTC) Received: by mail-pd0-f177.google.com with SMTP id x10so1077120pdj.8 for ; Wed, 05 Feb 2014 18:00:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=/RFqjDL9GP/5R9jHQ0C4qwtiIjgXYcJ3QjAq13hnSdw=; b=nw3qv/tkIl0cw5ZZV469Wtue2Gmbl2TfpWpHCzXj2PVPEUtNtj6ZZOEGf17vgsmi/a +cPvdZFrYmeFe5Gy7YHSW5q44vo8aZ77riMU7jPiNAyZp+Y+R/NrJgIVeJqGqLCVnzKQ oxWLVk3Ub7fLkWNQHofYR1gsReL/p/i3lsnWgsS97Q213Okp64l6rO2Ov0qlwO38VQwy dHlduFeMlar5U8ruFLPL9r8Hh3WAwmVpTfprIzaoIc1shMsSxfYUwFrFehmLurXZfAmP MAvjtyfT5x7kdCti4mZFNywqjdSwB2iw75Iqg/y7JVU1yQDPygIntCgVwMDcriqx6m2y s2OA== X-Received: by 10.68.240.36 with SMTP id vx4mr7628132pbc.140.1391652009345; Wed, 05 Feb 2014 18:00:09 -0800 (PST) Received: from pyunyh@gmail.com (lpe4.p59-icn.cdngp.net. [114.111.62.249]) by mx.google.com with ESMTPSA id j3sm81181852pbh.38.2014.02.05.18.00.06 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 05 Feb 2014 18:00:08 -0800 (PST) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Thu, 06 Feb 2014 11:00:03 +0900 From: Yonghyeon PYUN Date: Thu, 6 Feb 2014 11:00:03 +0900 To: Boris Samorodov Subject: Re: regression: msk0 watchdog timeout and interrupt storm Message-ID: <20140206020003.GC2810@michelle.cdnetworks.com> References: <526FBA53.9000208@passap.ru> <20131030021650.GA3106@michelle.cdnetworks.com> <52725C3D.2030602@passap.ru> <52ECADF3.4020909@passap.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <52ECADF3.4020909@passap.ru> User-Agent: Mutt/1.4.2.3i Cc: FreeBSD CURRENT X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Feb 2014 02:00:10 -0000 On Sat, Feb 01, 2014 at 12:18:59PM +0400, Boris Samorodov wrote: > Hi Yonghyeon and All, > > (this time it's a CURRENT issue) > > 31.10.2013 17:33, Boris Samorodov пишет: > > 30.10.2013 06:16, Yonghyeon PYUN пишет: > >> On Tue, Oct 29, 2013 at 05:38:27PM +0400, Boris Samorodov wrote: > > > >>> >From time to time I use a notebook and boot FreeBSD from USB > >>> stick. FreeBSD 9.2-i386 works OK. So I tried to use > >>> FreeBSD 10.0-i386 BETA2 and the network adapter works for > >>> some 10-15 seconds and then stops with diagnostic message > >>> "msk0:watchdog timeout". I've found similar case at > >>> freebsd-current@ with no workaround. Yes, there is an > >>> interrupt storm as well. > >> > >> There had been no functional changes for very long time so I'm not > >> sure what's going on here. I've attached local change I have at > >> this moment but I'm afraid it wouldn't address the issue above. > >> > >> I recall jhb also reported interrupt storm in the past but the root > >> cause was not identified yet. Could you change msk_intr() and let > >> me know which interrupt is firing? > > > > I've yet to organize a build. > > > >>> Here is some additional info: > >>> ----- > >>> mskc0@pci0:3:0:0: class=0x020000 card=0xff501179 chip=0x435511ab > >>> rev=0x12 hdr=0x00 > >>> vendor = 'Marvell Technology Group Ltd.' > >>> device = '88E8040T PCI-E Fast Ethernet Controller' > >>> class = network > >>> subclass = ethernet > >>> cap 01[48] = powerspec 3 supports D0 D1 D2 D3 current D0 > >>> cap 05[5c] = MSI supports 1 message, 64 bit enabled with 1 message > >>> cap 10[c0] = PCI-Express 2 legacy endpoint max data 128(128) link x1(x1) > >>> speed 2.5(2.5) ASPM disabled(L0s/L1) > >>> ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected > >>> ecap 0003[130] = Serial 1 b8b063ffff681e00 > >>> ----- > > > > Meanwhile some more investigations, "vmstat -i" for calm and storm: > > ----- > > interrupt total rate > > irq1: atkbd0 1025 2 > > irq9: acpi0 204 0 > > irq14: ata0 327 0 > > irq16: uhci0+ 246 0 > > irq20: hpet0 22472 52 > > irq23: uhci2 ehci1 10341 24 > > irq256: hdac0 52 0 > > irq257: mskc0 258 0 > > irq258: ahci0 221 0 > > Total 35146 81 > > ----- > > interrupt total rate > > irq1: atkbd0 1508 2 > > irq9: acpi0 234 0 > > irq14: ata0 409 0 > > irq16: uhci0+ 246 0 > > irq20: hpet0 72288 131 > > irq23: uhci2 ehci1 10846 19 > > irq256: hdac0 52 0 > > irq257: mskc0 4419760 8021 > > irq258: ahci0 221 0 > > Total 4505564 8177 > > ----- > > > > And "vmstat -w1" for calm and storm: > > ----- > > procs memory page disks faults cpu > > r b w avm fre flt re pi po fr sr mm0 ad0 in sy cs > > us sy id > > 0 0 0 206928 956040 277 0 2 0 330 4 0 0 117 476 > > 454 0 1 99 > > 0 0 0 206928 956036 0 0 0 0 8 4 0 0 50 123 > > 137 0 0 100 > > 0 0 0 206928 956036 0 0 0 0 0 4 0 0 47 120 > > 92 0 1 99 > > 0 0 0 206928 956036 0 0 0 0 0 4 0 0 43 123 > > 119 0 1 99 > > 0 0 0 206928 956036 0 0 0 0 0 4 0 0 55 132 > > 123 0 1 99 > > 0 0 0 206928 956004 0 0 0 0 0 4 0 0 68 123 > > 185 0 1 99 > > 0 0 0 206928 956036 0 0 0 0 8 4 0 0 86 123 > > 266 0 1 99 > > 0 0 0 206928 956036 0 0 0 0 0 4 0 0 44 125 > > 124 0 0 100 > > 0 0 0 206928 956036 0 0 0 0 0 4 0 0 64 128 > > 164 0 1 99 > > 0 0 0 206928 956036 0 0 0 0 0 4 0 0 42 131 > > 101 0 1 99 > > ----- > > procs memory page disks faults cpu > > r b w avm fre flt re pi po fr sr mm0 ad0 in sy cs > > us sy id > > 0 0 0 213648 954676 104 0 1 0 121 4 0 0 22299 204 > > 44262 0 10 90 > > 0 0 0 213648 954672 0 0 0 0 8 4 0 0 112259 123 > > 222379 0 44 56 > > 0 0 0 213648 954672 0 0 0 0 0 4 0 0 111792 123 > > 221489 0 43 57 > > 0 0 0 213648 954672 1 0 0 0 0 4 0 0 109887 183 > > 217754 0 43 57 > > 0 0 0 213648 954668 2 0 0 0 0 4 0 0 109543 146 > > 216963 0 44 56 > > 0 0 0 213648 954668 0 0 0 0 0 4 0 0 110142 123 > > 218187 0 45 55 > > 0 0 0 213648 954660 472 0 0 0 474 4 0 0 109340 717 > > 216674 0 42 57 > > 0 0 0 213648 954656 2 0 0 0 0 4 0 0 109459 147 > > 216831 0 43 57 > > 0 0 0 213648 954656 0 0 0 0 0 4 0 0 109462 131 > > 216827 0 43 57 > > 0 0 0 213648 954656 0 0 0 0 0 4 0 0 109454 123 > > 216803 0 42 58 > > ----- > > > > Dmesg is here: ftp://ftp.wart.ru/pub/misc/tos.dmesg.boot.txt . > > > > BTW, some more observations. While downloading a file the system > > goto watchdog timeout rather quickly, but the system works. If I > > try to upload files the system works much longer (for a couple of > > minutes) but then freeses. No ctrl-alt-esc. Only cold restart works. > > I've successfully upgraded to 10.0-RELEASE. Then I tried CURRENT > (verbose dmesg is here: ftp://ftp.wart.ru/pub/misc/dmesg.boot.a300.txt ) > and I've got watchdog timeouts. The situation is very much alike > (see previous diagnostics). Just uploads happens very quickly and > the machine is not freezed and operates well. > > This time I have sources and can test patches (if any) rather > quickly. > There is no driver code difference between CURRENT and 10.0-RELEASE. If you don't encounter watchdog timeouts on 10.0-RELEASE I have no idea what's going on there. I recall a couple of users are seeing msk(4) watchdog timeouts on 10.0-RELEASE/CURRENT so I started to think about r234666 which was not merged to stable/9 and stable/8. Could you back out r234666 and let me know whether it makes any difference for you?