From owner-freebsd-stable@FreeBSD.ORG  Sat Jan 25 19:35:14 2014
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 23462640
 for <freebsd-stable@freebsd.org>; Sat, 25 Jan 2014 19:35:14 +0000 (UTC)
Received: from maildrop2.v6ds.occnc.com (maildrop2.v6ds.occnc.com
 [IPv6:2001:470:88e6:3::232])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id BA6701D14
 for <freebsd-stable@freebsd.org>; Sat, 25 Jan 2014 19:35:13 +0000 (UTC)
Received: from harbor3.ipv6.occnc.com (harbor3.v6ds.occnc.com
 [IPv6:2001:470:88e6:3::239]) (authenticated bits=128)
 by maildrop2.v6ds.occnc.com (8.14.7/8.14.7) with ESMTP id s0PJZAwH048013;
 Sat, 25 Jan 2014 14:35:10 -0500 (EST)
 (envelope-from curtis@ipv6.occnc.com)
Message-Id: <201401251935.s0PJZAwH048013@maildrop2.v6ds.occnc.com>
To: Vitaly Magerya <vmagerya@gmail.com>
From: Curtis Villamizar <curtis@ipv6.occnc.com>
Subject: Re: Any news about "msk0 watchdog timeout" regression in 10-RELEASE?
In-reply-to: Your message of "Sat, 25 Jan 2014 14:23:40 +0200."
 <52E3ACCC.1080707@gmail.com>
Date: Sat, 25 Jan 2014 14:35:10 -0500
Cc: Yonghyeon PYUN <pyunyh@gmail.com>, freebsd-stable@freebsd.org,
 curtis@ipv6.occnc.com
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
Reply-To: curtis@ipv6.occnc.com
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 25 Jan 2014 19:35:14 -0000


In message <52E3ACCC.1080707@gmail.com>
Vitaly Magerya writes:
> 
> On 01/21/14 21:56, Curtis Villamizar wrote:
> > I have mine working but I haven't done a lot of reboots to see if it
> > is a "fix" or luck.
> > 
> > There is a lot of junk that you won't need in the code that is running
> > well for me.  But here it is, as-is warts and all.
> > 
> > I've been swamped lately and haven't had time to look at this further.
>  
> I've tried the patch, and the testing went like this:
> 1) Reboot into fixed kernel => msk0 shows watchdog timeouts.
> 2) Reboot again => no timeouts, but the interrupt storm is still there.
> 3) Disable the machine completely for 15 minutes (take out the battery
>    too; it's a laptop), boot fixed kernel => msk works fine.
> 4) Reboot one more time => msk still works fine.
> 5) Reboot into 10-RELEASE kernel => watchdog timeouts.
> 6) Disable the machine completely for 15 minutes, boot fixed kernel =>
>    still watchdog timeouts.
> 7) Disable the machine for 30 minutes, boot fixed kernel => nope, still
>    doesn't work.
>  
> So, there was a success once (step 3), but I was not able to reproduce
> it after that. Seems to be random.


In my case I didn't have a problem if I didn't reboot the original
kernel but I only tried a few reboots.  I can't see how a chip could
retain any state after 30 minutes of no power so you are right that we
don't have a fix.  I haven't had time to look at this further and
don't generally reboot this machine (uptime 16 days since last I
looked at this).

When I'm no longer quite so swamped I'll look at this again.  It seems
we are the only two reporting this problem.  Please send lines of
these form from dmesg:

  mskc0: <Marvell Yukon 88E8057 Gigabit Ethernet> port 0xe800-0xe8ff
  mem 0xfebfc000-0xfebfffff irq 19 at device 0.0 on pci2

  msk0: <Marvell Technology Group Ltd. Yukon Ultra 2 Id 0xba Rev 0x00>
  on mskc0

That may indicate we have very similar chips.  If not, this msk
problem may be more widespread.

Curtis