FreeBSD Mail Archives

Date:      Thu, 09 Jan 2014 14:21:44 -0500
From:      Curtis Villamizar <curtis@ipv6.occnc.com>
To:        pyunyh@gmail.com
Cc:        freebsd-stable@freebsd.org, Curtis Villamizar <curtis@ipv6.occnc.com>
Subject:   Re: regression: msk0 watchdog timeout and interrupt storm
Message-ID:  <201401091921.s09JLiQd076047@maildrop2.v6ds.occnc.com>
In-Reply-To: Your message of "Thu, 09 Jan 2014 17:27:59 %2B0900." <20140109082759.GA1371@michelle.cdnetworks.com>

index | next in thread | previous in thread | raw e-mail



In message <20140109082759.GA1371@michelle.cdnetworks.com>
Yonghyeon PYUN writes:
 
> On Thu, Jan 09, 2014 at 12:14:50AM -0500, Curtis Villamizar wrote:
> > 
> > In message <20140109011235.GA2813@michelle.cdnetworks.com>
> > Yonghyeon PYUN writes:
> >  
> > > On Tue, Jan 07, 2014 at 10:11:32PM -0500, Curtis Villamizar wrote:
> > > > 
> > > > In message <20140107084938.GA1361@michelle.cdnetworks.com>
> > > > Yonghyeon PYUN writes:
> > > >  
> > > > > On Mon, Jan 06, 2014 at 10:20:40AM -0500, Curtis Villamizar wrote:
> > > > >  
> > > > > [...]
> > > > >  
> > > > > > Here are some relevant parts of dmesg.  Is there anything else you want?
> > > > > > 
> > > > > > real memory  = 2147483648 (2048 MB)
> > > > > > avail memory = 2061438976 (1965 MB)
> > > > > > Event timer "LAPIC" quality 400
> > > > > > ACPI APIC Table: <LENOVO TC-9I   >
> > > > > > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
> > > > > > FreeBSD/SMP: 1 package(s) x 2 core(s)
> > > > > >  cpu0 (BSP): APIC ID:  0
> > > > > >  cpu1 (AP): APIC ID:  1
> > > > > > 
> > > > > > pcib2: <ACPI PCI-PCI bridge> irq 19 at device 7.0 on pci0
> > > > > > pci2: <ACPI PCI bus> on pcib2
> > > > > >  on pci1
> > > > > > pcib2: <ACPI PCI-PCI bridge> irq 19 at device 7.0 on pci0
> > > > > > pci2: <ACPI PCI bus> on pcib2
> > > > > > mskc0: <Marvell Yukon 88E8057 Gigabit Ethernet> port 0xe800-0xe8ff mem
> > > > > > 0xfebfc000-0xfebfffff irq 19 at device 0.0 on pci2
> > > > > > msk0: <Marvell Technology Group Ltd. Yukon Ultra 2 Id 0xba Rev 0x00>
> > > > > > on mskc0
> > > > > > msk0: Ethernet address: c8:9c:dc:56:38:ef
> > > > > > miibus0: <MII bus> on msk0
> > > > > > e1000phy0: <Marvell 88E1149 Gigabit PHY> PHY 0 on miibus0
> > > > > > e1000phy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX,
> > > > > > 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master,
> > > > > > auto, auto-flow
> > > > > > 
> > > > >  
> > > > > Thank you for the info.
> > > > >  
> > > > > > The computer is a Lenovo ThinkCenter (small tower) and not an uncommon
> > > > > > machine so others are likely to run into this.
> > > > > > 
> > > > > > > > Please let me know what I could do to help debug this.
> > > > > > > > 
> > > > > > >  
> > > > > > > If you have more than 4GB memory, try reducing the amount of
> > > > > > > memory(e.g. 3G) in /boot/loader.conf and let me know whether that
> > > > > > > makes any difference for you.
> > > > > > > Note, in order to test this you have to back out your local
> > > > > > > changes.
> > > > > > 
> > > > > > Only have 2 GB memory.
> > > > > > 
> > > > >  
> > > > > Ok, that means my wild guess was not right. :-(
> > > > >  
> > > > >  
> > > > > [...]
> > > > >  
> > > > > > > I'm under the impression that the controller may have additional
> > > > > > > DMA addressing limitation where TX/RX and status LEs should have
> > > > > > > the same high DMA address.  Due to the lack of documentation I'm
> > > > > > > not sure about that.  If the issue does not happen with 3GB memory,
> > > > > > > we have to use 32bit DMA addressing.
> > > > > > 
> > > > > > We have 2 GB memory so the problem with the original code does happen
> > > > > > with less than 4 GB memory.  Everything has the same high address of
> > > > > > zero.
> > > > > > 
> > > > >  
> > > > > Right.
> > > > >  
> > > > > > Is there anything else you want me to try?
> > > > >  
> > > > > msk(4) uses 4KB alignment for status/TX/RX rings.  Your local change
> > > > > will reduce the number of status LEs to be 1024.  Stock msk(4) will
> > > > > use 2048 entries for status LEs and you said the cons variable is
> > > > > stuck with 1024 in this case.  I have no idea this can happen at
> > > > > this moment.
> > > > > Did msk(4) ever work on your box?  If the answer is yes, would you
> > > > > back out both r258780 and your local change?
> > > > 
> > > > This host worked for a few years under FreeBSD 8.x and FreeBSD 9.x,
> > > > most recently 9.2.  I have other machines running stable_10 at about
> > > > the 10.0.beta3 vintage.  I had mostly good luck building the ports I
> > > > use (except openoffice never seems to build).
> > > > 
> > > > I transferred a bunch of small stuff over after upgrading to
> > > > 10.0.beta3 on this machine but then with the big move of a tar backup
> > > > through the GbE, it locked up consisitently.
> > > > 
> > > > I tried my patch and symptom gone.
> > > > 
> > > > > I have a small local diff which was made after seeing r258780.  But
> > > > > I'm not sure whether it makes any difference.
> > > > 
> > > > So it seems what you want me to do is:
> > > > 
> > > >   1.  verify whether just backing out r258780 on if_mskreg.h fixes this.
> > > > 
> > > >   2.  if so, then put back r258780 and try your patch below and see if
> > > >       that fixes it.
> > > > 
> > >  
> > > Correct.
> > 
> > Bad news:
> > 
> >   backing out didn't help.
> > 
> >   the patch didn't help.
> > 
> > Worse news:
> > 
> >   putting the code back to where it was didn't help.
> > 
> > It seems this thing gets wedged in a state where the LE stat entries
> > get written in 0x0 to 0x200 or 0x400, either way to the halfway point
> > in the ring, and then start writing at zero again.  Even though the
> > writes seem to start over at zero, the counter continues to increment
> > past the halfway mark.  Rebooting many times can sometime unwedge it.
> > 
> > Any thoughts on how this could occur?
> > 
> > I'm assuming this is a chip problem because once wedged it is mostly
> > unaffected by reboots.  A short power cycle yields a better chance of
> > coming up with no problem but still a poor chance.
> > 
> > The best way to unwedge it seems to be to put in the code to make
> > msk_stat_count end up as 2048, reboot, and then change the code so
> > that max_stat_count ends up as 1024 and reboot again.
> > 
> > Once unwedged msk0 works fine indefinitely until a reboot.  It may
> > work for a few reboots, then gets wedged again.  This could be a hint
> > as to finding a sequence of writes, perhaps to STAT_LAST_IDX so it
> > consistently comes up unwedged.  Maybe a DELAY(1) after the reset
> > (which is shortly before writing STAT_LAST_IDX).
> > 
> > If you want debug traces showing the contents of msk_stat_ring when
> > the watchdog occurs I can send some.
> > 
>  
> The only wild guess I have is to try removing forced status change
> request.  I've attached the diff.  The code was introduced to
> workaround unreliable detection of link status change but I guess
> e1000phy(4) received many fixes after this change.
>  
> > Can I do this without rebooting to save time?  Maybe config out msk
> > and use kldload and kldunload?
> > 
>  
> Probably yes.  However you want to cold boot your system once your
> controller is wedged(Unplug the power cable and wait tens of
> seconds).  msk(4) highly relies on sane default values and cold
> boot will let controller firmware handle the job.
>  
> > I'll experiment a bit later, maybe tomorrow.  Right now things are
> > running and I have other work to do.
> > 
> > Does anyone have any insights into the code after the write to
> > STAT_LAST_IDX and how the watermark and timer values were picked?
> > 
>  
> The watermark and timer values were borrowed from the vendor's
> FreeBSD driver.
>  
> > > > I think I can find some time to do this maybe immediately or at least
> > > > very soon.  After doing that I will report back.  Please stand by.
> > > > 
> > >  
> > > Thank you.
> > 
> > Thanks for the help with this.
> > 
> > > > > > Curtis
> > > > > > 
> > > > > > btw - I added someone from Marvell on the Bcc in case he wants to join
> > > > > > in on the conversation or give us a hint in private email.
>  
> --qMm9M+Fa2AknHoGS
> Content-Type: text/x-diff; charset=us-ascii
> Content-Disposition: attachment; filename="msk.statchg.diff"
>  
> Index: sys/dev/msk/if_msk.c
> ===================================================================
> --- sys/dev/msk/if_msk.c	(revision 260428)
> +++ sys/dev/msk/if_msk.c	(working copy)
> @@ -3399,8 +3399,6 @@ msk_tick(void *xsc_if)
>  	mii = device_get_softc(sc_if->msk_miibus);
>  
>  	mii_tick(mii);
> -	if ((sc_if->msk_flags & MSK_FLAG_LINK) == 0)
> -		msk_miibus_statchg(sc_if->msk_if_dev);
>  	msk_handle_events(sc_if->msk_softc);
>  	msk_watchdog(sc_if);
>  	callout_reset(&sc_if->msk_tick_ch, hz, msk_tick, sc_if);


As of this morning I have a kernel which has not experienced the msk
driver wedge for more than 24 hours.  In the past I was able to get
essentially the same code to wedge.  So here is a review of the
history.

Original driver:

  worked for a while during install of software

  reboot after some configuration changes

  immediately wedged after reboot transmiting a large file

  many reboots and a few power cycles with unchanged driver

  make some changes to source to figure out what is going on

  observe consistent hang when msk_stat_cons hits 0x400

Driver changes:

  tried a few things that did not work

  backed out the RX and TX sizes to 256 each which forces
  msk_stat_count to 1024.

  immediately after going to msk_stat_count = 1024, driver works fine
  and continues to work for days

Further testing:

  back out changes to earlier revision (r226478) to see if problem
  goes away.  problem returns.

  add first set of patches.  problem remains.

  restore previously working code.  problem remains.

  add lots of instrumentation to source code.  problem remains.

Current driver (patches available if you want them)

  go back to original code sizes, then make change to force
  msk_stat_count to 1024.  problem goes away.

  many reboots.  problem does not return.  a few power cycles.
  problem does not return.

Note that the last time I thought I had a driver that worked perfectly
I got things wedged and then couldn't get it unwedged by going back to
what I thought was the same source code.  So I'm not sure if once
again I've just been lucky in the last day or so.

By power cycle I means "shutdown -p now" followed by shutting off the
power strip.  In some cases walk away and come back then power up.  I
realize that due to BIOS power on LAN features the chip may still have
power when the power button is off but the line cord has power, even
if the power on LAN features are disabled.

This isn't making much sense to me.  And its consuming a lot of time.
If you have any other insights/guesses, please let me know.

The only thing we know about the sympton is that when not working the
following happens.  At half the msk_stat_count size the counter keeps
going up but the writes into msk_stat_ring start wrapping back to the
beginning of the ring.  When it works, that wrap does not occur.

Tomorrow I'll go back to original code (current code on stable/10) and
see if the problem returns.  Then I'll try your new patch.

Then hopefully I can do something that doesn't require me to waste
hours rebuilding kernels and rebooting before getting the box
unwedged.  I may also test out my idea of writing a larger value into
STAT_LAST_IDX and then writing the value we want and/or adding a
DELAY(1) here and there in that part of the initialization code.  I'm
taking some wild guesses at what could fix this.

Curtis

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201401091921.s09JLiQd076047>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation