Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Jan 2014 10:12:35 +0900
From:      Yonghyeon PYUN <pyunyh@gmail.com>
To:        Curtis Villamizar <curtis@ipv6.occnc.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: regression: msk0 watchdog timeout and interrupt storm
Message-ID:  <20140109011235.GA2813@michelle.cdnetworks.com>
In-Reply-To: <201401080311.s083BWf9038444@maildrop2.v6ds.occnc.com>
References:  <20140107084938.GA1361@michelle.cdnetworks.com> <201401080311.s083BWf9038444@maildrop2.v6ds.occnc.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jan 07, 2014 at 10:11:32PM -0500, Curtis Villamizar wrote:
> 
> In message <20140107084938.GA1361@michelle.cdnetworks.com>
> Yonghyeon PYUN writes:
>  
> > On Mon, Jan 06, 2014 at 10:20:40AM -0500, Curtis Villamizar wrote:
> >  
> > [...]
> >  
> > > Here are some relevant parts of dmesg.  Is there anything else you want?
> > > 
> > > real memory  = 2147483648 (2048 MB)
> > > avail memory = 2061438976 (1965 MB)
> > > Event timer "LAPIC" quality 400
> > > ACPI APIC Table: <LENOVO TC-9I   >
> > > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
> > > FreeBSD/SMP: 1 package(s) x 2 core(s)
> > >  cpu0 (BSP): APIC ID:  0
> > >  cpu1 (AP): APIC ID:  1
> > > 
> > > pcib2: <ACPI PCI-PCI bridge> irq 19 at device 7.0 on pci0
> > > pci2: <ACPI PCI bus> on pcib2
> > >  on pci1
> > > pcib2: <ACPI PCI-PCI bridge> irq 19 at device 7.0 on pci0
> > > pci2: <ACPI PCI bus> on pcib2
> > > mskc0: <Marvell Yukon 88E8057 Gigabit Ethernet> port 0xe800-0xe8ff mem
> > > 0xfebfc000-0xfebfffff irq 19 at device 0.0 on pci2
> > > msk0: <Marvell Technology Group Ltd. Yukon Ultra 2 Id 0xba Rev 0x00>
> > > on mskc0
> > > msk0: Ethernet address: c8:9c:dc:56:38:ef
> > > miibus0: <MII bus> on msk0
> > > e1000phy0: <Marvell 88E1149 Gigabit PHY> PHY 0 on miibus0
> > > e1000phy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX,
> > > 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master,
> > > auto, auto-flow
> > > 
> >  
> > Thank you for the info.
> >  
> > > The computer is a Lenovo ThinkCenter (small tower) and not an uncommon
> > > machine so others are likely to run into this.
> > > 
> > > > > Please let me know what I could do to help debug this.
> > > > > 
> > > >  
> > > > If you have more than 4GB memory, try reducing the amount of
> > > > memory(e.g. 3G) in /boot/loader.conf and let me know whether that
> > > > makes any difference for you.
> > > > Note, in order to test this you have to back out your local
> > > > changes.
> > > 
> > > Only have 2 GB memory.
> > > 
> >  
> > Ok, that means my wild guess was not right. :-(
> >  
> >  
> > [...]
> >  
> > > > I'm under the impression that the controller may have additional
> > > > DMA addressing limitation where TX/RX and status LEs should have
> > > > the same high DMA address.  Due to the lack of documentation I'm
> > > > not sure about that.  If the issue does not happen with 3GB memory,
> > > > we have to use 32bit DMA addressing.
> > > 
> > > We have 2 GB memory so the problem with the original code does happen
> > > with less than 4 GB memory.  Everything has the same high address of
> > > zero.
> > > 
> >  
> > Right.
> >  
> > > Is there anything else you want me to try?
> >  
> > msk(4) uses 4KB alignment for status/TX/RX rings.  Your local change
> > will reduce the number of status LEs to be 1024.  Stock msk(4) will
> > use 2048 entries for status LEs and you said the cons variable is
> > stuck with 1024 in this case.  I have no idea this can happen at
> > this moment.
> > Did msk(4) ever work on your box?  If the answer is yes, would you
> > back out both r258780 and your local change?
> 
> This host worked for a few years under FreeBSD 8.x and FreeBSD 9.x,
> most recently 9.2.  I have other machines running stable_10 at about
> the 10.0.beta3 vintage.  I had mostly good luck building the ports I
> use (except openoffice never seems to build).
> 
> I transferred a bunch of small stuff over after upgrading to
> 10.0.beta3 on this machine but then with the big move of a tar backup
> through the GbE, it locked up consisitently.
> 
> I tried my patch and symptom gone.
> 
> > I have a small local diff which was made after seeing r258780.  But
> > I'm not sure whether it makes any difference.
> 
> So it seems what you want me to do is:
> 
>   1.  verify whether just backing out r258780 on if_mskreg.h fixes this.
> 
>   2.  if so, then put back r258780 and try your patch below and see if
>       that fixes it.
> 

Correct.

> I think I can find some time to do this maybe immediately or at least
> very soon.  After doing that I will report back.  Please stand by.
> 

Thank you.

> > > Curtis
> > > 
> > > btw - I added someone from Marvell on the Bcc in case he wants to join
> > > in on the conversation or give us a hint in private email.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140109011235.GA2813>