Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 6 Jan 2014 14:04:00 +0900
From:      Yonghyeon PYUN <pyunyh@gmail.com>
To:        Curtis Villamizar <curtis@ipv6.occnc.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: regression: msk0 watchdog timeout and interrupt storm
Message-ID:  <20140106050400.GA1372@michelle.cdnetworks.com>
In-Reply-To: <201401060430.s064UjCG090668@maildrop2.v6ds.occnc.com>
References:  <201401012144.s01LivSi099164@maildrop2.v6ds.occnc.com> <201401060430.s064UjCG090668@maildrop2.v6ds.occnc.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jan 05, 2014 at 11:30:45PM -0500, Curtis Villamizar wrote:
> 
> Pyun,
> 
> Replying to self since I did not get your reply but saw it on the
> stable10 mailing list archive.  I pasted in your responses so its
> really a reply to you.
> 
> Sorry for the delay to your email on Jan 2.  I had some email trouble
> (self induced by DNS change) that should be fixed now.
> 

Ok.

[...]

> >  
> > Marvell calls DMA descriptors as LEs. The maximum number of status
> > LEs supported by controller is 4096 and it should be large enough
> > to hold status LE update(for dual-port controllers, the status
> > DMA block is shared between each port).
> 
> Yes.  I am aware of this, but regardless I ran into this bug and
> forcing MSK_TX_RING_CNT and MSK_RX_RING_CNT removed the symptom.
> 

Ok.

> > > This does seem to me like a regression in 10.0 caused by the change to
> > > if_mskreg.h (Nov 16).  The workaround so far has been fine for me.
> >  
> > If you revert the change made in r258790, does the issue go away?
> > Are you running amd64?  Because you touched #if (BUS_SPACE_MAXADDR
> > > 0xFFFFFFFF) block in if_mskreg.h I guess you're running amd64 but
> > I need confirmation. If your system have more than 4GB memory on
> > amd64, could you reduce amount of available memory to be less than
> > 4GB?(i.e. set hw.physmem in loader.conf)
> > Also would you show me dmesg(8) output(msk(4) and e1000phy(4) only)
> > to know exact Yukon controller model?
> 
> Yes it is AMD64.
> 
> uname -m
> amd64
> 
> CPU: AMD Athlon(tm) II X2 B24 Processor (2992.58-MHz K8-class CPU)
>  Origin = "AuthenticAMD" Id = 0x100f63 Family = 0x10 Model = 0x6 Stepping = 3
>  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>  Features2=0x802009<SSE3,MON,CX16,POPCNT>
>  AMD
>  Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
>  AMD
>  Features2=0x37ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT>
>  TSC: P-state invariant
> 
> pciconf -lcv
> [...]
> mskc0@pci0:2:0:0:       class=0x020000 card=0x305817aa chip=0x438011ab
>   rev=0x10 hdr=0x00
>     vendor     = 'Marvell Technology Group Ltd.'
>     device     = '88E8057 PCI-E Gigabit Ethernet Controller'
>     class      = network
>     subclass   = ethernet
>     cap 01[48] = powerspec 3  supports D0 D1 D2 D3  current D0
>     cap 05[5c] = MSI supports 1 message, 64 bit enabled with 1 message
>     cap 10[c0] = PCI-Express 2 legacy endpoint max data 128(128) link x1(x1)
>                  speed 2.5(2.5) ASPM disabled(L0s/L1)
>     ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
>     ecap 0003[130] = Serial 1 ef3856ffffdc9cc8
> 

dmesg(8) output will show more useful information than pciconf(8)
in this case.  There are too many Yukon II variants.

> Please let me know what I could do to help debug this.
> 

If you have more than 4GB memory, try reducing the amount of
memory(e.g. 3G) in /boot/loader.conf and let me know whether that
makes any difference for you.
Note, in order to test this you have to back out your local
changes.

> > > involved.
> I did not back out the change entirely (yet).  I only effectively
> backed out the change to the two constants MSK_TX_RING_CNT and
> MSK_RX_RING_CNT and that was enough to make the problem go away.
> 

I'm under the impression that the controller may have additional
DMA addressing limitation where TX/RX and status LEs should have
the same high DMA address.  Due to the lack of documentation I'm
not sure about that.  If the issue does not happen with 3GB memory,
we have to use 32bit DMA addressing.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140106050400.GA1372>