Date: Tue, 07 Jan 2014 22:11:32 -0500 From: Curtis Villamizar <curtis@ipv6.occnc.com> To: pyunyh@gmail.com Cc: freebsd-stable@freebsd.org, Curtis Villamizar <curtis@ipv6.occnc.com> Subject: Re: regression: msk0 watchdog timeout and interrupt storm Message-ID: <201401080311.s083BWf9038444@maildrop2.v6ds.occnc.com> In-Reply-To: Your message of "Tue, 07 Jan 2014 17:49:38 %2B0900." <20140107084938.GA1361@michelle.cdnetworks.com>
next in thread | previous in thread | raw e-mail | index | archive | help
In message <20140107084938.GA1361@michelle.cdnetworks.com> Yonghyeon PYUN writes: > On Mon, Jan 06, 2014 at 10:20:40AM -0500, Curtis Villamizar wrote: > > [...] > > > Here are some relevant parts of dmesg. Is there anything else you want? > > > > real memory = 2147483648 (2048 MB) > > avail memory = 2061438976 (1965 MB) > > Event timer "LAPIC" quality 400 > > ACPI APIC Table: <LENOVO TC-9I > > > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs > > FreeBSD/SMP: 1 package(s) x 2 core(s) > > cpu0 (BSP): APIC ID: 0 > > cpu1 (AP): APIC ID: 1 > > > > pcib2: <ACPI PCI-PCI bridge> irq 19 at device 7.0 on pci0 > > pci2: <ACPI PCI bus> on pcib2 > > on pci1 > > pcib2: <ACPI PCI-PCI bridge> irq 19 at device 7.0 on pci0 > > pci2: <ACPI PCI bus> on pcib2 > > mskc0: <Marvell Yukon 88E8057 Gigabit Ethernet> port 0xe800-0xe8ff mem > > 0xfebfc000-0xfebfffff irq 19 at device 0.0 on pci2 > > msk0: <Marvell Technology Group Ltd. Yukon Ultra 2 Id 0xba Rev 0x00> > > on mskc0 > > msk0: Ethernet address: c8:9c:dc:56:38:ef > > miibus0: <MII bus> on msk0 > > e1000phy0: <Marvell 88E1149 Gigabit PHY> PHY 0 on miibus0 > > e1000phy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, > > 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, > > auto, auto-flow > > > > Thank you for the info. > > > The computer is a Lenovo ThinkCenter (small tower) and not an uncommon > > machine so others are likely to run into this. > > > > > > Please let me know what I could do to help debug this. > > > > > > > > > > If you have more than 4GB memory, try reducing the amount of > > > memory(e.g. 3G) in /boot/loader.conf and let me know whether that > > > makes any difference for you. > > > Note, in order to test this you have to back out your local > > > changes. > > > > Only have 2 GB memory. > > > > Ok, that means my wild guess was not right. :-( > > > [...] > > > > I'm under the impression that the controller may have additional > > > DMA addressing limitation where TX/RX and status LEs should have > > > the same high DMA address. Due to the lack of documentation I'm > > > not sure about that. If the issue does not happen with 3GB memory, > > > we have to use 32bit DMA addressing. > > > > We have 2 GB memory so the problem with the original code does happen > > with less than 4 GB memory. Everything has the same high address of > > zero. > > > > Right. > > > Is there anything else you want me to try? > > msk(4) uses 4KB alignment for status/TX/RX rings. Your local change > will reduce the number of status LEs to be 1024. Stock msk(4) will > use 2048 entries for status LEs and you said the cons variable is > stuck with 1024 in this case. I have no idea this can happen at > this moment. > Did msk(4) ever work on your box? If the answer is yes, would you > back out both r258780 and your local change? This host worked for a few years under FreeBSD 8.x and FreeBSD 9.x, most recently 9.2. I have other machines running stable_10 at about the 10.0.beta3 vintage. I had mostly good luck building the ports I use (except openoffice never seems to build). I transferred a bunch of small stuff over after upgrading to 10.0.beta3 on this machine but then with the big move of a tar backup through the GbE, it locked up consisitently. I tried my patch and symptom gone. > I have a small local diff which was made after seeing r258780. But > I'm not sure whether it makes any difference. So it seems what you want me to do is: 1. verify whether just backing out r258780 on if_mskreg.h fixes this. 2. if so, then put back r258780 and try your patch below and see if that fixes it. I think I can find some time to do this maybe immediately or at least very soon. After doing that I will report back. Please stand by. > > Curtis > > > > btw - I added someone from Marvell on the Bcc in case he wants to join > > in on the conversation or give us a hint in private email. > > --ikeVEW9yuYc//A+q > Content-Type: text/x-diff; charset=us-ascii > Content-Disposition: attachment; filename="msk.type.diff" > > Index: sys/dev/msk/if_msk.c > =================================================================== > --- sys/dev/msk/if_msk.c (revision 260362) > +++ sys/dev/msk/if_msk.c (working copy) > @@ -3600,7 +3600,8 @@ msk_handle_events(struct msk_softc *sc) > int rxput[2]; > struct msk_stat_desc *sd; > uint32_t control, status; > - int cons, len, port, rxprog; > + int len, port, rxprog; > + uint16_t cons; > > if (sc->msk_stat_cons == CSR_READ_2(sc, STAT_PUT_IDX)) > return (0); > Index: sys/dev/msk/if_mskreg.h > =================================================================== > --- sys/dev/msk/if_mskreg.h (revision 260362) > +++ sys/dev/msk/if_mskreg.h (working copy) > @@ -2539,8 +2539,8 @@ struct msk_softc { > bus_addr_t msk_stat_ring_paddr; > int msk_int_holdoff; > int msk_process_limit; > - int msk_stat_cons; > - int msk_stat_count; > + uint16_t msk_stat_cons; > + uint16_t msk_stat_count; > struct mtx msk_mtx; > };
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201401080311.s083BWf9038444>