From owner-freebsd-stable@FreeBSD.ORG Mon Jan 6 15:20:43 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CEB274EC for ; Mon, 6 Jan 2014 15:20:43 +0000 (UTC) Received: from maildrop2.v6ds.occnc.com (maildrop2.v6ds.occnc.com [IPv6:2001:470:88e6:3::232]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 94C921A2D for ; Mon, 6 Jan 2014 15:20:43 +0000 (UTC) Received: from harbor3.ipv6.occnc.com (harbor3.v6ds.occnc.com [IPv6:2001:470:88e6:3::239]) (authenticated bits=128) by maildrop2.v6ds.occnc.com (8.14.7/8.14.7) with ESMTP id s06FKeVG009399; Mon, 6 Jan 2014 10:20:40 -0500 (EST) (envelope-from curtis@ipv6.occnc.com) Message-Id: <201401061520.s06FKeVG009399@maildrop2.v6ds.occnc.com> To: pyunyh@gmail.com From: Curtis Villamizar Subject: Re: regression: msk0 watchdog timeout and interrupt storm In-reply-to: Your message of "Mon, 06 Jan 2014 14:04:00 +0900." <20140106050400.GA1372@michelle.cdnetworks.com> Date: Mon, 06 Jan 2014 10:20:40 -0500 Cc: freebsd-stable@freebsd.org, Curtis Villamizar X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list Reply-To: curtis@ipv6.occnc.com List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jan 2014 15:20:43 -0000 In message <20140106050400.GA1372@michelle.cdnetworks.com> Yonghyeon PYUN writes: > On Sun, Jan 05, 2014 at 11:30:45PM -0500, Curtis Villamizar wrote: > > > > Pyun, > > > > Replying to self since I did not get your reply but saw it on the > > stable10 mailing list archive. I pasted in your responses so its > > really a reply to you. > > > > Sorry for the delay to your email on Jan 2. I had some email trouble > > (self induced by DNS change) that should be fixed now. > > > > Ok. > > [...] > > > > > > > Marvell calls DMA descriptors as LEs. The maximum number of status > > > LEs supported by controller is 4096 and it should be large enough > > > to hold status LE update(for dual-port controllers, the status > > > DMA block is shared between each port). > > > > Yes. I am aware of this, but regardless I ran into this bug and > > forcing MSK_TX_RING_CNT and MSK_RX_RING_CNT removed the symptom. > > > > Ok. > > > > > This does seem to me like a regression in 10.0 caused by the change to > > > > if_mskreg.h (Nov 16). The workaround so far has been fine for me. > > > > > > If you revert the change made in r258790, does the issue go away? > > > Are you running amd64? Because you touched #if (BUS_SPACE_MAXADDR > > > > 0xFFFFFFFF) block in if_mskreg.h I guess you're running amd64 but > > > I need confirmation. If your system have more than 4GB memory on > > > amd64, could you reduce amount of available memory to be less than > > > 4GB?(i.e. set hw.physmem in loader.conf) > > > Also would you show me dmesg(8) output(msk(4) and e1000phy(4) only) > > > to know exact Yukon controller model? > > > > Yes it is AMD64. > > > > uname -m > > amd64 > > > > CPU: AMD Athlon(tm) II X2 B24 Processor (2992.58-MHz K8-class CPU) > > Origin = "AuthenticAMD" Id = 0x100f63 Family = 0x10 Model = 0x6 Stepping = 3 > > Features=0x178bfbff > > Features2=0x802009 > > AMD > > Features=0xee500800 > > AMD > > Features2=0x37ff > > TSC: P-state invariant > > > > pciconf -lcv > > [...] > > mskc0@pci0:2:0:0: class=0x020000 card=0x305817aa chip=0x438011ab > > rev=0x10 hdr=0x00 > > vendor = 'Marvell Technology Group Ltd.' > > device = '88E8057 PCI-E Gigabit Ethernet Controller' > > class = network > > subclass = ethernet > > cap 01[48] = powerspec 3 supports D0 D1 D2 D3 current D0 > > cap 05[5c] = MSI supports 1 message, 64 bit enabled with 1 message > > cap 10[c0] = PCI-Express 2 legacy endpoint max data 128(128) link x1(x1) > > speed 2.5(2.5) ASPM disabled(L0s/L1) > > ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected > > ecap 0003[130] = Serial 1 ef3856ffffdc9cc8 > > > > dmesg(8) output will show more useful information than pciconf(8) > in this case. There are too many Yukon II variants. Here are some relevant parts of dmesg. Is there anything else you want? real memory = 2147483648 (2048 MB) avail memory = 2061438976 (1965 MB) Event timer "LAPIC" quality 400 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 pcib2: irq 19 at device 7.0 on pci0 pci2: on pcib2 on pci1 pcib2: irq 19 at device 7.0 on pci0 pci2: on pcib2 mskc0: port 0xe800-0xe8ff mem 0xfebfc000-0xfebfffff irq 19 at device 0.0 on pci2 msk0: on mskc0 msk0: Ethernet address: c8:9c:dc:56:38:ef miibus0: on msk0 e1000phy0: PHY 0 on miibus0 e1000phy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow The computer is a Lenovo ThinkCenter (small tower) and not an uncommon machine so others are likely to run into this. > > Please let me know what I could do to help debug this. > > > > If you have more than 4GB memory, try reducing the amount of > memory(e.g. 3G) in /boot/loader.conf and let me know whether that > makes any difference for you. > Note, in order to test this you have to back out your local > changes. Only have 2 GB memory. > > > > involved. > > I did not back out the change entirely (yet). I only effectively > > backed out the change to the two constants MSK_TX_RING_CNT and > > MSK_RX_RING_CNT and that was enough to make the problem go away. > > > > I'm under the impression that the controller may have additional > DMA addressing limitation where TX/RX and status LEs should have > the same high DMA address. Due to the lack of documentation I'm > not sure about that. If the issue does not happen with 3GB memory, > we have to use 32bit DMA addressing. We have 2 GB memory so the problem with the original code does happen with less than 4 GB memory. Everything has the same high address of zero. Is there anything else you want me to try? Curtis btw - I added someone from Marvell on the Bcc in case he wants to join in on the conversation or give us a hint in private email.