From owner-freebsd-stable@FreeBSD.ORG Wed Apr 15 21:52:21 2015 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 40CE6B9F for ; Wed, 15 Apr 2015 21:52:21 +0000 (UTC) Received: from mail2.glyndwr.ac.uk (mail2.glyndwr.ac.uk [194.82.118.162]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mail2.glyndwr.ac.uk", Issuer "TERENA SSL CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id ABB02A77 for ; Wed, 15 Apr 2015 21:52:20 +0000 (UTC) Received: from XCH9.wrexham.local (2002:c252:76aa::c252:76aa) by XCH2.wrexham.local (2002:c252:76a2::c252:76a2) with Microsoft SMTP Server (TLS) id 14.3.224.2; Wed, 15 Apr 2015 22:52:10 +0100 Received: from XCH7.wrexham.local ([fe80::4b1:2f4e:799f:2fb]) by XCH9.wrexham.local ([::1]) with mapi id 14.03.0210.002; Wed, 15 Apr 2015 22:52:10 +0100 From: Gareth Wyn Roberts To: "pyunyh@gmail.com" CC: "freebsd-stable@freebsd.org" Subject: RE: msk msk0 watchdog timeout freeze hang lock stop problem Thread-Topic: msk msk0 watchdog timeout freeze hang lock stop problem Thread-Index: AdB1SJ7I96oMbb9aSL6oKcwiEr6BLQAcMP4AAIMYBec= Date: Wed, 15 Apr 2015 21:52:09 +0000 Message-ID: References: , <20150413081348.GA965@michelle.fasterthan.com> In-Reply-To: <20150413081348.GA965@michelle.fasterthan.com> Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [81.158.237.56] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Apr 2015 21:52:21 -0000 I've inserted code to print some values which show the differences between = specifying 4096 or 8192 for MSK_STAT_ALIGN. In both cases the status buffe= r has length 0x4000 (8x2048=3D16K) but the alignments are different as expe= cted, respectively start addresses 0x5c3b000 or 0xbdc2c000. The following values were output from functions msk_status_dma_alloc(), msk= _dmamap_cb() and msk_handle_events(). The "Break #n" refer to breaks in msk_handle_events(). "#1" occurs if ((con= trol & HW_OWNER) =3D=3D 0), "#5" is OP_RXSTAT and "#6" is OP_TXINDEXLE. The first output is for MSK_STAT_ALIGN=3D8192. It continues normally. Alt= hough not shown here, it reaches cons=3D2047 then cons=3D0 as expected. The second output is for MSK_STAT_ALIGN=3D4096. Although there can be isol= ated occurences of "Break #1" (e.g. cons=3D196) (?are these to be expected?= ), it continues normally until cons=3D512. At this point it continually in= vokes the "#1" block because the msk_control from msk_stat_ring[512] is alw= ays zero and the network hangs immediately. This suggests the Yukon Ultra 2= 88E8057 can't access the next 4096 memory block, but why not? Please let me know if any further information would be helpful. ------------ Start of MSK_STAT_ALIGN=3D8192 output ------------------------= ----- mskc0: mem 0xfa000000-0xfa003fff i= rq 19 at device 0.0 on pci6 mskc0: Successful creation of DMA tag mskc0: sc->msk_stat_count=3D2048 mskc0: stat_sz=3D16384 mskc0: sc->msk_stat_tag=3D0xfffff800050b99a0 mskc0: Successful allocation of DMA'able memory for status ring mskc0: sc->msk_stat_map=3D0xfffff800050b99a8 msk_dmamap_cb (stat): nseg=3D1 msk_dmamap_cb (stat): error=3D0 msk_dmamap_cb (stat): segs[0].ds_addr=3D3183656960=3D0xbdc2c000 msk_dmamap_cb (stat): segs[0].ds_len=3D16384=3D0x4000 mskc0: Successful load of DMA'able memory for status ring mskc0: sc->msk_stat_ring_paddr=3D3183656960=3D0xbdc2c000 msk0: on msk= c0 msk0: Ethernet address: 00:13:77:e9:df:eb miibus0: on msk0 e1000phy0: PHY 0 on miibus0 e1000phy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT= , 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow ... mskc0: msk_handle_events: Break #6 cons=3D0 csrread=3D1 mskc0: msk_handle_events: Break #5 cons=3D1 csrread=3D2 mskc0: msk_handle_events: Break #6 cons=3D2 csrread=3D3 mskc0: msk_handle_events: Break #5 cons=3D3 csrread=3D5 mskc0: msk_handle_events: Break #6 cons=3D4 csrread=3D6 mskc0: msk_handle_events: Break #6 cons=3D5 csrread=3D6 mskc0: msk_handle_events: Break #6 cons=3D6 csrread=3D7 mskc0: msk_handle_events: Break #5 cons=3D7 csrread=3D8 mskc0: msk_handle_events: Break #5 cons=3D8 csrread=3D10 mskc0: msk_handle_events: Break #6 cons=3D9 csrread=3D10 ... mskc0: msk_handle_events: Break #5 cons=3D510 csrread=3D511 mskc0: msk_handle_events: Break #6 cons=3D511 csrread=3D512 mskc0: msk_handle_events: Break #5 cons=3D512 csrread=3D513 mskc0: msk_handle_events: Break #5 cons=3D513 csrread=3D514 mskc0: msk_handle_events: Break #6 cons=3D514 csrread=3D515 mskc0: msk_handle_events: Break #5 cons=3D515 csrread=3D516 ...etc. ------------ Start of MSK_STAT_ALIGN=3D4096 output ------------------------= ----- mskc0: mem 0xfa000000-0xfa003fff i= rq 19 at device 0.0 on pci6 mskc0: Successful creation of DMA tag mskc0: sc->msk_stat_count=3D2048 mskc0: stat_sz=3D16384 mskc0: sc->msk_stat_tag=3D0xfffff800050b99a0 mskc0: Successful allocation of DMA'able memory for status ring mskc0: sc->msk_stat_map=3D0xfffff800050b99a8 msk_dmamap_cb (stat): nseg=3D1 msk_dmamap_cb (stat): error=3D0 msk_dmamap_cb (stat): segs[0].ds_addr=3D96710656=3D0x5c3b000 msk_dmamap_cb (stat): segs[0].ds_len=3D16384=3D0x4000 mskc0: Successful load of DMA'able memory for status ring mskc0: sc->msk_stat_ring_paddr=3D96710656=3D0x5c3b000 msk0: on msk= c0 msk0: Ethernet address: 00:13:77:e9:df:eb miibus0: on msk0 e1000phy0: PHY 0 on miibus0 e1000phy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT= , 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow ... mskc0: msk_handle_events: Break #5 cons=3D0 csrread=3D2 mskc0: msk_handle_events: Break #5 cons=3D1 csrread=3D2 mskc0: msk_handle_events: Break #5 cons=3D2 csrread=3D3 mskc0: msk_handle_events: Break #5 cons=3D3 csrread=3D4 mskc0: msk_handle_events: Break #5 cons=3D4 csrread=3D5 mskc0: msk_handle_events: Break #5 cons=3D5 csrread=3D7 mskc0: msk_handle_events: Break #5 cons=3D6 csrread=3D7 mskc0: msk_handle_events: Break #5 cons=3D7 csrread=3D9 mskc0: msk_handle_events: Break #5 cons=3D8 csrread=3D9 mskc0: msk_handle_events: Break #5 cons=3D9 csrread=3D10 mskc0: msk_handle_events: Break #5 cons=3D10 csrread=3D11 ... mskc0: msk_handle_events: Break #6 cons=3D194 csrread=3D197 mskc0: msk_handle_events: Break #5 cons=3D195 csrread=3D197 mskc0: msk_handle_events: Break #1 cons=3D196 csrread=3D197 mskc0: msk_handle_events: sd=3D0xfffffe011e23b620 sd->msk_control=3D161061= 2806 control=3D1610612806 mskc0: msk_handle_events: Break #5 cons=3D196 csrread=3D197 mskc0: msk_handle_events: Break #5 cons=3D197 csrread=3D198 ... mskc0: msk_handle_events: Break #5 cons=3D510 csrread=3D511 mskc0: msk_handle_events: Break #5 cons=3D511 csrread=3D512 mskc0: msk_handle_events: Break #1 cons=3D512 csrread=3D513 mskc0: msk_handle_events: sd=3D0xfffffe011e23c000 sd->msk_control=3D0 con= trol=3D0 mskc0: msk_handle_events: Break #1 cons=3D512 csrread=3D513 mskc0: msk_handle_events: sd=3D0xfffffe011e23c000 sd->msk_control=3D0 con= trol=3D0 mskc0: msk_handle_events: Break #1 cons=3D512 csrread=3D513 mskc0: msk_handle_events: sd=3D0xfffffe011e23c000 sd->msk_control=3D0 con= trol=3D0 mskc0: msk_handle_events: Break #1 cons=3D512 csrread=3D513 mskc0: msk_handle_events: sd=3D0xfffffe011e23c000 sd->msk_control=3D0 con= trol=3D0 mskc0: msk_handle_events: Break #1 cons=3D512 csrread=3D513 mskc0: msk_handle_events: sd=3D0xfffffe011e23c000 sd->msk_control=3D0 con= trol=3D0 mskc0: msk_handle_events: Break #1 cons=3D512 csrread=3D513 mskc0: msk_handle_events: sd=3D0xfffffe011e23c000 sd->msk_control=3D0 con= trol=3D0 mskc0: msk_handle_events: Break #1 cons=3D512 csrread=3D513 mskc0: msk_handle_events: sd=3D0xfffffe011e23c000 sd->msk_control=3D0 con= trol=3D0 ... mskc0: msk_handle_events: Break #1 cons=3D512 csrread=3D519 mskc0: msk_handle_events: sd=3D0xfffffe011e23c000 sd->msk_control=3D0 con= trol=3D0 mskc0: msk_handle_events: Break #1 cons=3D512 csrread=3D519 mskc0: msk_handle_events: sd=3D0xfffffe011e23c000 sd->msk_control=3D0 con= trol=3D0 ...etc ________________________________________ From: owner-freebsd-stable@freebsd.org [owner-freebsd-stable@freebsd.org] o= n behalf of Yonghyeon PYUN [pyunyh@gmail.com] Sent: 13 April 2015 09:13 To: Gareth Wyn Roberts Cc: freebsd-stable@freebsd.org Subject: Re: msk msk0 watchdog timeout freeze hang lock stop problem On Sun, Apr 12, 2015 at 05:57:34PM +0000, Gareth Wyn Roberts wrote: > I've run in to problems using the msk device where initially it works wel= l enough to set DHCP etc. but stops/freezes as soon as any appreciable netw= ork traffic occurs . There are several threads describing similar symptoms = over the past two years or more. I've been following several false leads b= ut have finally found a solution (at least it solves my problem). > > I'm running a standard FreeBSD 10.1-RELEASE and the NIC is detected as: > > mskc0: mem 0xfa000000-0xfa003fff= irq 19 at device 0.0 on pci6 > msk0: on m= skc0 > msk0: Ethernet address: 00:13:77:e9:df:eb > miibus0: on msk0 > e1000phy0: PHY 0 on miibus0 > e1000phy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000bas= eT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-ma > ster, auto, auto-flow > > The network worked when using the i386 release, but failed for the amd64 = release (as reported previously) which prompted me to disable 64-bit DMA (t= he patch for this is attached below). This worked for the first kernel bui= lt but mysteriously failed when another unrelated part of the kernel was ch= anged (a usb driver) and the kernel recompiled. So identical msk driver co= de worked in one kernel but not the second! This suggested that alignment d= ifferences between the two kernels were causing the msk driver to fail. Oth= ers have reported varying behaviour depending on different circumstances. > > It transpires that changing just one value in the if_mskreg.h file solved= all my problems. Subsequently I have not been able to make it fail under = heavy network traffic in either 32-bit or 64-bit mode. > I'm working on 10.1-RELEASE source, i.e. if_msk.c revision 262524 and if_= mskreg.h revision 264442. Thanks for letting me know your findings. I really appreciate that. I recall that the alignment requirement of status LEs(List Elements in Marvell terms) is 2048 and the maximum size of the status LEs is 4096 bytes(Actual alignment seems to be much lower value like 32 or 64 bytes, but alignment 2048 is chosen to avoid silicon bugs). Later experiments showed some variants of Yukon II require 4096 bytes alignment and I changed the alignment to 4096 in the past. It seems your finding indicates msk(4) needs 8192 alignment for status LEs. However this does not explain how and why the same code in 8.x/9.x works well. In addition, it's not common to require alignment size greater than PAGE_SIZE on x86 given that the maximum size of DMA buffer is 4096 bytes. I have to check whether there was a change in bus_dma(9) between 8.x/9.x and 10.x but it needs more time due to lack of spare time. Probably you can verify the DMA address of status LEs meets the following requirements both on i386 and amd64. - Alignment is 4096. - Number of DMA segment is 1. - DMA segment base address plus DMA segment size does not cross a PAGE_SIZE boundary. > > Here's the patch to if_mskreg.h > --- if_mskreg.h-orig 2014-11-11 20:02:58.000000000 +0000 > +++ if_mskreg.h 2015-04-12 18:47:20.000000000 +0100 > @@ -2179,9 +2179,11 @@ > * At first I guessed 8 bytes, the size of a single descriptor, would be > * required alignment constraints. But, it seems that Yukon II have 4096 > * bytes boundary alignment constraints. > + * And it seems that the DMA status region for the Yukon Ultra 2 (88E805= 7) > + * requires 8192 byte alignment to prevent locking. > */ > #define MSK_RING_ALIGN 4096 > -#define MSK_STAT_ALIGN 4096 > +#define MSK_STAT_ALIGN 8192 > > > The patches to both files which also implement a MSK_64BIT_DMA_DISABLE fl= ag are attached. Perhaps the developers would consider committing these as= it may be useful for future debugging. > If you have more than 4GB memory installed and disables 64bit DMA addressing, msk(4) shall use bounce buffers. Passing packets through bounce buffers involves copy operation and it costs a lot. You can check hw.busdma sysctl node to see whether there are drivers that use bounce buffers. And if you want to disable 64bit DMA on 64bit architectures, add '#undef MSK_64BIT_DMA' just below BUS_SPACE_MAXADDR check in if_mskreg.h. _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"