From owner-freebsd-stable@freebsd.org Sat Jul 25 11:08:16 2015 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 05A689AA610 for ; Sat, 25 Jul 2015 11:08:16 +0000 (UTC) (envelope-from alnis.morics@gmail.com) Received: from mail-la0-x22e.google.com (mail-la0-x22e.google.com [IPv6:2a00:1450:4010:c03::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6A68D1CF1 for ; Sat, 25 Jul 2015 11:08:15 +0000 (UTC) (envelope-from alnis.morics@gmail.com) Received: by lahh5 with SMTP id h5so26592062lah.2 for ; Sat, 25 Jul 2015 04:08:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=hKXEVjHXUo+CK0LnqHYcUT3yuMRK29xRImY9v/+bCi8=; b=DWm2FU2jesk0yKg6Hb8rRmpidudzVjR8/rHWP/TlWutfBcmEx6cad149r3br8RE/9V ARnEaYx1AzusNKsUOuToZP9zkH0HYmBjiZadYoZW6GWU+0FCA46wKoeaRPXB05aZV8PX wjDE/ywIsGQCd0C9zbi5lMhQ210pwEz/hLsRXTNVI5Oe0XK/b8NXy/uY6fkfkZPZyTxX y2b8S39bmNE9ZJ1DdNIJPwJJEeyl8OXTKsCHd3225CDKN6EeaMUAaMAf0km9Fhb3UqlP SEvE0kvSon1oad/49HfegXm0j7eB91FZ1BLMLMxLNI6YIcyb9RQXjvpkutUcEmelGLJa SMGQ== X-Received: by 10.112.47.73 with SMTP id b9mr17987017lbn.46.1437822493173; Sat, 25 Jul 2015 04:08:13 -0700 (PDT) Received: from [192.168.2.192] ([78.84.244.14]) by smtp.googlemail.com with ESMTPSA id z1sm2502260lbj.11.2015.07.25.04.08.11 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 25 Jul 2015 04:08:12 -0700 (PDT) Message-ID: <55B36E1A.3040806@gmail.com> Date: Sat, 25 Jul 2015 14:08:10 +0300 From: Alnis Morics User-Agent: Mozilla/5.0 (X11; Linux i686; rv:31.0) Gecko/20100101 Icedove/31.7.0 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: Re: msk msk0 watchdog timeout freeze hang lock stop problem References: , <20150413081348.GA965@michelle.fasterthan.com> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 25 Jul 2015 11:08:16 -0000 On 04/16/2015 12:52 AM, Gareth Wyn Roberts wrote: > I've inserted code to print some values which show the differences between specifying 4096 or 8192 for MSK_STAT_ALIGN. In both cases the status buffer has length 0x4000 (8x2048=16K) but the alignments are different as expected, respectively start addresses 0x5c3b000 or 0xbdc2c000. > > The following values were output from functions msk_status_dma_alloc(), msk_dmamap_cb() and msk_handle_events(). > The "Break #n" refer to breaks in msk_handle_events(). "#1" occurs if ((control & HW_OWNER) == 0), "#5" is OP_RXSTAT and "#6" is OP_TXINDEXLE. > > The first output is for MSK_STAT_ALIGN=8192. It continues normally. Although not shown here, it reaches cons=2047 then cons=0 as expected. > > The second output is for MSK_STAT_ALIGN=4096. Although there can be isolated occurences of "Break #1" (e.g. cons=196) (?are these to be expected?), it continues normally until cons=512. At this point it continually invokes the "#1" block because the msk_control from msk_stat_ring[512] is always zero and the network hangs immediately. This suggests the Yukon Ultra 2 88E8057 can't access the next 4096 memory block, but why not? > > Please let me know if any further information would be helpful. > > ------------ Start of MSK_STAT_ALIGN=8192 output ----------------------------- > > mskc0: mem 0xfa000000-0xfa003fff irq 19 at device 0.0 on pci6 > mskc0: Successful creation of DMA tag > mskc0: sc->msk_stat_count=2048 > mskc0: stat_sz=16384 > mskc0: sc->msk_stat_tag=0xfffff800050b99a0 > mskc0: Successful allocation of DMA'able memory for status ring > mskc0: sc->msk_stat_map=0xfffff800050b99a8 > msk_dmamap_cb (stat): nseg=1 > msk_dmamap_cb (stat): error=0 > msk_dmamap_cb (stat): segs[0].ds_addr=3183656960=0xbdc2c000 > msk_dmamap_cb (stat): segs[0].ds_len=16384=0x4000 > mskc0: Successful load of DMA'able memory for status ring > mskc0: sc->msk_stat_ring_paddr=3183656960=0xbdc2c000 > msk0: on mskc0 > msk0: Ethernet address: 00:13:77:e9:df:eb > miibus0: on msk0 > e1000phy0: PHY 0 on miibus0 > e1000phy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > ... > mskc0: msk_handle_events: Break #6 cons=0 csrread=1 > mskc0: msk_handle_events: Break #5 cons=1 csrread=2 > mskc0: msk_handle_events: Break #6 cons=2 csrread=3 > mskc0: msk_handle_events: Break #5 cons=3 csrread=5 > mskc0: msk_handle_events: Break #6 cons=4 csrread=6 > mskc0: msk_handle_events: Break #6 cons=5 csrread=6 > mskc0: msk_handle_events: Break #6 cons=6 csrread=7 > mskc0: msk_handle_events: Break #5 cons=7 csrread=8 > mskc0: msk_handle_events: Break #5 cons=8 csrread=10 > mskc0: msk_handle_events: Break #6 cons=9 csrread=10 > ... > mskc0: msk_handle_events: Break #5 cons=510 csrread=511 > mskc0: msk_handle_events: Break #6 cons=511 csrread=512 > mskc0: msk_handle_events: Break #5 cons=512 csrread=513 > mskc0: msk_handle_events: Break #5 cons=513 csrread=514 > mskc0: msk_handle_events: Break #6 cons=514 csrread=515 > mskc0: msk_handle_events: Break #5 cons=515 csrread=516 > ...etc. > > ------------ Start of MSK_STAT_ALIGN=4096 output ----------------------------- > > mskc0: mem 0xfa000000-0xfa003fff irq 19 at device 0.0 on pci6 > mskc0: Successful creation of DMA tag > mskc0: sc->msk_stat_count=2048 > mskc0: stat_sz=16384 > mskc0: sc->msk_stat_tag=0xfffff800050b99a0 > mskc0: Successful allocation of DMA'able memory for status ring > mskc0: sc->msk_stat_map=0xfffff800050b99a8 > msk_dmamap_cb (stat): nseg=1 > msk_dmamap_cb (stat): error=0 > msk_dmamap_cb (stat): segs[0].ds_addr=96710656=0x5c3b000 > msk_dmamap_cb (stat): segs[0].ds_len=16384=0x4000 > mskc0: Successful load of DMA'able memory for status ring > mskc0: sc->msk_stat_ring_paddr=96710656=0x5c3b000 > msk0: on mskc0 > msk0: Ethernet address: 00:13:77:e9:df:eb > miibus0: on msk0 > e1000phy0: PHY 0 on miibus0 > e1000phy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > ... > mskc0: msk_handle_events: Break #5 cons=0 csrread=2 > mskc0: msk_handle_events: Break #5 cons=1 csrread=2 > mskc0: msk_handle_events: Break #5 cons=2 csrread=3 > mskc0: msk_handle_events: Break #5 cons=3 csrread=4 > mskc0: msk_handle_events: Break #5 cons=4 csrread=5 > mskc0: msk_handle_events: Break #5 cons=5 csrread=7 > mskc0: msk_handle_events: Break #5 cons=6 csrread=7 > mskc0: msk_handle_events: Break #5 cons=7 csrread=9 > mskc0: msk_handle_events: Break #5 cons=8 csrread=9 > mskc0: msk_handle_events: Break #5 cons=9 csrread=10 > mskc0: msk_handle_events: Break #5 cons=10 csrread=11 > ... > mskc0: msk_handle_events: Break #6 cons=194 csrread=197 > mskc0: msk_handle_events: Break #5 cons=195 csrread=197 > mskc0: msk_handle_events: Break #1 cons=196 csrread=197 > mskc0: msk_handle_events: sd=0xfffffe011e23b620 sd->msk_control=1610612806 control=1610612806 > mskc0: msk_handle_events: Break #5 cons=196 csrread=197 > mskc0: msk_handle_events: Break #5 cons=197 csrread=198 > ... > mskc0: msk_handle_events: Break #5 cons=510 csrread=511 > mskc0: msk_handle_events: Break #5 cons=511 csrread=512 > mskc0: msk_handle_events: Break #1 cons=512 csrread=513 > mskc0: msk_handle_events: sd=0xfffffe011e23c000 sd->msk_control=0 control=0 > mskc0: msk_handle_events: Break #1 cons=512 csrread=513 > mskc0: msk_handle_events: sd=0xfffffe011e23c000 sd->msk_control=0 control=0 > mskc0: msk_handle_events: Break #1 cons=512 csrread=513 > mskc0: msk_handle_events: sd=0xfffffe011e23c000 sd->msk_control=0 control=0 > mskc0: msk_handle_events: Break #1 cons=512 csrread=513 > mskc0: msk_handle_events: sd=0xfffffe011e23c000 sd->msk_control=0 control=0 > mskc0: msk_handle_events: Break #1 cons=512 csrread=513 > mskc0: msk_handle_events: sd=0xfffffe011e23c000 sd->msk_control=0 control=0 > mskc0: msk_handle_events: Break #1 cons=512 csrread=513 > mskc0: msk_handle_events: sd=0xfffffe011e23c000 sd->msk_control=0 control=0 > mskc0: msk_handle_events: Break #1 cons=512 csrread=513 > mskc0: msk_handle_events: sd=0xfffffe011e23c000 sd->msk_control=0 control=0 > ... > mskc0: msk_handle_events: Break #1 cons=512 csrread=519 > mskc0: msk_handle_events: sd=0xfffffe011e23c000 sd->msk_control=0 control=0 > mskc0: msk_handle_events: Break #1 cons=512 csrread=519 > mskc0: msk_handle_events: sd=0xfffffe011e23c000 sd->msk_control=0 control=0 > ...etc > > > ________________________________________ > From: owner-freebsd-stable@freebsd.org [owner-freebsd-stable@freebsd.org] on behalf of Yonghyeon PYUN [pyunyh@gmail.com] > Sent: 13 April 2015 09:13 > To: Gareth Wyn Roberts > Cc: freebsd-stable@freebsd.org > Subject: Re: msk msk0 watchdog timeout freeze hang lock stop problem > > On Sun, Apr 12, 2015 at 05:57:34PM +0000, Gareth Wyn Roberts wrote: >> I've run in to problems using the msk device where initially it works well enough to set DHCP etc. but stops/freezes as soon as any appreciable network traffic occurs . There are several threads describing similar symptoms over the past two years or more. I've been following several false leads but have finally found a solution (at least it solves my problem). >> >> I'm running a standard FreeBSD 10.1-RELEASE and the NIC is detected as: >> >> mskc0: mem 0xfa000000-0xfa003fff irq 19 at device 0.0 on pci6 >> msk0: on mskc0 >> msk0: Ethernet address: 00:13:77:e9:df:eb >> miibus0: on msk0 >> e1000phy0: PHY 0 on miibus0 >> e1000phy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-ma >> ster, auto, auto-flow >> >> The network worked when using the i386 release, but failed for the amd64 release (as reported previously) which prompted me to disable 64-bit DMA (the patch for this is attached below). This worked for the first kernel built but mysteriously failed when another unrelated part of the kernel was changed (a usb driver) and the kernel recompiled. So identical msk driver code worked in one kernel but not the second! This suggested that alignment differences between the two kernels were causing the msk driver to fail. Others have reported varying behaviour depending on different circumstances. >> >> It transpires that changing just one value in the if_mskreg.h file solved all my problems. Subsequently I have not been able to make it fail under heavy network traffic in either 32-bit or 64-bit mode. >> I'm working on 10.1-RELEASE source, i.e. if_msk.c revision 262524 and if_mskreg.h revision 264442. > Thanks for letting me know your findings. I really appreciate > that. > I recall that the alignment requirement of status LEs(List Elements > in Marvell terms) is 2048 and the maximum size of the status LEs is > 4096 bytes(Actual alignment seems to be much lower value like 32 or > 64 bytes, but alignment 2048 is chosen to avoid silicon bugs). > Later experiments showed some variants of Yukon II require 4096 > bytes alignment and I changed the alignment to 4096 in the past. > It seems your finding indicates msk(4) needs 8192 alignment for > status LEs. > > However this does not explain how and why the same code in 8.x/9.x > works well. In addition, it's not common to require alignment size > greater than PAGE_SIZE on x86 given that the maximum size of DMA > buffer is 4096 bytes. I have to check whether there was a change > in bus_dma(9) between 8.x/9.x and 10.x but it needs more time due > to lack of spare time. Probably you can verify the DMA address of > status LEs meets the following requirements both on i386 and amd64. > - Alignment is 4096. > - Number of DMA segment is 1. > - DMA segment base address plus DMA segment size does not cross > a PAGE_SIZE boundary. > >> Here's the patch to if_mskreg.h >> --- if_mskreg.h-orig 2014-11-11 20:02:58.000000000 +0000 >> +++ if_mskreg.h 2015-04-12 18:47:20.000000000 +0100 >> @@ -2179,9 +2179,11 @@ >> * At first I guessed 8 bytes, the size of a single descriptor, would be >> * required alignment constraints. But, it seems that Yukon II have 4096 >> * bytes boundary alignment constraints. >> + * And it seems that the DMA status region for the Yukon Ultra 2 (88E8057) >> + * requires 8192 byte alignment to prevent locking. >> */ >> #define MSK_RING_ALIGN 4096 >> -#define MSK_STAT_ALIGN 4096 >> +#define MSK_STAT_ALIGN 8192 >> >> >> The patches to both files which also implement a MSK_64BIT_DMA_DISABLE flag are attached. Perhaps the developers would consider committing these as it may be useful for future debugging. >> > If you have more than 4GB memory installed and disables 64bit DMA > addressing, msk(4) shall use bounce buffers. Passing packets > through bounce buffers involves copy operation and it costs a lot. > You can check hw.busdma sysctl node to see whether there are > drivers that use bounce buffers. And if you want to disable 64bit > DMA on 64bit architectures, add '#undef MSK_64BIT_DMA' just below > BUS_SPACE_MAXADDR check in if_mskreg.h. > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" Just tried 10.2-RC1 amd64 GENERIC, and the problem seems to be gone. I was even able to scp a 500 MB file. Could it be related to this fix in BETA2, as mentioned in the announcement, "The watchdog(4) device has been fixed to print to the correct buffer."? pciconf -lv [..] mskc0@pci0:9:0:0: class=0x020000 card=0xc072144d chip=0x435411ab rev=0x00 hdr=0x00 vendor = 'Marvell Technology Group Ltd.' device = '88E8040 PCI-E Fast Ethernet Controller' class = network subclass = ethernet -Alnis