Date: Thu, 31 Mar 2011 11:30:54 -0700 From: YongHyeon PYUN <pyunyh@gmail.com> To: Yamagi Burmeister <lists@yamagi.org> Cc: freebsd-net@freebsd.org, yongari@freebsd.org Subject: Re: Kernel memory corruption(?) with age(4) Message-ID: <20110331183054.GC11981@michelle.cdnetworks.com> In-Reply-To: <20110331181651.GB11981@michelle.cdnetworks.com> References: <alpine.BSF.2.00.1103301620110.17846@saya.home.yamagi.org> <20110330173145.GB8601@michelle.cdnetworks.com> <alpine.BSF.2.00.1103302137330.1646@maka.home.yamagi.org> <20110330202858.GC8601@michelle.cdnetworks.com> <alpine.BSF.2.00.1103310859310.3082@saya.home.yamagi.org> <20110331171302.GA11981@michelle.cdnetworks.com> <alpine.BSF.2.00.1103311951060.2217@maka.home.yamagi.org> <20110331181651.GB11981@michelle.cdnetworks.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--JgQwtEuHJzHdouWu Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Thu, Mar 31, 2011 at 11:16:52AM -0700, YongHyeon PYUN wrote: > On Thu, Mar 31, 2011 at 08:07:17PM +0200, Yamagi Burmeister wrote: > > On Thu, 31 Mar 2011, YongHyeon PYUN wrote: > > > > >>All boxes are quadcore machines with 8GB RAM, running FreeBSD/amd64. > > >>After limiting the memory via hw.physmem to 3GB the problems are gone. > > >>The box is running crashfree for more than 6 hours and has served over > > >>300GB of data via age(4). > > >> > > > > > >Thanks for testing. Remove the hw.physmem configuration and try > > >attached patch and let me know how it goes. > > > > Thanks for your help, but the patch doesn't work. Another random panic - > > this time "page fault in kernel mode" - with nothing age(4) or network > > stack related stuff in the backtrace... > > > > Maybe it'll help to know about a bug fix in the linux atl1 driver, now > > replaced by atlx. In git commit 5f08e46b621a769e52a9545a23ab1d5fb2aec1d4 > > 64 bit DMA was disabled: > > > > 64-bit DMA causes data corruption with atl1. We don't know why, and > > Atheros is working on it. For now, just use 32-bit DMA. This is a big > > hack that is probably wrong, but it stops the bleeding. > > > > There was no later follow up on it. I think that this can't be problem > > on FreeBSD but maybe I'm reading the driver code wrong. The kernel.org > > gitweb URL is: > > > > http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.23.y.git;a=commitdiff;h=5f08e46b621a769e52a9545a23ab1d5fb2aec1d4 > > > > Thanks a lot! It seems the L1 controller has data corruption issue > when 64bit DMA addressing is used. Try this one. Oops, there was a bug in previous patch. Try this instead. --JgQwtEuHJzHdouWu Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="age.dma.diff3" Index: sys/dev/age/if_age.c =================================================================== --- sys/dev/age/if_age.c (revision 220116) +++ sys/dev/age/if_age.c (working copy) @@ -1092,11 +1092,14 @@ * Create Tx/Rx buffer parent tag. * L1 supports full 64bit DMA addressing in Tx/Rx buffers * so it needs separate parent DMA tag. + * XXX + * It seems enabling 64bit DMA causes data corruption. Limit + * DMA address space to 32bit. */ error = bus_dma_tag_create( bus_get_dma_tag(sc->age_dev), /* parent */ 1, 0, /* alignment, boundary */ - BUS_SPACE_MAXADDR, /* lowaddr */ + BUS_SPACE_MAXADDR_32BIT, /* lowaddr */ BUS_SPACE_MAXADDR, /* highaddr */ NULL, NULL, /* filter, filterarg */ BUS_SPACE_MAXSIZE_32BIT, /* maxsize */ @@ -2452,6 +2455,9 @@ /* Update the consumer index. */ sc->age_cdata.age_rr_cons = rr_cons; + bus_dmamap_sync(sc->age_cdata.age_rx_ring_tag, + sc->age_cdata.age_rx_ring_map, + BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE); /* Sync descriptors. */ bus_dmamap_sync(sc->age_cdata.age_rr_ring_tag, sc->age_cdata.age_rr_ring_map, --JgQwtEuHJzHdouWu--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110331183054.GC11981>