Date: Thu, 16 Apr 2009 16:24:51 -0400 From: John Baldwin <jhb@freebsd.org> To: freebsd-current@freebsd.org Cc: Damian Gerow <dgerow@afflictions.org>, Richard Todd <rmtodd@ichotolot.servalan.com> Subject: Re: ZFS checksum errors on umass(4) insertion Message-ID: <200904161624.51920.jhb@freebsd.org> In-Reply-To: <x7myagjvi7.fsf@ichotolot.servalan.com> References: <49BD117B.2080706@163.com> <20090416144251.GA1605@plebeian.afflictions.org> <x7myagjvi7.fsf@ichotolot.servalan.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 16 April 2009 2:36:48 pm Richard Todd wrote: > Damian Gerow <dgerow@afflictions.org> writes: > > 1) Reverting the extended attribute locking change (r189967) does not change > > the situation for me. I still experience checksum issues and data loss. > > (Unsurprisingly.) > > > > 2) Without umass loaded, I have been completely unable to trigger the issue. > > > > 3) Once umass is loaded, and the symptoms start cropping up, unloading umass > > does not make them go away (again, unsurprisingly). What I haven't yet > > tested, but am currently working towards, is whether removing umass stops > > further checksum errors from ocurring. > > > > 4) r189967 does remove some LORs for me, even though I don't use (that I > > know of) extended attributes. > > > > 5) It seems that so long as umass is used at all, the symptoms will > > eventually show up. I've been able to trigger the symptoms by inserting > > then removing a umass device immediately after boot, then ramping up the > > workload. > > > > 6) The only difference made by vfs.zfs.debug=1 is that zfs reclaims are > > logged. > > > > I'm at a bit of a loss as to what to test next, other than checking for an > > increased number of checksum errors after unloading umass. However, I'm not > > convinced this is going to highlight the actual problem. I'm all ears as to > > what to test for at this point, as I'm running out of ideas. > > I have a question or two, and an idea. > > The questions: > > 1) How much RAM do you have, is it 4G or more? (I'm guessing the > answer is "yes".) > > 2) What does "sysctl -a | grep bounced" say? Check this both before and after > loading umass and seeing the bug triggered. > > My idea: I suspect a bug in the bounce-buffer code that does I/O to memory > space beyond the area a given piece of hardware can access directly thru DMA. > I've had some similar issues with checksum errors, and they seem to have gone > away since lowering hw.physmem to 3400M in loader.conf, which cuts memory > usage down below the point where anything needs to use bounce buffers. > You might try lowering hw.physmem and see if that helps; check with the > "sysctl -a | grep bounced" command, you should be seeing something like > > hw.busdma.zone0.total_bounced: 0 > hw.busdma.zone1.total_bounced: 0 > hw.busdma.zone2.total_bounced: 0 > > if no bounce-buffer usage is going on. (The number of zones may be different > on your system.) Can you please try http://www.FreeBSD.org/~jhb/patches/dma_pg.patch? This lines up with your analysis in that it fixes a problem in the bounce buffer code that was introduced with the new USB stack (and only triggers when the USB code has to use a bounce buffer). -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200904161624.51920.jhb>