From owner-freebsd-current@FreeBSD.ORG Fri Apr 17 19:51:07 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 465F81065702 for ; Fri, 17 Apr 2009 19:51:07 +0000 (UTC) (envelope-from rnoland@FreeBSD.org) Received: from gizmo.2hip.net (gizmo.2hip.net [64.74.207.195]) by mx1.freebsd.org (Postfix) with ESMTP id 198B88FC17 for ; Fri, 17 Apr 2009 19:51:06 +0000 (UTC) (envelope-from rnoland@FreeBSD.org) Received: from [192.168.1.156] (adsl-156-16-27.bna.bellsouth.net [70.156.16.27]) (authenticated bits=0) by gizmo.2hip.net (8.14.3/8.14.3) with ESMTP id n3HJoxbH035832 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 17 Apr 2009 15:50:59 -0400 (EDT) (envelope-from rnoland@FreeBSD.org) From: Robert Noland To: Damian Gerow In-Reply-To: <20090417103634.GD1186@plebeian.afflictions.org> References: <200904161336.18557.jhb@freebsd.org> <20090416184738.GA60409@wep4035.physik.uni-wuerzburg.de> <200904161558.56919.jhb@freebsd.org> <49E79F49.6000606@samsco.org> <20090417103634.GD1186@plebeian.afflictions.org> Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-Pa7/Zu/6E/eJPJ7ski+p" Organization: FreeBSD Date: Fri, 17 Apr 2009 14:49:57 -0500 Message-Id: <1239997797.24514.6.camel@balrog.2hip.net> Mime-Version: 1.0 X-Mailer: Evolution 2.26.0 FreeBSD GNOME Team Port X-Spam-Status: No, score=-3.2 required=5.0 tests=AWL,BAYES_00,RDNS_DYNAMIC autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on gizmo.2hip.net Cc: freebsd-current@freebsd.org Subject: Re: [PATCH] Possible fix to recent data corruption on HEAD since USB2 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Apr 2009 19:51:08 -0000 --=-Pa7/Zu/6E/eJPJ7ski+p Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Fri, 2009-04-17 at 06:36 -0400, Damian Gerow wrote: > Scott Long wrote: > : John Baldwin wrote: > : > On Thursday 16 April 2009 2:47:38 pm Alexey Shuvaev wrote: > : >> On Thu, Apr 16, 2009 at 01:36:18PM -0400, John Baldwin wrote: > : >>> Due to some good sleuthing by avg@, > : >>> there is a patch that might fix the recent=20 > : >>> reports of data corruption on current. It would explain some of th= e recent=20 > : >>> reports where a file that was read would have missing gaps of bytes= . The=20 > : >>> problem is with the BUS_DMA_KEEP_PG_OFFSET changes to bus_dma. Whe= n a bounce=20 > : >>> page was used by USB2, the changes to bus_dma would actually change= the=20 > : >>> starting virtual and physical addresses of the bounce page. When t= he bounce=20 > : >>> page was no longer needed it was left in this bogus state. Later i= f another=20 > : >>> device used the same bounce page for DMA it would use the wrong off= set and=20 > : >>> address. The issue there is if the second device was doing a full = page of=20 > : >>> I/O. In that case the DMA from the device would actually spill ove= r into the=20 > : >>> next page which could in theory be used by another DMA request. It= could=20 > : >>> also break alignment assumptions (since the previous PG_OFFSET may = not be=20 > : >>> aligned and the bus_dma code assumes bounce pages for the !PG_OFFSE= T case are=20 > : >>> page aligned). The quick fix is to always restore the bounce page = to the=20 > : >>> normal state when a PG_OFFSET DMA request is finished. I'd actual= ly prefer=20 > : >>> not ever touching the page's starting addresses, but those changes = would be=20 > : >>> more invasive I believe. > : >>> > : >>> http://www.FreeBSD.org/~jhb/patches/dma_sg.patch > : >>> > : >> Am I right that hardware prerequisite in order to observe these prob= lems > : >> is amd64 + 4Gb or more of RAM? > : >=20 > : > Well, i386 with PAE would do it as well. Basically, you need USB + o= ne other > : > device that use bounce pages and the other device ends up with corrup= tion. > : >=20 > : >> Is it possible to fabricate some (artificial) test case to stress th= is > : >> particular situation (interleaved use of bounce pages by USB and som= e other > : >> device (?HDD?))? > : >=20 > : > I haven't constructed one though it might be possible to do so. > : >=20 > : >> Asking because as I understand the data corruption is silent > : >> and affected consumer (of bounce pages) should have some mechanism > : >> of detecting this (e.g. zfs' CRCs). > : >> In my case stess testing unpatched system till UFS filesystems are d= ead > : >> is no fun... > : >=20 > : > Understood. I know some other folks are going to test this and if th= ere is > : > early success that may make the risk easier to take. > : >=20 > :=20 > : I have pretty high confidence that John and Andriy found the problem an= d > : fixed it with this patch. It'll be good to get it tested, but I think > : that the risk to tester will be pretty low. >=20 > Having been running the patch for sixteen hours now, I can safely say tha= t > it fixes my issues. I think that I agree... I crashed my amd64 box a few times last night and haven't had massive damage, which is refreshing... I haven't been brave enough to panic with more than usb keyboard though... robert. > - Damian > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org= " --=20 Robert Noland FreeBSD --=-Pa7/Zu/6E/eJPJ7ski+p Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.11 (FreeBSD) iEUEABECAAYFAkno3WUACgkQM4TrQ4qfROMX4wCeJU/Z6Xu9IlQk1r9TpEc2el3L a40AmLViDHujdB2CSw9DN9C643q7nq0= =oVgP -----END PGP SIGNATURE----- --=-Pa7/Zu/6E/eJPJ7ski+p--