Date: Wed, 29 May 2013 18:13:14 +0530 From: Ajit Jain <ajit.jain@cloudbyte.com> To: Steven Hartland <killing@multiplay.co.uk> Cc: freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: seeing data corruption with zfs trim functionality Message-ID: <CAA71u6a3TJ_sO3Q%2BiJa8EHKE2iM0MKh31D37pGAoua7QU_6xYg@mail.gmail.com> In-Reply-To: <2C2F5CAAE72B4658BFA09E4694A21375@multiplay.co.uk> References: <CAA71u6Y5dKZ9O0rqxCpx-9t7DYgTnPZSoNy-iHOnmzrOUYp%2Bvw@mail.gmail.com> <60316751643743738AB83DABC6A5934B@multiplay.co.uk> <20130429105143.GA1492@icarus.home.lan> <3AD1AB31003D49B2BF2EA7DD411B38A2@multiplay.co.uk> <C6AA4D0A7C49469ABB3C7440B1BCC108@multiplay.co.uk> <CAA71u6Zh7BbbdC=utqfR2MD1Nn=9euUDXHKqqu9NyBG-Jx%2B=Ow@mail.gmail.com> <9681E07546D348168052D4FC5365B4CD@multiplay.co.uk> <CAA71u6ZuO9CF0ECFS4z07-E5qPea-6SfNwkvhr_g6pFT5MV5yQ@mail.gmail.com> <CAA71u6YKGHDRVg6W_xnCNaA68bJvAZ2Lkp-UisiPqb1vKjJhfA@mail.gmail.com> <3E9CA9334E6F433A8F135ACD5C237340@multiplay.co.uk> <CAA71u6YZAKrmfTLU32f8UmYecmydwiqRT-OrR1ukZ9V6PGsU%2Bw@mail.gmail.com> <A05ACD84EB974E80B7142CE9982E479C@multiplay.co.uk> <93D0677B373A452BAF58C8EA6823783D@multiplay.co.uk> <CAA71u6bZ_4fb9FxYSwcrHBBApkZog30iQJGyTERi-xFMksud1g@mail.gmail.com> <35ABA7AAEB7F4D86A1ED54C4C47FEB49@multiplay.co.uk> <CAA71u6ahzRai=uUp5L6nDQxxEZC=d5jd4jBBfPNa2k29OwTZDg@mail.gmail.com> <2C2F5CAAE72B4658BFA09E4694A21375@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi Steven, Sorry for the long delay, but might delay even further. I think the reason for the corruption was, my code was not updated specially cam directory. I request please do not stop just because of the issue I reported. I'll update my src tree and rerun the experiments I was running if I see some issue then probably we fix the bug rather then stopping for MFC. thanks, ajit On Wed, May 29, 2013 at 5:19 PM, Steven Hartland <killing@multiplay.co.uk>wrote: > Sorry to pester, but any update on this Ajit? > > I ask as its currently blocking the MFC of TRIM to stable/8 & 9 and I've > been > unable to reproduce this issue even with your testing code on working FW > versions. > > > Regards > Steve > > ----- Original Message ----- From: "Ajit Jain" <ajit.jain@cloudbyte.com> > > > Sure Steven, >> I'll apply the patches and update ASAP. >> >> thanks >> ajit >> >> >> On Thu, May 23, 2013 at 3:03 PM, Steven Hartland <killing@multiplay.co.uk >> >**wrote: >> >> I've attacked the two patch sets I'm looking to MFC to stable-9, one >>> adds BIO_DELETE CAM changes and the other is ZFS TRIM support. >>> >>> They should both apply cleanly to stable-9, if you could test with >>> those on your machine and let me know. >>> >>> Regards >>> Steve >>> >>> ----- Original Message ----- From: "Ajit Jain" <ajit.jain@cloudbyte.com> >>> >>> >>> Hi Steven, >>> >>>> >>>> FW version on the setup is P15. >>>> I will upgrade the FW to P16, but I think my >>>> best bet will be to update code base to 9 stable as unlike you, >>>> I was seeing corruption for all three delete methods. >>>> >>>> thanks >>>> ajit >>>> >>>> On Sat, May 18, 2013 at 4:15 AM, Steven Hartland < >>>> killing@multiplay.co.uk >>>> >**wrote: >>>> >>>> >>>> ----- Original Message ----- From: "Steven Hartland" < >>>> >>>>> killing@multiplay.co.uk> >>>>> >>>>> >>>>> After initially seeing not issues, our overnight monitoring started >>>>> >>>>>> moaning >>>>>> big time on the test box. So we checked and there was zpool corruption >>>>>> as >>>>>> well >>>>>> as a missing boot loader and a corrupt GPT, so I believe we have >>>>>> reproduced >>>>>> your issue. >>>>>> >>>>>> After recovering the machine I created 3 pools on 3 different disks >>>>>> each >>>>>> running a different delete_method. >>>>>> >>>>>> We then re-ran the tests which resulted in the pool running with >>>>>> delete_method >>>>>> WS16 being so broken it had suspended IO. A reboot resulted in it once >>>>>> again >>>>>> reporting no partition table via gpart. >>>>>> >>>>>> A third test run again produced a corrupt pool for WS16. >>>>>> >>>>>> I've conducted a preliminary review of the CAM WS16 code path along >>>>>> with >>>>>> SBC-3 >>>>>> spec which didn't identify any obvious issues. >>>>>> >>>>>> Given we're both using LSI 2008 based controllers it could be FW issue >>>>>> specific >>>>>> to WS16 but that's just speculation atm, so I'll continue to >>>>>> investigate. >>>>>> >>>>>> If you could re-test you end without using WS16 to see if you can >>>>>> reproduce the >>>>>> problem with either UNMAP or ATA_TRIM that would be a very useful data >>>>>> point. >>>>>> >>>>>> >>>>>> After much playing I narrow down a test case of one delete which was >>>>> causing >>>>> disc corruption for us (deleted the partition table instead of data in >>>>> the middle of the disk). >>>>> >>>>> The conclusion is LSI 2008 HBA with FW below P13 will eat the data on >>>>> your >>>>> SATA >>>>> disks if you use WS16 due to the following bug:- >>>>> SCGCQ00230159 (DFCT) - Write same command to a SATA drive that doesn't >>>>> support >>>>> SCT write same may write wrong region. >>>>> >>>>> After updating here to P16, which we would generally be running, but >>>>> test >>>>> box >>>>> was new and hadnt updated yet the corruption issue is no longer >>>>> reproducable. >>>>> >>>>> So Ajit please check your FW version, I'm hoping to here your on >>>>> something >>>>> below P13, P12 possibly? >>>>> >>>>> If so then this is your issue, to fix simply update to P16 and the >>>>> problem >>>>> should be gone. >>>>> >>>>> >>>>> Regards >>>>> Steve >>>>> >>>>> >>>>> ==============================******================== >>>>> >>>>> >>>>> This e.mail is private and confidential between Multiplay (UK) Ltd. and >>>>> the person or entity to whom it is addressed. In the event of >>>>> misdirection, >>>>> the recipient is prohibited from using, copying, printing or otherwise >>>>> disseminating it or any information contained in it. >>>>> In the event of misdirection, illegible or incomplete transmission >>>>> please >>>>> telephone +44 845 868 1337 >>>>> or return the E.mail to postmaster@multiplay.co.uk. >>>>> >>>>> >>>>> >>>>> >>>> ==============================****================== >>> This e.mail is private and confidential between Multiplay (UK) Ltd. and >>> the person or entity to whom it is addressed. In the event of >>> misdirection, >>> the recipient is prohibited from using, copying, printing or otherwise >>> disseminating it or any information contained in it. >>> In the event of misdirection, illegible or incomplete transmission please >>> telephone +44 845 868 1337 >>> or return the E.mail to postmaster@multiplay.co.uk. >>> >>> >> > ==============================**================== > This e.mail is private and confidential between Multiplay (UK) Ltd. and > the person or entity to whom it is addressed. In the event of misdirection, > the recipient is prohibited from using, copying, printing or otherwise > disseminating it or any information contained in it. > In the event of misdirection, illegible or incomplete transmission please > telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAA71u6a3TJ_sO3Q%2BiJa8EHKE2iM0MKh31D37pGAoua7QU_6xYg>