From owner-freebsd-fs@FreeBSD.ORG Tue Jun 4 08:35:24 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id AD8B036E for ; Tue, 4 Jun 2013 08:35:24 +0000 (UTC) (envelope-from prvs=1867102569=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 3E43311BF for ; Tue, 4 Jun 2013 08:35:23 +0000 (UTC) Received: from r2d2 ([82.69.141.170]) by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) (MDaemon PRO v10.0.4) with ESMTP id md50004149828.msg for ; Tue, 04 Jun 2013 09:35:15 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Tue, 04 Jun 2013 09:35:15 +0100 (not processed: message from valid local sender) X-MDDKIM-Result: neutral (mail1.multiplay.co.uk) X-MDRemoteIP: 82.69.141.170 X-Return-Path: prvs=1867102569=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-fs@freebsd.org Message-ID: From: "Steven Hartland" To: "Ajit Jain" , "freebsd-fs" References: <3E9CA9334E6F433A8F135ACD5C237340@multiplay.co.uk> <93D0677B373A452BAF58C8EA6823783D@multiplay.co.uk> <35ABA7AAEB7F4D86A1ED54C4C47FEB49@multiplay.co.uk> <2C2F5CAAE72B4658BFA09E4694A21375@multiplay.co.uk> <6E4EBFE196274519B847A47A062950EE@multiplay.co.uk> Subject: Re: seeing data corruption with zfs trim functionality Date: Tue, 4 Jun 2013 09:35:11 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Jun 2013 08:35:24 -0000 Those are all just symlinks so unless your extracting to the correct location then its likely just moaning because the absolutely pathed file doesn't exsit. If your just wanting to update that machine run:- rm -rf /usr/src /usr/obj tar -xzPf stable-9-r251096.tar.gz Regards Steve ----- Original Message ----- From: "Ajit Jain" To: "freebsd-fs" ; "Steven Hartland" Sent: Tuesday, June 04, 2013 8:39 AM Subject: Fwd: seeing data corruption with zfs trim functionality > Hi Steven, > > I am not able to send full output file to freebsd-fs. > I am just sending the error file in this mail and will > send you another mail which contain to full untar output. > > > regards, > ajit > > ---------- Forwarded message ---------- > From: Ajit Jain > Date: Mon, Jun 3, 2013 at 11:51 PM > Subject: Re: seeing data corruption with zfs trim functionality > To: Steven Hartland > Cc: freebsd-fs > > > Hi Steven, > > > untar of the tarball is throwing the error below: > tar: Error exit delayed from previous errors. > > I have download the file from the link 3 times, every time I am seeing the > same issue. > Please find the tar output file and error (grep from the tar output file) > attached with mail. > > checksum of tar ball (after unzip, on freebsd) is: > root@everest:/pool_9stable/obj_src/new # cksum stable-9-r251096.tar > 2972813925 3474278400 stable-9-r251096.tar > > > regards, > ajit > > > > > On Fri, May 31, 2013 at 4:12 AM, Steven Hartland wrote: > >> Tar archive of /usr/src and /usr/obj with built world and GENERIC kernel >> for ams64 can be found here:- >> http://blog.multiplay.co.uk/**dropzone/freebsd/stable-9-**r251096.tar.gz >> >> This is based off r251096 with current proposed MFC of CAM BIO_DELETE & >> ZFS TRIM. >> >> >> Regards >> Steve >> ----- Original Message ----- From: "Ajit Jain" >> >> >> Hi Steven, >>> >>> That would be really great. I'll install build provided by you and can >>> quickly >>> update the result. I am kind of feeling that I am asking too much of fever >>> from you. >>> >>> thanks for the help and bearing me, >>> ajit >>> >>> >>> On Wed, May 29, 2013 at 6:39 PM, Steven Hartland >> >**wrote: >>> >>> Unfortunately FS corruption is a serious matters so even though I'm >>>> 99.99% >>>> convinced there isn't a problem I'd still prefer to confirm this was >>>> indeed >>>> an issue with your code base and not an issue with the current code prior >>>> to MFC'ing. >>>> >>>> Would a pre-patched stable/9 source / build help. If so I can look at >>>> making >>>> that available for you. >>>> >>>> >>>> Regards >>>> Steve >>>> >>>> ----- Original Message ----- From: "Ajit Jain" >>>> >>>> >>>> Hi Steven, >>>> >>>>> >>>>> Sorry for the long delay, but might delay even further. >>>>> I think the reason for the corruption was, my code >>>>> was not updated specially cam directory. >>>>> >>>>> I request please do not stop just because of the issue I reported. >>>>> I'll update my src tree and rerun the experiments I was running >>>>> if I see some issue then probably we fix the bug rather then stopping >>>>> for MFC. >>>>> >>>>> thanks, >>>>> ajit >>>>> >>>>> >>>>> >>>>> On Wed, May 29, 2013 at 5:19 PM, Steven Hartland < >>>>> killing@multiplay.co.uk >>>>> >**wrote: >>>>> >>>>> >>>>> Sorry to pester, but any update on this Ajit? >>>>> >>>>>> >>>>>> I ask as its currently blocking the MFC of TRIM to stable/8 & 9 and >>>>>> I've >>>>>> been >>>>>> unable to reproduce this issue even with your testing code on working >>>>>> FW >>>>>> versions. >>>>>> >>>>>> >>>>>> Regards >>>>>> Steve >>>>>> >>>>>> ----- Original Message ----- From: "Ajit Jain" < >>>>>> ajit.jain@cloudbyte.com> >>>>>> >>>>>> >>>>>> Sure Steven, >>>>>> >>>>>> I'll apply the patches and update ASAP. >>>>>>> >>>>>>> thanks >>>>>>> ajit >>>>>>> >>>>>>> >>>>>>> On Thu, May 23, 2013 at 3:03 PM, Steven Hartland < >>>>>>> killing@multiplay.co.uk >>>>>>> >**wrote: >>>>>>> >>>>>>> >>>>>>> I've attacked the two patch sets I'm looking to MFC to stable-9, one >>>>>>> >>>>>>> adds BIO_DELETE CAM changes and the other is ZFS TRIM support. >>>>>>>> >>>>>>>> They should both apply cleanly to stable-9, if you could test with >>>>>>>> those on your machine and let me know. >>>>>>>> >>>>>>>> Regards >>>>>>>> Steve >>>>>>>> >>>>>>>> ----- Original Message ----- From: "Ajit Jain" < >>>>>>>> ajit.jain@cloudbyte.com> >>>>>>>> >>>>>>>> >>>>>>>> Hi Steven, >>>>>>>> >>>>>>>> >>>>>>>> FW version on the setup is P15. >>>>>>>>> I will upgrade the FW to P16, but I think my >>>>>>>>> best bet will be to update code base to 9 stable as unlike you, >>>>>>>>> I was seeing corruption for all three delete methods. >>>>>>>>> >>>>>>>>> thanks >>>>>>>>> ajit >>>>>>>>> >>>>>>>>> On Sat, May 18, 2013 at 4:15 AM, Steven Hartland < >>>>>>>>> killing@multiplay.co.uk >>>>>>>>> >**wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> ----- Original Message ----- From: "Steven Hartland" < >>>>>>>>> >>>>>>>>> killing@multiplay.co.uk> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> After initially seeing not issues, our overnight monitoring >>>>>>>>>> started >>>>>>>>>> >>>>>>>>>> moaning >>>>>>>>>> >>>>>>>>>>> big time on the test box. So we checked and there was zpool >>>>>>>>>>> corruption >>>>>>>>>>> as >>>>>>>>>>> well >>>>>>>>>>> as a missing boot loader and a corrupt GPT, so I believe we have >>>>>>>>>>> reproduced >>>>>>>>>>> your issue. >>>>>>>>>>> >>>>>>>>>>> After recovering the machine I created 3 pools on 3 different >>>>>>>>>>> disks >>>>>>>>>>> each >>>>>>>>>>> running a different delete_method. >>>>>>>>>>> >>>>>>>>>>> We then re-ran the tests which resulted in the pool running with >>>>>>>>>>> delete_method >>>>>>>>>>> WS16 being so broken it had suspended IO. A reboot resulted in it >>>>>>>>>>> once >>>>>>>>>>> again >>>>>>>>>>> reporting no partition table via gpart. >>>>>>>>>>> >>>>>>>>>>> A third test run again produced a corrupt pool for WS16. >>>>>>>>>>> >>>>>>>>>>> I've conducted a preliminary review of the CAM WS16 code path >>>>>>>>>>> along >>>>>>>>>>> with >>>>>>>>>>> SBC-3 >>>>>>>>>>> spec which didn't identify any obvious issues. >>>>>>>>>>> >>>>>>>>>>> Given we're both using LSI 2008 based controllers it could be FW >>>>>>>>>>> issue >>>>>>>>>>> specific >>>>>>>>>>> to WS16 but that's just speculation atm, so I'll continue to >>>>>>>>>>> investigate. >>>>>>>>>>> >>>>>>>>>>> If you could re-test you end without using WS16 to see if you can >>>>>>>>>>> reproduce the >>>>>>>>>>> problem with either UNMAP or ATA_TRIM that would be a very useful >>>>>>>>>>> data >>>>>>>>>>> point. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> After much playing I narrow down a test case of one delete which >>>>>>>>>>> was >>>>>>>>>>> >>>>>>>>>>> causing >>>>>>>>>> disc corruption for us (deleted the partition table instead of data >>>>>>>>>> in >>>>>>>>>> the middle of the disk). >>>>>>>>>> >>>>>>>>>> The conclusion is LSI 2008 HBA with FW below P13 will eat the data >>>>>>>>>> on >>>>>>>>>> your >>>>>>>>>> SATA >>>>>>>>>> disks if you use WS16 due to the following bug:- >>>>>>>>>> SCGCQ00230159 (DFCT) - Write same command to a SATA drive that >>>>>>>>>> doesn't >>>>>>>>>> support >>>>>>>>>> SCT write same may write wrong region. >>>>>>>>>> >>>>>>>>>> After updating here to P16, which we would generally be running, >>>>>>>>>> but >>>>>>>>>> test >>>>>>>>>> box >>>>>>>>>> was new and hadnt updated yet the corruption issue is no longer >>>>>>>>>> reproducable. >>>>>>>>>> >>>>>>>>>> So Ajit please check your FW version, I'm hoping to here your on >>>>>>>>>> something >>>>>>>>>> below P13, P12 possibly? >>>>>>>>>> >>>>>>>>>> If so then this is your issue, to fix simply update to P16 and the >>>>>>>>>> problem >>>>>>>>>> should be gone. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Regards >>>>>>>>>> Steve >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ==============================**********================== >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This e.mail is private and confidential between Multiplay (UK) Ltd. >>>>>>>>>> and >>>>>>>>>> the person or entity to whom it is addressed. In the event of >>>>>>>>>> misdirection, >>>>>>>>>> the recipient is prohibited from using, copying, printing or >>>>>>>>>> otherwise >>>>>>>>>> disseminating it or any information contained in it. >>>>>>>>>> In the event of misdirection, illegible or incomplete transmission >>>>>>>>>> please >>>>>>>>>> telephone +44 845 868 1337 >>>>>>>>>> or return the E.mail to postmaster@multiplay.co.uk. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ==============================********================== >>>>>>>>>> >>>>>>>>> >>>>>>>>> This e.mail is private and confidential between Multiplay (UK) >>>>>>>> Ltd. and >>>>>>>> the person or entity to whom it is addressed. In the event of >>>>>>>> misdirection, >>>>>>>> the recipient is prohibited from using, copying, printing or >>>>>>>> otherwise >>>>>>>> disseminating it or any information contained in it. >>>>>>>> In the event of misdirection, illegible or incomplete transmission >>>>>>>> please >>>>>>>> telephone +44 845 868 1337 >>>>>>>> or return the E.mail to postmaster@multiplay.co.uk. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ==============================******================== >>>>>>> >>>>>> This e.mail is private and confidential between Multiplay (UK) Ltd. and >>>>>> the person or entity to whom it is addressed. In the event of >>>>>> misdirection, >>>>>> the recipient is prohibited from using, copying, printing or otherwise >>>>>> disseminating it or any information contained in it. >>>>>> In the event of misdirection, illegible or incomplete transmission >>>>>> please >>>>>> telephone +44 845 868 1337 >>>>>> or return the E.mail to postmaster@multiplay.co.uk. >>>>>> >>>>>> >>>>>> >>>>>> >>>>> ==============================****================== >>>> This e.mail is private and confidential between Multiplay (UK) Ltd. and >>>> the person or entity to whom it is addressed. In the event of >>>> misdirection, >>>> the recipient is prohibited from using, copying, printing or otherwise >>>> disseminating it or any information contained in it. >>>> In the event of misdirection, illegible or incomplete transmission please >>>> telephone +44 845 868 1337 >>>> or return the E.mail to postmaster@multiplay.co.uk. >>>> >>>> >>>> >>> >> ==============================**================== >> This e.mail is private and confidential between Multiplay (UK) Ltd. and >> the person or entity to whom it is addressed. In the event of misdirection, >> the recipient is prohibited from using, copying, printing or otherwise >> disseminating it or any information contained in it. >> In the event of misdirection, illegible or incomplete transmission please >> telephone +44 845 868 1337 >> or return the E.mail to postmaster@multiplay.co.uk. >> >> > ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.