From owner-freebsd-current@FreeBSD.ORG  Thu May 16 22:11:55 2013
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id A339CF9B
 for <freebsd-current@freebsd.org>; Thu, 16 May 2013 22:11:55 +0000 (UTC)
 (envelope-from prvs=18480ee867=killing@multiplay.co.uk)
Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
 by mx1.freebsd.org (Postfix) with ESMTP id 32CA0E2E
 for <freebsd-current@freebsd.org>; Thu, 16 May 2013 22:11:54 +0000 (UTC)
Received: from r2d2 ([46.65.172.4])
 by mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23])
 (MDaemon PRO v10.0.4) with ESMTP id md50003842705.msg
 for <freebsd-current@freebsd.org>; Thu, 16 May 2013 23:11:53 +0100
X-Spam-Processed: mail1.multiplay.co.uk, Thu, 16 May 2013 23:11:53 +0100
 (not processed: message from valid local sender)
X-MDDKIM-Result: neutral (mail1.multiplay.co.uk)
X-MDRemoteIP: 46.65.172.4
X-Return-Path: prvs=18480ee867=killing@multiplay.co.uk
X-Envelope-From: killing@multiplay.co.uk
X-MDaemon-Deliver-To: freebsd-current@freebsd.org
Message-ID: <8AE0891D2A494CFCBE85874965362BBD@multiplay.co.uk>
From: "Steven Hartland" <killing@multiplay.co.uk>
To: "O. Hartmann" <ohartman@zedat.fu-berlin.de>
References: <1368638448.1549.5.camel@thor.walstatt.dyndns.org>
 <5193C844.2050404@delphij.net>
 <A67573CBEF474A1F81F7B8B2E7429718@multiplay.co.uk>
 <1368733082.4643.36.camel@thor.walstatt.dyndns.org>
Subject: Re: CURRENT r250636: ZFS pool destroyed while scrubbing in action and
 shutdown
Date: Thu, 16 May 2013 23:11:51 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
 reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Cc: FreeBSD Current <freebsd-current@freebsd.org>, d@delphij.net
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 16 May 2013 22:11:55 -0000


----- Original Message ----- 
From: "O. Hartmann" <ohartman@zedat.fu-berlin.de>
On Thu, 2013-05-16 at 19:42 +0100, Steven Hartland wrote:
>> ----- Original Message ----- 
>> From: "Xin Li" <delphij@delphij.net>
>>
>>
>> > On 05/15/13 10:20, O. Hartmann wrote:
>> >> Several machines running FreeBSD 10.0-CURRENT #0 r250636: Tue May
>> >> 14 21:13:19 CEST 2013 amd64 were scrubbing the pools over the past
>> >> two days. Since that takes a while, I was sure I could shutdown the
>> >> boxes and scrubbing will restart next restart automatically.
>> >>
>> >> Not this time! On ALL(!) systems (three) the pools remains
>> >> destroyed/corrupted showing this message(s) (as a representative, I
>> >> will present only one):
>>
>> Can you confirm the HW your running there?
>>
>> If your using CAM backed disks can you let me know what your seeing for
>> 1. sysctl kern.cam.da | grep delete_method
>> 2. sysctl vfs.zfs.trim
>>
>> The reason I ask is I'm investigating an issue with ZFS TRIM, reported
>> by Ajit Jain, and it tests that have just completed potentially indicate
>> an issue with either CAM or LSI's firmware when processing Write Same
>> requests. Such requests may be used the ZFS TRIM depending on the
>> underlying HW.
>>
>>     Regards
>>     Steve
>>
>> ================================================
>> This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the 
>> event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any 
>> information contained in it.
>>
>> In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
>> or return the E.mail to postmaster@multiplay.co.uk.
>
> Hello Steven.
>
> Below hopefully the requested informations, if you need more, please
> ask.
>
> The scenario on all boxes was the same: scrubbing hasn't finished the
> day before, so the boxes were shutdown over night. In the morning I
> started them up for a couple of minutes before I left for work and shut
> them down again and at that point the "crash" happened.
>
> At work (no access at the moment) the box (third one) is a LGA2011
> system based upon X79 chipset (ASUA P9X79 WS). One pool, a ZFS JBOD,
> finished scrubbing before I shutdown and reboot the box, but as similar
> to the above mentioned Core2Duo box, there is a single disk 3 TB ZFS
> BACKUP pool  and  it showed the same symptoms.
>
> Since no activities were performed on those pools in the short period of
> activity, there seems to be no harm done so far to the pools.
>
> Hardware:
>
> Box a)
> (This box use the single-disk pool)
>
> disk in question: scbus6 target 0 lun 0 (pass3,ada3)
>
> root@thor:/usr/src # camcontrol devlist
> <WDC WD6402AAEX-00Y9A0 01.01V01>   at scbus3 target 0 lun 0 (pass0,ada0)
> <WDC WD1001FALS-00J7B0 05.00K05>   at scbus4 target 0 lun 0 (pass1,ada1)
> <WDC WD1001FALS-00J7B0 05.00K05>   at scbus5 target 0 lun 0 (pass2,ada2)
> <WDC WD20EURS-63Z9B1 80.00A80>     at scbus6 target 0 lun 0 (pass3,ada3)
> <WDC WD5001AALS-00L3B2 01.03B01>   at scbus7 target 0 lun 0 (pass4,ada4)
> <TEAC DV-W524GS BT11>              at scbus8 target 0 lun 0 (pass5,cd0)
> <AHCI SGPIO Enclosure 1.00 0001>   at scbus9 target 0 lun 0 (pass6,ses0)
> <Generic Ultra HS-SD/MMC 1.91>     at scbus11 target 0 lun 0 (da0,pass7)
>
>
> Core2Duo, SATA chipset is ICH10:
>
> root@thor:/usr/src # dmesg | grep ahci
> ahci0: <Marvell 88SE6121 AHCI SATA controller> at channel -1 on atapci0
> ahci0: AHCI v1.00 with 2 3Gbps ports, Port Multiplier supported
> ahcich0: <AHCI channel> at channel 0 on ahci0
> ahcich1: <AHCI channel> at channel 1 on ahci0
> ahci1: <Intel ICH10 AHCI SATA controller> port
> 0xac00-0xac07,0xa880-0xa883,0xa800-0xa807,0xa480-0xa483,0xa400-0xa41f
> mem 0xfbffe800-0xfbffefff irq 19 at device 31.2 on pci0
> ahci1: AHCI v1.20 with 6 3Gbps ports, Port Multiplier supported
> ahcich2: <AHCI channel> at channel 0 on ahci1
> ahcich3: <AHCI channel> at channel 1 on ahci1
> ahcich4: <AHCI channel> at channel 2 on ahci1
> ahcich5: <AHCI channel> at channel 3 on ahci1
> ahcich6: <AHCI channel> at channel 4 on ahci1
> ahcich7: <AHCI channel> at channel 5 on ahci1
> ahciem0: <AHCI enclosure management bridge> on ahci1
> ses0 at ahciem0 bus 0 scbus9 target 0 lun 0
> ada0 at ahcich2 bus 0 scbus3 target 0 lun 0
> ada1 at ahcich3 bus 0 scbus4 target 0 lun 0
> ada2 at ahcich4 bus 0 scbus5 target 0 lun 0
> ada3 at ahcich5 bus 0 scbus6 target 0 lun 0
> ada4 at ahcich6 bus 0 scbus7 target 0 lun 0
> cd0 at ahcich7 bus 0 scbus8 target 0 lun 0
>
>
> root@thor:/usr/src # sysctl kern.cam.da | grep delete_method
> kern.cam.da.0.delete_method: NONE
>
> root@thor:/usr/src # sysctl vfs.zfs.trim
> vfs.zfs.trim.enabled: 1
> vfs.zfs.trim.txg_delay: 32
> vfs.zfs.trim.timeout: 30
> vfs.zfs.trim.max_interval: 1
>
>
> ////////////////////////////////////////////
>
> Box b):
> (Box with the RAIDZ-1 pool as reported in the initial message. This
> pool/machine has a log disk (Samsung SSD 830, 64GB) for ZFS (doesn't
> matter obviously since the other boxes don't have such a thing).
>
> disks in question: scbus4 + scbus5 + scbus6
>
> root@gate [src] camcontrol devlist
> <SAMSUNG SSD 830 Series CXM03B1Q>  at scbus0 target 0 lun 0 (ada0,pass0)
> <SAMSUNG SSD 830 Series CXM03B1Q>  at scbus2 target 0 lun 0 (ada1,pass1)
> <WDC WD30EZRX-00DC0B0 80.00A80>    at scbus4 target 0 lun 0 (ada2,pass2)
> <WDC WD30EZRX-00DC0B0 80.00A80>    at scbus5 target 0 lun 0 (ada3,pass3)
> <WDC WD30EZRX-00DC0B0 80.00A80>    at scbus6 target 0 lun 0 (ada4,pass4)
> <HL-DT-ST DVDRAM GH22LS50 TL03>    at scbus7 target 0 lun 0 (pass5,cd0)
> <AHCI SGPIO Enclosure 1.00 0001>   at scbus8 target 0 lun 0 (pass6,ses0)
>
>
> CPU i3-3220, chipset Intel Z77 SATA
>
> root@gate [src]  dmesg | grep ahci
> ahci0: <ASMedia ASM1061 AHCI SATA controller> port
> 0xc050-0xc057,0xc040-0xc043,0xc030-0xc037,0xc020-0xc023,0xc000-0xc01f
> mem 0xf7c00000-0xf7c001ff irq 19 at device 0.0 on pci4
> ahci0: AHCI v1.20 with 2 6Gbps ports, Port Multiplier supported
> ahcich0: <AHCI channel> at channel 0 on ahci0
> ahcich1: <AHCI channel> at channel 1 on ahci0
> ahci1: <Intel Panther Point AHCI SATA controller> port
> 0xf0b0-0xf0b7,0xf0a0-0xf0a3,0xf090-0xf097,0xf080-0xf083,0xf060-0xf07f
> mem 0xf7f16000-0xf7f167ff irq 19 at device 31.2 on pci0
> ahci1: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
> ahcich2: <AHCI channel> at channel 0 on ahci1
> ahcich3: <AHCI channel> at channel 1 on ahci1
> ahcich4: <AHCI channel> at channel 2 on ahci1
> ahcich5: <AHCI channel> at channel 3 on ahci1
> ahcich6: <AHCI channel> at channel 4 on ahci1
> ahcich7: <AHCI channel> at channel 5 on ahci1
> ahciem0: <AHCI enclosure management bridge> on ahci1
> ses0 at ahciem0 bus 0 scbus8 target 0 lun 0
> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
> ada1 at ahcich2 bus 0 scbus2 target 0 lun 0
> ada2 at ahcich4 bus 0 scbus4 target 0 lun 0
> cd0 at ahcich7 bus 0 scbus7 target 0 lun 0
> ada3 at ahcich5 bus 0 scbus5 target 0 lun 0
> ada4 at ahcich6 bus 0 scbus6 target 0 lun 0
>
>
> root@gate [src] sysctl kern.cam.da | grep delete_method
> root@gate [src]
>
> root@gate [src] sysctl vfs.zfs.trim
> vfs.zfs.trim.enabled: 1
> vfs.zfs.trim.txg_delay: 32
> vfs.zfs.trim.timeout: 30
> vfs.zfs.trim.max_interval: 1

Thanks for the info, thats not something I believe at this time which could
be effected, but always good to check. If my diagnosis changes I'll of course
let you know.

    Regards
    Steve 


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.