From owner-freebsd-stable@freebsd.org Tue Oct 3 15:34:06 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9D5EDE40350 for ; Tue, 3 Oct 2017 15:34:06 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 376EC7D347; Tue, 3 Oct 2017 15:34:06 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (mh0.gentlemail.de [78.138.80.135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v93FY4mx000564; Tue, 3 Oct 2017 17:34:04 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (s1.omnilan.de [217.91.127.234]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id 268FF280; Tue, 3 Oct 2017 17:34:04 +0200 (CEST) Message-ID: <59D3ADEB.3010205@omnilan.de> Date: Tue, 03 Oct 2017 17:34:03 +0200 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: Andriy Gapon CC: freebsd-stable@FreeBSD.org Subject: Re: panic: Solaris(panic): blkptr invalid CHECKSUM1 References: <59CFC6A6.6030600@omnilan.de> <59CFD37A.8080009@omnilan.de> <59D00EE5.7090701@omnilan.de> <493e3eec-53c6-3846-0386-d5d7f4756b11@FreeBSD.org> <59D28550.3070700@omnilan.de> <59D34DA0.802@omnilan.de> <59D39C88.4040501@omnilan.de> <4c144055-600c-89cf-13d5-0bf161726d1a@FreeBSD.org> <59D3A131.8040803@omnilan.de> In-Reply-To: <59D3A131.8040803@omnilan.de> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Greylist: ACL 129 matched, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [78.138.80.130]); Tue, 03 Oct 2017 17:34:04 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: 78.138.80.135; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Oct 2017 15:34:06 -0000 Bezüglich Harry Schmalzbauer's Nachricht vom 03.10.2017 16:39 (localtime): > Bezüglich Andriy Gapon's Nachricht vom 03.10.2017 16:28 (localtime): >> On 03/10/2017 17:19, Harry Schmalzbauer wrote: >>> Have tried several different txg IDs, but the latest 5 or so lead to the >>> panic and some other random picked all claim missing devices... >>> Doh, if I only knew about -T some days ago, when I had all 4 devices >>> available. >> I don't think that the error is really about the missing devices. >> Most likely the real problem is that you are going too far back in history where >> the data required to import the pool is not present. It's just that there is no >> special error code to report that condition distinctly, so it gets interpreted >> as a missing device condition. > Sounds reasonable. > When the RAM-corruption happened, a live update was started, where > several pool availability checks were done. No data write. > Last data write were view KBytes some minutes before the corruption, and > the last significant ammount written to that pool was long time before that. > So I still have hope to find an importable txg ID. > > Are they strictly serialized? Seems so. Just for the records, I couldn't recover any data yet, but in general, if a pool isn't damaged that much, the following promising steps were the ones I got closest: I have attached dumps of the physical disks as md2 and md3. 'zpool import' offers cetusPsys DEGRADED mirror-0 DEGRADED 8178308212021996317 UNAVAIL cannot open md3 ONLINE mirror-1 DEGRADED md2p5 ONLINE 4036286347185017167 UNAVAIL cannot open Which is ḱnown to be corrupt. This time I also attached zdb(8) dumps (sparse files) of the remaining two disks, resp. partition. Now import offers this: pool: cetusPsys id: 13207378952432032998 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: cetusPsys ONLINE mirror-0 ONLINE md5 ONLINE md3 ONLINE mirror-1 ONLINE md2p5 ONLINE md4 ONLINE 'zdb -ue cetusPsys' showed me the latest txg ID (3757573 in my case). So I decremented the txg ID by one and repeated until the following fatal panicing indicator vanished: loading space map for vdev 1 of 2, metaslab 108 of 109 ... WARNING: blkptr at 0x80e0ead00 has invalid CHECKSUM 1 WARNING: blkptr at 0x80e0ead00 has invalid COMPRESS 0 WARNING: blkptr at 0x80e0ead00 DVA 0 has invalid VDEV 2337865727 WARNING: blkptr at 0x80e0ead00 DVA 1 has invalid VDEV 289407040 WARNING: blkptr at 0x80e0ead00 DVA 2 has invalid VDEV 3959586324 Which was 'zdb -c -t 3757569 -AAA -e cetusPsys': Traversing all blocks to verify metadata checksums and verify nothing leaked ... loading space map for vdev 1 of 2, metaslab 108 of 109 ... 89.0M completed ( 6MB/s) estimated time remaining: 3hr 34min 47sec zdb_blkptr_cb: Got error 122 reading <69, 0, 0, c> -- skipping 86.8G completed ( 588MB/s) estimated time remaining: 0hr 00min 00sec Error counts: errno count 122 1 leaked space: vdev 0, offset 0xa01084200, size 512 leaked space: vdev 0, offset 0xd0dc23c00, size 512 leaked space: vdev 0, offset 0x2380182200, size 3072 leaked space: vdev 0, offset 0x2380189a00, size 1536 leaked space: vdev 0, offset 0x2380183000, size 1536 leaked space: vdev 0, offset 0x238039a200, size 2560 leaked space: vdev 0, offset 0x238039be00, size 18944 leaked space: vdev 0, offset 0x23801b3200, size 9216 leaked space: vdev 0, offset 0x33122a8800, size 512 leaked space: vdev 1, offset 0x2808f1600, size 512 leaked space: vdev 1, offset 0x2808f1e00, size 512 leaked space: vdev 1, offset 0x2808f2e00, size 4096 leaked space: vdev 1, offset 0x2808f1a00, size 512 leaked space: vdev 1, offset 0x9010e6c00, size 512 leaked space: vdev 1, offset 0x23c5ad9c00, size 512 leaked space: vdev 1, offset 0x2e00ad4800, size 512 leaked space: vdev 1, offset 0x2f0030b200, size 50176 leaked space: vdev 1, offset 0x2f000ca800, size 512 leaked space: vdev 1, offset 0x2f003a9800, size 15360 leaked space: vdev 1, offset 0x2f003af600, size 13312 leaked space: vdev 1, offset 0x2f00715c00, size 1024 leaked space: vdev 1, offset 0x2f003adc00, size 6144 leaked space: vdev 1, offset 0x2f00363600, size 38912 block traversal size 93540302336 != alloc 93540473344 (leaked 171008) bp count: 3670624 ganged count: 0 bp logical: 96083156992 avg: 26176 bp physical: 93308853248 avg: 25420 compression: 1.03 bp allocated: 93540302336 avg: 25483 compression: 1.03 bp deduped: 0 ref>1: 0 deduplication: 1.00 SPA allocated: 93540473344 used: 19.98% additional, non-pointer bps of type 0: 48879 Dittoed blocks on same vdev: 23422 In my case, import didn't work with the highest non-panicing txg ID: zpool import -o readonly=on -R /mnt -T 3757569 cetusPsys cannot import 'cetusPsys': one or more devices is currently unavailable Maybe anybody else will have more luck... just keep the "-T" parameter for zpool(8)'s import command in mind. thanks, -harry