Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 03 Oct 2017 17:34:03 +0200
From:      Harry Schmalzbauer <freebsd@omnilan.de>
To:        Andriy Gapon <avg@FreeBSD.org>
Cc:        freebsd-stable@FreeBSD.org
Subject:   Re: panic: Solaris(panic): blkptr invalid CHECKSUM1
Message-ID:  <59D3ADEB.3010205@omnilan.de>
In-Reply-To: <59D3A131.8040803@omnilan.de>
References:  <59CFC6A6.6030600@omnilan.de> <59CFD37A.8080009@omnilan.de> <59D00EE5.7090701@omnilan.de> <493e3eec-53c6-3846-0386-d5d7f4756b11@FreeBSD.org> <59D28550.3070700@omnilan.de> <59D34DA0.802@omnilan.de> <e8d8084b-5740-2645-69ae-a4e3967c7e59@FreeBSD.org> <59D39C88.4040501@omnilan.de> <4c144055-600c-89cf-13d5-0bf161726d1a@FreeBSD.org> <59D3A131.8040803@omnilan.de>

next in thread | previous in thread | raw e-mail | index | archive | help
 Bezüglich Harry Schmalzbauer's Nachricht vom 03.10.2017 16:39 (localtime):
> Bezüglich Andriy Gapon's Nachricht vom 03.10.2017 16:28 (localtime):
>> On 03/10/2017 17:19, Harry Schmalzbauer wrote:
>>> Have tried several different txg IDs, but the latest 5 or so lead to the
>>> panic and some other random picked all claim missing devices...
>>> Doh, if I only knew about -T some days ago, when I had all 4 devices
>>> available.
>> I don't think that the error is really about the missing devices.
>> Most likely the real problem is that you are going too far back in history where
>> the data required to import the pool is not present.  It's just that there is no
>> special error code to report that condition distinctly, so it gets interpreted
>> as a missing device condition.
> Sounds reasonable.
> When the RAM-corruption happened, a live update was started, where
> several pool availability checks were done. No data write.
> Last data write were view KBytes some minutes before the corruption, and
> the last significant ammount written to that pool was long time before that.
> So I still have hope to find an importable txg ID.
>
> Are they strictly serialized?

Seems so.
Just for the records, I couldn't recover any data yet, but in general,
if a pool isn't damaged that much, the following promising steps were
the ones I got closest:

I have attached dumps of the physical disks as md2 and md3.
'zpool import' offers
    cetusPsys                DEGRADED
      mirror-0               DEGRADED
        8178308212021996317  UNAVAIL  cannot open
        md3                  ONLINE
      mirror-1               DEGRADED
        md2p5                ONLINE
        4036286347185017167  UNAVAIL  cannot open

Which is ḱnown to be corrupt.
This time I also attached zdb(8) dumps (sparse files) of the remaining
two disks, resp. partition.
Now import offers this:
   pool: cetusPsys
     id: 13207378952432032998
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

    cetusPsys   ONLINE
      mirror-0  ONLINE
        md5     ONLINE
        md3     ONLINE
      mirror-1  ONLINE
        md2p5   ONLINE
        md4     ONLINE

'zdb -ue cetusPsys' showed me the latest txg ID (3757573 in my case).

So I decremented the txg ID by one and repeated until the following
fatal panicing indicator vanished:
loading space map for vdev 1 of 2, metaslab 108 of 109 ...
WARNING: blkptr at 0x80e0ead00 has invalid CHECKSUM 1
WARNING: blkptr at 0x80e0ead00 has invalid COMPRESS 0
WARNING: blkptr at 0x80e0ead00 DVA 0 has invalid VDEV 2337865727
WARNING: blkptr at 0x80e0ead00 DVA 1 has invalid VDEV 289407040
WARNING: blkptr at 0x80e0ead00 DVA 2 has invalid VDEV 3959586324

Which was 'zdb -c -t 3757569 -AAA -e cetusPsys':

Traversing all blocks to verify metadata checksums and verify nothing
leaked ...

loading space map for vdev 1 of 2, metaslab 108 of 109 ...
89.0M completed (   6MB/s) estimated time remaining: 3hr 34min 47sec
zdb_blkptr_cb: Got error 122 reading <69, 0, 0, c>  -- skipping
86.8G completed ( 588MB/s) estimated time remaining: 0hr 00min 00sec       
Error counts:

    errno  count
      122  1
leaked space: vdev 0, offset 0xa01084200, size 512
leaked space: vdev 0, offset 0xd0dc23c00, size 512
leaked space: vdev 0, offset 0x2380182200, size 3072
leaked space: vdev 0, offset 0x2380189a00, size 1536
leaked space: vdev 0, offset 0x2380183000, size 1536
leaked space: vdev 0, offset 0x238039a200, size 2560
leaked space: vdev 0, offset 0x238039be00, size 18944
leaked space: vdev 0, offset 0x23801b3200, size 9216
leaked space: vdev 0, offset 0x33122a8800, size 512
leaked space: vdev 1, offset 0x2808f1600, size 512
leaked space: vdev 1, offset 0x2808f1e00, size 512
leaked space: vdev 1, offset 0x2808f2e00, size 4096
leaked space: vdev 1, offset 0x2808f1a00, size 512
leaked space: vdev 1, offset 0x9010e6c00, size 512
leaked space: vdev 1, offset 0x23c5ad9c00, size 512
leaked space: vdev 1, offset 0x2e00ad4800, size 512
leaked space: vdev 1, offset 0x2f0030b200, size 50176
leaked space: vdev 1, offset 0x2f000ca800, size 512
leaked space: vdev 1, offset 0x2f003a9800, size 15360
leaked space: vdev 1, offset 0x2f003af600, size 13312
leaked space: vdev 1, offset 0x2f00715c00, size 1024
leaked space: vdev 1, offset 0x2f003adc00, size 6144
leaked space: vdev 1, offset 0x2f00363600, size 38912
block traversal size 93540302336 != alloc 93540473344 (leaked 171008)

    bp count:         3670624
    ganged count:           0
    bp logical:    96083156992      avg:  26176
    bp physical:   93308853248      avg:  25420     compression:   1.03
    bp allocated:  93540302336      avg:  25483     compression:   1.03
    bp deduped:             0    ref>1:      0   deduplication:   1.00
    SPA allocated: 93540473344     used: 19.98%

    additional, non-pointer bps of type 0:      48879
    Dittoed blocks on same vdev: 23422


In my case, import didn't work with the highest non-panicing txg ID:
zpool import -o readonly=on -R /mnt -T 3757569 cetusPsys
cannot import 'cetusPsys': one or more devices is currently unavailable

Maybe anybody else will have more luck... just keep the "-T" parameter
for zpool(8)'s import command in mind.

thanks,

-harry



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?59D3ADEB.3010205>