Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Jun 2022 15:40:33 -0600
From:      Alan Somers <asomers@freebsd.org>
To:        John Doherty <bsdlists@jld3.net>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: "spare-X" device remains after resilvering
Message-ID:  <CAOtMX2iv3g-pA=XciiFCoH-6y%2B=RKeJ61TnOvJm2bPNoc_WwEg@mail.gmail.com>
In-Reply-To: <34A91D31-1883-40AE-82F3-57B783532ED7@jld3.net>
References:  <34A91D31-1883-40AE-82F3-57B783532ED7@jld3.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jun 20, 2022 at 7:42 AM John Doherty <bsdlists@jld3.net> wrote:
>
> Hi, I have a zpool that currently looks like this (some lines elided for
> brevity; all omitted devices are online and apparently fine):
>
>    pool: zp1
>   state: DEGRADED
> status: One or more devices has been taken offline by the administrator.
>          Sufficient replicas exist for the pool to continue functioning
> in a
>          degraded state.
> action: Online the device using 'zpool online' or replace the device
> with
>          'zpool replace'.
>    scan: resilvered 1.76T in 1 days 00:38:14 with 0 errors on Sun Jun 19
> 22:31:46 2022
> config:
>
>          NAME                       STATE     READ WRITE CKSUM
>          zp1                        DEGRADED     0     0     0
>            raidz2-0                 ONLINE       0     0     0
>              gpt/disk0              ONLINE       0     0     0
>              gpt/disk1              ONLINE       0     0     0
>              ...
>              gpt/disk9              ONLINE       0     0     0
>            raidz2-1                 ONLINE       0     0     0
>              gpt/disk10             ONLINE       0     0     0
>              ...
>              gpt/disk19             ONLINE       0     0     0
>            raidz2-2                 ONLINE       0     0     0
>              gpt/disk20             ONLINE       0     0     0
>              ...
>              gpt/disk29             ONLINE       0     0     0
>            raidz2-3                 DEGRADED     0     0     0
>              gpt/disk30             ONLINE       0     0     0
>              3343132967577870793    OFFLINE      0     0     0  was
> /dev/gpt/disk31
>              ...
>              spare-9                DEGRADED     0     0     0
>                6960108738988598438  OFFLINE      0     0     0  was
> /dev/gpt/disk39
>                gpt/disk41           ONLINE       0     0     0
>          spares
>            16713572025248921080     INUSE     was /dev/gpt/disk41
>            gpt/disk42               AVAIL
>            gpt/disk43               AVAIL
>            gpt/disk44               AVAIL
>
> My question is why the "spare-9" device still exists after the
> resilvering completed. Based on past experience, my expectation was that
> it would exist for the duration of the resilvering and after that, only
> the "gpt/disk41" device would appear in the output of "zpool status."
>
> I also expected that when the resilvering completed, the "was
> /dev/gpt/disk41" device would be removed from the list of spares.
>
> I took the "was /dev/gpt/disk31" device offline deliberately because it
> was causing a lot of "CAM status: SCSI Status Error" errors. Next step
> for this pool is to replace that with one of the available spares but
> I'd like to get things looking a little cleaner before doing that.
>
> I don't have much in the way of ideas here. One thought was to export
> the pool and then do "zpool import zp1 -d /dev/gpt" and see if that
> cleaned things up.
>
> This system is running 12.2-RELEASE-p4, which I know is a little out of
> date. I'm going to update it 13.1-RELEASE soon but the more immediate
> need is to get this zpool in good shape.
>
> Any insights or advice much appreciated. Happy to provide any further
> info that might be helpful. Thanks.

This is expected behavior.  I take it that you were expecting for
6960108738988598438 to be removed from the configuration, replaced by
gpt/disk41, and for gpt/disk41 to disappear from the spare list?  That
didn't happen because ZFS considers anything in the spare list to be a
permanent spare.  It will never automatically remove a disk from the
spare list.  Instead, zfs is expecting for you to provide it with a
permanent replacement for the failed disk.  Once resilvering to the
permanent replacement is complete, then it will automatically detach
the spare.

OTOH, if you really want gpt/disk41 to be the permanent replacement, I
think you can accomplish that with some combination of the following
commands:

zpool detach zp1 6960108738988598438
zpool remove zp1 gpt/disk41

-Alan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2iv3g-pA=XciiFCoH-6y%2B=RKeJ61TnOvJm2bPNoc_WwEg>