Date: Sat, 14 Mar 2020 17:14:10 +0200 From: Andriy Gapon <avg@FreeBSD.org> To: Willem Jan Withagen <wjw@digiware.nl>, FreeBSD Filesystems <freebsd-fs@FreeBSD.org> Subject: Re: ZFS pools in "trouble" Message-ID: <24916dd7-f22c-b55b-73ae-1a2bfe653f9c@FreeBSD.org> In-Reply-To: <15bde4a5-0a2e-9984-dfd6-fce39f079f52@digiware.nl> References: <71e1f22a-1261-67d9-e41d-0f326bf81469@digiware.nl> <91e1cd09-b6b8-f107-537f-ae2755aba087@FreeBSD.org> <15bde4a5-0a2e-9984-dfd6-fce39f079f52@digiware.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
On 14/03/2020 13:00, Willem Jan Withagen wrote: > On 27-2-2020 09:11, Andriy Gapon wrote: >> On 26/02/2020 19:09, Willem Jan Withagen wrote: >>> Hi, >>> >>> I'm using my pools in perhaps a rather awkward way as underlying storage for my >>> ceph cluster: >>> 1 disk per pool, with log and cache on SSD >>> >>> For one reason or another one of the servers has crashed ad does not really want >>> to read several of the pools: >>> ---- >>> pool: osd_2 >>> state: UNAVAIL >>> Assertion failed: (reason == ZPOOL_STATUS_OK), file >>> /usr/src/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c, line 5098. >>> Abort (core dumped) >>> ---- >>> >>> The code there is like: >>> ---- >>> default: >>> /* >>> * The remaining errors can't actually be generated, yet. >>> */ >>> assert(reason == ZPOOL_STATUS_OK); >>> >>> ---- >>> And this on already 3 disks. >>> Running: >>> FreeBSD 12.1-STABLE (GENERIC) #0 r355208M: Fri Nov 29 10:43:47 CET 2019 >>> >>> Now this is a test cluster, so no harm there in matters of data loss. >>> And the ceph cluster probably can rebuild everything if I do not lose too many >>> disk. >>> >>> But the problem also lies in the fact that not all disk are recognized by the >>> kernel, and not all disk end up mounted. So I need to remove a pool first to get >>> more disks online. >>> >>> Is there anything I can do the get them back online? >>> Or is this a lost cause? >> Depends on what 'reason' is. >> I mean the value of the variable. > > I ran into the same problem. Even though I deleted the zpool in error. > > Ao I augmented this code with a pringtf > > Error: Reason not found: 5 It seems that 5 is ZPOOL_STATUS_BAD_GUID_SUM and there is a discrepancy between what the code in status_callback() expects and what actually happens. Looks like check_status() can actually return ZPOOL_STATUS_BAD_GUID_SUM: /* * Check that the config is complete. */ if (vs->vs_state == VDEV_STATE_CANT_OPEN && vs->vs_aux == VDEV_AUX_BAD_GUID_SUM) return (ZPOOL_STATUS_BAD_GUID_SUM); I think that VDEV_AUX_BAD_GUID_SUM typically means that a device is missing from the pool. E.g., a log device. Or there is some other discrepancy between expected pool vdevs and found pool vdevs. -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?24916dd7-f22c-b55b-73ae-1a2bfe653f9c>