Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Nov 2024 13:41:16 -0500
From:      Dennis Clarke <dclarke@blastwave.org>
To:        freebsd-current@freebsd.org
Subject:   Re: zpools no longer exist after boot
Message-ID:  <1d22bbb4-85fc-4817-a0ee-d1b25a55d220@blastwave.org>
In-Reply-To: <754754561.9245.1732891767670@localhost>
References:  <5798b0db-bc73-476a-908a-dd1f071bfe43@blastwave.org> <CAOtMX2hKCYrx92SBLQOtekKiBWMgBy_n93ZGQ_NVLq=6puRhOg@mail.gmail.com> <22187e59-b6e9-4f2e-ba9b-f43944d1a37b@blastwave.org> <754754561.9245.1732891767670@localhost>

next in thread | previous in thread | raw e-mail | index | archive | help
On 11/29/24 09:49, Ronald Klop wrote:
 > Van: Dennis Clarke <dclarke@blastwave.org>
 > Datum: donderdag, 28 november 2024 15:45
 > Aan: Alan Somers <asomers@freebsd.org>
 > CC: Current FreeBSD <freebsd-current@freebsd.org>
 > Onderwerp: Re: zpools no longer exist after boot
 >>
 >> On 11/28/24 08:52, Alan Somers wrote:
 >> > On Thu, Nov 28, 2024, 7:06AM Dennis Clarke <dclarke@blastwave.org>
 >> wrote:
 >> >
 >> >>
 >> >> This is a baffling problem wherein two zpools no longer exist after
 >> >> boot. This is :
 >> .
 >> .
 >> .
 >> > Do you have zfs_enable="YES" set in /etc/rc.conf? If not then
 >> nothing will
 >> > get imported.
 >> >
 >> > Regarding the cachefile property, it's expected that "zpool import"
 >> will
 >> > change it, unless you do "zpool import -O cachefile=whatever".
 >> >
 >>
 >> The rc script seems to do something slightly different with zpool
 >> import -c $FOOBAR thus :
 >>
 >>
 >> titan# cat  /etc/rc.d/zpool
 >> #!/bin/sh
 >> #
 >> #
 >>
 >> # PROVIDE: zpool
 >> # REQUIRE: hostid disks
 >> # BEFORE: mountcritlocal
 >> # KEYWORD: nojail
 >>
 >> . /etc/rc.subr
 >>
 >> name="zpool"
 >> desc="Import ZPOOLs"
 >> rcvar="zfs_enable"
 >> start_cmd="zpool_start"
 >> required_modules="zfs"
 >>
 >> zpool_start()
 >> {
 >>          local cachefile
 >>
 >>          for cachefile in /etc/zfs/zpool.cache /boot/zfs/zpool.cache; do
 >>                  if [ -r $cachefile ]; then
 >>                          zpool import -c $cachefile -a -N
 >>                          if [ $? -ne 0 ]; then
 >>                                  echo "Import of zpool cache
 >> ${cachefile} failed," \
 >>                                      "will retry after root mount hold
 >> release"
 >>                                  root_hold_wait
 >>                                  zpool import -c $cachefile -a -N
 >>                          fi
 >>                          break
 >>                  fi
 >>          done
 >> }
 >>
 >> load_rc_config $name
 >> run_rc_command "$1"
 >> titan#
 >>
 >>
 >>
 >> I may as well nuke the pre-existing cache file and start over :
 >>
 >>
 >> titan# ls -l /etc/zfs/zpool.cache /boot/zfs/zpool.cache
 >> -rw-r--r--  1 root wheel 1424 Jan 16  2024 /boot/zfs/zpool.cache
 >> -rw-r--r--  1 root wheel 4960 Nov 28 14:15 /etc/zfs/zpool.cache
 >> titan#
 >> titan#
 >> titan# rm /boot/zfs/zpool.cache
 >> titan# zpool set cachefile="/boot/zfs/zpool.cache" t0
 >> titan#
 >> titan# ls -l /boot/zfs/zpool.cache
 >> -rw-r--r--  1 root wheel 1456 Nov 28 14:27 /boot/zfs/zpool.cache
 >> titan#
 >> titan# zpool set cachefile="/boot/zfs/zpool.cache" leaf
 >> titan#
 >> titan# ls -l /boot/zfs/zpool.cache
 >> -rw-r--r--  1 root wheel 3536 Nov 28 14:28 /boot/zfs/zpool.cache
 >> titan#
 >> titan# zpool set cachefile="/boot/zfs/zpool.cache" proteus
 >> titan#
 >> titan# ls -l /boot/zfs/zpool.cache
 >> -rw-r--r--  1 root wheel 4960 Nov 28 14:28 /boot/zfs/zpool.cache
 >> titan#
 >> titan# zpool get cachefile t0
 >> NAME  PROPERTY   VALUE                  SOURCE
 >> t0    cachefile  /boot/zfs/zpool.cache  local
 >> titan#
 >> titan# zpool get cachefile leaf
 >> NAME  PROPERTY   VALUE                  SOURCE
 >> leaf  cachefile  /boot/zfs/zpool.cache  local
 >> titan#
 >> titan# zpool get cachefile proteus
 >> NAME     PROPERTY   VALUE                  SOURCE
 >> proteus  cachefile  /boot/zfs/zpool.cache  local
 >> titan#
 >>
 >> titan#
 >> titan# reboot
 >> Nov 28 14:34:05 Waiting (max 60 seconds) for system process `vnlru' to
 >> stop... done
 >> Waiting (max 60 seconds) for system process `syncer' to stop...
 >> Syncing disks, vnodes remaining... 0 0 0 0 0 0 done
 >> All buffers synced.
 >> Uptime: 2h38m57s
 >> GEOM_MIRROR: Device swap: provider destroyed.
 >> GEOM_MIRROR: Device swap destroyed.
 >> uhub5: detached
 >> uhub1: detached
 >> uhub4: detached
 >> uhub2: detached
 >> uhub3: detached
 >> uhub6: detached
 >> uhub0: detached
 >> ix0: link state changed to DOWN
 >> .
 >> .
 >> .
 >>
 >> Starting iscsid.
 >> Starting iscsictl.
 >> Clearing /tmp.
 >> Updating /var/run/os-release done.
 >> Updating motd:.
 >> Creating and/or trimming log files.
 >> Starting syslogd.
 >> No core dumps found.
 >> Starting local daemons:failed to open cache file: No such file or
 >> directory
 >> .
 >> Starting ntpd.
 >> Starting powerd.
 >> Mounting late filesystems:.
 >> Starting cron.
 >> Performing sanity check on sshd configuration.
 >> Starting sshd.
 >> Starting background file system
 >> FreeBSD/amd64 (titan) (ttyu0)
 >>
 >> login: root
 >> Password:
 >> Nov 28 14:36:29 titan login[4162]: ROOT LOGIN (root) ON ttyu0
 >> Last login: Thu Nov 28 14:33:45 on ttyu0
 >> FreeBSD 15.0-CURRENT (GENERIC-NODEBUG) #1
 >> main-n273749-4b65481ac68a-dirty: Wed Nov 20 15:08:52 GMT 2024
 >>
 >> Welcome to FreeBSD!
 >>
 >> Release Notes, Errata: https://www.FreeBSD.org/releases/
 >> Security Advisories:   https://www.FreeBSD.org/security/
 >> FreeBSD Handbook:      https://www.FreeBSD.org/handbook/
 >> FreeBSD FAQ:           https://www.FreeBSD.org/faq/
 >> Questions List:        https://www.FreeBSD.org/lists/questions/
 >> FreeBSD Forums:        https://forums.FreeBSD.org/
 >>
 >> Documents installed with the system are in the
 >> /usr/local/share/doc/freebsd/
 >> directory, or can be installed later with:  pkg install en-freebsd-doc
 >> For other languages, replace "en" with a language code like de or fr.
 >>
 >> Show the version of FreeBSD installed:  freebsd-version ; uname -a
 >> Please include that output and any error messages when posting 
questions.
 >> Introduction to manual pages:  man man
 >> FreeBSD directory layout:      man hier
 >>
 >> To change this login announcement, see motd(5).
 >> You have new mail.
 >> titan#
 >> titan# zpool list
 >> NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP
 >> HEALTH  ALTROOT
 >> leaf     18.2T   984K  18.2T        -         -     0%     0%  1.00x
 >> ONLINE  -
 >> proteus  1.98T   361G  1.63T        -         -     1%    17%  1.00x
 >> ONLINE  -
 >> t0        444G  91.2G   353G        -         -    27%    20%  1.00x
 >> ONLINE  -
 >> titan#
 >>
 >> This is progress ... however the cachefile property is wiped out again :
 >>
 >> titan# zpool get cachefile t0
 >> NAME  PROPERTY   VALUE      SOURCE
 >> t0    cachefile  -          default
 >> titan# zpool get cachefile leaf
 >> NAME  PROPERTY   VALUE      SOURCE
 >> leaf  cachefile  -          default
 >> titan# zpool get cachefile proteus
 >> NAME     PROPERTY   VALUE      SOURCE
 >> proteus  cachefile  -          default
 >> titan#
 >>
 >> Also, strangely, none of the filesystem in proteus are mounted :
 >>
 >> titan#
 >> titan# zfs list -o name,exec,checksum,canmount,mounted,mountpoint -r
 >> proteus
 >> NAME                EXEC  CHECKSUM   CANMOUNT  MOUNTED  MOUNTPOINT
 >> proteus             on    sha512     on        no       none
 >> proteus/bhyve       off   sha512     on        no       /bhyve
 >> proteus/bhyve/disk  off   sha512     on        no       /bhyve/disk
 >> proteus/bhyve/isos  off   sha512     on        no       /bhyve/isos
 >> proteus/obj         on    sha512     on        no       /usr/obj
 >> proteus/src         on    sha512     on        no       /usr/src
 >> titan#
 >>
 >> If I reboot again without doing anything will the zpools re-appear ?
 >>
 >>
 >> titan#
 >> titan# Nov 28 14:37:08 titan su[4199]: admsys to root on /dev/pts/0
 >>
 >> titan# reboot
 >> Nov 28 14:40:29 Waiting (max 60 seconds) for system process `vnlru' to
 >> stop... done
 >> Waiting (max 60 seconds) for system process `syncer' to stop...
 >> Syncing disks, vnodes remaining... 0 0 0 0 0 done
 >> All buffers synced.
 >> Uptime: 4m50s
 >> GEOM_MIRROR: Device swap: provider destroyed.
 >> GEOM_MIRROR: Device swap destroyed.
 >> uhub4: detached
 >> uhub1: detached
 >> uhub5: detached
 >> uhub0: detached
 >> uhub3: detached
 >> uhub6: detached
 >> uhub2: detached
 >> ix0: link state changed to DOWN
 >> .
 >> .
 >> .
 >> Starting iscsid.
 >> Starting iscsictl.
 >> Clearing /tmp.
 >> Updating /var/run/os-release done.
 >> Updating motd:.
 >> Creating and/or trimming log files.
 >> Starting syslogd.
 >> No core dumps found.
 >> Starting local daemons:failed to open cache file: No such file or
 >> directory
 >> .
 >> Starting ntpd.
 >> Starting powerd.
 >> Mounting late filesystems:.
 >> Starting cron.
 >> Performing sanity check on sshd configuration.
 >> Starting sshd.
 >> Starting background file system
 >> FreeBSD/amd64 (titan) (ttyu0)
 >>
 >> login: root
 >> Password:
 >> Nov 28 14:43:01 titan login[4146]: ROOT LOGIN (root) ON ttyu0
 >> Last login: Thu Nov 28 14:36:29 on ttyu0
 >> FreeBSD 15.0-CURRENT (GENERIC-NODEBUG) #1
 >> main-n273749-4b65481ac68a-dirty: Wed Nov 20 15:08:52 GMT 2024
 >>
 >> Welcome to FreeBSD!
 >>
 >> Release Notes, Errata: https://www.FreeBSD.org/releases/
 >> Security Advisories:   https://www.FreeBSD.org/security/
 >> FreeBSD Handbook:      https://www.FreeBSD.org/handbook/
 >> FreeBSD FAQ:           https://www.FreeBSD.org/faq/
 >> Questions List:        https://www.FreeBSD.org/lists/questions/
 >> FreeBSD Forums:        https://forums.FreeBSD.org/
 >>
 >> Documents installed with the system are in the
 >> /usr/local/share/doc/freebsd/
 >> directory, or can be installed later with:  pkg install en-freebsd-doc
 >> For other languages, replace "en" with a language code like de or fr.
 >>
 >> Show the version of FreeBSD installed:  freebsd-version ; uname -a
 >> Please include that output and any error messages when posting 
questions.
 >> Introduction to manual pages:  man man
 >> FreeBSD directory layout:      man hier
 >>
 >> To change this login announcement, see motd(5).
 >> You have new mail.
 >> titan#
 >> titan# zpool list
 >> NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP
 >> HEALTH  ALTROOT
 >> leaf     18.2T  1.01M  18.2T        -         -     0%     0%  1.00x
 >> ONLINE  -
 >> proteus  1.98T   361G  1.63T        -         -     1%    17%  1.00x
 >> ONLINE  -
 >> t0        444G  91.2G   353G        -         -    27%    20%  1.00x
 >> ONLINE  -
 >> titan#
 >> titan# zfs list -o name,exec,checksum,canmount,mounted,mountpoint -r
 >> proteus
 >> NAME                EXEC  CHECKSUM   CANMOUNT  MOUNTED  MOUNTPOINT
 >> proteus             on    sha512     on        no       none
 >> proteus/bhyve       off   sha512     on        no       /bhyve
 >> proteus/bhyve/disk  off   sha512     on        no       /bhyve/disk
 >> proteus/bhyve/isos  off   sha512     on        no       /bhyve/isos
 >> proteus/obj         on    sha512     on        no       /usr/obj
 >> proteus/src         on    sha512     on        no       /usr/src
 >> titan#
 >>
 >> OKay so the zpools appear to be back in spite of the strange situation
 >> with the cachefile property is empty everywhere.  My guess is the zpool
 >> rc script is bring in information during early boot.
 >>
 >> Why the zfs filesystems on proteus do not mount? Well that is a
 >> strange problem but at least the zpool can be used.
 >>
 >> --
 >> --
 >> Dennis Clarke
 >> RISC-V/SPARC/PPC/ARM/CISC
 >> UNIX and Linux spoken
 >>
 >>
 >>
 >>
 >>
 >
 >
 > Hi,
 >
 > The output you provide contains this line:
 > "Starting local daemons:failed to open cache file: No such file or
 > directory"
 >
 > Where does that output come from? What is in your file /etc/rc.local 
file?
 >
 > Regards,
 > Ronald.
 >
Ah ha !

I should really do better documentation on this machines config. Sure
enough there is something there to handle the iSCSI based zpool :

titan# ls -la  /etc/rc.local
-rw-r--r--  1 root wheel 92 Mar 12  2024 /etc/rc.local
titan#
titan# cat  /etc/rc.local
zpool import -a -c /var/cache/iscsi-zpools.cache -o 
cachefile=/var/cache/iscsi-zpools.cache

This seems familiar because the iSCSI based zpool would not be available
at boot time. At some point in the past, late 2023 I think, I was trying
to get the iSCSI services working and I saw that iSCSI device(s) were
not available after boot. It took a bit of wrangling to get that working
in an order where at least I can see the zpool and then import it.


titan# zpool list
NAME      SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP 
HEALTH  ALTROOT
leaf     18.2T   267G  17.9T        -         -     0%     1%  1.00x 
ONLINE  -
proteus  1.98T   365G  1.63T        -         -     1%    17%  1.00x 
ONLINE  -
t0        444G   152G   292G        -         -    31%    34%  1.00x 
ONLINE  -

titan# zpool get cachefile proteus
NAME     PROPERTY   VALUE      SOURCE
proteus  cachefile  -          default

titan# ls /var/cache/iscsi-zpools.cache
ls: /var/cache/iscsi-zpools.cache: No such file or directory


Somehow that cache file vanished and I suspect it was when I moved
around ccache location also. A mistake on my part. I wanted the ccache
location to not have sync=standard and so it made some scary sense to
set sync=disable. To me that is a scary idea. However myself and some
others felt that it was okay for the ccache location. Therefore I made
a new zfs filesystem on the local NVME boot device just for cache
operations and then /var/cache/iscsi-zpools.cache must have been lost
in the shuffle around.


titan# zpool set cachefile=/var/cache/iscsi-zpools.cache proteus

titan# ls -l /var/cache/iscsi-zpools.cache
-rw-r--r--  1 root wheel 1440 Nov 29 18:31 /var/cache/iscsi-zpools.cache

Perhaps now, at next reboot, all the zpools will exist and the zfs
filesystems on the iSCSI based storage will exist and be mounted as
required. At the moment the machine has a large poudriere bulk build
running and likely will be busy half of today.

Thank you for the excellent catch there!

-- 
--
Dennis Clarke
RISC-V/SPARC/PPC/ARM/CISC
UNIX and Linux spoken





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1d22bbb4-85fc-4817-a0ee-d1b25a55d220>