Date: Fri, 29 Nov 2024 13:41:16 -0500 From: Dennis Clarke <dclarke@blastwave.org> To: freebsd-current@freebsd.org Subject: Re: zpools no longer exist after boot Message-ID: <1d22bbb4-85fc-4817-a0ee-d1b25a55d220@blastwave.org> In-Reply-To: <754754561.9245.1732891767670@localhost> References: <5798b0db-bc73-476a-908a-dd1f071bfe43@blastwave.org> <CAOtMX2hKCYrx92SBLQOtekKiBWMgBy_n93ZGQ_NVLq=6puRhOg@mail.gmail.com> <22187e59-b6e9-4f2e-ba9b-f43944d1a37b@blastwave.org> <754754561.9245.1732891767670@localhost>
next in thread | previous in thread | raw e-mail | index | archive | help
On 11/29/24 09:49, Ronald Klop wrote: > Van: Dennis Clarke <dclarke@blastwave.org> > Datum: donderdag, 28 november 2024 15:45 > Aan: Alan Somers <asomers@freebsd.org> > CC: Current FreeBSD <freebsd-current@freebsd.org> > Onderwerp: Re: zpools no longer exist after boot >> >> On 11/28/24 08:52, Alan Somers wrote: >> > On Thu, Nov 28, 2024, 7:06AM Dennis Clarke <dclarke@blastwave.org> >> wrote: >> > >> >> >> >> This is a baffling problem wherein two zpools no longer exist after >> >> boot. This is : >> . >> . >> . >> > Do you have zfs_enable="YES" set in /etc/rc.conf? If not then >> nothing will >> > get imported. >> > >> > Regarding the cachefile property, it's expected that "zpool import" >> will >> > change it, unless you do "zpool import -O cachefile=whatever". >> > >> >> The rc script seems to do something slightly different with zpool >> import -c $FOOBAR thus : >> >> >> titan# cat /etc/rc.d/zpool >> #!/bin/sh >> # >> # >> >> # PROVIDE: zpool >> # REQUIRE: hostid disks >> # BEFORE: mountcritlocal >> # KEYWORD: nojail >> >> . /etc/rc.subr >> >> name="zpool" >> desc="Import ZPOOLs" >> rcvar="zfs_enable" >> start_cmd="zpool_start" >> required_modules="zfs" >> >> zpool_start() >> { >> local cachefile >> >> for cachefile in /etc/zfs/zpool.cache /boot/zfs/zpool.cache; do >> if [ -r $cachefile ]; then >> zpool import -c $cachefile -a -N >> if [ $? -ne 0 ]; then >> echo "Import of zpool cache >> ${cachefile} failed," \ >> "will retry after root mount hold >> release" >> root_hold_wait >> zpool import -c $cachefile -a -N >> fi >> break >> fi >> done >> } >> >> load_rc_config $name >> run_rc_command "$1" >> titan# >> >> >> >> I may as well nuke the pre-existing cache file and start over : >> >> >> titan# ls -l /etc/zfs/zpool.cache /boot/zfs/zpool.cache >> -rw-r--r-- 1 root wheel 1424 Jan 16 2024 /boot/zfs/zpool.cache >> -rw-r--r-- 1 root wheel 4960 Nov 28 14:15 /etc/zfs/zpool.cache >> titan# >> titan# >> titan# rm /boot/zfs/zpool.cache >> titan# zpool set cachefile="/boot/zfs/zpool.cache" t0 >> titan# >> titan# ls -l /boot/zfs/zpool.cache >> -rw-r--r-- 1 root wheel 1456 Nov 28 14:27 /boot/zfs/zpool.cache >> titan# >> titan# zpool set cachefile="/boot/zfs/zpool.cache" leaf >> titan# >> titan# ls -l /boot/zfs/zpool.cache >> -rw-r--r-- 1 root wheel 3536 Nov 28 14:28 /boot/zfs/zpool.cache >> titan# >> titan# zpool set cachefile="/boot/zfs/zpool.cache" proteus >> titan# >> titan# ls -l /boot/zfs/zpool.cache >> -rw-r--r-- 1 root wheel 4960 Nov 28 14:28 /boot/zfs/zpool.cache >> titan# >> titan# zpool get cachefile t0 >> NAME PROPERTY VALUE SOURCE >> t0 cachefile /boot/zfs/zpool.cache local >> titan# >> titan# zpool get cachefile leaf >> NAME PROPERTY VALUE SOURCE >> leaf cachefile /boot/zfs/zpool.cache local >> titan# >> titan# zpool get cachefile proteus >> NAME PROPERTY VALUE SOURCE >> proteus cachefile /boot/zfs/zpool.cache local >> titan# >> >> titan# >> titan# reboot >> Nov 28 14:34:05 Waiting (max 60 seconds) for system process `vnlru' to >> stop... done >> Waiting (max 60 seconds) for system process `syncer' to stop... >> Syncing disks, vnodes remaining... 0 0 0 0 0 0 done >> All buffers synced. >> Uptime: 2h38m57s >> GEOM_MIRROR: Device swap: provider destroyed. >> GEOM_MIRROR: Device swap destroyed. >> uhub5: detached >> uhub1: detached >> uhub4: detached >> uhub2: detached >> uhub3: detached >> uhub6: detached >> uhub0: detached >> ix0: link state changed to DOWN >> . >> . >> . >> >> Starting iscsid. >> Starting iscsictl. >> Clearing /tmp. >> Updating /var/run/os-release done. >> Updating motd:. >> Creating and/or trimming log files. >> Starting syslogd. >> No core dumps found. >> Starting local daemons:failed to open cache file: No such file or >> directory >> . >> Starting ntpd. >> Starting powerd. >> Mounting late filesystems:. >> Starting cron. >> Performing sanity check on sshd configuration. >> Starting sshd. >> Starting background file system >> FreeBSD/amd64 (titan) (ttyu0) >> >> login: root >> Password: >> Nov 28 14:36:29 titan login[4162]: ROOT LOGIN (root) ON ttyu0 >> Last login: Thu Nov 28 14:33:45 on ttyu0 >> FreeBSD 15.0-CURRENT (GENERIC-NODEBUG) #1 >> main-n273749-4b65481ac68a-dirty: Wed Nov 20 15:08:52 GMT 2024 >> >> Welcome to FreeBSD! >> >> Release Notes, Errata: https://www.FreeBSD.org/releases/ >> Security Advisories: https://www.FreeBSD.org/security/ >> FreeBSD Handbook: https://www.FreeBSD.org/handbook/ >> FreeBSD FAQ: https://www.FreeBSD.org/faq/ >> Questions List: https://www.FreeBSD.org/lists/questions/ >> FreeBSD Forums: https://forums.FreeBSD.org/ >> >> Documents installed with the system are in the >> /usr/local/share/doc/freebsd/ >> directory, or can be installed later with: pkg install en-freebsd-doc >> For other languages, replace "en" with a language code like de or fr. >> >> Show the version of FreeBSD installed: freebsd-version ; uname -a >> Please include that output and any error messages when posting questions. >> Introduction to manual pages: man man >> FreeBSD directory layout: man hier >> >> To change this login announcement, see motd(5). >> You have new mail. >> titan# >> titan# zpool list >> NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP >> HEALTH ALTROOT >> leaf 18.2T 984K 18.2T - - 0% 0% 1.00x >> ONLINE - >> proteus 1.98T 361G 1.63T - - 1% 17% 1.00x >> ONLINE - >> t0 444G 91.2G 353G - - 27% 20% 1.00x >> ONLINE - >> titan# >> >> This is progress ... however the cachefile property is wiped out again : >> >> titan# zpool get cachefile t0 >> NAME PROPERTY VALUE SOURCE >> t0 cachefile - default >> titan# zpool get cachefile leaf >> NAME PROPERTY VALUE SOURCE >> leaf cachefile - default >> titan# zpool get cachefile proteus >> NAME PROPERTY VALUE SOURCE >> proteus cachefile - default >> titan# >> >> Also, strangely, none of the filesystem in proteus are mounted : >> >> titan# >> titan# zfs list -o name,exec,checksum,canmount,mounted,mountpoint -r >> proteus >> NAME EXEC CHECKSUM CANMOUNT MOUNTED MOUNTPOINT >> proteus on sha512 on no none >> proteus/bhyve off sha512 on no /bhyve >> proteus/bhyve/disk off sha512 on no /bhyve/disk >> proteus/bhyve/isos off sha512 on no /bhyve/isos >> proteus/obj on sha512 on no /usr/obj >> proteus/src on sha512 on no /usr/src >> titan# >> >> If I reboot again without doing anything will the zpools re-appear ? >> >> >> titan# >> titan# Nov 28 14:37:08 titan su[4199]: admsys to root on /dev/pts/0 >> >> titan# reboot >> Nov 28 14:40:29 Waiting (max 60 seconds) for system process `vnlru' to >> stop... done >> Waiting (max 60 seconds) for system process `syncer' to stop... >> Syncing disks, vnodes remaining... 0 0 0 0 0 done >> All buffers synced. >> Uptime: 4m50s >> GEOM_MIRROR: Device swap: provider destroyed. >> GEOM_MIRROR: Device swap destroyed. >> uhub4: detached >> uhub1: detached >> uhub5: detached >> uhub0: detached >> uhub3: detached >> uhub6: detached >> uhub2: detached >> ix0: link state changed to DOWN >> . >> . >> . >> Starting iscsid. >> Starting iscsictl. >> Clearing /tmp. >> Updating /var/run/os-release done. >> Updating motd:. >> Creating and/or trimming log files. >> Starting syslogd. >> No core dumps found. >> Starting local daemons:failed to open cache file: No such file or >> directory >> . >> Starting ntpd. >> Starting powerd. >> Mounting late filesystems:. >> Starting cron. >> Performing sanity check on sshd configuration. >> Starting sshd. >> Starting background file system >> FreeBSD/amd64 (titan) (ttyu0) >> >> login: root >> Password: >> Nov 28 14:43:01 titan login[4146]: ROOT LOGIN (root) ON ttyu0 >> Last login: Thu Nov 28 14:36:29 on ttyu0 >> FreeBSD 15.0-CURRENT (GENERIC-NODEBUG) #1 >> main-n273749-4b65481ac68a-dirty: Wed Nov 20 15:08:52 GMT 2024 >> >> Welcome to FreeBSD! >> >> Release Notes, Errata: https://www.FreeBSD.org/releases/ >> Security Advisories: https://www.FreeBSD.org/security/ >> FreeBSD Handbook: https://www.FreeBSD.org/handbook/ >> FreeBSD FAQ: https://www.FreeBSD.org/faq/ >> Questions List: https://www.FreeBSD.org/lists/questions/ >> FreeBSD Forums: https://forums.FreeBSD.org/ >> >> Documents installed with the system are in the >> /usr/local/share/doc/freebsd/ >> directory, or can be installed later with: pkg install en-freebsd-doc >> For other languages, replace "en" with a language code like de or fr. >> >> Show the version of FreeBSD installed: freebsd-version ; uname -a >> Please include that output and any error messages when posting questions. >> Introduction to manual pages: man man >> FreeBSD directory layout: man hier >> >> To change this login announcement, see motd(5). >> You have new mail. >> titan# >> titan# zpool list >> NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP >> HEALTH ALTROOT >> leaf 18.2T 1.01M 18.2T - - 0% 0% 1.00x >> ONLINE - >> proteus 1.98T 361G 1.63T - - 1% 17% 1.00x >> ONLINE - >> t0 444G 91.2G 353G - - 27% 20% 1.00x >> ONLINE - >> titan# >> titan# zfs list -o name,exec,checksum,canmount,mounted,mountpoint -r >> proteus >> NAME EXEC CHECKSUM CANMOUNT MOUNTED MOUNTPOINT >> proteus on sha512 on no none >> proteus/bhyve off sha512 on no /bhyve >> proteus/bhyve/disk off sha512 on no /bhyve/disk >> proteus/bhyve/isos off sha512 on no /bhyve/isos >> proteus/obj on sha512 on no /usr/obj >> proteus/src on sha512 on no /usr/src >> titan# >> >> OKay so the zpools appear to be back in spite of the strange situation >> with the cachefile property is empty everywhere. My guess is the zpool >> rc script is bring in information during early boot. >> >> Why the zfs filesystems on proteus do not mount? Well that is a >> strange problem but at least the zpool can be used. >> >> -- >> -- >> Dennis Clarke >> RISC-V/SPARC/PPC/ARM/CISC >> UNIX and Linux spoken >> >> >> >> >> > > > Hi, > > The output you provide contains this line: > "Starting local daemons:failed to open cache file: No such file or > directory" > > Where does that output come from? What is in your file /etc/rc.local file? > > Regards, > Ronald. > Ah ha ! I should really do better documentation on this machines config. Sure enough there is something there to handle the iSCSI based zpool : titan# ls -la /etc/rc.local -rw-r--r-- 1 root wheel 92 Mar 12 2024 /etc/rc.local titan# titan# cat /etc/rc.local zpool import -a -c /var/cache/iscsi-zpools.cache -o cachefile=/var/cache/iscsi-zpools.cache This seems familiar because the iSCSI based zpool would not be available at boot time. At some point in the past, late 2023 I think, I was trying to get the iSCSI services working and I saw that iSCSI device(s) were not available after boot. It took a bit of wrangling to get that working in an order where at least I can see the zpool and then import it. titan# zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT leaf 18.2T 267G 17.9T - - 0% 1% 1.00x ONLINE - proteus 1.98T 365G 1.63T - - 1% 17% 1.00x ONLINE - t0 444G 152G 292G - - 31% 34% 1.00x ONLINE - titan# zpool get cachefile proteus NAME PROPERTY VALUE SOURCE proteus cachefile - default titan# ls /var/cache/iscsi-zpools.cache ls: /var/cache/iscsi-zpools.cache: No such file or directory Somehow that cache file vanished and I suspect it was when I moved around ccache location also. A mistake on my part. I wanted the ccache location to not have sync=standard and so it made some scary sense to set sync=disable. To me that is a scary idea. However myself and some others felt that it was okay for the ccache location. Therefore I made a new zfs filesystem on the local NVME boot device just for cache operations and then /var/cache/iscsi-zpools.cache must have been lost in the shuffle around. titan# zpool set cachefile=/var/cache/iscsi-zpools.cache proteus titan# ls -l /var/cache/iscsi-zpools.cache -rw-r--r-- 1 root wheel 1440 Nov 29 18:31 /var/cache/iscsi-zpools.cache Perhaps now, at next reboot, all the zpools will exist and the zfs filesystems on the iSCSI based storage will exist and be mounted as required. At the moment the machine has a large poudriere bulk build running and likely will be busy half of today. Thank you for the excellent catch there! -- -- Dennis Clarke RISC-V/SPARC/PPC/ARM/CISC UNIX and Linux spoken
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1d22bbb4-85fc-4817-a0ee-d1b25a55d220>