From owner-netperf-users@freebsd.org Fri Nov 20 22:47:03 2020 Return-Path: Delivered-To: netperf-users@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 6F2BE46F9F3 for ; Fri, 20 Nov 2020 22:47:03 +0000 (UTC) (envelope-from mike@sentex.net) Received: from pyroxene2a.sentex.ca (pyroxene19.sentex.ca [IPv6:2607:f3e0:0:3::19]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "pyroxene.sentex.ca", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CdBWC29vgz4jr4; Fri, 20 Nov 2020 22:47:03 +0000 (UTC) (envelope-from mike@sentex.net) Received: from [IPv6:2607:f3e0:0:4:200f:3e87:d9ab:2d6f] ([IPv6:2607:f3e0:0:4:200f:3e87:d9ab:2d6f]) by pyroxene2a.sentex.ca (8.15.2/8.15.2) with ESMTPS id 0AKMl2dd076420 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Fri, 20 Nov 2020 17:47:02 -0500 (EST) (envelope-from mike@sentex.net) To: Mateusz Guzik Cc: Philip Paeps , "Bjoern A. Zeeb" , netperf-admin@freebsd.org, netperf-users@freebsd.org, Allan Jude References: <1f8e49ff-e3da-8d24-57f1-11f17389aa84@sentex.net> <2691e1fd-5a27-4dd0-2ef7-b1c06fd4e751@sentex.net> <5A5094BC-D417-4BA6-97E2-7CB522B51368@FreeBSD.org> <4ec6ed6f-b3b4-22ae-e1ec-93a46f3d88ea@sentex.net> <0ddec867-32b5-f667-d617-0ddc71726d09@sentex.net> <5549CA9F-BCF4-4043-BA2F-A2C41D13D955@freebsd.org> <270b65c0-8085-fe2f-cf4f-7a2e4c17a2e8@sentex.net> From: mike tancsa Subject: Re: zoo reboot Friday Nov 20 14:00 UTC Message-ID: <163d1815-fc4a-7987-30c5-0a21e8383c93@sentex.net> Date: Fri, 20 Nov 2020 17:47:03 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.4.3 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Language: en-US X-Rspamd-Queue-Id: 4CdBWC29vgz4jr4 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: netperf-users@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: "Announcements and discussions related to the netperf cluster. " List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Nov 2020 22:47:03 -0000 Unfortunately no luck :( =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 ZFS: i/o error - all block copies unavailable=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ZFS: can't read MOS of pool zroot=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 gptzfsboot: failed to mount default pool zroot=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 FreeBSD/x86 boot=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 Whats odd is that it doesnt post all the drives.... Zoo predated EFI, so it was booting legacy BIOS.=C2=A0 Are the boot block= s that you installed assuming that ? On 11/20/2020 1:27 PM, Mateusz Guzik wrote: > swap and boot partitions resized, the ada0p3 partition got removed > from the pool and inserted back, it is rebuilding now: > > root@zoo2:~ # zpool status > pool: zroot > state: DEGRADED > status: One or more devices is currently being resilvered. The pool wi= ll > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scan: resilver in progress since Fri Nov 20 23:13:28 2020 > 459G scanned at 1.00G/s, 291G issued at 650M/s, 3.47T total > 0B resilvered, 8.17% done, 01:25:48 to go > config: > > NAME STATE READ WRITE C= KSUM > zroot DEGRADED 0 0 = 0 > mirror-0 DEGRADED 0 0 = 0 > replacing-0 DEGRADED 0 0 = 0 > 1517819109053923011 OFFLINE 0 0 > 0 was /dev/ada0p3/old > ada0p3 ONLINE 0 0 = 0 > ada1 ONLINE 0 0 = 0 > mirror-1 ONLINE 0 0 = 0 > ada3p3 ONLINE 0 0 = 0 > ada4p3 ONLINE 0 0 = 0 > mirror-2 ONLINE 0 0 = 0 > ada5p3 ONLINE 0 0 = 0 > ada6p3 ONLINE 0 0 = 0 > special=09 > mirror-3 ONLINE 0 0 = 0 > gptid/db15e826-1a9c-11eb-8d25-0cc47a1f2fa0 ONLINE 0 0 = 0 > mfid1p2 ONLINE 0 0 = 0 > > errors: No known data errors > > One pickle: i did 'zpool export zroot' to replace the drive, otherwise > zfs protested. subsequent zpool import was done slightly carelessly > and it mounted over /, meaning i lost access to original ufs. Should > there be a need to boot from it again someone will have to boot single > user and make sure to comment out swap in /etc/fstab or we will have > to replace the drive again. > > That said, as I understand we are in position to take out the ufs > drive and reboot to be back in business. > > The ufs drive will have to be mounted somewhere to sort out that swap. > > On 11/20/20, mike tancsa wrote: >> On 11/20/2020 1:00 PM, Mateusz Guzik wrote: >>> So this happened after boot: >>> >>> root@zoo2:/home/mjg # swapinfo >>> Device 1K-blocks Used Avail Capacity >>> /dev/ada0p3 2928730500 0 2928730500 0% >>> >>> which i presume might have corrupted some of it. >> Oh, that makes sense now. When it was installed in the back, the drive= >> posted as ada0. When we put it in zoo, it was on a farther down port, >> hence it came up as ada7. I had to manually mount / off ada7p2. I >> updated fstab so as not to do that again already. That mystery solved= =2E >> >> ---Mike >> >> >>> Allan pasted some one-liners to resize the boot and swap partition. >>> >>> With your permission I would like to run them and then offline/online= >>> the disk to have it rebuild. >>> >>> As for longer plans what to do with it i think that's a different >>> subject, whatever new drives end up being used I'm sure the FreeBSD >>> Foundation can reimburse you with no difficulty. >>> >>> >>> On 11/20/20, mike tancsa wrote: >>>> Its a bit of an evolutionary mess the current state of zoo. I wonde= r if >>>> we are better off re-installing the base OS fresh on a pair of SSD >>>> drives and have the base OS on it and leave all the user data on the= >>>> current "zroot"... Considering 240G SSDs are $35 CDN it might be eas= ier >>>> to just install fresh on it and not have to worry about resizing etc= =2E >>>> >>>> ---Mike >>>> >>>> On 11/20/2020 12:49 PM, Mateusz Guzik wrote: >>>>> On 11/20/20, Mateusz Guzik wrote: >>>>>> CC'ing Allan Jude >>>>>> >>>>>> So: >>>>>> >>>>>> pool: zroot >>>>>> state: DEGRADED >>>>>> status: One or more devices could not be opened. Sufficient repli= cas >>>>>> exist >>>>>> for >>>>>> the pool to continue functioning in a degraded state. >>>>>> action: Attach the missing device and online it using 'zpool onlin= e'. >>>>>> see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-2Q >>>>>> scan: scrub repaired 0B in 05:17:02 with 0 errors on Tue Aug 18 >>>>>> 15:19:00 >>>>>> 2020 >>>>>> config: >>>>>> >>>>>> NAME STATE READ WR= ITE >>>>>> CKSUM >>>>>> zroot DEGRADED 0 = 0 >>>>>> 0 >>>>>> mirror-0 DEGRADED 0 = 0 >>>>>> 0 >>>>>> 1517819109053923011 UNAVAIL 0 = 0 >>>>>> 0 was /dev/ada0p3 >>>>>> ada1 ONLINE 0 = 0 >>>>>> 0 >>>>>> mirror-1 ONLINE 0 = 0 >>>>>> 0 >>>>>> ada3p3 ONLINE 0 = 0 >>>>>> 0 >>>>>> ada4p3 ONLINE 0 = 0 >>>>>> 0 >>>>>> mirror-2 ONLINE 0 = 0 >>>>>> 0 >>>>>> ada5p3 ONLINE 0 = 0 >>>>>> 0 >>>>>> ada6p3 ONLINE 0 = 0 >>>>>> 0 >>>>>> special=09 >>>>>> mirror-3 ONLINE 0 = 0 >>>>>> 0 >>>>>> gptid/db15e826-1a9c-11eb-8d25-0cc47a1f2fa0 ONLINE 0 = 0 >>>>>> 0 >>>>>> mfid1p2 ONLINE 0 = 0 >>>>>> 0 >>>>>> >>>>>> errors: No known data errors >>>>>> >>>>>> # dmesg | grep ada0 >>>>>> Trying to mount root from ufs:/dev/ada0p2 [rw]... >>>>>> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 >>>>>> ada0: ACS-2 ATA SATA 3.x device >>>>>> ada0: Serial Number WD-WCC137TALF5K >>>>>> ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) >>>>>> ada0: Command Queueing enabled >>>>>> ada0: 2861588MB (5860533168 512 byte sectors) >>>>>> ada0: quirks=3D0x1<4K> >>>>>> Mounting from ufs:/dev/ada0p2 failed with error 2; retrying for 3 = more >>>>>> seconds >>>>>> Mounting from ufs:/dev/ada0p2 failed with error 2. >>>>>> vfs.root.mountfrom=3Dufs:/dev/ada0p2 >>>>>> GEOM_PART: Partition 'ada0p3' not suitable for kernel dumps (wrong= >>>>>> type?) >>>>>> ZFS WARNING: Unable to attach to ada0p3. >>>>>> ZFS WARNING: Unable to attach to ada0p3. >>>>>> ZFS WARNING: Unable to attach to ada0p3. >>>>>> ZFS WARNING: Unable to attach to ada0p3. >>>>>> ZFS WARNING: Unable to attach to ada0p3. >>>>>> ZFS WARNING: Unable to attach to ada0p3. >>>>>> >>>>>> # gpart show ada0 >>>>>> =3D> 34 5860533101 ada0 GPT (2.7T) >>>>>> 34 6 - free - (3.0K) >>>>>> 40 88 1 freebsd-boot (44K) >>>>>> 128 3072000 2 freebsd-swap (1.5G) >>>>>> 3072128 5857461000 3 freebsd-zfs (2.7T) >>>>>> 5860533128 7 - free - (3.5K) >>>>>> >>>>>> Running naive dd if=3D/dev/ada0p3 works, so I don't know what zfs >>>>>> complains >>>>>> about. >>>>>> >>>>> Also note Philip's point boot partition of 44k. Is that too small n= ow? >>>>> >>>>>> On 11/20/20, mike tancsa wrote: >>>>>>> On 11/20/2020 11:40 AM, Philip Paeps wrote: >>>>>>>> On 2020-11-21 00:04:19 (+0800), Mateusz Guzik wrote: >>>>>>>> >>>>>>>>> Oh, that's a bummer. I wonder if there is a regression in the b= oot >>>>>>>>> loader though. >>>>>>>>> >>>>>>>>> Does the pool mount if you boot the system from a cd/over the >>>>>>>>> network/whatever? >>>>>>>> It's worth checking if the freebsd-boot partition is large enoug= h. >>>>>>>> I >>>>>>>> noticed during the cluster refresh that we often use 108k for >>>>>>>> freebsd-boot but recent head wants 117k. I've been bumping the >>>>>>>> bootblocks to 236k. >>>>>>>> >>>>>>>> So far, all the cluster machines I've upgraded booted though .. = so >>>>>>>> ... >>>>>>>> I might be talking ex recto. :) >>>>>>>> >>>>>>> I put in an ssd drive and booted from it. One of the drives might= >>>>>>> have >>>>>>> gotten loose or died in the power cycles, but there is still >>>>>>> redundancy >>>>>>> and I was able to mount the pool. Not sure why it cant find the f= ile >>>>>>> ? >>>>>>> >>>>>>> root@zoo2:~ # diff /boot/lua/loader.lua /mnt/boot/lua/loader.lua >>>>>>> 29c29 >>>>>>> < -- $FreeBSD$ >>>>>>> --- >>>>>>>> -- $FreeBSD: head/stand/lua/loader.lua 359371 2020-03-27 17:37:3= 1Z >>>>>>> freqlabs $ >>>>>>> root@zoo2:~ # >>>>>>> >>>>>>> >>>>>>> % ls -l /mnt/boot/lua/ >>>>>>> total 110 >>>>>>> -r--r--r-- 1 root wheel 4300 Nov 20 08:41 cli.lua >>>>>>> -r--r--r-- 1 root wheel 3288 Nov 20 08:41 color.lua >>>>>>> -r--r--r-- 1 root wheel 18538 Nov 20 08:41 config.lua >>>>>>> -r--r--r-- 1 root wheel 12610 Nov 20 08:41 core.lua >>>>>>> -r--r--r-- 1 root wheel 11707 Nov 20 08:41 drawer.lua >>>>>>> -r--r--r-- 1 root wheel 2456 Nov 20 08:41 gfx-beastie.lua >>>>>>> -r--r--r-- 1 root wheel 2235 Nov 20 08:41 gfx-beastiebw.lua >>>>>>> -r--r--r-- 1 root wheel 1958 Nov 20 08:41 gfx-fbsdbw.lua >>>>>>> -r--r--r-- 1 root wheel 2413 Nov 20 08:41 gfx-orb.lua >>>>>>> -r--r--r-- 1 root wheel 2140 Nov 20 08:41 gfx-orbbw.lua >>>>>>> -r--r--r-- 1 root wheel 3324 Nov 20 08:41 hook.lua >>>>>>> -r--r--r-- 1 root wheel 2395 Nov 20 08:41 loader.lua >>>>>>> -r--r--r-- 1 root wheel 2429 Sep 24 09:09 logo-beastie.lua >>>>>>> -r--r--r-- 1 root wheel 2203 Sep 24 09:09 logo-beastiebw.lua >>>>>>> -r--r--r-- 1 root wheel 1958 Sep 24 09:09 logo-fbsdbw.lua >>>>>>> -r--r--r-- 1 root wheel 2397 Sep 24 09:09 logo-orb.lua >>>>>>> -r--r--r-- 1 root wheel 2119 Sep 24 09:09 logo-orbbw.lua >>>>>>> -r--r--r-- 1 root wheel 14201 Nov 20 08:41 menu.lua >>>>>>> -r--r--r-- 1 root wheel 4299 Nov 20 08:41 password.lua >>>>>>> -r--r--r-- 1 root wheel 2227 Nov 20 08:41 screen.lua >>>>>>> >>>>>>> >>>>>>> >>>>>> -- >>>>>> Mateusz Guzik >>>>>> >