From owner-netperf-users@freebsd.org Fri Nov 20 23:02:44 2020 Return-Path: Delivered-To: netperf-users@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id B509547037A for ; Fri, 20 Nov 2020 23:02:44 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CdBsJ4Rcpz4kW9; Fri, 20 Nov 2020 23:02:44 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-wr1-x436.google.com with SMTP id p8so12461624wrx.5; Fri, 20 Nov 2020 15:02:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=3B/F7YQ3y21Bhe8nNZrN0y4u9ngyogT6QGGRcB/Fz0o=; b=iIh57QwesIvjn7ZpdDIUk8VL7bkHsxNuNq1WzRbv76dWLtIQuAiQu8zP1TWGHnu3mA G8gajQcfoHGnBd4+nX7gp/oQZaisC/u65Nl0GT7Umn9xKjzL51RYf7G7/JfzEPeSI0Rx Pd7sEMxfRAPgrNTeeMKMRaIRVHXXfcgq5arPPZS4nW79fX0Nn0Oq5U2Lga4wpzYYfILo GfYDfmKcZF1IH0/DjSnRDjndjh6/J7eN31AcSk67WOsBB5HzX4/kC4tcswj98W58+Kz5 6ARXwilsMx3ycKhy19p9C8Tiszj11G5ZpUlbjphZd9p9KY/X2Jmp++HqbwwSkPObAYEZ x5WA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=3B/F7YQ3y21Bhe8nNZrN0y4u9ngyogT6QGGRcB/Fz0o=; b=jBE1dZD9k2qG8VMZPmDMu8cEohPnw7cE5ZMUaYBjKoL8l1LbZEbceCcRG70I/sQFbo QU9djB8P/u/uM+Hr7uebuQ5QojSgR6qX0BiNlxDt/5CLWPG4eBbFDBCxKsBOZOch0k1f FFcsh2BxOzJbRHmrizL7l9/Fpl+N6A8MSnG8ZWDPlN2aJ2WFnnNg5+tpdJZn2tKcQfmq Z9/dqmDhkhxiyfisfcrNvSPgpAjn9KyiFhdyIlCdlI3hn6h72ClAeig260M5PHvtk44P fPo846ffeHIl+CtvrIhTFQqA4aUwPrgmz72kfs1C0VvaXSrU3/wxW+rk6l7BGuPot3zn 6u+w== X-Gm-Message-State: AOAM532sbkAAodklVtyKZXP1m2dBcdSGYMqz0pCPdHwB3JRaZ2dB7o11 OpgAkJrvtyCU2XoLl/jr4OsQNEk3iqexIlkoX1U= X-Google-Smtp-Source: ABdhPJxiYPqizlNRKRaDYlaFM5ulNojfBT61bTlffnTa34FHYtRnTsl7EwHFeBXmTMNANeArNT4254dNsOozSZYkZIo= X-Received: by 2002:adf:9b98:: with SMTP id d24mr18147155wrc.17.1605913363297; Fri, 20 Nov 2020 15:02:43 -0800 (PST) MIME-Version: 1.0 Received: by 2002:adf:dec7:0:0:0:0:0 with HTTP; Fri, 20 Nov 2020 15:02:42 -0800 (PST) In-Reply-To: References: <1f8e49ff-e3da-8d24-57f1-11f17389aa84@sentex.net> <5A5094BC-D417-4BA6-97E2-7CB522B51368@FreeBSD.org> <4ec6ed6f-b3b4-22ae-e1ec-93a46f3d88ea@sentex.net> <0ddec867-32b5-f667-d617-0ddc71726d09@sentex.net> <5549CA9F-BCF4-4043-BA2F-A2C41D13D955@freebsd.org> <270b65c0-8085-fe2f-cf4f-7a2e4c17a2e8@sentex.net> <163d1815-fc4a-7987-30c5-0a21e8383c93@sentex.net> From: Mateusz Guzik Date: Sat, 21 Nov 2020 00:02:42 +0100 Message-ID: Subject: Re: zoo reboot Friday Nov 20 14:00 UTC To: mike tancsa Cc: Philip Paeps , "Bjoern A. Zeeb" , netperf-admin@freebsd.org, netperf-users@freebsd.org, Allan Jude Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4CdBsJ4Rcpz4kW9 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: netperf-users@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: "Announcements and discussions related to the netperf cluster. " List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Nov 2020 23:02:44 -0000 should you go trhough with it please make sure zsh is installed. then i can get in and install some other stuff as needed. On 11/20/20, mike tancsa wrote: > That looks good to my newbie eyes. I think tomorrow I will just bite > the bullet and put in a pair of 240G SSDs to boot from. I will install > HEAD onto them on another machine, and then put them in zoo, boot from > there and adjust the home directory mounts accordingly if thats OK with > everyone ? > > ---Mike > > On 11/20/2020 5:53 PM, Mateusz Guzik wrote: >> I ran this one-liner: >> >> gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0 >> >> which according to https://wiki.freebsd.org/RootOnZFS/GPTZFSBoot >> should be fine. Hopefully Allan will know better. >> >> On 11/20/20, mike tancsa wrote: >>> Unfortunately no luck :( >>> >>> >>> >>> ZFS: i/o error - all block copies >>> unavailable >>> ZFS: can't read MOS of pool >>> zroot >>> gptzfsboot: failed to mount default pool >>> zroot >>> >>> >>> FreeBSD/x86 >>> boot >>> >>> >>> Whats odd is that it doesnt post all the drives.... >>> >>> Zoo predated EFI, so it was booting legacy BIOS. Are the boot blocks >>> that you installed assuming that ? >>> >>> On 11/20/2020 1:27 PM, Mateusz Guzik wrote: >>>> swap and boot partitions resized, the ada0p3 partition got removed >>>> from the pool and inserted back, it is rebuilding now: >>>> >>>> root@zoo2:~ # zpool status >>>> pool: zroot >>>> state: DEGRADED >>>> status: One or more devices is currently being resilvered. The pool >>>> will >>>> continue to function, possibly in a degraded state. >>>> action: Wait for the resilver to complete. >>>> scan: resilver in progress since Fri Nov 20 23:13:28 2020 >>>> 459G scanned at 1.00G/s, 291G issued at 650M/s, 3.47T total >>>> 0B resilvered, 8.17% done, 01:25:48 to go >>>> config: >>>> >>>> NAME STATE READ WRITE >>>> CKSUM >>>> zroot DEGRADED 0 0 >>>> 0 >>>> mirror-0 DEGRADED 0 0 >>>> 0 >>>> replacing-0 DEGRADED 0 0 >>>> 0 >>>> 1517819109053923011 OFFLINE 0 0 >>>> 0 was /dev/ada0p3/old >>>> ada0p3 ONLINE 0 0 >>>> 0 >>>> ada1 ONLINE 0 0 >>>> 0 >>>> mirror-1 ONLINE 0 0 >>>> 0 >>>> ada3p3 ONLINE 0 0 >>>> 0 >>>> ada4p3 ONLINE 0 0 >>>> 0 >>>> mirror-2 ONLINE 0 0 >>>> 0 >>>> ada5p3 ONLINE 0 0 >>>> 0 >>>> ada6p3 ONLINE 0 0 >>>> 0 >>>> special >>>> mirror-3 ONLINE 0 0 >>>> 0 >>>> gptid/db15e826-1a9c-11eb-8d25-0cc47a1f2fa0 ONLINE 0 0 >>>> 0 >>>> mfid1p2 ONLINE 0 0 >>>> 0 >>>> >>>> errors: No known data errors >>>> >>>> One pickle: i did 'zpool export zroot' to replace the drive, otherwise >>>> zfs protested. subsequent zpool import was done slightly carelessly >>>> and it mounted over /, meaning i lost access to original ufs. Should >>>> there be a need to boot from it again someone will have to boot single >>>> user and make sure to comment out swap in /etc/fstab or we will have >>>> to replace the drive again. >>>> >>>> That said, as I understand we are in position to take out the ufs >>>> drive and reboot to be back in business. >>>> >>>> The ufs drive will have to be mounted somewhere to sort out that swap. >>>> >>>> On 11/20/20, mike tancsa wrote: >>>>> On 11/20/2020 1:00 PM, Mateusz Guzik wrote: >>>>>> So this happened after boot: >>>>>> >>>>>> root@zoo2:/home/mjg # swapinfo >>>>>> Device 1K-blocks Used Avail Capacity >>>>>> /dev/ada0p3 2928730500 0 2928730500 0% >>>>>> >>>>>> which i presume might have corrupted some of it. >>>>> Oh, that makes sense now. When it was installed in the back, the drive >>>>> posted as ada0. When we put it in zoo, it was on a farther down port, >>>>> hence it came up as ada7. I had to manually mount / off ada7p2. I >>>>> updated fstab so as not to do that again already. That mystery >>>>> solved. >>>>> >>>>> ---Mike >>>>> >>>>> >>>>>> Allan pasted some one-liners to resize the boot and swap partition. >>>>>> >>>>>> With your permission I would like to run them and then offline/online >>>>>> the disk to have it rebuild. >>>>>> >>>>>> As for longer plans what to do with it i think that's a different >>>>>> subject, whatever new drives end up being used I'm sure the FreeBSD >>>>>> Foundation can reimburse you with no difficulty. >>>>>> >>>>>> >>>>>> On 11/20/20, mike tancsa wrote: >>>>>>> Its a bit of an evolutionary mess the current state of zoo. I >>>>>>> wonder >>>>>>> if >>>>>>> we are better off re-installing the base OS fresh on a pair of SSD >>>>>>> drives and have the base OS on it and leave all the user data on the >>>>>>> current "zroot"... Considering 240G SSDs are $35 CDN it might be >>>>>>> easier >>>>>>> to just install fresh on it and not have to worry about resizing >>>>>>> etc. >>>>>>> >>>>>>> ---Mike >>>>>>> >>>>>>> On 11/20/2020 12:49 PM, Mateusz Guzik wrote: >>>>>>>> On 11/20/20, Mateusz Guzik wrote: >>>>>>>>> CC'ing Allan Jude >>>>>>>>> >>>>>>>>> So: >>>>>>>>> >>>>>>>>> pool: zroot >>>>>>>>> state: DEGRADED >>>>>>>>> status: One or more devices could not be opened. Sufficient >>>>>>>>> replicas >>>>>>>>> exist >>>>>>>>> for >>>>>>>>> the pool to continue functioning in a degraded state. >>>>>>>>> action: Attach the missing device and online it using 'zpool >>>>>>>>> online'. >>>>>>>>> see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-2Q >>>>>>>>> scan: scrub repaired 0B in 05:17:02 with 0 errors on Tue Aug 18 >>>>>>>>> 15:19:00 >>>>>>>>> 2020 >>>>>>>>> config: >>>>>>>>> >>>>>>>>> NAME STATE READ >>>>>>>>> WRITE >>>>>>>>> CKSUM >>>>>>>>> zroot DEGRADED 0 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> mirror-0 DEGRADED 0 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> 1517819109053923011 UNAVAIL 0 >>>>>>>>> 0 >>>>>>>>> 0 was /dev/ada0p3 >>>>>>>>> ada1 ONLINE 0 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> mirror-1 ONLINE 0 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> ada3p3 ONLINE 0 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> ada4p3 ONLINE 0 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> mirror-2 ONLINE 0 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> ada5p3 ONLINE 0 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> ada6p3 ONLINE 0 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> special >>>>>>>>> mirror-3 ONLINE 0 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> gptid/db15e826-1a9c-11eb-8d25-0cc47a1f2fa0 ONLINE 0 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> mfid1p2 ONLINE 0 >>>>>>>>> 0 >>>>>>>>> 0 >>>>>>>>> >>>>>>>>> errors: No known data errors >>>>>>>>> >>>>>>>>> # dmesg | grep ada0 >>>>>>>>> Trying to mount root from ufs:/dev/ada0p2 [rw]... >>>>>>>>> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 >>>>>>>>> ada0: ACS-2 ATA SATA 3.x device >>>>>>>>> ada0: Serial Number WD-WCC137TALF5K >>>>>>>>> ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) >>>>>>>>> ada0: Command Queueing enabled >>>>>>>>> ada0: 2861588MB (5860533168 512 byte sectors) >>>>>>>>> ada0: quirks=0x1<4K> >>>>>>>>> Mounting from ufs:/dev/ada0p2 failed with error 2; retrying for 3 >>>>>>>>> more >>>>>>>>> seconds >>>>>>>>> Mounting from ufs:/dev/ada0p2 failed with error 2. >>>>>>>>> vfs.root.mountfrom=ufs:/dev/ada0p2 >>>>>>>>> GEOM_PART: Partition 'ada0p3' not suitable for kernel dumps (wrong >>>>>>>>> type?) >>>>>>>>> ZFS WARNING: Unable to attach to ada0p3. >>>>>>>>> ZFS WARNING: Unable to attach to ada0p3. >>>>>>>>> ZFS WARNING: Unable to attach to ada0p3. >>>>>>>>> ZFS WARNING: Unable to attach to ada0p3. >>>>>>>>> ZFS WARNING: Unable to attach to ada0p3. >>>>>>>>> ZFS WARNING: Unable to attach to ada0p3. >>>>>>>>> >>>>>>>>> # gpart show ada0 >>>>>>>>> => 34 5860533101 ada0 GPT (2.7T) >>>>>>>>> 34 6 - free - (3.0K) >>>>>>>>> 40 88 1 freebsd-boot (44K) >>>>>>>>> 128 3072000 2 freebsd-swap (1.5G) >>>>>>>>> 3072128 5857461000 3 freebsd-zfs (2.7T) >>>>>>>>> 5860533128 7 - free - (3.5K) >>>>>>>>> >>>>>>>>> Running naive dd if=/dev/ada0p3 works, so I don't know what zfs >>>>>>>>> complains >>>>>>>>> about. >>>>>>>>> >>>>>>>> Also note Philip's point boot partition of 44k. Is that too small >>>>>>>> now? >>>>>>>> >>>>>>>>> On 11/20/20, mike tancsa wrote: >>>>>>>>>> On 11/20/2020 11:40 AM, Philip Paeps wrote: >>>>>>>>>>> On 2020-11-21 00:04:19 (+0800), Mateusz Guzik wrote: >>>>>>>>>>> >>>>>>>>>>>> Oh, that's a bummer. I wonder if there is a regression in the >>>>>>>>>>>> boot >>>>>>>>>>>> loader though. >>>>>>>>>>>> >>>>>>>>>>>> Does the pool mount if you boot the system from a cd/over the >>>>>>>>>>>> network/whatever? >>>>>>>>>>> It's worth checking if the freebsd-boot partition is large >>>>>>>>>>> enough. >>>>>>>>>>> I >>>>>>>>>>> noticed during the cluster refresh that we often use 108k for >>>>>>>>>>> freebsd-boot but recent head wants 117k. I've been bumping the >>>>>>>>>>> bootblocks to 236k. >>>>>>>>>>> >>>>>>>>>>> So far, all the cluster machines I've upgraded booted though .. >>>>>>>>>>> so >>>>>>>>>>> ... >>>>>>>>>>> I might be talking ex recto. :) >>>>>>>>>>> >>>>>>>>>> I put in an ssd drive and booted from it. One of the drives might >>>>>>>>>> have >>>>>>>>>> gotten loose or died in the power cycles, but there is still >>>>>>>>>> redundancy >>>>>>>>>> and I was able to mount the pool. Not sure why it cant find the >>>>>>>>>> file >>>>>>>>>> ? >>>>>>>>>> >>>>>>>>>> root@zoo2:~ # diff /boot/lua/loader.lua /mnt/boot/lua/loader.lua >>>>>>>>>> 29c29 >>>>>>>>>> < -- $FreeBSD$ >>>>>>>>>> --- >>>>>>>>>>> -- $FreeBSD: head/stand/lua/loader.lua 359371 2020-03-27 >>>>>>>>>>> 17:37:31Z >>>>>>>>>> freqlabs $ >>>>>>>>>> root@zoo2:~ # >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> % ls -l /mnt/boot/lua/ >>>>>>>>>> total 110 >>>>>>>>>> -r--r--r-- 1 root wheel 4300 Nov 20 08:41 cli.lua >>>>>>>>>> -r--r--r-- 1 root wheel 3288 Nov 20 08:41 color.lua >>>>>>>>>> -r--r--r-- 1 root wheel 18538 Nov 20 08:41 config.lua >>>>>>>>>> -r--r--r-- 1 root wheel 12610 Nov 20 08:41 core.lua >>>>>>>>>> -r--r--r-- 1 root wheel 11707 Nov 20 08:41 drawer.lua >>>>>>>>>> -r--r--r-- 1 root wheel 2456 Nov 20 08:41 gfx-beastie.lua >>>>>>>>>> -r--r--r-- 1 root wheel 2235 Nov 20 08:41 gfx-beastiebw.lua >>>>>>>>>> -r--r--r-- 1 root wheel 1958 Nov 20 08:41 gfx-fbsdbw.lua >>>>>>>>>> -r--r--r-- 1 root wheel 2413 Nov 20 08:41 gfx-orb.lua >>>>>>>>>> -r--r--r-- 1 root wheel 2140 Nov 20 08:41 gfx-orbbw.lua >>>>>>>>>> -r--r--r-- 1 root wheel 3324 Nov 20 08:41 hook.lua >>>>>>>>>> -r--r--r-- 1 root wheel 2395 Nov 20 08:41 loader.lua >>>>>>>>>> -r--r--r-- 1 root wheel 2429 Sep 24 09:09 logo-beastie.lua >>>>>>>>>> -r--r--r-- 1 root wheel 2203 Sep 24 09:09 logo-beastiebw.lua >>>>>>>>>> -r--r--r-- 1 root wheel 1958 Sep 24 09:09 logo-fbsdbw.lua >>>>>>>>>> -r--r--r-- 1 root wheel 2397 Sep 24 09:09 logo-orb.lua >>>>>>>>>> -r--r--r-- 1 root wheel 2119 Sep 24 09:09 logo-orbbw.lua >>>>>>>>>> -r--r--r-- 1 root wheel 14201 Nov 20 08:41 menu.lua >>>>>>>>>> -r--r--r-- 1 root wheel 4299 Nov 20 08:41 password.lua >>>>>>>>>> -r--r--r-- 1 root wheel 2227 Nov 20 08:41 screen.lua >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> -- >>>>>>>>> Mateusz Guzik >>>>>>>>> >>> >> > -- Mateusz Guzik