From owner-netperf-users@freebsd.org Fri Nov 20 18:27:47 2020 Return-Path: Delivered-To: netperf-users@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 76B7F46A356 for ; Fri, 20 Nov 2020 18:27:47 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wr1-x436.google.com (mail-wr1-x436.google.com [IPv6:2a00:1450:4864:20::436]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Cd4m32RHkz4Sxq; Fri, 20 Nov 2020 18:27:47 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-wr1-x436.google.com with SMTP id j7so11023428wrp.3; Fri, 20 Nov 2020 10:27:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=I/wJYTCnl7alsrKGEBExYcqGq7bjbf+TL0i9TPPmG10=; b=I/Y1yl1+rwZqgXvQflz23IZa0HjBPZ+wvO9q48flWSJSfRotnzajY1kcEmSKOzGMdX xNsPpRqB+xSXHzBDkGq4ybcnHaDLn1sS51vEuhQHvZuJbkqY7dam9uIPuwC9NoVpeg0D qGkrDiMeIx95xLqY87nWxe7xvqjWxCHqWQ9kQtB37BBFBVQ2Xw5kog+u50dGQdeHRFuq Vx7FjQQn7qVbN+GKfEA8obSUvJvIJFz1k6GANFZR+THCvLyNNgGm7yMatptfCV0YZnO/ 1XFVZWqiAeFQieVfdv2lY1r1/fCgmW2iuxDhmdWmf0q9kRTbiE3KwnrqOHDH6Z/OixD5 Ag9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=I/wJYTCnl7alsrKGEBExYcqGq7bjbf+TL0i9TPPmG10=; b=m1j6r42AmyY0v2tqWBVQ8dSNFdOi2yuP7Kz8MuedfJxw8IrEh8YWRmP1B9F1UnoGP0 Z/BVUcOLCPikC7kiTsHgKQ/UOWnyG9YJsRUFUXLLrWvwNLxdNfAEzSGoSSdyPmviDb9j hgf1SrDuXFebbRNWqTiU9ltQ1UcUy0Ri6K33d7bBytxeF4oG9bVOLqn/1+zhVO3kz0fm +zq/JlieIgAY1UbW1DTRH0lgaSsmTEK38Q/+Uu8bPk3pb3dWIYWQxK/eOr1reXKoYd69 llJn0YL+2CTfMOaiZMwbSBCxCX5i70IUa48EST+1NNzcgBO5hhZnaBLLsn8vHiO5Yhsu JSbA== X-Gm-Message-State: AOAM5334G7xNjYvilmzkgjVydudxON2Bf3QcPgJ7NycIu0bpbFooQ2Q3 Foc1qaBi2v+xF+MPKKGVffvmv2MzFTzLS8ZKqs8lnV1BWps= X-Google-Smtp-Source: ABdhPJzeWwddo0b7qYLT11sXeAFl5UzQOTlHTK8sqKrm4jWzVxNVN8LcpFlWUQ5rBoHfdpevAArQ342lQmUIWHXVLUQ= X-Received: by 2002:adf:9b98:: with SMTP id d24mr17193850wrc.17.1605896864762; Fri, 20 Nov 2020 10:27:44 -0800 (PST) MIME-Version: 1.0 Received: by 2002:adf:dec7:0:0:0:0:0 with HTTP; Fri, 20 Nov 2020 10:27:44 -0800 (PST) In-Reply-To: References: <1f8e49ff-e3da-8d24-57f1-11f17389aa84@sentex.net> <2691e1fd-5a27-4dd0-2ef7-b1c06fd4e751@sentex.net> <5A5094BC-D417-4BA6-97E2-7CB522B51368@FreeBSD.org> <4ec6ed6f-b3b4-22ae-e1ec-93a46f3d88ea@sentex.net> <0ddec867-32b5-f667-d617-0ddc71726d09@sentex.net> <5549CA9F-BCF4-4043-BA2F-A2C41D13D955@freebsd.org> <270b65c0-8085-fe2f-cf4f-7a2e4c17a2e8@sentex.net> From: Mateusz Guzik Date: Fri, 20 Nov 2020 19:27:44 +0100 Message-ID: Subject: Re: zoo reboot Friday Nov 20 14:00 UTC To: mike tancsa Cc: Philip Paeps , "Bjoern A. Zeeb" , netperf-admin@freebsd.org, netperf-users@freebsd.org, Allan Jude Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4Cd4m32RHkz4Sxq X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: netperf-users@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: "Announcements and discussions related to the netperf cluster. " List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Nov 2020 18:27:47 -0000 swap and boot partitions resized, the ada0p3 partition got removed from the pool and inserted back, it is rebuilding now: root@zoo2:~ # zpool status pool: zroot state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Fri Nov 20 23:13:28 2020 459G scanned at 1.00G/s, 291G issued at 650M/s, 3.47T total 0B resilvered, 8.17% done, 01:25:48 to go config: NAME STATE READ WRITE CKSUM zroot DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 replacing-0 DEGRADED 0 0 0 1517819109053923011 OFFLINE 0 0 0 was /dev/ada0p3/old ada0p3 ONLINE 0 0 0 ada1 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 ada3p3 ONLINE 0 0 0 ada4p3 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 ada5p3 ONLINE 0 0 0 ada6p3 ONLINE 0 0 0 special mirror-3 ONLINE 0 0 0 gptid/db15e826-1a9c-11eb-8d25-0cc47a1f2fa0 ONLINE 0 0 0 mfid1p2 ONLINE 0 0 0 errors: No known data errors One pickle: i did 'zpool export zroot' to replace the drive, otherwise zfs protested. subsequent zpool import was done slightly carelessly and it mounted over /, meaning i lost access to original ufs. Should there be a need to boot from it again someone will have to boot single user and make sure to comment out swap in /etc/fstab or we will have to replace the drive again. That said, as I understand we are in position to take out the ufs drive and reboot to be back in business. The ufs drive will have to be mounted somewhere to sort out that swap. On 11/20/20, mike tancsa wrote: > On 11/20/2020 1:00 PM, Mateusz Guzik wrote: >> So this happened after boot: >> >> root@zoo2:/home/mjg # swapinfo >> Device 1K-blocks Used Avail Capacity >> /dev/ada0p3 2928730500 0 2928730500 0% >> >> which i presume might have corrupted some of it. > > Oh, that makes sense now. When it was installed in the back, the drive > posted as ada0. When we put it in zoo, it was on a farther down port, > hence it came up as ada7. I had to manually mount / off ada7p2. I > updated fstab so as not to do that again already. That mystery solved. > > ---Mike > > >> Allan pasted some one-liners to resize the boot and swap partition. >> >> With your permission I would like to run them and then offline/online >> the disk to have it rebuild. >> >> As for longer plans what to do with it i think that's a different >> subject, whatever new drives end up being used I'm sure the FreeBSD >> Foundation can reimburse you with no difficulty. >> >> >> On 11/20/20, mike tancsa wrote: >>> Its a bit of an evolutionary mess the current state of zoo. I wonder if >>> we are better off re-installing the base OS fresh on a pair of SSD >>> drives and have the base OS on it and leave all the user data on the >>> current "zroot"... Considering 240G SSDs are $35 CDN it might be easier >>> to just install fresh on it and not have to worry about resizing etc. >>> >>> ---Mike >>> >>> On 11/20/2020 12:49 PM, Mateusz Guzik wrote: >>>> On 11/20/20, Mateusz Guzik wrote: >>>>> CC'ing Allan Jude >>>>> >>>>> So: >>>>> >>>>> pool: zroot >>>>> state: DEGRADED >>>>> status: One or more devices could not be opened. Sufficient replicas >>>>> exist >>>>> for >>>>> the pool to continue functioning in a degraded state. >>>>> action: Attach the missing device and online it using 'zpool online'. >>>>> see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-2Q >>>>> scan: scrub repaired 0B in 05:17:02 with 0 errors on Tue Aug 18 >>>>> 15:19:00 >>>>> 2020 >>>>> config: >>>>> >>>>> NAME STATE READ WRITE >>>>> CKSUM >>>>> zroot DEGRADED 0 0 >>>>> 0 >>>>> mirror-0 DEGRADED 0 0 >>>>> 0 >>>>> 1517819109053923011 UNAVAIL 0 0 >>>>> 0 was /dev/ada0p3 >>>>> ada1 ONLINE 0 0 >>>>> 0 >>>>> mirror-1 ONLINE 0 0 >>>>> 0 >>>>> ada3p3 ONLINE 0 0 >>>>> 0 >>>>> ada4p3 ONLINE 0 0 >>>>> 0 >>>>> mirror-2 ONLINE 0 0 >>>>> 0 >>>>> ada5p3 ONLINE 0 0 >>>>> 0 >>>>> ada6p3 ONLINE 0 0 >>>>> 0 >>>>> special >>>>> mirror-3 ONLINE 0 0 >>>>> 0 >>>>> gptid/db15e826-1a9c-11eb-8d25-0cc47a1f2fa0 ONLINE 0 0 >>>>> 0 >>>>> mfid1p2 ONLINE 0 0 >>>>> 0 >>>>> >>>>> errors: No known data errors >>>>> >>>>> # dmesg | grep ada0 >>>>> Trying to mount root from ufs:/dev/ada0p2 [rw]... >>>>> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 >>>>> ada0: ACS-2 ATA SATA 3.x device >>>>> ada0: Serial Number WD-WCC137TALF5K >>>>> ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) >>>>> ada0: Command Queueing enabled >>>>> ada0: 2861588MB (5860533168 512 byte sectors) >>>>> ada0: quirks=0x1<4K> >>>>> Mounting from ufs:/dev/ada0p2 failed with error 2; retrying for 3 more >>>>> seconds >>>>> Mounting from ufs:/dev/ada0p2 failed with error 2. >>>>> vfs.root.mountfrom=ufs:/dev/ada0p2 >>>>> GEOM_PART: Partition 'ada0p3' not suitable for kernel dumps (wrong >>>>> type?) >>>>> ZFS WARNING: Unable to attach to ada0p3. >>>>> ZFS WARNING: Unable to attach to ada0p3. >>>>> ZFS WARNING: Unable to attach to ada0p3. >>>>> ZFS WARNING: Unable to attach to ada0p3. >>>>> ZFS WARNING: Unable to attach to ada0p3. >>>>> ZFS WARNING: Unable to attach to ada0p3. >>>>> >>>>> # gpart show ada0 >>>>> => 34 5860533101 ada0 GPT (2.7T) >>>>> 34 6 - free - (3.0K) >>>>> 40 88 1 freebsd-boot (44K) >>>>> 128 3072000 2 freebsd-swap (1.5G) >>>>> 3072128 5857461000 3 freebsd-zfs (2.7T) >>>>> 5860533128 7 - free - (3.5K) >>>>> >>>>> Running naive dd if=/dev/ada0p3 works, so I don't know what zfs >>>>> complains >>>>> about. >>>>> >>>> Also note Philip's point boot partition of 44k. Is that too small now? >>>> >>>>> On 11/20/20, mike tancsa wrote: >>>>>> On 11/20/2020 11:40 AM, Philip Paeps wrote: >>>>>>> On 2020-11-21 00:04:19 (+0800), Mateusz Guzik wrote: >>>>>>> >>>>>>>> Oh, that's a bummer. I wonder if there is a regression in the boot >>>>>>>> loader though. >>>>>>>> >>>>>>>> Does the pool mount if you boot the system from a cd/over the >>>>>>>> network/whatever? >>>>>>> It's worth checking if the freebsd-boot partition is large enough. >>>>>>> I >>>>>>> noticed during the cluster refresh that we often use 108k for >>>>>>> freebsd-boot but recent head wants 117k. I've been bumping the >>>>>>> bootblocks to 236k. >>>>>>> >>>>>>> So far, all the cluster machines I've upgraded booted though .. so >>>>>>> ... >>>>>>> I might be talking ex recto. :) >>>>>>> >>>>>> I put in an ssd drive and booted from it. One of the drives might >>>>>> have >>>>>> gotten loose or died in the power cycles, but there is still >>>>>> redundancy >>>>>> and I was able to mount the pool. Not sure why it cant find the file >>>>>> ? >>>>>> >>>>>> root@zoo2:~ # diff /boot/lua/loader.lua /mnt/boot/lua/loader.lua >>>>>> 29c29 >>>>>> < -- $FreeBSD$ >>>>>> --- >>>>>>> -- $FreeBSD: head/stand/lua/loader.lua 359371 2020-03-27 17:37:31Z >>>>>> freqlabs $ >>>>>> root@zoo2:~ # >>>>>> >>>>>> >>>>>> % ls -l /mnt/boot/lua/ >>>>>> total 110 >>>>>> -r--r--r-- 1 root wheel 4300 Nov 20 08:41 cli.lua >>>>>> -r--r--r-- 1 root wheel 3288 Nov 20 08:41 color.lua >>>>>> -r--r--r-- 1 root wheel 18538 Nov 20 08:41 config.lua >>>>>> -r--r--r-- 1 root wheel 12610 Nov 20 08:41 core.lua >>>>>> -r--r--r-- 1 root wheel 11707 Nov 20 08:41 drawer.lua >>>>>> -r--r--r-- 1 root wheel 2456 Nov 20 08:41 gfx-beastie.lua >>>>>> -r--r--r-- 1 root wheel 2235 Nov 20 08:41 gfx-beastiebw.lua >>>>>> -r--r--r-- 1 root wheel 1958 Nov 20 08:41 gfx-fbsdbw.lua >>>>>> -r--r--r-- 1 root wheel 2413 Nov 20 08:41 gfx-orb.lua >>>>>> -r--r--r-- 1 root wheel 2140 Nov 20 08:41 gfx-orbbw.lua >>>>>> -r--r--r-- 1 root wheel 3324 Nov 20 08:41 hook.lua >>>>>> -r--r--r-- 1 root wheel 2395 Nov 20 08:41 loader.lua >>>>>> -r--r--r-- 1 root wheel 2429 Sep 24 09:09 logo-beastie.lua >>>>>> -r--r--r-- 1 root wheel 2203 Sep 24 09:09 logo-beastiebw.lua >>>>>> -r--r--r-- 1 root wheel 1958 Sep 24 09:09 logo-fbsdbw.lua >>>>>> -r--r--r-- 1 root wheel 2397 Sep 24 09:09 logo-orb.lua >>>>>> -r--r--r-- 1 root wheel 2119 Sep 24 09:09 logo-orbbw.lua >>>>>> -r--r--r-- 1 root wheel 14201 Nov 20 08:41 menu.lua >>>>>> -r--r--r-- 1 root wheel 4299 Nov 20 08:41 password.lua >>>>>> -r--r--r-- 1 root wheel 2227 Nov 20 08:41 screen.lua >>>>>> >>>>>> >>>>>> >>>>> -- >>>>> Mateusz Guzik >>>>> >> > -- Mateusz Guzik