From nobody Fri Sep 6 21:22:44 2024 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4X0q2J5fgcz5W1Yw for ; Fri, 06 Sep 2024 21:22:48 +0000 (UTC) (envelope-from morganw@gmail.com) Received: from mail-yw1-x1135.google.com (mail-yw1-x1135.google.com [IPv6:2607:f8b0:4864:20::1135]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4X0q2H6n65z47d0 for ; Fri, 6 Sep 2024 21:22:47 +0000 (UTC) (envelope-from morganw@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20230601 header.b=mpc3F15G; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of morganw@gmail.com designates 2607:f8b0:4864:20::1135 as permitted sender) smtp.mailfrom=morganw@gmail.com Received: by mail-yw1-x1135.google.com with SMTP id 00721157ae682-6d6e2b2fbc4so24346747b3.0 for ; Fri, 06 Sep 2024 14:22:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725657767; x=1726262567; darn=freebsd.org; h=content-transfer-encoding:mime-version:message-id:references :in-reply-to:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=8YVmtfPB1AL3BXtqQJz4uwQD2aqX2T9fD8RFRHG8U1s=; b=mpc3F15Gp71zrtqV8NHA43GGugPMM6K9tl6ZeTHA9LbMXAPAxqUYoO25VZAuvz+sZV Qiys9hj/6lcbMTxaq91RlNh2ugdX8En/Dmko1XKF6iXg9PxJosCu2HFYzls/Fjl9hAlF e+eeQapzh3EBkV8ptD9LUJe0tDNh66t50qWiQZFTrM/k0qvbYajsVGvJznaRlcDd7DnT eIYYT5Cu8T86ikmghycXtPyh2gItMoN09/GiaK7XnVJ3C3+WLyhqcnH5AtTpaHqzxQBo opjKcUfcDcJFAv4dJqeWyUKWPP4e4I5+AMvmlYjSjMUe2VlCoBnddiXAwvik2JnnSjdS 6dGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725657767; x=1726262567; h=content-transfer-encoding:mime-version:message-id:references :in-reply-to:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8YVmtfPB1AL3BXtqQJz4uwQD2aqX2T9fD8RFRHG8U1s=; b=gIEu/ielG0BYzMatbTOo6AS7RKbuvGwmDmvBT+QBPFyoy75xf4BdhqYJU93A/HGXwf uybUoPiz6cqui31LT/CpVkATnrw0b/uDe0+e/nJFwy4f5UUyW0yvxCyCxP5OiRX3+AnW 3yTzMM/isOhCBAovvgpYOU27otRssbSqfrYRVSrzeZ5fOuZaLbX1YQK9oepW3v8/Zlqi ZNGqzVeU8iUGP8jFtG31xwfAoRqnexL4GJXxmBbLsXP6HU6cGHmDzaxaBYeh88pLbfW6 NYTC2AiWLJ7Co+Koc1GeHgOgaeK+Z+ZvoxpgGnKTiV7AAtpcEoU/bjgcSp/THBhIoBHd wfcQ== X-Gm-Message-State: AOJu0YzzN+6olL+yoIH3sKCtjcmwChwbQV/FyPdtN28DdaZdh/pvaYt7 4pEZ4rl92nZYLBYl5ZcmRl7JQliYSCCfOqf3Sa0bth3YMzsapNzyP6Pb0g== X-Google-Smtp-Source: AGHT+IG7v+9ovLun9wkSR0Lr/TuTlysIqfmV6nJOCmk2KlTRBUDXaYUdYwYKL7Rm2iFLW0oQ/xSWHQ== X-Received: by 2002:a05:690c:490c:b0:6b7:3da:cffd with SMTP id 00721157ae682-6db44f164b6mr52649417b3.15.1725657766884; Fri, 06 Sep 2024 14:22:46 -0700 (PDT) Received: from ?IPv6:::1? ([2600:381:642f:e2a:d8f2:e3e8:b579:a70b]) by smtp.gmail.com with ESMTPSA id 00721157ae682-6db565e601asm1428477b3.143.2024.09.06.14.22.45 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 06 Sep 2024 14:22:46 -0700 (PDT) Date: Fri, 06 Sep 2024 16:22:44 -0500 From: Wes Morgan To: Chris Ross CC: freebsd-fs@freebsd.org Subject: Re: Unable to replace drive in raidz1 In-Reply-To: References: <5ED5CB56-2E2A-4D83-8CDA-6D6A0719ED19@distal.com> <6A20ABDA-9BEA-4526-94C1-5768AA564C13@distal.com> <0CF1E2D7-6C82-4A8B-82C3-A5BF1ED939CF@distal.com> <29003A7C-745D-4A06-8558-AE64310813EA@distal.com> <42346193-AD06-4D26-B0C6-4392953D21A3@gmail.com> Message-ID: <50B791D8-F0CC-431E-93B8-834D57AB3C14@gmail.com> List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: --- X-Spamd-Result: default: False [-4.00 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20230601]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; DKIM_TRACE(0.00)[gmail.com:+]; FREEMAIL_FROM(0.00)[gmail.com]; TO_DN_SOME(0.00)[]; FROM_HAS_DN(0.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; MISSING_XM_UA(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; MID_RHS_MATCH_FROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-fs@freebsd.org]; RCVD_VIA_SMTP_AUTH(0.00)[]; TAGGED_RCPT(0.00)[freebsd]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::1135:from] X-Rspamd-Queue-Id: 4X0q2H6n65z47d0 On September 6, 2024 2:34:36 PM CDT, Chris Ross wrote: > > >> On Sep 6, 2024, at 15:16, Wes Morgan wrote: >>=20 >> You probably don't want that=2E You will have to use the glabel dev, wh= ich will not be the same size as your other devices=2E IIRC you have no con= trol over what device node the system finds first for the pool=2E Even if y= ou use GPT labels, the daXpY device will still exist=2E=20 > >Right=2E But if I don=E2=80=99t _use_ those device names, it won=E2=80= =99t matter=2E If I use /dev/label/foo, or /dev/gpt/foo, I=E2=80=99ll just= always use those=2E I just did that with the ufs disk I have since it mov= ed names, now it=E2=80=99s "/dev/ufs/drive12=E2=80=9D in /etc/fstab et al= =2E The labels are helpful for fstab, but zfs doesn't need fstab=2E In the ear= ly days of zfs on freebsd the unpartitioned device was recommended; maybe t= hat's not accurate any longer, but I still follow it for a pool that contai= ns vdevs with multiple devices (raidz)=2E=20 If you use, e=2Eg=2E, da0 in a pool, you cannot later replace it with a la= beled device of the same size; it won't have enough sectors=2E=20 >I want to have some sort of label=2E I=E2=80=99d rather not have to add = a partitioning scheme to the disk if I know I=E2=80=99m just going to use t= he whole disk just to get a label, but I suppose if I have to I can=2E Tho= ugh I=E2=80=99d have to do it one disk at a time=2E :-) ZFS will absolutely find the device if it is readable=2E The label on ever= y device contains enough metadata to describe the entire vdev (and the pool= I believe), including the missing devices=2E It's very good at finding the= m=2E The clearlabel command was added because it was a pain to get zfs to g= ive up on a disk that has been repurposed=2E You really don't need the labe= ls, but if you have trouble figuring out which disk is which, that may be t= he only way for you to be sure=2E >>=20 >>> The former da3 is off-line, out of the chassis=2E I replaced a disk i= n a full chassis, having them both online at the same time is not possible= =2E That drive in ZFS=E2=80=99s mind is only faulted because I tried =E2= =80=9Czpool offline -f=E2=80=9D on it to see if that helped=2E >>=20 >> It sounds like you have replaced the wrong device=2E Check the "zpool h= istory" to see what you did=2E=20 >>=20 >> In your earlier message, three devices were shown in each raidz, when w= hat you should be seeing is that one raidz has an offline device identified= by guid and maybe "was /dev/da3" that is being replaced, along with the re= placement device=2E I don't see any of that=2E=20 > >History attached=2E There is no replacement device (sub-vdev) until afte= r the =E2=80=9Czpool replace=E2=80=9D starts, which it won=E2=80=99t=2E > >>> I didn=E2=80=99t initiate a replace until after the disks were physica= lly changed=2E Although in this conversation realize that things likely go= t confused by the replacement in the kernel=E2=80=99s mind of da3 with what= used to be da4=2E :-/ >>=20 >> This is why your zpool history will be helpful=2E What did you actually= try to replace, and what did you mean to replace=2E=20 > >All of my history since the last previous boot in May=2E > >2024-09-05=2E09:40:14 zpool offline tank da3 >2024-09-05=2E14:26:44 zpool import -c /etc/zfs/zpool=2Ecache -a -N >2024-09-05=2E14:32:45 zpool import -c /etc/zfs/zpool=2Ecache -a -N >2024-09-05=2E14:52:18 zpool offline tank da3 >2024-09-05=2E14:53:51 zpool offline tank da3 >2024-09-05=2E14:59:43 zpool offline -f tank da3 >2024-09-05=2E15:02:53 zpool clear tank >2024-09-05=2E15:07:41 zpool online tank da3 >2024-09-05=2E15:10:00 zpool add tank spare da10 >2024-09-05=2E15:10:20 zpool offline -f tank da3 >2024-09-05=2E15:35:23 zpool remove tank da10 >2024-09-05=2E15:54:35 zpool scrub tank >2024-09-05=2E16:01:12 zpool set autoreplace=3Don tank >2024-09-05=2E16:01:24 zpool set autoexpand=3Don tank >2024-09-05=2E16:02:16 zpool add -o ashift=3D9 tank spare da10 >2024-09-06=2E10:10:20 zpool remove tank da10 > >So, I offline=E2=80=99d the disk-to-be-replaced at 09:40 yesterday, then = I shut the system down, removed that physical device replacing it with a la= rger disk, and rebooted=2E I suspect the =E2=80=9Coffline=E2=80=9Ds after = that are me experimenting when it was telling me it couldn=E2=80=99t start = the replace action I was asking for=2E This is probably where you made your mistake=2E Rebooting shifted another = device into da3=2E When you tried to offline it, you were probably either t= argeting a device in a different raidz or one that wasn't in the pool=2E Th= e output of those original offline commands would have been informative=2E = You could also check dmesg and map the serial numbers to device assignments= to figure out what device moved to da3=2E >The scrub I started yesterday just because the replace says sometihng abo= ut an operation in progress, so I did that=2E It completed with no issues,= but nothing changed w=2Er=2Et=2E my current problem=2E > >I=E2=80=99m pretty sure the problem here is that the old da3 went away, a= nd a new da3 came online as a member of raidz1-1=2E The new disk I added c= ame online as da10, for some reason=2E I had to resolve the issue of the U= FS disk which used to be da10 now being da9, but that was easy enough=2E J= ust unexpected=2E Sounds about right=2E In another message it seemed like the pool had start= ed an autoreplace=2E So I assume you have zfsd enabled? That is what issues= the replace command=2E Strange that it is not anywhere in the pool history= =2E There should be syslog entries for any actions it took=2E In your case, it appears that you had two missing devices - the original "= da3" that was physically removed, and the new da3 that you forced offline= =2E You added da10 as a spare, when what you needed to do was a replace=2E = Spare devices do not auto-replace without zfsd running and autoreplace set = to on=2E This should all be reported in zpool status=2E In your original message, t= here is no sign of a replacement in progress or a spare device, assuming yo= u didn't omit something=2E If the pool is only showing that a single device= is missing, and that device is to be replaced by da10, zero out the first = and last sectors (I think a zfs label is 128k?) to wipe out any labels and = use the replace command, not spare, e=2Eg=2E "zpool replace tank da3 da10",= or use the missing guid as suggested elsewhere=2E This should work based o= n the information provided=2E