From nobody Thu Dec 14 22:25:13 2023 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Srn3p5dcxz53rxq for ; Thu, 14 Dec 2023 22:25:26 +0000 (UTC) (envelope-from delphij@gmail.com) Received: from mail-ej1-x62a.google.com (mail-ej1-x62a.google.com [IPv6:2a00:1450:4864:20::62a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Srn3p2dkVz3DM2 for ; Thu, 14 Dec 2023 22:25:26 +0000 (UTC) (envelope-from delphij@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ej1-x62a.google.com with SMTP id a640c23a62f3a-a1fae88e66eso7826166b.3 for ; Thu, 14 Dec 2023 14:25:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702592725; x=1703197525; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=WFfQwv/o3WhUanKobesjFPSWqxsAHHq126+omE4EK6E=; b=Hi3nJxLT/81y2rFNVNxbC75hA64MSrlPNqxjhctnhd51XSvbvnlW6ZbfBps6kZxHdr LuBCX0+uIC4pHMMZBYGJTiBH5ulTbVroUOwFH5YOt7OMKHadMYSTM9er7UJB9rjin9cY SnCbX62I7Cmrb72RDWXaYsbGWt7peeRWfZMyK9QIBObr6d97xvUb33joWxSRJ7AphAUX fSbP+lt937oLBOm9SXuZSkmZIUID9x3I6qB5KqpXMHEupwKYq1+CdWQwpAdU2aBRch3M oTzVDWHxJZR5WhUWcxeAywlp7fAHy0PMwJN569bUtpnuWFCfdlWl+l1VEPCcBFPPnJgR T+Ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702592725; x=1703197525; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=WFfQwv/o3WhUanKobesjFPSWqxsAHHq126+omE4EK6E=; b=TJoIP+bzjidl2rQRQbQg7FnMnRwCemTpFt504mpJxl/lX68B+EZAcWbth4bGdhP70w 0rsau5CPokfGhTjYBMfyMef4fPN2YIB2SK64KezbAyYj0sn5nONjgQlzD7pc1wn7z9Mn f5DdjoWaoerbJX8ECIwXC30mdkMCeumMmFQ7hJBEAQpCAATSjUAmIR3yViIwztA6p6+S YuoTWJkmvXjI/gyL8W0qqFqydYSL26JFixHl3nftc4jGabJFiVlzZoISNDw0+bA2Nmaq 1LyRMIP0SExxmbx6StBfr90ZDzGSwTOFF+e0RmjjiXwQiSds980JeQeug3peScc55UVK V89g== X-Gm-Message-State: AOJu0Yw/zDO1g9eTPO7uN5k4X69t0hJPzeRfmZnU/F6D3cz7Ia2mmzXI Wx1SKxdRntUXGABH84sZJodHj5BHbJ7d4YOK6PM= X-Google-Smtp-Source: AGHT+IGUU3FaSoUQPdqsq0NhGYKeQAKeZcZbf+ZZUuS7UbUviRJiygSoSUYf+8lyKHyxx0frinFjTY2VPCr1qdQ0u1s= X-Received: by 2002:a17:906:3390:b0:a1d:b97b:58d4 with SMTP id v16-20020a170906339000b00a1db97b58d4mr4779327eja.98.1702592724592; Thu, 14 Dec 2023 14:25:24 -0800 (PST) List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 References: <787CB64A-1687-49C3-9063-2CE3B6F957EF@le-fay.org> In-Reply-To: From: Xin LI Date: Thu, 14 Dec 2023 14:25:13 -0800 Message-ID: Subject: Re: unusual ZFS issue To: Pete Wright Cc: Lexi Winter , "freebsd-fs@freebsd.org" Content-Type: multipart/alternative; boundary="0000000000008bebbd060c7fc204" X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Spamd-Bar: ---- X-Rspamd-Queue-Id: 4Srn3p2dkVz3DM2 --0000000000008bebbd060c7fc204 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Try "zpool status -x" and see if it would show something useful? Cheers, On Thu, Dec 14, 2023 at 2:10=E2=80=AFPM Pete Wright w= rote: > > > On 12/14/23 2:05 PM, Lexi Winter wrote: > > On 14 Dec 2023, at 22:02, Pete Wright wrote: > >> On Thu, Dec 14, 2023 at 09:17:06PM +0000, Lexi Winter wrote: > >>> hi list, > >>> > >>> i=E2=80=99ve just hit this ZFS error: > >>> > >>> # zfs list -rt snapshot data/vm/media/disk1 > >>> cannot iterate filesystems: I/O error > >> > >> hrm, i wonder if you see any errors in dmesg or /var/log/messages abou= t > a > >> device failing? > > > > nothing that looks relevant in the last few days (the problem appeared > last night, Dec 13th): > > > > Dec 11 15:44:21 hemlock kernel: ix1: link state changed to DOWN > > Dec 11 15:44:21 hemlock kernel: ix1.107: link state changed to DOWN > > Dec 11 15:44:35 hemlock kernel: ix1: link state changed to UP > > Dec 11 15:44:35 hemlock kernel: ix1.107: link state changed to UP > > Dec 11 15:44:47 hemlock kernel: nfsrv_cache_session: no session > IPaddr=3D2001:8b0:aab5:ffff::2, check NFS clients for unique /etc/hostid'= s > > Dec 11 15:44:47 hemlock syslogd: last message repeated 1 times > > Dec 11 17:00:48 hemlock kernel: tcp_vnet_init: WARNING: unable to > initialise TCP stats > > Dec 11 17:00:48 hemlock kernel: lo0: link state changed to UP > > Dec 12 06:17:23 hemlock ntpd[25836]: leapsecond file > ('/var/db/ntpd.leap-seconds.list'): will expire in less than 16 days > > Dec 13 06:17:23 hemlock ntpd[25836]: leapsecond file > ('/var/db/ntpd.leap-seconds.list'): will expire in less than 15 days > > Dec 14 06:17:23 hemlock ntpd[25836]: leapsecond file > ('/var/db/ntpd.leap-seconds.list'): will expire in less than 14 days > > Dec 14 16:30:12 hemlock smbd[98264]: [2023/12/14 16:30:12.404883, 0] > ../../source3/smbd/server.c:1741(main) > > Dec 14 16:30:12 hemlock smbd[98264]: smbd version 4.16.11 started. > > Dec 14 16:30:12 hemlock smbd[98264]: Copyright Andrew Tridgell and th= e > Samba Team 1992-2022 > > > > i=E2=80=99ve also checked the disks with smartctl and i didn=E2=80=99t = see any errors > there. (a couple of devices have corrected read errors, but that=E2=80= =99s > expected given their age - and if it *was* a disk error i=E2=80=99d expec= t it to > show up as a checksum error). > > > > dang, was hoping something obvious would pop up there or with smartctl. > hopefully others here have some ideas about trying to find the root > cause before a restart. > > -pete > > -- > Pete Wright > pete@nomadlogic.org > > --0000000000008bebbd060c7fc204 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Try "zpool status -x" and see if it would show somet= hing useful?

Cheers,

<= div dir=3D"ltr" class=3D"gmail_attr">On Thu, Dec 14, 2023 at 2:10=E2=80=AFP= M Pete Wright <pete@nomadlogic.or= g> wrote:


On 12/14/23 2:05 PM, Lexi Winter wrote:
> On 14 Dec 2023, at 22:02, Pete Wright <pete@nomadlogic.org> wrote:
>> On Thu, Dec 14, 2023 at 09:17:06PM +0000, Lexi Winter wrote:
>>> hi list,
>>>
>>> i=E2=80=99ve just hit this ZFS error:
>>>
>>> # zfs list -rt snapshot data/vm/media/disk1
>>> cannot iterate filesystems: I/O error
>>
>> hrm, i wonder if you see any errors in dmesg or /var/log/messages = about a
>> device failing?
>
> nothing that looks relevant in the last few days (the problem appeared= last night, Dec 13th):
>
> Dec 11 15:44:21 hemlock kernel: ix1: link state changed to DOWN
> Dec 11 15:44:21 hemlock kernel: ix1.107: link state changed to DOWN > Dec 11 15:44:35 hemlock kernel: ix1: link state changed to UP
> Dec 11 15:44:35 hemlock kernel: ix1.107: link state changed to UP
> Dec 11 15:44:47 hemlock kernel: nfsrv_cache_session: no session IPaddr= =3D2001:8b0:aab5:ffff::2, check NFS clients for unique /etc/hostid's > Dec 11 15:44:47 hemlock syslogd: last message repeated 1 times
> Dec 11 17:00:48 hemlock kernel: tcp_vnet_init: WARNING: unable to init= ialise TCP stats
> Dec 11 17:00:48 hemlock kernel: lo0: link state changed to UP
> Dec 12 06:17:23 hemlock ntpd[25836]: leapsecond file ('/var/db/ntp= d.leap-seconds.list'): will expire in less than 16 days
> Dec 13 06:17:23 hemlock ntpd[25836]: leapsecond file ('/var/db/ntp= d.leap-seconds.list'): will expire in less than 15 days
> Dec 14 06:17:23 hemlock ntpd[25836]: leapsecond file ('/var/db/ntp= d.leap-seconds.list'): will expire in less than 14 days
> Dec 14 16:30:12 hemlock smbd[98264]: [2023/12/14 16:30:12.404883,=C2= =A0 0] ../../source3/smbd/server.c:1741(main)
> Dec 14 16:30:12 hemlock smbd[98264]:=C2=A0 =C2=A0smbd version 4.16.11 = started.
> Dec 14 16:30:12 hemlock smbd[98264]:=C2=A0 =C2=A0Copyright Andrew Trid= gell and the Samba Team 1992-2022
>
> i=E2=80=99ve also checked the disks with smartctl and i didn=E2=80=99t= see any errors there.=C2=A0 (a couple of devices have corrected read error= s, but that=E2=80=99s expected given their age - and if it *was* a disk err= or i=E2=80=99d expect it to show up as a checksum error).
>

dang, was hoping something obvious would pop up there or with smartctl. hopefully others here have some ideas about trying to find the root
cause before a restart.

-pete

--
Pete Wright
pete@nomadlogic.or= g

--0000000000008bebbd060c7fc204--