Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Nov 2021 08:27:42 -0500
From:      Chris Ross <cross+freebsd@distal.com>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: ZFS operations hanging, but no visible errors?
Message-ID:  <665A4DDB-6973-4A2D-A427-492F30D27851@distal.com>
In-Reply-To: <de784091-c465-2931-c526-bc6efe7b88be@FreeBSD.org>
References:  <B28E52F4-F475-4CF6-BE0C-F5C803AD5757@distal.com> <20211105173935.7aa53269@fabiankeil.de> <86999084-7007-4F08-A4C4-4A835A7E1C78@distal.com> <de784091-c465-2931-c526-bc6efe7b88be@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_418595D6-236C-433E-A3B7-CBE045A968A1
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

Following up on a new hang this same system had (yesterday freebsd-fs =
mail subject "swap_pager: cannot allocate bio=E2=80=9D), I think the =
same problem might have occurred again.  Certainly the system got stuck =
again

Based on the below, my executing that dtrace command caused the system =
to report "ACPI Error: AE_NO_MEMORY=E2=80=9D.  In what way is the system =
out of memory here?  And, does that failure running dtrace suggest that =
that =E2=80=9Cout of memory=E2=80=9D problem is the core problem causing =
the ZFS hang in the first place?  My system has 128GB, which is nothing =
to sneeze at.  Are there parameters that I should change because the =
normal parameters just don=E2=80=99t work well with a pool or fs this =
large?

And, from earlier in this thread from last week:  Now that I have the =
system running again, I can provide the "zpool status=E2=80=9D for =
information.  Let me know if I=E2=80=99ve just tried something crazy =
here, this is the largest ZFS filesystem I=E2=80=99ve attempted.  I have =
a 30T pool on another system without issue, and with less RAM.  (The =
largest fs on that pool is about 18T)

% zfs status
  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 05:05:55 with 0 errors on Sat Oct 23 =
04:38:36 2021
config:

	NAME        STATE     READ WRITE CKSUM
	tank        ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    da3     ONLINE       0     0     0
	    da2     ONLINE       0     0     0
	    da1     ONLINE       0     0     0
	  raidz1-1  ONLINE       0     0     0
	    da4     ONLINE       0     0     0
	    da5     ONLINE       0     0     0
	    da6     ONLINE       0     0     0
	  raidz1-2  ONLINE       0     0     0
	    da7     ONLINE       0     0     0
	    da8     ONLINE       0     0     0
	    da9     ONLINE       0     0     0

errors: No known data errors
% zfs list tank
NAME   USED  AVAIL     REFER  MOUNTPOINT
tank  14.2T  35.0T     14.2T  /tank


                               - Chris

> On Nov 7, 2021, at 03:35, Andriy Gapon <avg@freebsd.org> wrote:
>=20
> On 05/11/2021 18:59, Chris Ross wrote:
>> Running prostate -kk on the rsync that was hung, then killed, then =
SIGKILL=E2=80=99d shows:
>> procstat -kk 35220
>>   PID    TID COMM                TDNAME              KSTACK
>> 35220 102499 rsync               -                   mi_switch+0xc1 =
_sleep+0x1cb vm_wait_doms+0xe2 vm_wait_domain+0x51 =
vm_domain_alloc_fail+0x86 vm_page_alloc_domain_after+0x7e =
uma_small_alloc+0x58 keg_alloc_slab+0xba zone_import+0xee =
zone_alloc_item+0x6f abd_alloc_chunks+0x61 abd_alloc+0x102 =
arc_hdr_alloc_abd+0xb0 arc_hdr_alloc+0x11e arc_read+0x4f4 =
dbuf_issue_final_prefetch+0x108 dbuf_prefetch_impl+0x3d0 =
dmu_zfetch+0x558
>=20
> Looks like the system is out of memory.
> It seems that you already established that.
>=20
> --=20
> Andriy Gapon
>=20


--Apple-Mail=_418595D6-236C-433E-A3B7-CBE045A968A1--




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?665A4DDB-6973-4A2D-A427-492F30D27851>