Date: Thu, 11 Nov 2021 08:27:42 -0500 From: Chris Ross <cross+freebsd@distal.com> To: Andriy Gapon <avg@freebsd.org> Cc: freebsd-fs <freebsd-fs@freebsd.org> Subject: Re: ZFS operations hanging, but no visible errors? Message-ID: <665A4DDB-6973-4A2D-A427-492F30D27851@distal.com> In-Reply-To: <de784091-c465-2931-c526-bc6efe7b88be@FreeBSD.org> References: <B28E52F4-F475-4CF6-BE0C-F5C803AD5757@distal.com> <20211105173935.7aa53269@fabiankeil.de> <86999084-7007-4F08-A4C4-4A835A7E1C78@distal.com> <de784091-c465-2931-c526-bc6efe7b88be@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--Apple-Mail=_418595D6-236C-433E-A3B7-CBE045A968A1 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Following up on a new hang this same system had (yesterday freebsd-fs = mail subject "swap_pager: cannot allocate bio=E2=80=9D), I think the = same problem might have occurred again. Certainly the system got stuck = again Based on the below, my executing that dtrace command caused the system = to report "ACPI Error: AE_NO_MEMORY=E2=80=9D. In what way is the system = out of memory here? And, does that failure running dtrace suggest that = that =E2=80=9Cout of memory=E2=80=9D problem is the core problem causing = the ZFS hang in the first place? My system has 128GB, which is nothing = to sneeze at. Are there parameters that I should change because the = normal parameters just don=E2=80=99t work well with a pool or fs this = large? And, from earlier in this thread from last week: Now that I have the = system running again, I can provide the "zpool status=E2=80=9D for = information. Let me know if I=E2=80=99ve just tried something crazy = here, this is the largest ZFS filesystem I=E2=80=99ve attempted. I have = a 30T pool on another system without issue, and with less RAM. (The = largest fs on that pool is about 18T) % zfs status pool: tank state: ONLINE scan: scrub repaired 0B in 05:05:55 with 0 errors on Sat Oct 23 = 04:38:36 2021 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 da3 ONLINE 0 0 0 da2 ONLINE 0 0 0 da1 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 raidz1-2 ONLINE 0 0 0 da7 ONLINE 0 0 0 da8 ONLINE 0 0 0 da9 ONLINE 0 0 0 errors: No known data errors % zfs list tank NAME USED AVAIL REFER MOUNTPOINT tank 14.2T 35.0T 14.2T /tank - Chris > On Nov 7, 2021, at 03:35, Andriy Gapon <avg@freebsd.org> wrote: >=20 > On 05/11/2021 18:59, Chris Ross wrote: >> Running prostate -kk on the rsync that was hung, then killed, then = SIGKILL=E2=80=99d shows: >> procstat -kk 35220 >> PID TID COMM TDNAME KSTACK >> 35220 102499 rsync - mi_switch+0xc1 = _sleep+0x1cb vm_wait_doms+0xe2 vm_wait_domain+0x51 = vm_domain_alloc_fail+0x86 vm_page_alloc_domain_after+0x7e = uma_small_alloc+0x58 keg_alloc_slab+0xba zone_import+0xee = zone_alloc_item+0x6f abd_alloc_chunks+0x61 abd_alloc+0x102 = arc_hdr_alloc_abd+0xb0 arc_hdr_alloc+0x11e arc_read+0x4f4 = dbuf_issue_final_prefetch+0x108 dbuf_prefetch_impl+0x3d0 = dmu_zfetch+0x558 >=20 > Looks like the system is out of memory. > It seems that you already established that. >=20 > --=20 > Andriy Gapon >=20 --Apple-Mail=_418595D6-236C-433E-A3B7-CBE045A968A1--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?665A4DDB-6973-4A2D-A427-492F30D27851>