From nobody Thu Nov 11 13:27:42 2021 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id C2343184E679 for ; Thu, 11 Nov 2021 13:27:50 +0000 (UTC) (envelope-from cross+freebsd@distal.com) Received: from relay.wiredblade.com (relay.wiredblade.com [168.235.95.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4HqjFd5XKZz4ZNB; Thu, 11 Nov 2021 13:27:49 +0000 (UTC) (envelope-from cross+freebsd@distal.com) Received: from mail.distal.com (pool-108-48-165-176.washdc.fios.verizon.net [108.48.165.176]) by relay.wiredblade.com with ESMTPSA (version=TLSv1.2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256) ; Thu, 11 Nov 2021 13:27:48 +0000 Received: from smtpclient.apple ( [2001:420:c0c8:1002::200]) by tristain.distal.com (OpenSMTPD) with ESMTPSA id 906e6cc0 (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO); Thu, 11 Nov 2021 08:27:46 -0500 (EST) From: Chris Ross Message-Id: <665A4DDB-6973-4A2D-A427-492F30D27851@distal.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_418595D6-236C-433E-A3B7-CBE045A968A1" List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 15.0 \(3693.20.0.1.32\)) Subject: Re: ZFS operations hanging, but no visible errors? Date: Thu, 11 Nov 2021 08:27:42 -0500 In-Reply-To: Cc: freebsd-fs To: Andriy Gapon References: <20211105173935.7aa53269@fabiankeil.de> <86999084-7007-4F08-A4C4-4A835A7E1C78@distal.com> X-Mailer: Apple Mail (2.3693.20.0.1.32) X-Rspamd-Queue-Id: 4HqjFd5XKZz4ZNB X-Spamd-Bar: ++ Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of cross@distal.com designates 168.235.95.80 as permitted sender) smtp.mailfrom=cross@distal.com X-Spamd-Result: default: False [2.37 / 15.00]; SUBJECT_ENDS_QUESTION(1.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; MV_CASE(0.50)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; R_SPF_ALLOW(-0.20)[+a:relay.dynu.com]; ARC_NA(0.00)[]; NEURAL_SPAM_MEDIUM(1.00)[1.000]; DMARC_NA(0.00)[distal.com]; NEURAL_HAM_LONG(-0.83)[-0.827]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_SPAM_SHORT(1.00)[0.998]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; ASN(0.00)[asn:3842, ipnet:168.235.92.0/22, country:US]; TAGGED_FROM(0.00)[freebsd]; RCVD_TLS_ALL(0.00)[]; RECEIVED_SPAMHAUS_PBL(0.00)[108.48.165.176:received] X-Spam: Yes X-ThisMailContainsUnwantedMimeParts: Y --Apple-Mail=_418595D6-236C-433E-A3B7-CBE045A968A1 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Following up on a new hang this same system had (yesterday freebsd-fs = mail subject "swap_pager: cannot allocate bio=E2=80=9D), I think the = same problem might have occurred again. Certainly the system got stuck = again Based on the below, my executing that dtrace command caused the system = to report "ACPI Error: AE_NO_MEMORY=E2=80=9D. In what way is the system = out of memory here? And, does that failure running dtrace suggest that = that =E2=80=9Cout of memory=E2=80=9D problem is the core problem causing = the ZFS hang in the first place? My system has 128GB, which is nothing = to sneeze at. Are there parameters that I should change because the = normal parameters just don=E2=80=99t work well with a pool or fs this = large? And, from earlier in this thread from last week: Now that I have the = system running again, I can provide the "zpool status=E2=80=9D for = information. Let me know if I=E2=80=99ve just tried something crazy = here, this is the largest ZFS filesystem I=E2=80=99ve attempted. I have = a 30T pool on another system without issue, and with less RAM. (The = largest fs on that pool is about 18T) % zfs status pool: tank state: ONLINE scan: scrub repaired 0B in 05:05:55 with 0 errors on Sat Oct 23 = 04:38:36 2021 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 da3 ONLINE 0 0 0 da2 ONLINE 0 0 0 da1 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 da6 ONLINE 0 0 0 raidz1-2 ONLINE 0 0 0 da7 ONLINE 0 0 0 da8 ONLINE 0 0 0 da9 ONLINE 0 0 0 errors: No known data errors % zfs list tank NAME USED AVAIL REFER MOUNTPOINT tank 14.2T 35.0T 14.2T /tank - Chris > On Nov 7, 2021, at 03:35, Andriy Gapon wrote: >=20 > On 05/11/2021 18:59, Chris Ross wrote: >> Running prostate -kk on the rsync that was hung, then killed, then = SIGKILL=E2=80=99d shows: >> procstat -kk 35220 >> PID TID COMM TDNAME KSTACK >> 35220 102499 rsync - mi_switch+0xc1 = _sleep+0x1cb vm_wait_doms+0xe2 vm_wait_domain+0x51 = vm_domain_alloc_fail+0x86 vm_page_alloc_domain_after+0x7e = uma_small_alloc+0x58 keg_alloc_slab+0xba zone_import+0xee = zone_alloc_item+0x6f abd_alloc_chunks+0x61 abd_alloc+0x102 = arc_hdr_alloc_abd+0xb0 arc_hdr_alloc+0x11e arc_read+0x4f4 = dbuf_issue_final_prefetch+0x108 dbuf_prefetch_impl+0x3d0 = dmu_zfetch+0x558 >=20 > Looks like the system is out of memory. > It seems that you already established that. >=20 > --=20 > Andriy Gapon >=20 --Apple-Mail=_418595D6-236C-433E-A3B7-CBE045A968A1--