From nobody Thu Aug 31 23:44:36 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RcHnh17XFz4rmPs; Thu, 31 Aug 2023 23:44:40 +0000 (UTC) (envelope-from cy.schubert@cschubert.com) Received: from omta002.cacentral1.a.cloudfilter.net (omta002.cacentral1.a.cloudfilter.net [3.97.99.33]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "Client", Issuer "CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RcHng5pkQz3W8s; Thu, 31 Aug 2023 23:44:39 +0000 (UTC) (envelope-from cy.schubert@cschubert.com) Authentication-Results: mx1.freebsd.org; none Received: from shw-obgw-4004a.ext.cloudfilter.net ([10.228.9.227]) by cmsmtp with ESMTP id bp28qFOK56NwhbrL5qBvzE; Thu, 31 Aug 2023 23:44:39 +0000 Received: from spqr.komquats.com ([70.66.152.170]) by cmsmtp with ESMTPA id brL3qfYCK3fOSbrL4qX3Tm; Thu, 31 Aug 2023 23:44:39 +0000 X-Authority-Analysis: v=2.4 cv=J8G5USrS c=1 sm=1 tr=0 ts=64f125e7 a=y8EK/9tc/U6QY+pUhnbtgQ==:117 a=y8EK/9tc/U6QY+pUhnbtgQ==:17 a=kj9zAlcOel0A:10 a=UttIx32zK-AA:10 a=VxmjJ2MpAAAA:8 a=YxBL1-UpAAAA:8 a=6I5d2MoRAAAA:8 a=NEAV23lmAAAA:8 a=EkcXrb_YAAAA:8 a=xDwQH7ZLWR0dE5ld-7IA:9 a=VOrSvM_a6J0YgARF:21 a=CjuIK1q_8ugA:10 a=UJ0tAi3fqDAA:10 a=gvFz51Dc2IYA:10 a=7gXAzLPJhVmCkEl4_tsf:22 a=Ia-lj3WSrqcvXOmTRaiG:22 a=IjZwj45LgO3ly-622nXo:22 a=LK5xJRSDVpKd5WXXoEvA:22 Received: from slippy.cwsent.com (slippy [10.1.1.91]) by spqr.komquats.com (Postfix) with ESMTP id 1D2C22F3; Thu, 31 Aug 2023 16:44:37 -0700 (PDT) Received: by slippy.cwsent.com (Postfix, from userid 1000) id 0EB38133; Thu, 31 Aug 2023 16:44:37 -0700 (PDT) X-Mailer: exmh version 2.9.0 11/07/2018 with nmh-1.8+dev Reply-to: Cy Schubert From: Cy Schubert X-os: FreeBSD X-Sender: cy@cwsent.com X-URL: http://www.cschubert.com/ To: Cy Schubert cc: Warner Losh , Alexander Motin , Gleb Smirnoff , Drew Gallatin , Martin Matuska , src-committers , "" , "" Subject: Re: git: 315ee00fa961 - main - zfs: merge openzfs/zfs@804414aad In-reply-to: <20230831234023.917DFAB@slippy.cwsent.com> References: <202308270509.37R596B5048298@gitrepo.freebsd.org> <07faf861-9186-47d1-992a-91d483ea4e9c@app.fastmail.com> <1db726d4-32c9-e1b8-51d6-981aa51b7825@FreeBSD.org> <20230831175350.981F1D5@slippy.cwsent.com> <20230831223526.DCB701A1@slippy.cwsent.com> <20230831233228.9935BA8@slippy.cwsent.com> <20230831234023.917DFAB@slippy.cwsent.com> Comments: In-reply-to Cy Schubert message dated "Thu, 31 Aug 2023 16:40:23 -0700." List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 31 Aug 2023 16:44:36 -0700 Message-Id: <20230831234437.0EB38133@slippy.cwsent.com> X-CMAE-Envelope: MS4xfJJjAxlA45qO+eE9rO/alj7ZUX7z7UXOvVbo2ZnVA/PhgKHu4dT9OkzteujT1nyqavuor1xQ9xDCLY+38FlzcoqLy81KEC9RasCs8PB95rkNpM7Ad4ZP 5MBz7fSqKwaAoY6y8SYkPtgSwlZYNVajEQw1XxEXBiAkSuGw2nzGymhT56L/JR4Sa8uEbo1/Km/OYvoE41ut86JjnssIdGoaE6FUkthJeubEluENvxHPfLe8 8IhQ+EDKvmRhyQJz2uLFVLv1aeJotfr5BSDBSvu5JT/J0jZBYp0NALTuLzt+Tt9poVAhLvCnXz2VXmYAhSJeGfwkKF++TwJOu5j5n/MaWg34s4g8lIDeuHZd 3pdtCiRxAH6vm8leuKQZk4tU3/xuPGPE3oI7rargyxhRv3iDVV5WwUfToncEEx95b1/n3AhANABTNF/YtyIadWHd8ZQPVw== X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:16509, ipnet:3.96.0.0/15, country:US] X-Rspamd-Queue-Id: 4RcHng5pkQz3W8s In message <20230831234023.917DFAB@slippy.cwsent.com>, Cy Schubert writes: > In message om> > , Warner Losh writes: > > On Thu, Aug 31, 2023, 5:32 PM Cy Schubert wrote > = > > : > > > > > In message <20230831223526.DCB701A1@slippy.cwsent.com>, Cy Schubert > > > writes: > > > > In message , > > > Alexander > > > > Motin > > > > writes: > > > > > On 31.08.2023 13:53, Cy Schubert wrote: > > > > > > One thing that circumvents my two problems is reducing poudriere > > > bulk job > > > > s > > > > > > from 8 to 5 on my 4 core machines. > > > > > > > > > > Cy, I have no real evidences to think it is related, other than your > > > > > panics look like some memory corruptions, but could you try is patch: > > > > > https://github.com/openzfs/zfs/pull/15228 . If it won't do the > > > trick, > > > > > then I am out of ideas without additional input. > > > > > > > > So far so good. Poudriere has been running with a decent -J jobs on bot > = > > h > > > > machines for over an hour. I'll let you know if they survive the night. > > > It > > > > can take some time before the panics happen though. > > > > > > > > The problem is more likely to occur when there are a lot of small > > > package > > > > builds than large long running jobs, probably because of the parallel > > > ZFS > > > > dataset creations, deletions, and rollbacks. > > > > > > > > > > > > > > Gleb, you may try to add this too, just as a choice between impossibl > = > > e > > > > > and improbable. > > > > > > > > > > -- > > > > > Alexander Motin > > > > > > > > > > > > -- > > > > Cheers, > > > > Cy Schubert > > > > FreeBSD UNIX: Web: https://FreeBSD.org > > > > NTP: Web: https://nwtime.org > > > > > > > > e^(i*pi)+1=3D0 > > > > > > > > > > > > > > One of the two machines is hung. > > > > > > cwfw# ping bob > > > PING bob (10.1.1.7): 56 data bytes > > > ^C > > > --- bob ping statistics --- > > > 2 packets transmitted, 0 packets received, 100.0% packet loss > > > cwfw# console bob > > > [Enter `^Ec?' for help] > > > [halt sent] > > > KDB: enter: Break to debugger > > > [ thread pid 31259 tid 100913 ] > > > Stopped at kdb_break+0x48: movq $0,0xa1069d(%rip) > > > db> bt > > > Tracing pid 31259 tid 100913 td 0xfffffe00c4eca000 > > > kdb_break() at kdb_break+0x48/frame 0xfffffe00c53ef2d0 > > > uart_intr() at uart_intr+0xf7/frame 0xfffffe00c53ef310 > > > intr_event_handle() at intr_event_handle+0x12b/frame 0xfffffe00c53ef380 > > > intr_execute_handlers() at intr_execute_handlers+0x63/frame > > > 0xfffffe00c53ef3b0 > > > Xapic_isr1() at Xapic_isr1+0xdc/frame 0xfffffe00c53ef3b0 > > > --- interrupt, rip =3D 0xffffffff806d5c70, rsp =3D 0xfffffe00c53ef480, rb > = > > p =3D > > > 0xfffffe00c53ef480 --- > > > getbinuptime() at getbinuptime+0x30/frame 0xfffffe00c53ef480 > > > arc_access() at arc_access+0x250/frame 0xfffffe00c53ef4d0 > > > arc_buf_access() at arc_buf_access+0xd0/frame 0xfffffe00c53ef4f0 > > > dbuf_hold_impl() at dbuf_hold_impl+0xf3/frame 0xfffffe00c53ef580 > > > dbuf_hold() at dbuf_hold+0x25/frame 0xfffffe00c53ef5b0 > > > dnode_hold_impl() at dnode_hold_impl+0x194/frame 0xfffffe00c53ef670 > > > dmu_bonus_hold() at dmu_bonus_hold+0x20/frame 0xfffffe00c53ef6a0 > > > zfs_zget() at zfs_zget+0x20d/frame 0xfffffe00c53ef750 > > > zfs_dirent_lookup() at zfs_dirent_lookup+0x16d/frame 0xfffffe00c53ef7a0 > > > zfs_dirlook() at zfs_dirlook+0x7f/frame 0xfffffe00c53ef7d0 > > > zfs_lookup() at zfs_lookup+0x3c0/frame 0xfffffe00c53ef8a0 > > > zfs_freebsd_cachedlookup() at zfs_freebsd_cachedlookup+0x67/frame > > > 0xfffffe00c53ef9e0 > > > vfs_cache_lookup() at vfs_cache_lookup+0xa6/frame 0xfffffe00c53efa30 > > > vfs_lookup() at vfs_lookup+0x457/frame 0xfffffe00c53efac0 > > > namei() at namei+0x2e1/frame 0xfffffe00c53efb20 > > > vn_open_cred() at vn_open_cred+0x505/frame 0xfffffe00c53efca0 > > > kern_openat() at kern_openat+0x287/frame 0xfffffe00c53efdf0 > > > ia32_syscall() at ia32_syscall+0x156/frame 0xfffffe00c53eff30 > > > int0x80_syscall_common() at int0x80_syscall_common+0x9c/frame 0xffff89dc > > > db> > > > > > > I'll let it continue. Hopefully the watchdog timer will pop and we get a > > > dump. > > > > > > > > > Might also be interesting to see if this moves around or is really hung > > getting the time. I suspect it's live lock given this traceback. > > It's moving around. > > > db> c > [halt sent] > KDB: enter: Break to debugger > [ thread pid 31259 tid 100913 ] > Stopped at kdb_break+0x48: movq $0,0xa1069d(%rip) > db> bt > Tracing pid 31259 tid 100913 td 0xfffffe00c4eca000 > kdb_break() at kdb_break+0x48/frame 0xfffffe00c53ef410 > uart_intr() at uart_intr+0xf7/frame 0xfffffe00c53ef450 > intr_event_handle() at intr_event_handle+0x12b/frame 0xfffffe00c53ef4c0 > intr_execute_handlers() at intr_execute_handlers+0x63/frame > 0xfffffe00c53ef4f0 > Xapic_isr1() at Xapic_isr1+0xdc/frame 0xfffffe00c53ef4f0 > --- interrupt, rip = 0xffffffff807117c4, rsp = 0xfffffe00c53ef5c0, rbp = > 0xfffffe00c53ef5c0 --- > lock_delay() at lock_delay+0x14/frame 0xfffffe00c53ef5c0 > _sx_xlock_hard() at _sx_xlock_hard+0x1a9/frame 0xfffffe00c53ef660 > _sx_xlock() at _sx_xlock+0xb0/frame 0xfffffe00c53ef6a0 > zfs_zget() at zfs_zget+0x1fa/frame 0xfffffe00c53ef750 > zfs_dirent_lookup() at zfs_dirent_lookup+0x16d/frame 0xfffffe00c53ef7a0 > zfs_dirlook() at zfs_dirlook+0x7f/frame 0xfffffe00c53ef7d0 > zfs_lookup() at zfs_lookup+0x3c0/frame 0xfffffe00c53ef8a0 > zfs_freebsd_cachedlookup() at zfs_freebsd_cachedlookup+0x67/frame > 0xfffffe00c53ef9e0 > vfs_cache_lookup() at vfs_cache_lookup+0xa6/frame 0xfffffe00c53efa30 > vfs_lookup() at vfs_lookup+0x457/frame 0xfffffe00c53efac0 > namei() at namei+0x2e1/frame 0xfffffe00c53efb20 > vn_open_cred() at vn_open_cred+0x505/frame 0xfffffe00c53efca0 > kern_openat() at kern_openat+0x287/frame 0xfffffe00c53efdf0 > ia32_syscall() at ia32_syscall+0x156/frame 0xfffffe00c53eff30 > int0x80_syscall_common() at int0x80_syscall_common+0x9c/frame 0xffff89dc > db> c It seems to have settled there for now. -- Cheers, Cy Schubert FreeBSD UNIX: Web: https://FreeBSD.org NTP: Web: https://nwtime.org e^(i*pi)+1=0