From nobody Sat Aug 19 18:13:36 2023 X-Original-To: current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RSn1H5VR8z4mHms for ; Sat, 19 Aug 2023 18:13:39 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-yb1-xb2f.google.com (mail-yb1-xb2f.google.com [IPv6:2607:f8b0:4864:20::b2f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RSn1H3pgdz4Tyc; Sat, 19 Aug 2023 18:13:39 +0000 (UTC) (envelope-from mavbsd@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-yb1-xb2f.google.com with SMTP id 3f1490d57ef6-d712d86eb96so1881204276.3; Sat, 19 Aug 2023 11:13:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692468818; x=1693073618; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=cviUJMFFwVe3uSD9umiwERFgrMea1UuamcwpyCOeQG4=; b=I1awdNiQh+Z7GLPArNfyJf+oIowxUo7bWDlfxVNgt0vfkfEIHNF5qJHBAvO6JFpuzC e5pKaOwEXtVuYaVq3TDaoMvO3g3YQBuit2WP//BfULOhc82KXv60nLfIZ4bpapry4shX ysNv7V/at6E/LnZcIS48SFli5QB7+nnkNMAZKfHihCAWr95zPECICz3soaFsikD9y6Bj RNqD92LQ8UJm7ytH1T60WYOo1NVtGZYLBlKqvK9tLzEAi/FC94O7y+atz5l+tEBp5lKE MuySNL2hgqeFkhNKeA6+N8wBjdAmrUeUnYuxaserUj2Do40DGRPMx7xx7Nz1xDRtdozX hIXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692468818; x=1693073618; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=cviUJMFFwVe3uSD9umiwERFgrMea1UuamcwpyCOeQG4=; b=lXul4t/0hnPBGMCqcbpti5Q5gOqGonUuuY6HDa+lkKjsoTs/dW3Y5J6HIUhhf8zbSV XzMymwLOE6FwCVvZCv0BjfmRLSIquQBM9UxTIZ/fML55Ep2iw3GSo6U054f00Ae7pU9u pEzcI+00e5A932VMKGocYtgAVd3l9TrfAGrx0GRq1v2Y2KNVQ/wufTab81wI8SXZP2PC ciOYmMAHNaiid0Hl+ULN3V/TXjOqBEK86R+GM1oO4nqb8SMzA9+CSyQY44KZPj6laEmf 6akiOP4GwvGiVhj6RfB9llmq0LYKKmtWOv3yC6AhPse0wTdWNLT4yBr4Ta9VhIuRFIX+ ca9w== X-Gm-Message-State: AOJu0YySCFyV577kttIRez1tQ5cdk76ZRowjvMdwh5qobbD0N85MN8x4 mEEy5Sj+yCQ5ArYxdZv49Qzz8AKp1rU= X-Google-Smtp-Source: AGHT+IFwYG3Yj9IBIqR/nrd4CJ4QkGw2r6mBhAyxmx2kC0W81/KlIWZbnFGlx2ojX1A2MAlqpkwDdw== X-Received: by 2002:a25:d149:0:b0:d55:370b:aa2e with SMTP id i70-20020a25d149000000b00d55370baa2emr3191541ybg.25.1692468818276; Sat, 19 Aug 2023 11:13:38 -0700 (PDT) Received: from [192.168.1.66] (104-55-12-234.lightspeed.knvltn.sbcglobal.net. [104.55.12.234]) by smtp.gmail.com with ESMTPSA id e191-20020a2537c8000000b00d6861b49f01sm1073405yba.65.2023.08.19.11.13.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 19 Aug 2023 11:13:37 -0700 (PDT) Message-ID: Date: Sat, 19 Aug 2023 14:13:36 -0400 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: ZFS deadlock in 14 Content-Language: en-US To: =?UTF-8?Q?Dag-Erling_Sm=c3=b8rgrav?= Cc: current@freebsd.org, Mateusz Guzik , Martin Matuska References: <86leeltqcb.fsf@ltc.des.no> <86h6p4s64h.fsf@ltc.des.no> <86a5utrafp.fsf@ltc.des.no> <86350kqokl.fsf@ltc.des.no> <86y1icp95t.fsf@ltc.des.no> <86ttt0p8wv.fsf@ltc.des.no> <197ead1e-210a-6be6-7e24-5c56b14bb777@FreeBSD.org> <86350hpiby.fsf@ltc.des.no> <0fd64567-59cd-9276-cc0c-17a09611f11a@FreeBSD.org> <86sf8hnp3k.fsf@ltc.des.no> <86fs4go2qb.fsf@ltc.des.no> <868ra8nfo3.fsf@ltc.des.no> From: Alexander Motin In-Reply-To: <868ra8nfo3.fsf@ltc.des.no> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4RSn1H3pgdz4Tyc X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] On 18.08.2023 18:34, Dag-Erling Smørgrav wrote: > Dag-Erling Smørgrav writes: >> Plot twist: c47116e909 _without_ the patches also appears to be working >> fine. The last kernel I know for sure deadlocks is b36f469a15, so I'm >> going to test cd25b0f740 and 28d2e3b5de. > > c47116e909 with cd25b0f740 and 28d2e3b5de reverted deadlocks, see > attached ddb.txt. I'm going to see if reverting only 28d2e3b5de but not > cd25b0f740 changes anything. Yes, it looks like ZIL-related deadlock: ZFS sync thread in zil_sync() is waiting for allocated ZIL zios to complete: Tracing command zfskern pid 5 tid 101124 td 0xfffffe0408cbb020 sched_switch() at sched_switch+0x5da/frame 0xfffffe04090f7900 mi_switch() at mi_switch+0x173/frame 0xfffffe04090f7920 sleepq_switch() at sleepq_switch+0x104/frame 0xfffffe04090f7960 _cv_wait() at _cv_wait+0x165/frame 0xfffffe04090f79c0 zil_sync() at zil_sync+0x9b/frame 0xfffffe04090f7aa0 dmu_objset_sync() at dmu_objset_sync+0x51b/frame 0xfffffe04090f7b70 dsl_pool_sync() at dsl_pool_sync+0x11d/frame 0xfffffe04090f7bf0 spa_sync() at spa_sync+0xc68/frame 0xfffffe04090f7e20 txg_sync_thread() at txg_sync_thread+0x2eb/frame 0xfffffe04090f7ef0 fork_exit() at fork_exit+0x82/frame 0xfffffe04090f7f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe04090f7f30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Some thread requested fsync(), allocated zil zio, but stuck waiting for z_teardown_inactive_lock in attempt to get data to be written into zil, so zios were never even issued: Tracing command blacklistd pid 521 tid 101136 td 0xfffffe040d08d000 sched_switch() at sched_switch+0x5da/frame 0xfffffe040c25c710 mi_switch() at mi_switch+0x173/frame 0xfffffe040c25c730 sleepq_switch() at sleepq_switch+0x104/frame 0xfffffe040c25c770 _sleep() at _sleep+0x2d6/frame 0xfffffe040c25c810 rms_rlock_fallback() at rms_rlock_fallback+0xd0/frame 0xfffffe040c25c850 zfs_freebsd_reclaim() at zfs_freebsd_reclaim+0x2b/frame 0xfffffe040c25c880 VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0x35/frame 0xfffffe040c25c8a0 vgonel() at vgonel+0x3a9/frame 0xfffffe040c25c910 vnlru_free_impl() at vnlru_free_impl+0x371/frame 0xfffffe040c25c990 vn_alloc_hard() at vn_alloc_hard+0xd3/frame 0xfffffe040c25c9b0 getnewvnode_reserve() at getnewvnode_reserve+0xa0/frame 0xfffffe040c25c9d0 zfs_zget() at zfs_zget+0x1f/frame 0xfffffe040c25ca80 zfs_get_data() at zfs_get_data+0x62/frame 0xfffffe040c25cb20 zil_lwb_commit() at zil_lwb_commit+0x32f/frame 0xfffffe040c25cb70 zil_lwb_write_issue() at zil_lwb_write_issue+0x4e/frame 0xfffffe040c25cbb0 zil_commit_impl() at zil_commit_impl+0x943/frame 0xfffffe040c25cd40 zfs_fsync() at zfs_fsync+0x8f/frame 0xfffffe040c25cd80 kern_fsync() at kern_fsync+0x18a/frame 0xfffffe040c25ce00 amd64_syscall() at amd64_syscall+0x138/frame 0xfffffe040c25cf30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe040c25cf30 --- syscall (95, FreeBSD ELF64, fsync), rip = 0x3f979eeb074a, rsp = 0x3f979a449c38, rbp = 0x3f979a449c50 --- Third thread doing zfs rollback while holding z_teardown_inactive_lock tries to wait for transaction commit, causing deadlock: Tracing command zfs pid 65636 tid 138109 td 0xfffffe0438d721e0 sched_switch() at sched_switch+0x5da/frame 0xfffffe0439b2b950 mi_switch() at mi_switch+0x173/frame 0xfffffe0439b2b970 sleepq_switch() at sleepq_switch+0x104/frame 0xfffffe0439b2b9b0 _cv_wait() at _cv_wait+0x165/frame 0xfffffe0439b2ba10 txg_wait_synced_impl() at txg_wait_synced_impl+0xeb/frame 0xfffffe0439b2ba50 txg_wait_synced() at txg_wait_synced+0xb/frame 0xfffffe0439b2ba60 zfsvfs_teardown() at zfsvfs_teardown+0x203/frame 0xfffffe0439b2bab0 zfs_ioc_rollback() at zfs_ioc_rollback+0x12f/frame 0xfffffe0439b2bb00 zfsdev_ioctl_common() at zfsdev_ioctl_common+0x612/frame 0xfffffe0439b2bbc0 zfsdev_ioctl() at zfsdev_ioctl+0x12a/frame 0xfffffe0439b2bbf0 devfs_ioctl() at devfs_ioctl+0xd2/frame 0xfffffe0439b2bc40 vn_ioctl() at vn_ioctl+0xc2/frame 0xfffffe0439b2bcb0 devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe0439b2bcd0 kern_ioctl() at kern_ioctl+0x286/frame 0xfffffe0439b2bd30 sys_ioctl() at sys_ioctl+0x152/frame 0xfffffe0439b2be00 amd64_syscall() at amd64_syscall+0x138/frame 0xfffffe0439b2bf30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0439b2bf30 --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x1afaddea3aaa, rsp = 0x1afad4058328, rbp = 0x1afad40583a0 --- Unfortunately I think the current code in main should still suffer from this specific deadlock. cd25b0f740 fixes some deadlocks in this area, may be that is why you are getting issues less often, but I don't believe it fixes this specific one, may be you was just lucky. Only https://github.com/openzfs/zfs/pull/15122 I believe should fix them. -- Alexander Motin