Date: Sat, 13 Jul 2024 19:42:32 -0700 From: Rick Macklem <rick.macklem@gmail.com> To: Garrett Wollman <wollman@bimajority.org> Cc: freebsd-stable@freebsd.org Subject: Re: Possible bug in zfs send or pipe implementation? Message-ID: <CAM5tNy4pPF9mHdXM5W6gjztm4_TtFfXnOLu3cdkqvaRf3Ab5uA@mail.gmail.com> In-Reply-To: <26259.12713.114036.564205@hergotha.csail.mit.edu> References: <26259.12713.114036.564205@hergotha.csail.mit.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Jul 13, 2024 at 7:02=E2=80=AFPM Garrett Wollman <wollman@bimajority= .org> wrote: > > I'm migrating an old file server to new hardware using syncoid. Every > so often, the `zfs send` process gets stuck with the following > kstacks: > > 7960 108449 zfs - mi_switch sleepq_cat= ch_signals sleepq_wait_sig _sleep pipe_write zfs_file_write_impl zfs_file_w= rite dump_record dmu_dump_write do_dump dmu_send_impl dmu_send_obj zfs_ioc_= send zfsdev_ioctl_common zfsdev_ioctl devfs_ioctl vn_ioctl devfs_ioctl_f > 7960 126072 zfs send_traverse_threa mi_switch sleepq_cat= ch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl send_cb travers= e_visitbp traverse_visitbp traverse_visitbp traverse_dnode traverse_visitbp= traverse_visitbp traverse_visitbp traverse_visitbp traverse_visitbp traver= se_visitbp traverse_dnode traverse_visitbp > 7960 126074 zfs send_merge_thread mi_switch sleepq_cat= ch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl send_merge_thre= ad fork_exit fork_trampoline > 7960 126075 zfs send_reader_thread mi_switch sleepq_cat= ch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl send_reader_thr= ead fork_exit fork_trampoline > > Near as I can tell, the thread first thread is trying to write > serialized data data to the output pipe and is blocked. The other > threads are stuck because the write process isn't making progress. # ps axHl should show you what wchan's the processes are waiting on and that might give you a clue w.r.t. what is happening? If is easy to build a kernel from sources and boot that, you could try defi= ning PIPE_NODIRECT in sys/kern/sys_pipe.c and see if that avoids the hangs? rick > > The process reading from the pipe (which is just a progress meter) is > sitting in select() waiting for the pipe to become ready, so either > zfs_file_write() is doing something wrong, or the pipe implementation > has lost a selwakeup() somewhere. (Or, possibly but unlikely, the > progress meter has lost the read end of the pipe from its read > fd_set.) Unfortunately, neither fstat nor procstat print any useful > information about the state of the pipe, so I can only try to deduce > what's going on from the observable behavior. > > -GAWollman >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy4pPF9mHdXM5W6gjztm4_TtFfXnOLu3cdkqvaRf3Ab5uA>