Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 13 Jul 2024 19:42:32 -0700
From:      Rick Macklem <rick.macklem@gmail.com>
To:        Garrett Wollman <wollman@bimajority.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Possible bug in zfs send or pipe implementation?
Message-ID:  <CAM5tNy4pPF9mHdXM5W6gjztm4_TtFfXnOLu3cdkqvaRf3Ab5uA@mail.gmail.com>
In-Reply-To: <26259.12713.114036.564205@hergotha.csail.mit.edu>
References:  <26259.12713.114036.564205@hergotha.csail.mit.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Jul 13, 2024 at 7:02=E2=80=AFPM Garrett Wollman <wollman@bimajority=
.org> wrote:
>
> I'm migrating an old file server to new hardware using syncoid.  Every
> so often, the `zfs send` process gets stuck with the following
> kstacks:
>
>  7960 108449 zfs                 -                   mi_switch sleepq_cat=
ch_signals sleepq_wait_sig _sleep pipe_write zfs_file_write_impl zfs_file_w=
rite dump_record dmu_dump_write do_dump dmu_send_impl dmu_send_obj zfs_ioc_=
send zfsdev_ioctl_common zfsdev_ioctl devfs_ioctl vn_ioctl devfs_ioctl_f
>  7960 126072 zfs                 send_traverse_threa mi_switch sleepq_cat=
ch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl send_cb travers=
e_visitbp traverse_visitbp traverse_visitbp traverse_dnode traverse_visitbp=
 traverse_visitbp traverse_visitbp traverse_visitbp traverse_visitbp traver=
se_visitbp traverse_dnode traverse_visitbp
>  7960 126074 zfs                 send_merge_thread   mi_switch sleepq_cat=
ch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl send_merge_thre=
ad fork_exit fork_trampoline
>  7960 126075 zfs                 send_reader_thread  mi_switch sleepq_cat=
ch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl send_reader_thr=
ead fork_exit fork_trampoline
>
> Near as I can tell, the thread first thread is trying to write
> serialized data data to the output pipe and is blocked.  The other
> threads are stuck because the write process isn't making progress.
# ps axHl
should show you what wchan's the processes are waiting on and that might
give you a clue w.r.t. what is happening?

If is easy to build a kernel from sources and boot that, you could try defi=
ning
PIPE_NODIRECT in sys/kern/sys_pipe.c and see if that avoids the hangs?

rick

>
> The process reading from the pipe (which is just a progress meter) is
> sitting in select() waiting for the pipe to become ready, so either
> zfs_file_write() is doing something wrong, or the pipe implementation
> has lost a selwakeup() somewhere.  (Or, possibly but unlikely, the
> progress meter has lost the read end of the pipe from its read
> fd_set.)  Unfortunately, neither fstat nor procstat print any useful
> information about the state of the pipe, so I can only try to deduce
> what's going on from the observable behavior.
>
> -GAWollman
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy4pPF9mHdXM5W6gjztm4_TtFfXnOLu3cdkqvaRf3Ab5uA>