Date: Wed, 26 Nov 2025 21:58:21 -0700 From: Warner Losh <imp@bsdimp.com> To: Anthony Pankov <anthony.pankov@yahoo.com> Cc: freebsd-hackers@freebsd.org Subject: Re: any way to recover from I/O hang? Message-ID: <CANCZdfqsE5aA%2BATrqXnO6MqTWYV34i8DdSCDtJ7yc-Hp6gn-tA@mail.gmail.com> In-Reply-To: <458400081.20251126171621@yahoo.com> References: <458400081.20251126171621.ref@yahoo.com> <458400081.20251126171621@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] On Wed, Nov 26, 2025 at 7:16 AM Anthony Pankov <anthony.pankov@yahoo.com> wrote: > > Recently I'm again facing a situation where some UFS (?) error on one > partition forced me to reboot whole server and interrupt people's work. > As I understand the brief is: > > Reason: A process in state 'D' (uninterruptible wait/sleep) is blocked on > an I/O operation (e.g., waiting for a network file share, a disk operation, > or a device that is no longer responding) at the kernel level. The process > is not awaken to handle signals until the condition it is waiting for is > resolved. > Solution: You cannot kill a 'D' state process with any signal, including > SIGKILL (kill -9). The condition must be resolved (e.g., fixing the network > connection, waiting for the I/O to time out, or potentially unmounting the > filesystem with umount -f if it's a mounted network share). In some rare > cases involving buggy kernel code or hardware failure, a reboot may be the > only option. > > In my case there was a jailed samba which share one ufs-formatted > partition. The samba processes hangs one by one and sharing has stopped it > work. > There was no message in log. Geom reported no errors. Disk's has no error. > It seems that there was some introduced inconsistency in UFS. > My thoughts was to stop/kill samba, remove samba's jail, unmount partition > and do fsck. > > But 'jail -r' exit with something 'rc.shutdown exited ... 9' and left jail > running. pskill -KILL for samba processes say nothing and do nothing. > 'umount -f' say 'Device is busy'. Various utilities such as 'df' hangs. So > I forced to shutdown the server which has other important and workable > service to resolve the situation. > > I wander is there any way to treat such a cases? May be 'umount -f' can > have more power... > So no errors from geom? Or from CAM? what's the underlying storage hardware? And what's the wchan / straceback for the processes in 'D' state? And do you know the location of the file that's waiting for I/O? There's a number of things that might cause this, but they typically are noisy about it. There's a couple of deadlock issues that might cause this, but getting some more information is needed to understand what might be done to mitigate or prevent the deadlocks. P.S. Previous situation was when I do simple (as I think) experiments with > ggated and have forced to reboot server when 'mount' hung. > I assume ggate isn't involved when this happens now, right? Warner > -- > Best regards, > Anthony Pankov mailto:anthony.pankov@yahoo.com > > > [-- Attachment #2 --] <div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Wed, Nov 26, 2025 at 7:16 AM Anthony Pankov <<a href="mailto:anthony.pankov@yahoo.com">anthony.pankov@yahoo.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br> Recently I'm again facing a situation where some UFS (?) error on one partition forced me to reboot whole server and interrupt people's work.<br> As I understand the brief is:<br> <br> Reason: A process in state 'D' (uninterruptible wait/sleep) is blocked on an I/O operation (e.g., waiting for a network file share, a disk operation, or a device that is no longer responding) at the kernel level. The process is not awaken to handle signals until the condition it is waiting for is resolved.<br> Solution: You cannot kill a 'D' state process with any signal, including SIGKILL (kill -9). The condition must be resolved (e.g., fixing the network connection, waiting for the I/O to time out, or potentially unmounting the filesystem with umount -f if it's a mounted network share). In some rare cases involving buggy kernel code or hardware failure, a reboot may be the only option.<br> <br> In my case there was a jailed samba which share one ufs-formatted partition. The samba processes hangs one by one and sharing has stopped it work.<br> There was no message in log. Geom reported no errors. Disk's has no error. It seems that there was some introduced inconsistency in UFS.<br> My thoughts was to stop/kill samba, remove samba's jail, unmount partition and do fsck.<br> <br> But 'jail -r' exit with something 'rc.shutdown exited ... 9' and left jail running. pskill -KILL for samba processes say nothing and do nothing.<br> 'umount -f' say 'Device is busy'. Various utilities such as 'df' hangs. So I forced to shutdown the server which has other important and workable service to resolve the situation.<br> <br> I wander is there any way to treat such a cases? May be 'umount -f' can have more power...<br></blockquote><div><br></div><div>So no errors from geom? Or from CAM? what's the underlying storage hardware? And what's the wchan / straceback for the processes in 'D' state? And do you know the location of the file that's waiting for I/O?</div><div><br></div><div>There's a number of things that might cause this, but they typically are noisy about it. There's a couple of deadlock issues that might cause this, but getting some more information is needed to understand what might be done to mitigate or prevent the deadlocks.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">P.S. Previous situation was when I do simple (as I think) experiments with ggated and have forced to reboot server when 'mount' hung.<br></blockquote><div><br></div><div>I assume ggate isn't involved when this happens now, right?</div><div><br></div><div>Warner</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> -- <br> Best regards,<br> Anthony Pankov mailto:<a href="mailto:anthony.pankov@yahoo.com" target="_blank">anthony.pankov@yahoo.com</a><br> <br> <br> </blockquote></div></div>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfqsE5aA%2BATrqXnO6MqTWYV34i8DdSCDtJ7yc-Hp6gn-tA>
