Date: Thu, 27 Nov 2025 13:13:17 +0300 From: Anthony Pankov <anthony.pankov@yahoo.com> To: Warner Losh <imp@bsdimp.com> Cc: freebsd-hackers@freebsd.org Subject: Re: any way to recover from I/O hang? Message-ID: <908785764.20251127131317@yahoo.com> In-Reply-To: <CANCZdfqsE5aA%2BATrqXnO6MqTWYV34i8DdSCDtJ7yc-Hp6gn-tA@mail.gmail.com> References: <458400081.20251126171621.ref@yahoo.com> <458400081.20251126171621@yahoo.com> <CANCZdfqsE5aA%2BATrqXnO6MqTWYV34i8DdSCDtJ7yc-Hp6gn-tA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Thursday, November 27, 2025, 7:58:21 AM, you wrote: > On Wed, Nov 26, 2025 at 7:16 AM Anthony Pankov <anthony.pankov@yahoo.com> > wrote: >> >> Recently I'm again facing a situation where some UFS (?) error on one >> partition forced me to reboot whole server and interrupt people's work. >> As I understand the brief is: >> >> Reason: A process in state 'D' (uninterruptible wait/sleep) is blocked on >> an I/O operation (e.g., waiting for a network file share, a disk operation, >> or a device that is no longer responding) at the kernel level. The process >> is not awaken to handle signals until the condition it is waiting for is >> resolved. >> Solution: You cannot kill a 'D' state process with any signal, including >> SIGKILL (kill -9). The condition must be resolved (e.g., fixing the network >> connection, waiting for the I/O to time out, or potentially unmounting the >> filesystem with umount -f if it's a mounted network share). In some rare >> cases involving buggy kernel code or hardware failure, a reboot may be the >> only option. >> >> In my case there was a jailed samba which share one ufs-formatted >> partition. The samba processes hangs one by one and sharing has stopped it >> work. >> There was no message in log. Geom reported no errors. Disk's has no error. >> It seems that there was some introduced inconsistency in UFS. >> My thoughts was to stop/kill samba, remove samba's jail, unmount partition >> and do fsck. >> >> But 'jail -r' exit with something 'rc.shutdown exited ... 9' and left jail >> running. pskill -KILL for samba processes say nothing and do nothing. >> 'umount -f' say 'Device is busy'. Various utilities such as 'df' hangs. So >> I forced to shutdown the server which has other important and workable >> service to resolve the situation. >> >> I wander is there any way to treat such a cases? May be 'umount -f' can >> have more power... >> > So no errors from geom? Or from CAM? what's the underlying storage > hardware? And what's the wchan / straceback for the processes in 'D' state? > And do you know the location of the file that's waiting for I/O? No errors. UFS was on gmirror and 'gmirror status' said 'OK'. Unfortunately I was unable to do meaningful investigation under stress. There was a number of smbd process in a jail which ignore killing and anything else. The first inconsistency found by fsck later was a file with mtime corresponding to a first claim of a problem from one user. Hour later there was more claims and finally samba share stop working. As I understand UFS inconsistency grew over that hour. Because there was multiple smbd processes each corresponding to different user there it is no one file maked problem. > There's a number of things that might cause this, but they typically are > noisy about it. There's a couple of deadlock issues that might cause this, > but getting some more information is needed to understand what might be > done to mitigate or prevent the deadlocks. > P.S. Previous situation was when I do simple (as I think) experiments with >> ggated and have forced to reboot server when 'mount' hung. >> > I assume ggate isn't involved when this happens now, right? That's right. I note this because it seems that stoping ggate server while running ggate client is the simplest way to reproduce the problem. P.S. I was really hope that "umount -f" awake waiting processes for killing. >> -- >> Best regards, >> Anthony Pankov mailto:anthony.pankov@yahoo.com >> >> >> -- Best regards, Anthony
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?908785764.20251127131317>
