Date: Wed, 10 Jul 2024 11:31:48 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 280216] UFS deadly hangs while removing snapshot Message-ID: <bug-280216-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D280216 Bug ID: 280216 Summary: UFS deadly hangs while removing snapshot Product: Base System Version: Unspecified Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: ant_mail@inbox.ru I have a very sad situation with a production server which force me to brea= k my weekends. Server hangs on some friday nights and have to be bringed to life by phisic= ally power off/on. This begun at autumn '23. It appeared as filesystem hanging: server respond to ping but every I/O operation hangs. I'm running 12-STABLE and may be there is a some relation with commits made during July-October '23. It was hard to explore because of production server and total number incide= nts is about 7-8. So what I've founded. I'm using 'snapshot' (package freebsd-snapshot) utility to make periodic snapshot. It contain the following lines of code: logger -p daemon.notice \ "snapshot: removing $fs_dir/.snap/$fs_tag.$" system rm -f $fs_dir/.snap/$fs_tag.$i Last messages that was logged in system are: Jun 28 22:10:06 serv root[52374]: snapshot: rotating snapshots Jun 28 22:10:06 serv root[52375]: snapshot: rm /data/office/.snap/weekly.3 Jun 29 09:47:28 serv syslogd: kernel boot file is /boot/kernel/kernel Jun 29 09:47:28 serv kernel: ---<<BOOT>>--- There is no evidence that system has any successfull UFS reads or writes af= ter 'rm' was engaged. After power off/on fsck found errors on some partitions but the problematic partition (/data/office) has no error. And there is no problem to remove snapshot (doing rm /data/office/.snap/weekly.3) There are other UFS partitions on this server which doing UFS snapshot same= way but it never hangs. UFS parameters of data/office: tunefs: POSIX.1e ACLs: (-a) enabled tunefs: NFSv4 ACLs: (-N) disabled tunefs: MAC multilabel: (-l) disabled tunefs: soft updates: (-n) disabled tunefs: soft update journaling: (-j) disabled tunefs: gjournal: (-J) enabled tunefs: trim: (-t) disabled tunefs: maximum blocks per file in a cylinder group: (-e) 4096 tunefs: average file size: (-f) 512000 tunefs: average number of files in a directory: (-s) 64 tunefs: minimum percentage of free space: (-m) 12% tunefs: space to hold for metadata blocks: (-k) 6408 tunefs: optimization preference: (-o) time What was tried: creating new enlarged partition, making newfs on it, dumping and restoring = data to the new partition. After couple of month the server hangs again.=20 I suppose that problem arise when the size of snapshot getting large. This explain why it hangs on some fridays only: removing oldest snapshot is a removing largest snapshot and when it size is more than some thresholds it hangs. Currently I have those size of snapshot: /data/office/ ufs 464GB 40.0% 44GB 3.8% weekly.2=20=20=20= =20=20=20=20 2024-06-07T22:11 /data/office/ ufs 464GB 40.0% 22GB 1.9% weekly.1=20=20=20= =20=20=20=20 2024-06-14T22:10 /data/office/ ufs 464GB 40.0% 18GB 1.5% weekly.0=20=20=20= =20=20=20=20 2024-06-21T22:11 /data/office/ ufs 464GB 40.0% 9GB 0.8% daily.2=20=20=20=20= =20=20=20=20 2024-07-08T00:03 /data/office/ ufs 464GB 40.0% 741MB 0.1% daily.1=20=20=20=20= =20=20=20=20 2024-07-09T00:03 /data/office/ ufs 464GB 40.0% 784MB 0.1% hourly.1=20=20=20= =20=20=20=20 2024-07-09T16:01 /data/office/ ufs 464GB 40.0% 594MB 0.0% daily.0=20=20=20=20= =20=20=20=20 2024-07-10T00:03 /data/office/ ufs 464GB 40.0% 590MB 0.0% hourly.0=20=20=20= =20=20=20=20 2024-07-10T12:01 Any help is greatly appreciated. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-280216-227>