Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Jul 2024 11:31:48 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 280216] UFS deadly hangs while removing snapshot
Message-ID:  <bug-280216-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D280216

            Bug ID: 280216
           Summary: UFS deadly hangs while removing snapshot
           Product: Base System
           Version: Unspecified
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: ant_mail@inbox.ru

I have a very sad situation with a production server which force me to brea=
k my
weekends.

Server hangs on some friday nights and have to be bringed to life by phisic=
ally
power off/on. This begun at autumn '23.

It appeared as filesystem hanging: server respond to ping but every I/O
operation hangs.

I'm running 12-STABLE and may be there is a some relation with commits made
during July-October '23.

It was hard to explore because of production server and total number incide=
nts
is about 7-8. So what I've founded.

I'm using 'snapshot' (package freebsd-snapshot) utility to make periodic
snapshot. It contain the following lines of code:

                logger -p daemon.notice \
                    "snapshot: removing $fs_dir/.snap/$fs_tag.$"
                system rm -f $fs_dir/.snap/$fs_tag.$i

Last messages that was logged in system are:

Jun 28 22:10:06 serv root[52374]: snapshot: rotating snapshots
Jun 28 22:10:06 serv root[52375]: snapshot: rm /data/office/.snap/weekly.3
Jun 29 09:47:28 serv syslogd: kernel boot file is /boot/kernel/kernel
Jun 29 09:47:28 serv kernel: ---<<BOOT>>---

There is no evidence that system has any successfull UFS reads or writes af=
ter
'rm' was engaged.

After power off/on fsck found errors on some partitions but the problematic
partition (/data/office) has no error. And there is no problem to remove
snapshot (doing rm /data/office/.snap/weekly.3)

There are other UFS partitions on this server which doing UFS snapshot same=
 way
but it never hangs.

UFS parameters of data/office:

tunefs: POSIX.1e ACLs: (-a)                                enabled
tunefs: NFSv4 ACLs: (-N)                                   disabled
tunefs: MAC multilabel: (-l)                               disabled
tunefs: soft updates: (-n)                                 disabled
tunefs: soft update journaling: (-j)                       disabled
tunefs: gjournal: (-J)                                     enabled
tunefs: trim: (-t)                                         disabled
tunefs: maximum blocks per file in a cylinder group: (-e)  4096
tunefs: average file size: (-f)                            512000
tunefs: average number of files in a directory: (-s)       64
tunefs: minimum percentage of free space: (-m)             12%
tunefs: space to hold for metadata blocks: (-k)            6408
tunefs: optimization preference: (-o)                      time

What was tried:

creating new enlarged partition, making newfs on it, dumping and restoring =
data
to the new partition. After couple of month the server hangs again.=20

I suppose that problem arise when the size of snapshot getting large. This
explain why it hangs on some fridays only: removing oldest snapshot is a
removing largest snapshot and when it size is more than some thresholds it
hangs.

Currently I have those size of snapshot:
/data/office/    ufs    464GB   40.0%     44GB    3.8%  weekly.2=20=20=20=
=20=20=20=20
2024-06-07T22:11
/data/office/    ufs    464GB   40.0%     22GB    1.9%  weekly.1=20=20=20=
=20=20=20=20
2024-06-14T22:10
/data/office/    ufs    464GB   40.0%     18GB    1.5%  weekly.0=20=20=20=
=20=20=20=20
2024-06-21T22:11
/data/office/    ufs    464GB   40.0%      9GB    0.8%  daily.2=20=20=20=20=
=20=20=20=20
2024-07-08T00:03
/data/office/    ufs    464GB   40.0%    741MB    0.1%  daily.1=20=20=20=20=
=20=20=20=20
2024-07-09T00:03
/data/office/    ufs    464GB   40.0%    784MB    0.1%  hourly.1=20=20=20=
=20=20=20=20
2024-07-09T16:01
/data/office/    ufs    464GB   40.0%    594MB    0.0%  daily.0=20=20=20=20=
=20=20=20=20
2024-07-10T00:03
/data/office/    ufs    464GB   40.0%    590MB    0.0%  hourly.0=20=20=20=
=20=20=20=20
2024-07-10T12:01

Any help is greatly appreciated.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-280216-227>