From nobody Wed Jul 10 11:31:48 2024 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WJwg84NTWz5QH0H for ; Wed, 10 Jul 2024 11:31:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R11" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4WJwg81pWLz47fj for ; Wed, 10 Jul 2024 11:31:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1720611108; a=rsa-sha256; cv=none; b=nryFDPLObktWNYX83pnEwswrrcAPxKIiOl1tjwAX65eOtRfUR5wwxvibBWxbxfALoKOAio /g8SgXFK2gbWhEc90SICOfTCb0deeDyNSjESFcwLG2IrlztND9DRrbCPiWvvnuFCuFaDSm MiCsoa30Ff+LWxhpRifDi9NG4AdZObBcENae4PXFrkuMNOGaw8txqCIRM+l22OQfcQJ0Ia jzFgeBrc1LBRn0e2y5so90qKFV7fpJ7xkzw2Zsf6VQ2rP/ar58df9GdF0flWeHpNOyPjnl sK83OdJ0AZiQ00noEdMfzsr+0QPw7BPHVX0dBsT1wHFpVNPgSjznpPjiMO5BuQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1720611108; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=2Skoj5VSH3FB/qj4DlUT8CrjoOtOVK7RsUxmCThLxw0=; b=SajkZe771Hi27iywZOG3A+7UQDYxQdxamofT9nl1k4TOSU7mYUCCnMZwoaBNSrUwKB/j6P tIq6OoG6W00TKXxhiha7jM+pUZBct+CzhytvJLZ7avgFJmLEvkqcF5i88LJu60QtlOgzFv TheSUQlmB3SaBwE09ZE82Wnyj5C0fz2yQ8FkfiL0buYq8OOhqGFUg64F408SFsVd3vH3zV 5YC8C9sn2u6oFjqdRSK5IOTqQhd6hD2G/vmU1z0iOCH0b7Jl/kyQX0543euycixsBjKq1F t27dFBLubuaHrlkS733fe9GOTroea4YHvTGYKVGHp76p7l6gOGgt/engd2EzVw== Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4WJwg81JD2zGHt for ; Wed, 10 Jul 2024 11:31:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 46ABVmtm052030 for ; Wed, 10 Jul 2024 11:31:48 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 46ABVm5Q052029 for bugs@FreeBSD.org; Wed, 10 Jul 2024 11:31:48 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 280216] UFS deadly hangs while removing snapshot Date: Wed, 10 Jul 2024 11:31:48 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: Unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: ant_mail@inbox.ru X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@FreeBSD.org MIME-Version: 1.0 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D280216 Bug ID: 280216 Summary: UFS deadly hangs while removing snapshot Product: Base System Version: Unspecified Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: ant_mail@inbox.ru I have a very sad situation with a production server which force me to brea= k my weekends. Server hangs on some friday nights and have to be bringed to life by phisic= ally power off/on. This begun at autumn '23. It appeared as filesystem hanging: server respond to ping but every I/O operation hangs. I'm running 12-STABLE and may be there is a some relation with commits made during July-October '23. It was hard to explore because of production server and total number incide= nts is about 7-8. So what I've founded. I'm using 'snapshot' (package freebsd-snapshot) utility to make periodic snapshot. It contain the following lines of code: logger -p daemon.notice \ "snapshot: removing $fs_dir/.snap/$fs_tag.$" system rm -f $fs_dir/.snap/$fs_tag.$i Last messages that was logged in system are: Jun 28 22:10:06 serv root[52374]: snapshot: rotating snapshots Jun 28 22:10:06 serv root[52375]: snapshot: rm /data/office/.snap/weekly.3 Jun 29 09:47:28 serv syslogd: kernel boot file is /boot/kernel/kernel Jun 29 09:47:28 serv kernel: ---<>--- There is no evidence that system has any successfull UFS reads or writes af= ter 'rm' was engaged. After power off/on fsck found errors on some partitions but the problematic partition (/data/office) has no error. And there is no problem to remove snapshot (doing rm /data/office/.snap/weekly.3) There are other UFS partitions on this server which doing UFS snapshot same= way but it never hangs. UFS parameters of data/office: tunefs: POSIX.1e ACLs: (-a) enabled tunefs: NFSv4 ACLs: (-N) disabled tunefs: MAC multilabel: (-l) disabled tunefs: soft updates: (-n) disabled tunefs: soft update journaling: (-j) disabled tunefs: gjournal: (-J) enabled tunefs: trim: (-t) disabled tunefs: maximum blocks per file in a cylinder group: (-e) 4096 tunefs: average file size: (-f) 512000 tunefs: average number of files in a directory: (-s) 64 tunefs: minimum percentage of free space: (-m) 12% tunefs: space to hold for metadata blocks: (-k) 6408 tunefs: optimization preference: (-o) time What was tried: creating new enlarged partition, making newfs on it, dumping and restoring = data to the new partition. After couple of month the server hangs again.=20 I suppose that problem arise when the size of snapshot getting large. This explain why it hangs on some fridays only: removing oldest snapshot is a removing largest snapshot and when it size is more than some thresholds it hangs. Currently I have those size of snapshot: /data/office/ ufs 464GB 40.0% 44GB 3.8% weekly.2=20=20=20= =20=20=20=20 2024-06-07T22:11 /data/office/ ufs 464GB 40.0% 22GB 1.9% weekly.1=20=20=20= =20=20=20=20 2024-06-14T22:10 /data/office/ ufs 464GB 40.0% 18GB 1.5% weekly.0=20=20=20= =20=20=20=20 2024-06-21T22:11 /data/office/ ufs 464GB 40.0% 9GB 0.8% daily.2=20=20=20=20= =20=20=20=20 2024-07-08T00:03 /data/office/ ufs 464GB 40.0% 741MB 0.1% daily.1=20=20=20=20= =20=20=20=20 2024-07-09T00:03 /data/office/ ufs 464GB 40.0% 784MB 0.1% hourly.1=20=20=20= =20=20=20=20 2024-07-09T16:01 /data/office/ ufs 464GB 40.0% 594MB 0.0% daily.0=20=20=20=20= =20=20=20=20 2024-07-10T00:03 /data/office/ ufs 464GB 40.0% 590MB 0.0% hourly.0=20=20=20= =20=20=20=20 2024-07-10T12:01 Any help is greatly appreciated. --=20 You are receiving this mail because: You are the assignee for the bug.=