From owner-freebsd-bugs@freebsd.org Tue Feb 11 14:13:06 2020 Return-Path: Delivered-To: freebsd-bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id AE35E23656F for ; Tue, 11 Feb 2020 14:13:06 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.nyi.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 48H4Vp4FLDz43vl for ; Tue, 11 Feb 2020 14:13:06 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.nyi.freebsd.org (Postfix) id 91BB123656E; Tue, 11 Feb 2020 14:13:06 +0000 (UTC) Delivered-To: bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 9182823656C for ; Tue, 11 Feb 2020 14:13:06 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 48H4Vp3GMFz43vh for ; Tue, 11 Feb 2020 14:13:06 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 6B81B9508 for ; Tue, 11 Feb 2020 14:13:06 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 01BED6OU043317 for ; Tue, 11 Feb 2020 14:13:06 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 01BED6n6043315 for bugs@FreeBSD.org; Tue, 11 Feb 2020 14:13:06 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 244048] mksnap_ffs hangs machine for several minutes (12.1 regression over 11.3) Date: Tue, 11 Feb 2020 14:13:06 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 12.1-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: ml@netfence.it X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Feb 2020 14:13:06 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D244048 Bug ID: 244048 Summary: mksnap_ffs hangs machine for several minutes (12.1 regression over 11.3) Product: Base System Version: 12.1-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: ml@netfence.it On several servers I manage, I take backups to an external HD and use mksnap_ffs to create snapshots. I've never had troubles with this up to 11.3. Lately I started doing this on a 12.1 server and noticed mksnap_ffs will ha= ng the box for several minutes (services stuck, no login allowed, already established ssh sessions partially work; shutdown not feasible unless reset button is pressed). N.B. This is an external drive, only mounted when needed and only accessed = to make backups, so if *it* got stuck, it should not affect the whole system. I decided to check this and took the external HD to my desktop (11.3): it worked perfectly. I then upgraded my desktop to 12.1p2 and it started doing as above: _ mksnap_ffs will work for several minutes under high I/O (the HD is 6TB); meanwhile, the system is responsive; _ then mksnap_ffs will drastically reduce its I/O (at least as measured with top), but will keep working for some other minutes: during this phase, I ca= nnot open any new program; ThunderBird gets stuck, already open FireFox windows still works, but I cannot open any new window; audacity keeps playing the current track, but will get stuck when moving on to the next; already open terminal windows might partially work; _ after several minutes mksnap_ffs will exit and everything will get back to normal. This is of course unacceptable on a production server. I built a test machine with 12.1/amd64 with the following kernel options: K= DB, KDB_TRACE, DDB, GDB, INVARIANTS, INVARIANT_SUPPORT, WITNESS, WITNESS_SKIPSP= IN, DEBUG_VFS_LOCKS, LOCK_PROFILING, KTR, ALQ, KTR_ENTRIES=3D4096. Such a kernel paniced immediately after launching mksnap_ffs with LOR #269. I removed WITNESS, WITNESS_SKIPSPIN, issued a "fsck -y" on the disk and tri= ed again. This time I got a different panic: panic: ffs_copyonwrite: bad copy block cpuid =3D 0 time =3D 1581243816 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe001beef= 0e0 vpanic() at vpanic+0x19d/frame 0xfffffe001beef130 panic() at panic+0x43/frame 0xfffffe001beef190 ffs_copyonwrite() at ffs_copyonwrite+0x74c/frame 0xfffffe001beef230 ffs_geom_strategy() at ffs_geom_strategy+0x8c/frame 0xfffffe001beef260 ufs_strategy() at ufs_strategy+0x83/frame 0xfffffe001beef290 VOP_STRATEGY_APV() at VOP_STRATEGY_APV+0xc9/frame 0xfffffe001beef2c0 bufstrategy() at bufstrategy+0x44/frame 0xfffffe001beef2f0 bufwrite() at bufwrite+0x230/frame 0xfffffe001beef330 ffs_snapshot() at ffs_snapshot+0x8e0/frame 0xfffffe001beef630 ffs_mount() at ffs_mount+0xb3a/frame 0xfffffe001beef7d0 vfs_domount() at vfs_domount+0x8b6/frame 0xfffffe001beef9f0 vfs_donmount() at vfs_donmount+0x7e7/frame 0xfffffe001beefa90 sys_nmount() at sys_nmount+0xf2/frame 0xfffffe001beefac0 amd64_syscall() at amd64_syscall+0x281/frame 0xfffffe001beefbf0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe001beefbf0 --- syscall (378, FreeBSD ELF64, sys_nmount), rip =3D 0x8002d88ba, rsp =3D 0x7fffffffd288, rbp =3D 0x7fffffffeae0 --- KDB: enter: panic So, I also removed INVARIANTS and INVARIANT_SUPPORT (and run "fsck -y" twic= e) in order to be able to get snapshots. I haven't been able to collect other data yet. --=20 You are receiving this mail because: You are the assignee for the bug.=