From nobody Sun Nov 28 01:22:46 2021 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 40B0718C5943; Sun, 28 Nov 2021 01:22:54 +0000 (UTC) (envelope-from peterj@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4J1rMK6NLNz4ly4; Sun, 28 Nov 2021 01:22:53 +0000 (UTC) (envelope-from peterj@freebsd.org) Received: from server.rulingia.com (ppp239-208.static.internode.on.net [59.167.239.208]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) (Authenticated sender: peterj) by smtp.freebsd.org (Postfix) with ESMTPSA id 68B3E8494; Sun, 28 Nov 2021 01:22:52 +0000 (UTC) (envelope-from peterj@freebsd.org) Date: Sun, 28 Nov 2021 12:22:46 +1100 From: Peter Jeremy To: Konstantin Belousov Cc: src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Subject: Re: git: b19740f4ce7a - main - swap_pager: lock vnode in swapdev_strategy() Message-ID: References: <202111251935.1APJZA1e094731@gitrepo.freebsd.org> List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="IOV4qFaapzeWeugn" Content-Disposition: inline In-Reply-To: X-PGP-Key: http://www.rulingia.com/keys/peter.pgp ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1638062573; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=okGXBRwZXvQqq1LCPIKdv0nDNGel78OCKKA0MghRhFQ=; b=RrN65LPQnDR1f7Y3Hd1cUN6mVpY6UlOjDmZ5dTaA/D5XSv2tZ2ndtM+Ra4WVtN8bKJNaoZ NcqMR0sR0BqM4XOzHepjXxrGyhKSI8gTgmee+feCrR/6w0xtz+T466Vmkf4svmujQLZiLL DqB6y/a1LKFoxKgEbm0JCJSJjJSy15pmfkc0d11pdFZDYwXy1XnB8zU4sfF2hlKzHujE4f ZfusjFk8bRKFAeAuLIRqmbIdO7xPLSm783j1NgWLuygTWyFvnMi7UJk9WpwlHkFzTFeauq E+wc8VRaOOopTwMVC6So1oZ1aW964aYAb9fzeJ3sIheWWsawJuaoAjZX31YJqA== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1638062574; a=rsa-sha256; cv=none; b=QWruNDFEPxliANv7bsYvVmtnNyMHyEPzjKcFPSYeqTB/lqjA1reH8qQd6wdy206l6tPoTi ka9sBScysnU+myp09dtF0eT2B2uOFT/9nwN8i2yys5qI0QBo0E09AUUfcPOdE1bd9jvZPm 45zDs3wPDji+j1n5mljkzV3QFrHvgqGm1Ktl5Puim4EXszfrlWrgo2aE2sl8BzHAUWonIt IU3rz06NxD7M7/vEvejTEv1657kBP9uev0mEbRnux/n7HsN9NcZPW0M4iIPbDzTsMNmk9O AHuDQI5DYrYsDMXS8zDDvE+1NojwlfCtWIfkCTZjyJgnHe6bUuw06Z0LdTdgsQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N --IOV4qFaapzeWeugn Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2021-Nov-27 01:26:17 +0200, Konstantin Belousov wr= ote: >commit 9c62295373f728459c19138f5aa03d9cb8422554 >Author: Konstantin Belousov >Date: Sat Nov 27 01:22:27 2021 +0200 > > swapoff_one(): only check free pages count manually turning swap off That didn't work but I don't think the underlying bug is related to your recent work on swap_pager - digging back through my logs, I've found another similar panic in August last year. Nov 28 09:40:17 rock64 syslogd: exiting on signal 15 Waiting (max 60 seconds) for system process `vnlru' to stop... done Waiting (max 60 seconds) for system process `syncer' to stop...=20 Syncing disks, vnodes remaining... 0 0 done Waiting (max 60 seconds) for system thread `bufdaemon' to stop... done Waiting (max 60 seconds) for system thread `bufspacedaemon-0' to stop... do= ne All buffers synced. No strategy for buffer at 0xffff0000bf8dc000 vnode 0xffffa00009024a80: type VBAD usecount 2, writecount 0, refcount 33263 seqc users 1 hold count flags () flags (VIRF_DOOMED|VV_VMSIZEVNLOCK) lock type nfs: SHARED (count 1) swap_pager: I/O error - pagein failed; blkno 241400,size 4096, error 45 panic: VOP_STRATEGY failed bp=3D0xffff0000bf8dc000 vp=3D0 cpuid =3D 0 time =3D 1638052821 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 vpanic() at vpanic+0x178 panic() at panic+0x44 bufstrategy() at bufstrategy+0x80 swapdev_strategy() at swapdev_strategy+0xcc swap_pager_getpages_locked() at swap_pager_getpages_locked+0x460 swapoff_one() at swapoff_one+0x3e4 swapoff_all() at swapoff_all+0x9c bufshutdown() at bufshutdown+0x2ac kern_reboot() at kern_reboot+0x240 sys_reboot() at sys_reboot+0x358 do_el0_sync() at do_el0_sync+0x4a4 handle_el0_sync() at handle_el0_sync+0x9c --- exception, esr 0x56000000 KDB: enter: panic [ thread pid 1 tid 100002 ] Stopped at kdb_enter+0x48: undefined f900c11f db>=20 This is the same traceback as my previous mail. Looking at the code path, the test whether there's enough RAM to swap in all the data passes in both cases: If swapoff_one() returned ENOMEM then swapoff_all() would report a "Cannot remove swap device" error and keep going (not bother to actually remove the swap device) - and that's not happening. I think the important message is "No strategy for buffer at 0x..." which comes from vop_nostrategy() and causes bufstrategy() to panic: swapdev_strategy() =3D> bstrategy() =3D> BO_STRATEGY() =3D> bufstrategy() =3D> VOP_STRATEGY() =3D> VOP_STRATEGY_APV() =3D> vop_nostrategy() =3D> bufdone() =3D> swp_pager_async_iodone() Presumably, stopping the network means there's no longer any way for swap operations to complete so the swap device has become associated with default_vnodeops, (though I haven't dug into the actual code path that does that). Moving up a level, does it really matter if swapoff_one() is skipped? If it actually returned an error (eg if the free memory test failed), then that's what would happen. By this point in the shutdown, there's no userland left (which makes me wonder why there's anything left in swap in any case) and only the final cleanups remain before the kernel shuts down. What's really needed is a way to detect that the relevant swap I/O provider has gone away and return to swapoff_all() without panicing. --=20 Peter Jeremy --IOV4qFaapzeWeugn Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQKTBAEBCgB9FiEE7rKYbDBnHnTmXCJ+FqWXoOSiCzQFAmGi2eFfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEVF QjI5ODZDMzA2NzFFNzRFNjVDMjI3RTE2QTU5N0EwRTRBMjBCMzQACgkQFqWXoOSi CzQXuxAAmZ8WeG9bcvQLUuyrx37KNUcYBV88wZf/MiWFr9v9r+1Bww493ICescF1 RLx8fydOTKTBuNOvZmIDPm15ZRuDKT2z8n0sSBwxQIO75SjVkuMSkJqxfR0BPVB6 cSr4iwpxQfHjoGReKHufkciTelvSTHLEYnHa+rpIMn7PgQ72Nr9aGbapBqsWvYNl ZH4NGnj4swsfw/LL7XHB9uYaISK1ZdDHxeaSpshXPjDkVek/SEaIfxzHX0NdYJDt bYOKcWWPKiGWx03loDi5Z4+I5Popb1ACC+Jwv3L4RIA3b4IlxWiBtWfpvD3xbZWm rpGQjky2SwUo6K+1MvLFIqsHeczciTo2CYsq3fWVXjue8b/Z0d1ooiqxfcWeY/r+ ExRIbP6S+YS/4UBMpekI1Lrf6aD849c/B0hCPkEExnhUeJAK23OCaJL706Zf98gJ 2zHLcPBsOmVpVn+bpTM2mfI7qvtbYwHZncWGfsUErH67RMRU6jic0B+nQ2q4ysWE 7Xrg9c4jXNcmoQ/71/nRCB1fGFp8/Of8IrC27h0qAn9IDfkwk9BkgVlAHrtg/BNG CdQexx+YzrH/CSzF1HQu5oMTUV1C56RXFTawbF9EJOvdR8elO9JFI5ecEm8mG8pk V6BSB/nA0xI4qyB/h02kAKJLr41y2bqorARBWiBMIuHfX2lRkI8= =vXIi -----END PGP SIGNATURE----- --IOV4qFaapzeWeugn--