From nobody Wed Aug 20 16:00:21 2025 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4c6WPl4bBSz64cvn for ; Wed, 20 Aug 2025 16:00:27 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qt1-x82b.google.com (mail-qt1-x82b.google.com [IPv6:2607:f8b0:4864:20::82b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4c6WPl2XMyz3wKm; Wed, 20 Aug 2025 16:00:27 +0000 (UTC) (envelope-from markjdb@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-qt1-x82b.google.com with SMTP id d75a77b69052e-4b133b24ed7so410261cf.3; Wed, 20 Aug 2025 09:00:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755705625; x=1756310425; darn=freebsd.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:sender :from:to:cc:subject:date:message-id:reply-to; bh=Tyf6kboJwr8LwAMlIpgCPsWMvW2p6COa0lrZGooBo2E=; b=GYNqI4Ri3Cg5wjtLOknoxp4yUJPKgSKJrmK2a7PuMksAI80OOnztkhBXlQpelRklJp e12x6UjBr5OZ0AL7987s09ISg8F680tSkUu99ufvD9mzo83iymoE4Y/Hmg+xvumU+jBk MMsuQLz9Ftz8a7s/5f4qgWOSb1pChCvK6V2Rw3hL/sLAppJJpI7y8HvS10qIDZLCZCdK ZCBHS+L0swGGoNMk1oQL3WC8sY5rqkhPEGGf14O7Z6Nn0RLHUKP3AD7uQ8gc/EtWP9Zc CO380YO9F0dZeHRcdtZc6UPD7ULjfVCJQJ+5aXkRvmGJlYjy4BuRJdGiAn7eYvHzRnrj Yj3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755705625; x=1756310425; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:sender :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Tyf6kboJwr8LwAMlIpgCPsWMvW2p6COa0lrZGooBo2E=; b=smcBh/90NsyJOsZv1x5HtD7tagJyMwmcEyg+n0QaS8yvjYU7t6lOJvlq5vUE7s7Bza TokriDrOIfb06mQf8A3DIRAeYaEnfoT9e14sL76XjXpuz0F9sdcOEeEzmUMpjcJrX1nc Z4rNkZ4oVyfW99dYx9IX0+AJjBGon7TSTptegG8XdvCOLvHP+xN72kbMu/NFbeDwM5yo k1zYtANylSSXVMXhgzcRD9NoGqNt0jGBaEeHJ1XZEeQ5FDnU75K/K4VK1eghZfSbCUao bD6/vjOamOpU7mKkUznQ+WYI6b/9aSi8DF9kLFi3mbS8n2VRvifzcf/eBJM+mviQu+M5 G/Xw== X-Gm-Message-State: AOJu0Yx/d9g5GBYJPVwMw7/Km7wuodorjib2exk+UAAz+sJH6XJok7yg /EILJ4ZPPKwZFc8V84gcc9Oy+C1qEAMNoLkQ7rNmLls/MNcUfvcw1A4Q5cBnXgCp X-Gm-Gg: ASbGncu8D3/x1PCx4W2/zBBhnzn6OGKWrnwyuA6qmIOzyFnv56nMUFOo2Xsg9Mimk9F PSEca9y7pZTJUIX3m4cdVNBZGHYXnRQ6q3YPLMqCA6eqhsL2eDDAsqEH/7jD74sZbJXUZCvWy4l T2ASk6e9FmwX2uuNziZkoZfHBwrJu9rrqA3U1E6Tjyz1khaGwAoe4q5NLwrGsfZrUACqGBfv/Be RZTDuYLk6hyYnechpwbZkPxDMFiyZy0WH4ijoDcWXNxyYDQZLOEmc5ZoclltFRxOtNpm9UUIU9t kQxXWDVIv+S/00yq1BvVWfHAL0CBaSY6EGIURODTIJZPtHhWQYXlWTzm+XTC79s33QdRK9u2aYh mJRvXrCrEESJ1zzcitifajFS6pmitxWCyODbb X-Google-Smtp-Source: AGHT+IGaNyO1LEOjY3c1deJj40zixtqrdlyPuM9VNV2XQObWqKYoeW5QFT2a2fhGTjjXoavSE+9xDw== X-Received: by 2002:ac8:57d0:0:b0:4b0:ed9c:3a5c with SMTP id d75a77b69052e-4b291c52990mr36684771cf.60.1755705624888; Wed, 20 Aug 2025 09:00:24 -0700 (PDT) Received: from nuc (192-0-220-237.cpe.teksavvy.com. [192.0.220.237]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4b11dc5cd0fsm86217281cf.22.2025.08.20.09.00.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Aug 2025 09:00:24 -0700 (PDT) Date: Wed, 20 Aug 2025 12:00:21 -0400 From: Mark Johnston To: Kristof Provost Cc: FreeBSD Net Subject: Re: rtentry_free panic Message-ID: References: <163785B5-236A-4C19-8475-66E9E8912DFA@FreeBSD.org> List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <163785B5-236A-4C19-8475-66E9E8912DFA@FreeBSD.org> X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Rspamd-Queue-Id: 4c6WPl2XMyz3wKm On Wed, Aug 20, 2025 at 02:30:20PM +0200, Kristof Provost wrote: > Hi, > > Running the pf tests I very occasional (say 1 out of 10 runs) see panics > freeing an rtentry. > This mostly manifests during bricoler test runs, and usually with the KMSAN > kernel config. I assume that’s because there’s a timing factor involved > rather than it being an issue that’s directly detected by KMSAN/KASAN. I've seen this before, but not in the past few months. I'm running with the default parallelism of 4 most of the time. > Here’s the panic: > > Freed UMA keg (rtentry) was not empty (2 items). Lost 1 pages of memory. > > > Fatal trap 12: page fault while in kernel mode > cpuid = 3; apic id = 03 > fault virtual address = 0x2 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff81896d53 > stack pointer = 0x28:0xfffffe0098468b20 > frame pointer = 0x28:0xfffffe0098468bb0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 0 (softirq_3) > rdi: 0000000000000000 rsi: fffffe00e08b67e0 rdx: 0000000000000000 > rcx: fffffe000c5f08d8 r8: 0000000000000000 r9: 0000000000000001 > rax: fffffe0000000000 rbx: 0000000000000000 rbp: fffffe0098468bb0 > r10: 0000000000000001 r11: 0000000000000005 r12: 0000000000000000 > r13: fffffe0155c46920 r14: 0000000000000000 r15: fffffe00e08b67e0 > trap number = 12 > panic: page fault > cpuid = 3 > time = 1754664399 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0xa5/frame > 0xfffffe0098468390 > kdb_backtrace() at kdb_backtrace+0xc6/frame 0xfffffe00984684f0 > vpanic() at vpanic+0x214/frame 0xfffffe0098468690 > panic() at panic+0xb5/frame 0xfffffe0098468750 > trap_pfault() at trap_pfault+0x7e4/frame 0xfffffe0098468870 > trap() at trap+0x765/frame 0xfffffe0098468a50 > calltrap() at calltrap+0x8/frame 0xfffffe0098468a50 > --- trap 0xc, rip = 0xffffffff81896d53, rsp = 0xfffffe0098468b20, rbp = > 0xfffffe0098468bb0 --- > uma_zfree_arg() at uma_zfree_arg+0x23/frame 0xfffffe0098468bb0 > destroy_rtentry_epoch() at destroy_rtentry_epoch+0x17a/frame > 0xfffffe0098468c70 > epoch_call_task() at epoch_call_task+0x26d/frame 0xfffffe0098468d50 > gtaskqueue_run_locked() at gtaskqueue_run_locked+0x366/frame > 0xfffffe0098468eb0 > gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x138/frame > 0xfffffe0098468ef0 > fork_exit() at fork_exit+0xa3/frame 0xfffffe0098468f30 > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0098468f30 > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > KDB: enter: panic > [ thread pid 0 tid 100010 ] > Stopped at kdb_enter+0x34: movq $0,0x20d7651(%rip) > > We’re panicing because the V_rtzone zone has been cleaned up (in > vnet_rtzone_destroy()). I explicitly NULL out V_rtzone too, to make this > more obvious. > Note that we failed to completely free all rtentries (`Freed UMA keg > (rtentry) was not empty (2 items). Lost 1 pages of memory.`). Presumably at > least on of those two gets freed later, and that’s the panic we see. > > rt_free() queues the actual delete as an epoch callback > (`NET_EPOCH_CALL(destroy_rtentry_epoch, &rt->rt_epoch_ctx);`), and that’s > what we see here: the zone is removed before we’re done freeing all of the > rtentries. > > vnet_rtzone_destroy() is called from rtables_destroy(), but that explicitly > calls NET_EPOCH_DRAIN_CALLBACKS() first, so I’d expect all of the pending > cleanups to have been done at that point. The comment block above does > suggest that there may still be nexthop entries pending deletion even after > the we drain the callbacks. I think I can see how that’d happen for > nexthops, but I do not see how it can happen for rtentries. Is it possible that if_detach_internal()->rt_flushifroutes() is running after the rtentry zone is being destroyed? That is, maybe we're destroying interfaces too late in the jail teardown process? > Has anyone else seen this panic or have any ideas what I’m missing? > > Thanks, > Kristof