From owner-freebsd-current@freebsd.org Tue Apr 13 21:18:43 2021 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 54FD95EC376 for ; Tue, 13 Apr 2021 21:18:43 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mailman.nyi.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 4FKdkq0j56z3NWD for ; Tue, 13 Apr 2021 21:18:43 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mailman.nyi.freebsd.org (Postfix) id 15D695ECB07; Tue, 13 Apr 2021 21:18:43 +0000 (UTC) Delivered-To: current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 1575B5EC375; Tue, 13 Apr 2021 21:18:43 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qv1-xf35.google.com (mail-qv1-xf35.google.com [IPv6:2607:f8b0:4864:20::f35]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4FKdkp71jtz3NQH; Tue, 13 Apr 2021 21:18:42 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qv1-xf35.google.com with SMTP id o11so8854300qvh.11; Tue, 13 Apr 2021 14:18:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=IlycLKfUROY5uIO685vpPL2pZj4KL9A/2VL+5hiiISM=; b=ma2cAwdMj/LKPrHsSUGPe20NbbG6p1SMrTFkTdkf6z+8Am6PRD9uLFyUKe2ZuAB3u0 m9Hl24QyG0UYWxqFMp42pqyFHiCBzhfuXNFOs+72mDgwU14dG/DSv0ObmwlpOBX9mL5O sYmwQW7X6YPiJcDGl+kSJT9q3jm0lj2pDAjDlyH6moJc5PxzLyknJv1+dtAs4nmFXVfa Yrk2HC7shUNC7SqlialrFC3v63lpxXsPpZxbzcyA3qNpE1YlKHGcjXnOZkIiUEWkXwp9 cWaRDwTGEQ7tMZBePR8tte0MxdlB37KKNJD/yX0i13zzWrqfO9kj0lLJha3SmsCTZR4M u81g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=IlycLKfUROY5uIO685vpPL2pZj4KL9A/2VL+5hiiISM=; b=gjlz2pKgVBuzNq+/0t9ZphRJC0i6sPP0UKF5aCom8YKT8+hbTzokd9cAfJ88E3Krnh UlCzRvNY9VLLiAVRdDGU9P1J+hPzYTQIfJuNgZKiAZZ3WViBdFdKdgQT7MoAgm2dcwtL Ii7njL+PelGlBxpn2mfePXBBzUpQ34rL9zS8uMg8c6jExjCXRM9A+KOrHTVB9Rlpg9s/ WTlqbXIzSH0wxnnOTHhpkHydPovKIRBhzm/JYLOcQJrpv7EQ9FX9syzAn9RQATkIrfAG llbV0+mB8MGWH152SE5HMvuorcD5oD3s2UYMQzV8Kc5GnRhY+n/Cz53rBMCEPus8ZlIg gPxg== X-Gm-Message-State: AOAM531bUvvoIaqnrntoJmyJl92+QKRnsRhezv5eazb/naKsFc8sTMSe azdjQbd+Fv1M1lX0WXw0LTZwiAWScDtaVg== X-Google-Smtp-Source: ABdhPJxF/JfFoHSnVVWvsWhL3xD4OQ2/67s7wdz6bu/oqKIYx20+xIeDYfUrOvhIddnob3bjbrztWQ== X-Received: by 2002:a0c:fbac:: with SMTP id m12mr9211373qvp.52.1618348721628; Tue, 13 Apr 2021 14:18:41 -0700 (PDT) Received: from nuc ([142.126.164.150]) by smtp.gmail.com with ESMTPSA id v2sm10890813qkv.39.2021.04.13.14.18.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Apr 2021 14:18:41 -0700 (PDT) Sender: Mark Johnston Date: Tue, 13 Apr 2021 17:18:42 -0400 From: Mark Johnston To: Andriy Gapon Cc: freebsd-stable List , FreeBSD Current Subject: Re: stable/13, vm page counts do not add up Message-ID: References: <0606571f-fec0-c7ad-98e8-a0b8554918e2@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 4FKdkp71jtz3NQH X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Apr 2021 21:18:43 -0000 On Tue, Apr 13, 2021 at 05:01:49PM +0300, Andriy Gapon wrote: > On 07/04/2021 23:56, Mark Johnston wrote: > > I don't know what might be causing it then. It could be a page leak. > > The kernel allocates wired pages without adjusting the v_wire_count > > counter in some cases, but the ones I know about happen at boot and > > should not account for such a large disparity. I do not see it on a few > > systems that I have access to. > > Mark or anyone, > > do you have a suggestion on how to approach hunting for the potential page leak? > It's been a long while since I worked with that code and it changed a lot. > > Here is some additional info. > I had approximately 2 million unaccounted pages. > I rebooted the system and that number became 20 thousand which is more > reasonable and could be explained by those boot-time allocations that you mentioned. > After 30 hours of uptime the number became 60 thousand. > > I monitored the number and so far I could not correlate it with any activity. > > P.S. > I have not been running any virtual machines. > I do use nvidia graphics driver. My guess is that something is allocating pages without VM_ALLOC_WIRE and either they're managed and something is failing to place them in page queues, or they're unmanaged and should likely be counted as wired. It is also possible that something is allocating wired, unmanaged pages and unwiring them without freeing them. For managed pages, vm_page_unwire() ensures they get placed in a queue. vm_page_unwire_noq() does not, but it is typically only used with unmanaged pages. The nvidia drivers do not appear to call any vm_page_* functions, at least based on the kld symbol tables. So you might try using DTrace to collect stacks for these functions, leaving it running for a while and comparing stack counts with the number of pages leaked while the script is running. Something like: fbt::vm_page_alloc_domain_after:entry /(args[3] & 0x20) == 0/ { @alloc[stack()] = count(); } fbt::vm_page_alloc_contig_domain:entry /(args[3] & 0x20) == 0/ { @alloc[stack()] = count(); } fbt::vm_page_unwire_noq:entry { @unwire[stack()] = count(); } fbt::vm_page_unwire:entry /args[0]->oflags & 0x4/ { @unwire[stack()] = count(); } It might be that the count of leaked pages does not relate directly to the counts collected by the script, e.g., because there is some race that results in a leak. But we can try to rule out some easier cases first. I tried to look for possible causes of the KTLS page leak mentioned elsewhere in this thread but can't see any obvious problems. Does your affected system use sendfile() at all? I also wonder if you see much mbuf usage on the system.