From nobody Tue Jun 7 17:17:06 2022 X-Original-To: freebsd-dtrace@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 04D0383B309 for ; Tue, 7 Jun 2022 17:17:11 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qk1-x735.google.com (mail-qk1-x735.google.com [IPv6:2607:f8b0:4864:20::735]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4LHcVG1pfrz4s9V for ; Tue, 7 Jun 2022 17:17:10 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-qk1-x735.google.com with SMTP id x75so10298536qkb.12 for ; Tue, 07 Jun 2022 10:17:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=MUxf6xXxKjZNwbwtfQhseDSCR3wUmTalUMwRTH96bm0=; b=HPhY0UJHOaJYaLC/Iiep29l6rTqhx/wZ4mwSNIh13g5h9dkCejc1kK6qs6dYEtuuX+ vmdnxpIsNu3uIS7VGjrSJjfbi5zuyteGTjsewPEsNMJo+/AGWS3UGX78hUL0R/aS/bQn Y3LLtD0YMx+tbxJ/qaJTBptR7IzL2YFv8dprE/wJ7xQA6u5cUdL+xOsUaeSXBiogKFoK cazO2BiF8i0gGMOHLNuEHr34kjgv7YNttnpbxJ0TIrlGZHYPyxp+NR3kH517f/Iyf4LH 6x4cg8io7WJAf7c6ZCjakL5tHTOWZr8I5y+b1LFpNaft5OHFjsaisAfR4PxcgDRJYQxL gvmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=MUxf6xXxKjZNwbwtfQhseDSCR3wUmTalUMwRTH96bm0=; b=03zDUMePbFy3t6at5NtuMuM/VPy6IYrE/KUDRXTPX+9jOh9xdKb4jWKJ//DyXLRpx2 h2/Nr+nWaPRITHlkq8/ebk7d7hXtw/4GnujsyvdBfuk1/M7cJdFL0PS22Iz/IufYXlf7 7ffmjJHK+YqOXu7Sn57lEm8S4kAredlMyZ5MS+vWVHM5qpnRcNYhtpwgY08ToWlEEwyk s/vci98+Kxfyct8ZCqYM1MgifZ4wQRqIHH/N1CG8IfQwfdvWRHQ7KiqqTouTAdiMWBI7 ta8T4niVfskv6E7XSOZ182UTOe1rbeREvATDdL15B3lOKblFwS88YBfy2pJrpIcX/vex KFCg== X-Gm-Message-State: AOAM533BNhy24h/Vao+XYLjR97085RgiN8cOpiOnD0ctiM7/+LTPPGNG hGlrw1Mdt1LgMVF9+mzQ4o/fHGgnp00= X-Google-Smtp-Source: ABdhPJxqtHyOcDHaryz746FZHj2dohzKUr/GCPbFUDFSaMwHeldLf6sYoLFWJPFP/vdAZFhStRWjtw== X-Received: by 2002:a05:620a:40c2:b0:6a6:d274:dd96 with SMTP id g2-20020a05620a40c200b006a6d274dd96mr4798164qko.54.1654622229633; Tue, 07 Jun 2022 10:17:09 -0700 (PDT) Received: from nuc (198-84-189-58.cpe.teksavvy.com. [198.84.189.58]) by smtp.gmail.com with ESMTPSA id v1-20020a05620a0f0100b006a6a6f148e6sm10047536qkl.17.2022.06.07.10.17.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Jun 2022 10:17:08 -0700 (PDT) Date: Tue, 7 Jun 2022 13:17:06 -0400 From: Mark Johnston To: Peter Johnson Cc: freebsd-dtrace@freebsd.org Subject: Re: vminfo provider for FreeBSD Message-ID: References: List-Id: A discussion list for developers working on DTrace in FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-dtrace List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-dtrace@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 4LHcVG1pfrz4s9V X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=HPhY0UJH; dmarc=none; spf=pass (mx1.freebsd.org: domain of markjdb@gmail.com designates 2607:f8b0:4864:20::735 as permitted sender) smtp.mailfrom=markjdb@gmail.com X-Spamd-Result: default: False [-2.68 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.98)[-0.976]; FORGED_SENDER(0.30)[markj@freebsd.org,markjdb@gmail.com]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[markj@freebsd.org,markjdb@gmail.com]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-dtrace@freebsd.org]; DMARC_NA(0.00)[freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::735:from]; MLMMJ_DEST(0.00)[freebsd-dtrace]; MID_RHS_NOT_FQDN(0.50)[]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N On Mon, Jun 06, 2022 at 12:21:42PM -0400, Peter Johnson wrote: > Thanks for the detailed reply. > > I'm coming at this from a bit of a weird angle: I use FreeBSD as the basis for > my operating systems course, part of which asks students to write programs > that exercise operating system functionalities like scheduling and virtual > memory in specific ways and then use DTrace to confirm that their programs are > doing what they're supposed to do vis-a-vis those functionalities (eg, "write > a program that induces swapping and a D script that proves it works"). Cool! > Currently, the best way I've figured out for them to do this in the context of > virtual memory is to use vmstat(1), but it would be nice to have those > statistics at a process granularity. That makes sense. 30 minutes ago I was wishing I could check whether a given process took the "optimized COW fault" path in vm_fault.c (see the v_cow_optim counter). vmstat -s doesn't give a reliable answer, merely running that command itself causes the counter to increment. > The Illumos documentation on the vminfo provider itself suggests as a use case > getting more fine-grained information than the Illumos implementation of > vmstat makes available [1]---"more fine-grained" meaning both "per process > statistics" and, eg, "more information about individual faults". > > Given the reservations you note/confirm (potential onerous overhead of SDT > probes in VM code, lack of clear mapping between Illumos probes and FreeBSD > codebase, scalability of arg1 especially wrt SMP systems, complexity of > per-domain NUMA stats) I propose the following as a first step: > > Implement probes that fire whenever values in the "page" category of > vmstat(1) output change; that is: a page fault occurs (flt), a page is > reactivated (re), a page is paged in (pi), a page is paged out (po), a > page is freed (fr), a page is scanned by the page daemon (sr). > > I am unfamiliar with the codebase, but it seems likely to me that all of those > use counter(9), and so we would be able to correctly populate arg1. Most of them use counter(9), yes. "sr" is a bit more complicated. Basically, there is a counter in each page queue (PQ_{ACTIVE,INACTIVE,LAUNDRY} times the number of NUMA domains) which is updated once per "batch" of scanned pages. The global "sr" value is computed on demand by summing the per-pagequeue counters. > This would be a very modest amount of work (at least relative to transferring > the entire vmfino provider as it exists in Illumos) and give a starting point > for measuring SDT overhead. Yep, that sounds perfectly reasonable. > Once those proposed probes are in place, we can decide whether to implement > other probes from the Illumos set, add new probes that we determine useful, > optimize the SDT implementation, address SMP or NUMA considerations, etc. > > Thoughts? This makes sense to me. The other thing we might consider is whether it's worth including additional arguments (e.g., the physical vm_page_t) in some cases. That could always be added later though. > pete > > [1] https://illumos.org/books/dtrace/chp-vminfo.html#chp-vminfo-3 > > On Fri, Jun 03, 2022 at 03:47:31PM -0400, Mark Johnston wrote: > > On Thu, Jun 02, 2022 at 01:08:27PM -0400, Peter Johnson wrote: > > > Hi there -- > > > > > > I would find the probes in Illumos' vminfo provider [1] really handy to have > > > in FreeBSD and I'm happy to do the work to make it happen. The only > > > FreeBSD-related mention of the vminfo provider I can find is an old mailing > > > list post [2] that I interpret to mean that the existing fbt probes aren't a > > > meaningful alternative (not to mention that using fbt probes effectively > > > requires more understanding of the source code than is perhaps desirable given > > > DTrace's intended purpose/audience). > > > > > > My first question is: would such an addition be welcome? I can make a more > > > detailed case for its inclusion if that would be helpful/persuasive. > > > > I think it'd be welcome. My major reservation is that SDT probes have > > non-zero overhead even when disabled, especially on FreeBSD as currently > > implemented. The vminfo provider effectively adds a probe to various VM > > counter increments, which can occur very very frequently in some > > workloads, so I think we'd also want to > > 1) try to measure that overhead, perhaps using some micro-benchmarks, > > 2) possibly use the results to help motivate some long-overdue > > improvements to the SDT implementation. > > I'd be interested in helping with both of these. > > > > It'd be helpful to see an example or two demonstrating how the vminfo > > provider would be useful in diagnosing a particular problem. > > > > > If it is welcome, my plan would be to get very well-acquainted with FreeBSD's > > > VM subsystem, identify where each of the vminfo probes described in the > > > Illumos documentation should go, and then develop a patch to add those probes, > > > seeking feedback from both freebsd-dtrace folks and whichever group has > > > dominion over the VM stuff. > > > > > > My second question is: does this sound like a reasonable plan? It is, > > > admittedly, almost uselessly high level, but I expect I will need more than a > > > little familiarity with the codebase before I can get more specific. > > > > Looking through the provider documentation, I suspect it'll be difficult > > to implement some of the probes on FreeBSD, as you note below. For > > instance, I'm not sure that execfree can be implemented at all; FreeBSD > > doesn't have any (cheap) way to determine whether a given physical page > > belongs to an executable image. At least, I can't think of one. > > > > A second issue is in the description of "arg1" for vminfo probes. In > > FreeBSD, frequently-updated counters are implemented using counter(9), > > which provides per-CPU counters. To get the global value of such a > > counter, one must iterate over all per-CPU elements, summing them up. > > That's quite expensive and wasteful if you're doing it every time a > > vminfo probe fires. I'm not sure how best to deal with that problem. > > > > Yet another consideration is how one might expose per-NUMA domain > > counters. We could simply ignore that consideration and just provide > > global values, but per-domain info can be very useful. > > > > FreeBSD's VM system has a number of counters, exposed in various > > subtrees of the "vm" sysctl node. One might start by looking at the > > existing counters to see how closely they match vminfo probes, or simply > > define FreeBSD's vminfo provider in terms of the existing counters, > > possibly adding new ones. > > > > > Given the mailing list post I mentioned above, it seems possible that some of > > > the vminfo probes described in the Illumos documentation don't make sense in > > > the context of FreeBSD (eg, if FreeBSD doesn't have a distinct paging daemon, > > > then the pgrrun, rev, and scan probes aren't suited for transfer). On the > > > other hand, there may be aspects on the FreeBSD side which would be beneficial > > > to monitor, but for which Illumos does not define probes. > > > > I agree. FreeBSD does have a paging daemon, implemented in vm_pageout.c. > > > > > Therefore, my third question is: how important is it for a vminfo provider > > > implementation in FreeBSD to hew closely to the Illumos implementation? Would > > > it be acceptable to not transfer some probes that don't make sense and add > > > some new probes that do? Documentation is obviously vital for any deviations, > > > and I will make darn sure to make it a central part of the work. > > > > Having ported the ip/tcp/udp providers based on illumos documentation, > > and having gone through some effort to make them compatible, I'm fairly > > skeptical that it's important to maintain compatibility. Most > > non-trivial D scripts that I've seen and written which use these > > providers will also make use of FBT probes here and there, so some > > porting work is needed regardless. Based on that, and on the > > observations above, compatibility shouldn't be a priority IMHO. > > > > > Any and all feedback is most appreciated. > > > > > > Thanks. > > > > > > pete > > > > > > > > > [1] https://illumos.org/books/dtrace/chp-vminfo.html > > > [2] https://lists.freebsd.org/pipermail/freebsd-dtrace/2014-April/000209.html > > > > > > > >