From nobody Thu Sep 14 13:46:13 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Rmdrm12xvz4tXgc for ; Thu, 14 Sep 2023 13:46:16 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-oo1-xc2d.google.com (mail-oo1-xc2d.google.com [IPv6:2607:f8b0:4864:20::c2d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Rmdrl477bz4Fct; Thu, 14 Sep 2023 13:46:15 +0000 (UTC) (envelope-from mjguzik@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20221208 header.b="i6Kvgn/T"; spf=pass (mx1.freebsd.org: domain of mjguzik@gmail.com designates 2607:f8b0:4864:20::c2d as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-oo1-xc2d.google.com with SMTP id 006d021491bc7-573429f5874so530579eaf.0; Thu, 14 Sep 2023 06:46:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1694699174; x=1695303974; darn=freebsd.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=DQbnGYGC1tWpJ5gmQaVkx06bk3fBiDHM6OlO+9oCV9U=; b=i6Kvgn/Tsng+gvBYSE4HMeXovNDJypffGAYXIP7oGit/D/3q1biupD8yDfvBYRfkZi OYg6Ah4sCPpDDpXtIBdZJDTr8LCo3b8WU45ZrkaUpG/4IVbKuAKvpOeGXLYE9qauoJ2A SP1+PvniFlsZO41flUn2UpY8hOEMSYEqitUlHZOqHBEYIGRMkogcv1geSCtFuc5Ycwi1 doOhiYqG3uWrHWmPP4k6PuphGuqjlEaIlVENgfyo3Po6Ug565WaaX9NSQkVqYwgDgccR Bl0IUZfQJxZoOqSTcYrRubZGcUeLB+AfcMKszD03yS1zs6Qc43ueGtIJrrOFd3y7zkEP TmCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694699174; x=1695303974; h=content-transfer-encoding:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DQbnGYGC1tWpJ5gmQaVkx06bk3fBiDHM6OlO+9oCV9U=; b=ujirS33B7qSS0703G8+PS8RSdZe1GgjkDCoiQ0DOddF3kY7zZpI3RGHTpEba7L+Rd0 rfQNkNBkzvm9DpPKWLzQXrPLOA8UDdYH87/ySHDqfRN9SSd9i/Gf8JOmExcQDDTAUtfL hkeqwStPhv0bHEwz2HU+Qb54igGmovHcAsZCVgZS4qrPKTOsnKnrdiKD0Z+dsMJp7vjm 01oYLi+CFZr49nsmfhZXJXHsEaJ6/Jg5cTTom2fsLuwxP9GhgfApIoJrAqqQUSLrzg9M mH2X1qf1gMvhE3OaNHR+yurPuk9BLuM//aefmTrICuFj3jhU+w2vqSDymfWBC3SFYHgd HHgA== X-Gm-Message-State: AOJu0YxIPNW8qBVQo9Eg/HJUiwmUkcRkWiJjqDgjWv1SeCuDteA5BJSp hFc6vd7hGRoeJMmlCWAs3o+kN0SNcHyxeRqBVLYf7ot/ X-Google-Smtp-Source: AGHT+IHKXmaFN3hmiyFxmcbdPmAm+4eA1dWCSSASy3n1x7WnaOrrmC5XFIGi41T1PC8qpS4RvfeUjjb7zkAGpcLLJ/s= X-Received: by 2002:a05:6870:9111:b0:1cd:249a:690c with SMTP id o17-20020a056870911100b001cd249a690cmr5959150oae.20.1694699174368; Thu, 14 Sep 2023 06:46:14 -0700 (PDT) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 Received: by 2002:ac9:5750:0:b0:4f0:1250:dd51 with HTTP; Thu, 14 Sep 2023 06:46:13 -0700 (PDT) In-Reply-To: <1D86A8FB-ACC6-427E-ABB0-2E1A5989170E@FreeBSD.org> References: <291ad2de-ba0e-4bdf-786a-19614eacec49@gmail.com> <592123F4-E610-446E-82B4-ACC519C0BA3E@iitbombay.org> <1D86A8FB-ACC6-427E-ABB0-2E1A5989170E@FreeBSD.org> From: Mateusz Guzik Date: Thu, 14 Sep 2023 15:46:13 +0200 Message-ID: Subject: Re: Continually count the number of open files To: David Chisnall Cc: Bakul Shah , Graham Perrin , FreeBSD CURRENT Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.83 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.83)[-0.832]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20221208]; MIME_GOOD(-0.10)[text/plain]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; RCVD_TLS_LAST(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::c2d:from]; FREEMAIL_ENVFROM(0.00)[gmail.com]; MIME_TRACE(0.00)[0:+]; MID_RHS_MATCH_FROMTLD(0.00)[]; TO_DN_ALL(0.00)[]; FREEMAIL_CC(0.00)[iitbombay.org,gmail.com,freebsd.org]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_THREE(0.00)[4]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; FREEMAIL_FROM(0.00)[gmail.com]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-Rspamd-Queue-Id: 4Rmdrl477bz4Fct On 9/13/23, David Chisnall wrote: > On 12 Sep 2023, at 17:19, Bakul Shah wrote: >> >> FreeBSD >> should add inotify. > > inotify is also probably not the right thing. If someone is interested i= n > adding this, Apple=E2=80=99s fsevents API is a better inspiration. It is= carefully > designed to ensure that the things monitoring for events can=E2=80=99t ev= er block > filesystem operations from making progress. I'm not sure what you mean here specifically and I don't see anything careful about what they did. >From userspace POV the API is allowed to drop events, which makes life easy on this front and is probably the right call. The implementation is utterly horrid. For one, the non-blocking aspect starts with the obvious equivalent of uma_zalloc(..., M_NOWAIT) and bailing if it fails, except if you read past that to actual registration it can perform an alloc which can block indefinitely while holding on to some vnodes: // if we haven't gotten the path yet, get it. if (pathbuff =3D=3D NULL) { pathbuff =3D get_pathbuff(); pathbuff_len =3D MAXPATHLEN; where get_pathbuf is: return zalloc(ZV_NAMEI); So the notification routine can block indefinitely in a low-memory condition. I tried to figure out if this is ever called without other vnodes write-locked (as opposed to "just" refed), but their code is such a mess that my level of curiosity was dwarfed by difficulty of getting a definitive answer. Other than that it is terribly inefficient and artificially limited to 8 processes which can do anything. That is to say it is unfit for anything but laptop-scale usage. Perhaps you meant it does not block if the watchers decide to not process any events, but that's almost inherently true if one allows for lossy notifications. > I think there=E2=80=99s a nice design > possible with a bloom filter in the kernel of events that ensures that > monitors may get spurious events but don=E2=80=99t miss out on anything. > [snip] > I think the right kernel API would walk the directory and add the vnodes= to > a bloom filter and trigger a notification on a match in the filter. You= =E2=80=99d > then have occasional spurious notifications but you=E2=80=99d have someth= ing that > could be monitored via kqueue and could be made to not block anything els= e > in the kernel. > I don't see how this can work. A directory can have more inodes than you can have vnodes at any point. So if you add vnodes to a list as you go, they may fall off of so that you can continue adding other entries. But perhaps you mean you could store the inode number as opposed to holding to the vnode? Even then, the number of entries to scan to make it happen is so big that it is going to be impractical on anything but laptop-scale. What can be fast is checking if the parent dir wants notifications, but this ignores changes to hardlinks. Except *currently* the VFS layer does not reliably track who the parent is (and in fact it can fail to spot one). The VFS layer contains a lot of cruft and design decisions which at least today are questionable at best, but fixable. A big chunk of this concerns name caching, which currently is entirely optional. Should someone want to propose an API for file notification changes, they need to state something which if implemented does not result in unfixable drag on the layer, even if initial implementation would be suboptimal. Handling arbitrary hardlinks looks like a drag to me, but I'm happy to review an implementation which avoids being a problem. That is to say, a laptop-scale API can probably be implemented as is, but solution which can provide reliable events (not to be confused with reliably notifying about all events) would require numerous changes. --=20 Mateusz Guzik