Date: Wed, 17 Jul 2024 09:31:26 +0100 From: David Chisnall <theraven@freebsd.org> To: Emil Tsalapatis <freebsd-lists@etsalapatis.com> Cc: Warner Losh <imp@bsdimp.com>, Alan Somers <asomers@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: Is anyone working on VirtFS (FUSE over VirtIO) Message-ID: <9F249E56-4053-45A3-96FC-179C01AFB084@freebsd.org> In-Reply-To: <CABFh=a6Tm=2JJdrk9LDQ%2BM96Wndr8%2Br=C4c17K3RQ0mb4%2BN0KQ@mail.gmail.com> References: <CABFh=a6Tm=2JJdrk9LDQ%2BM96Wndr8%2Br=C4c17K3RQ0mb4%2BN0KQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--Apple-Mail-768DFAE2-A24A-4D17-AE5A-3111C6A48B63 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable > On 16 Jul 2024, at 21:20, Emil Tsalapatis <freebsd-lists@etsalapatis.com> w= rote!: >=20 > After going over the Linux code, I think adding direct mapping doesn't req= uire any changes outside of FUSE and virtio code. Direct mapping mainly requ= ires code to manage the virtiofs device's memory region in the driver. This i= s a shared memory region between guest and host with which the driver backs = FUSE inodes. The driver then includes an allocator used to map parts of an i= node into the region. That=E2=80=99s how I understood the spec too. > It should be possible to pass host-guest shared pages to ARC, with the cav= eat that the virtiofs driver should be able to reclaim them at any time. Doe= s the code currently allow this? Virtiofs needs this because it maps region p= ages to inodes, and must reuse cold region pages during an allocation if the= re aren't any available. Basically, the region is a separate pool of device p= ages that's managed directly by virtiofs. I am not overly familiar with the buffer cache code, but I believe the code t= hat was added to support ARC had similar requirements. The first ZFS port ha= d pages in ARC and then exactly the same data in the buffer cache. The buffe= r cache was extended with a notion of pages that it didn=E2=80=99t own so th= at it could just use the pages in ARC directly. I don=E2=80=99t remember if there=E2=80=99s existing support for ARC to remo= ve those pages from the buffer cache. They are both kernel pages so it would= be possible to just treat removing them from ARC as an accounting operation= . There is, I believe, support for the pager to remove arbitrary pages and s= o it might be simple to just add a new kind of pager for these pages (which j= ust tells the host to flush the pages). >> If I understand the protocol correctly, the DAX mode is the same as the d= irect mmap mode in FUSE (not sure if FreeBSD!=E2=80=99s kernel fuse bits sup= port this?). >>=20 >=20 >=20 > Yeah, virtiofs DAX seems like it's similar to FUSE direct mmap, but with FU= SE inodes being backed by the shared region instead. I don't think FreeBSD h= as direct mmap but I may be wrong there. It would be a nice feature to have if not! David --Apple-Mail-768DFAE2-A24A-4D17-AE5A-3111C6A48B63 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable <html><head><meta http-equiv=3D"content-type" content=3D"text/html; charset=3D= utf-8"></head><body dir=3D"auto"><div dir=3D"ltr"></div><div dir=3D"ltr"><br= ></div><div dir=3D"ltr"><br><blockquote type=3D"cite">On 16 Jul 2024, at 21:= 20, Emil Tsalapatis <freebsd-lists@etsalapatis.com> wrote!:</blockquot= e></div><blockquote type=3D"cite"><div dir=3D"ltr"><div dir=3D"ltr"><div cla= ss=3D"gmail_quote"><div><br></div><div><div dir=3D"ltr">After going over the Linux code, I think adding direct mapping doesn't require an= y changes outside of FUSE and virtio code. Direct mapping mainly=20 requires code to manage the virtiofs device's memory region in the driver.=20= This is a shared memory region between guest and host with which the=20 driver backs FUSE inodes. The driver then includes an allocator used to=20 map parts of an inode into the region.</div></div></div></div></div></blockq= uote><div><br></div><div>That=E2=80=99s how I understood the spec too.</div>= <br><blockquote type=3D"cite"><div dir=3D"ltr"><div dir=3D"ltr"><div class=3D= "gmail_quote"><div><div dir=3D"ltr">It should be possible to pass host-guest= shared pages to ARC, with=20 the caveat that the virtiofs driver should be able to reclaim them at=20 any time. Does the code currently allow this? Virtiofs needs this because it= maps region pages to inodes, and must reuse cold region pages during an all= ocation if there aren't any available.=20 Basically, the region is a separate pool of device pages that's managed=20 directly by virtiofs.<br></div></div></div></div></div></blockquote><div><br= ></div><div>I am not overly familiar with the buffer cache code, but I belie= ve the code that was added to support ARC had similar requirements. The firs= t ZFS port had pages in ARC and then exactly the same data in the buffer cac= he. The buffer cache was extended with a notion of pages that it didn=E2=80=99= t own so that it could just use the pages in ARC directly.</div><div><br></d= iv><div>I don=E2=80=99t remember if there=E2=80=99s existing support for ARC= to remove those pages from the buffer cache. They are both kernel pages so i= t would be possible to just treat removing them from ARC as an accounting op= eration. There is, I believe, support for the pager to remove arbitrary page= s and so it might be simple to just add a new kind of pager for these pages (= which just tells the host to flush the pages).</div><br><blockquote type=3D"= cite"><div dir=3D"ltr"><div dir=3D"ltr"><div class=3D"gmail_quote"><blockquo= te class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px s= olid rgb(204,204,204);padding-left:1ex"><div dir=3D"auto"><div dir=3D"ltr"><= div dir=3D"ltr"></div><div dir=3D"ltr">If I understand the protocol correctl= y, the DAX mode is the same as the direct mmap mode in FUSE (not sure if Fre= eBSD!=E2=80=99s kernel fuse bits support this?).</div><div dir=3D"ltr"><br><= /div></div></div></blockquote><div><br><br>Yeah, virtiofs DAX seems like it'= s similar to FUSE=20 direct mmap, but with FUSE inodes being backed by the shared region instead.= I=20 don't think FreeBSD has direct mmap but I may be wrong there.<br></div></div= ></div> </div></blockquote><div><br></div>It would be a nice feature to have if not!= <br><div><br></div><div>David</div><div><br></div></body></html>= --Apple-Mail-768DFAE2-A24A-4D17-AE5A-3111C6A48B63--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9F249E56-4053-45A3-96FC-179C01AFB084>