Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Jul 2024 09:31:26 +0100
From:      David Chisnall <theraven@freebsd.org>
To:        Emil Tsalapatis <freebsd-lists@etsalapatis.com>
Cc:        Warner Losh <imp@bsdimp.com>, Alan Somers <asomers@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: Is anyone working on VirtFS (FUSE over VirtIO)
Message-ID:  <9F249E56-4053-45A3-96FC-179C01AFB084@freebsd.org>
In-Reply-To: <CABFh=a6Tm=2JJdrk9LDQ%2BM96Wndr8%2Br=C4c17K3RQ0mb4%2BN0KQ@mail.gmail.com>
References:  <CABFh=a6Tm=2JJdrk9LDQ%2BM96Wndr8%2Br=C4c17K3RQ0mb4%2BN0KQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]


> On 16 Jul 2024, at 21:20, Emil Tsalapatis <freebsd-lists@etsalapatis.com> wrote!:
> 
> After going over the Linux code, I think adding direct mapping doesn't require any changes outside of FUSE and virtio code. Direct mapping mainly requires code to manage the virtiofs device's memory region in the driver. This is a shared memory region between guest and host with which the driver backs FUSE inodes. The driver then includes an allocator used to map parts of an inode into the region.

That’s how I understood the spec too.

> It should be possible to pass host-guest shared pages to ARC, with the caveat that the virtiofs driver should be able to reclaim them at any time. Does the code currently allow this? Virtiofs needs this because it maps region pages to inodes, and must reuse cold region pages during an allocation if there aren't any available. Basically, the region is a separate pool of device pages that's managed directly by virtiofs.

I am not overly familiar with the buffer cache code, but I believe the code that was added to support ARC had similar requirements. The first ZFS port had pages in ARC and then exactly the same data in the buffer cache. The buffer cache was extended with a notion of pages that it didn’t own so that it could just use the pages in ARC directly.

I don’t remember if there’s existing support for ARC to remove those pages from the buffer cache. They are both kernel pages so it would be possible to just treat removing them from ARC as an accounting operation. There is, I believe, support for the pager to remove arbitrary pages and so it might be simple to just add a new kind of pager for these pages (which just tells the host to flush the pages).

>> If I understand the protocol correctly, the DAX mode is the same as the direct mmap mode in FUSE (not sure if FreeBSD!’s kernel fuse bits support this?).
>> 
> 
> 
> Yeah, virtiofs DAX seems like it's similar to FUSE direct mmap, but with FUSE inodes being backed by the shared region instead. I don't think FreeBSD has direct mmap but I may be wrong there.

It would be a nice feature to have if not!

David


[-- Attachment #2 --]
<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div dir="ltr"></div><div dir="ltr"><br></div><div dir="ltr"><br><blockquote type="cite">On 16 Jul 2024, at 21:20, Emil Tsalapatis &lt;freebsd-lists@etsalapatis.com&gt; wrote!:</blockquote></div><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div><br></div><div><div dir="ltr">After
 going over the Linux code, I think adding direct mapping doesn't require any changes outside of FUSE and virtio code. Direct mapping mainly 
requires code to manage the virtiofs device's memory region in the driver. 
This is a shared memory region between guest and host with which the 
driver backs FUSE inodes. The driver then includes an allocator used to 
map parts of an inode into the region.</div></div></div></div></div></blockquote><div><br></div><div>That’s how I understood the spec too.</div><br><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div><div dir="ltr">It should be possible to pass host-guest shared pages to ARC, with 
the caveat that the virtiofs driver should be able to reclaim them at 
any time. Does the code currently allow this? Virtiofs needs this because it maps region pages to inodes, and must reuse cold region pages during an allocation if there aren't any available. 
Basically, the region is a separate pool of device pages that's managed 
directly by virtiofs.<br></div></div></div></div></div></blockquote><div><br></div><div>I am not overly familiar with the buffer cache code, but I believe the code that was added to support ARC had similar requirements. The first ZFS port had pages in ARC and then exactly the same data in the buffer cache. The buffer cache was extended with a notion of pages that it didn’t own so that it could just use the pages in ARC directly.</div><div><br></div><div>I don’t remember if there’s existing support for ARC to remove those pages from the buffer cache. They are both kernel pages so it would be possible to just treat removing them from ARC as an accounting operation. There is, I believe, support for the pager to remove arbitrary pages and so it might be simple to just add a new kind of pager for these pages (which just tells the host to flush the pages).</div><br><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div dir="ltr"><div dir="ltr"></div><div dir="ltr">If I understand the protocol correctly, the DAX mode is the same as the direct mmap mode in FUSE (not sure if FreeBSD!’s kernel fuse bits support this?).</div><div dir="ltr"><br></div></div></div></blockquote><div><br><br>Yeah, virtiofs DAX seems like it's similar to FUSE 
direct mmap, but with FUSE inodes being backed by the shared region instead. I 
don't think FreeBSD has direct mmap but I may be wrong there.<br></div></div></div>
</div></blockquote><div><br></div>It would be a nice feature to have if not!<br><div><br></div><div>David</div><div><br></div></body></html>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9F249E56-4053-45A3-96FC-179C01AFB084>