Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Jul 2024 16:15:24 -0400
From:      Emil Tsalapatis <freebsd-lists@etsalapatis.com>
To:        David Chisnall <theraven@freebsd.org>
Cc:        Warner Losh <imp@bsdimp.com>, Alan Somers <asomers@freebsd.org>,  FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: Is anyone working on VirtFS (FUSE over VirtIO)
Message-ID:  <CABFh=a6Tm=2JJdrk9LDQ%2BM96Wndr8%2Br=C4c17K3RQ0mb4%2BN0KQ@mail.gmail.com>
In-Reply-To: <75944503-8599-43CF-84C5-0C10CA325761@freebsd.org>
References:  <CABFh=a4t=73NLyJFqBOs1pRuo8B_d8wOH_mavnD-Da9dU-3k8Q@mail.gmail.com> <75944503-8599-43CF-84C5-0C10CA325761@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
Hi,

On Mon, Jul 15, 2024 at 3:47 AM David Chisnall <theraven@freebsd.org> wrote:

> Hi,
>
> This looks great! Are there infrastructure problems with supporting the
> DAX or is it ‘just work’? I had hoped that the extensions to the buffer
> cache that allow ARC to own pages that are delegated to the buffer cache
> would be sufficient.
>
>
After going over the Linux code, I think adding direct mapping doesn't
require any changes outside of FUSE and virtio code. Direct mapping mainly
requires code to manage the virtiofs device's memory region in the driver.
This is a shared memory region between guest and host with which the driver
backs FUSE inodes. The driver then includes an allocator used to map parts
of an inode into the region.

It should be possible to pass host-guest shared pages to ARC, with the
caveat that the virtiofs driver should be able to reclaim them at any time.
Does the code currently allow this? Virtiofs needs this because it maps
region pages to inodes, and must reuse cold region pages during an
allocation if there aren't any available. Basically, the region is a
separate pool of device pages that's managed directly by virtiofs.

If I understand the protocol correctly, the DAX mode is the same as the
> direct mmap mode in FUSE (not sure if FreeBSD!’s kernel fuse bits support
> this?).
>
>

Yeah, virtiofs DAX seems like it's similar to FUSE direct mmap, but with
FUSE inodes being backed by the shared region instead. I don't think
FreeBSD has direct mmap but I may be wrong there.

Emil



> David
>
> On 14 Jul 2024, at 15:07, Emil Tsalapatis <freebsd-lists@etsalapatis.com>
> wrote:
>
> 
> Hi David, Warner,
>
>     I'm glad you find this approach interesting! I've been meaning to
> update the virtio-dbg patch for a while but unfortunately haven't found the
> time in the last month since I uploaded it... I'll update it soon to
> address the reviews and split off the userspace device emulation code out
> of the patch to make reviewing easier (thanks Alan for the suggestion). If
> you have any questions or feedback please let me know.
>
> WRT virtiofs itself, I've been working on it too but I haven't found the
> time to clean it up and upload it. I have a messy but working
> implementation here
> <https://github.com/etsal/freebsd-src/tree/virtiofs-head>. The changes to
> FUSE itself are indeed minimal because it is enough to redirect the
> messages into a virtiofs device instead of sending them to a local FUSE
> device. The virtiofs device and the FUSE device are both simple
> bidirectional queues. Not sure on how to deal with directly mapping files
> between host and guest just yet, because the Linux driver uses their DAX
> interface for that, but it should be possible.
>
> Emil
>
> On Sun, Jul 14, 2024 at 3:11 AM David Chisnall <theraven@freebsd.org>
> wrote:
>
>> Wow, that looks incredibly useful.  Not needing bhyve / qemu (nested, if
>> your main development is a VM) to test virtio drivers would be a huge
>> productivity win.
>>
>> David
>>
>> On 13 Jul 2024, at 23:06, Warner Losh <imp@bsdimp.com> wrote:
>>
>> Hey David,
>>
>> You might want to check out  https://reviews.freebsd.org/D45370 which
>> has the testing framework as well as hints at other work that's been done
>> for virtiofs by Emil Tsalapatis. It looks quite interesting. Anything he's
>> done that's at odds with what I've said just shows where my analysis was
>> flawed :) This looks quite promising, but I've not had the time to look at
>> it in detail yet.
>>
>> Warner
>>
>> On Sat, Jul 13, 2024 at 2:44 AM David Chisnall <theraven@freebsd.org>
>> wrote:
>>
>>> On 31 Dec 2023, at 16:19, Warner Losh <imp@bsdimp.com> wrote:
>>>
>>>
>>> Yea. The FUSE protocol is going to be the challenge here. For this to be
>>> useful, the VirtioFS support on the FreeBSD  needs to be 100% in the
>>> kernel, since you can't have userland in the loop. This isn't so terrible,
>>> though, since our VFS interface provides a natural breaking point for
>>> converting the requests into FUSE requests. The trouble, I fear, is a
>>> mismatch between FreeBSD's VFS abstraction layer and Linux's will cause
>>> issues (many years ago, the weakness of FreeBSD VFS caused problems for a
>>> company doing caching, though things have no doubt improved from those
>>> days). Second, there's a KVM tie-in for the direct mapped pages between the
>>> VM and the hypervisor. I'm not sure how that works on the client (FreeBSD)
>>> side (though the description also says it's mapped via a PCI bar, so maybe
>>> the VM OS doesn't care).
>>>
>>>
>>> From what I can tell from a little bit of looking at the code, our FUSE
>>> implementation has a fairly cleanly abstracted layer (in fuse_ipc.c) for
>>> handling the message queue.  For VirtioFS, it would 'just' be necessary to
>>> factor out the bits here that do uio into something that talked to a VirtIO
>>> ring.  I don’t know what the VFS limitations are, but since the protocol
>>> for VirtioFS is the kernel <-> userspace protocol for FUSE, it seems that
>>> any functionality that works with FUSE filesystems in userspace would work
>>> with VirtioFS filesystems.
>>>
>>> The shared buffer cache bits are nice, but are optional, so could be
>>> done in a later version once the basic functionality worked.
>>>
>>> David
>>>
>>>
>>

[-- Attachment #2 --]
<div dir="ltr"><div>Hi,<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jul 15, 2024 at 3:47 AM David Chisnall &lt;<a href="mailto:theraven@freebsd.org">theraven@freebsd.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div dir="ltr"></div><div dir="ltr"><div dir="ltr">Hi,</div><div dir="ltr"><br></div><div dir="ltr">This looks great! Are there infrastructure problems with supporting the DAX or is it ‘just work’? I had hoped that the extensions to the buffer cache that allow ARC to own pages that are delegated to the buffer cache would be sufficient.</div><div dir="ltr"><br></div></div></div></blockquote><div><br></div><div><div dir="ltr">After
 going over the Linux code, I think adding direct mapping doesn&#39;t require any changes outside of FUSE and virtio code. Direct mapping mainly 
requires code to manage the virtiofs device&#39;s memory region in the driver. 
This is a shared memory region between guest and host with which the 
driver backs FUSE inodes. The driver then includes an allocator used to 
map parts of an inode into the region.</div><div dir="ltr"><br></div><div dir="ltr">It should be possible to pass host-guest shared pages to ARC, with 
the caveat that the virtiofs driver should be able to reclaim them at 
any time. Does the code currently allow this? Virtiofs needs this because it maps region pages to inodes, and must reuse cold region pages during an allocation if there aren&#39;t any available. 
Basically, the region is a separate pool of device pages that&#39;s managed 
directly by virtiofs.<br></div><div dir="ltr"><br></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div dir="ltr"><div dir="ltr"></div><div dir="ltr">If I understand the protocol correctly, the DAX mode is the same as the direct mmap mode in FUSE (not sure if FreeBSD!’s kernel fuse bits support this?).</div><div dir="ltr"><br></div></div></div></blockquote><div><br><br>Yeah, virtiofs DAX seems like it&#39;s similar to FUSE 
direct mmap, but with FUSE inodes being backed by the shared region instead. I 
don&#39;t think FreeBSD has direct mmap but I may be wrong there.<br><br>Emil</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div dir="ltr"><div dir="ltr"></div><div dir="ltr">David</div></div><div dir="ltr"><br><blockquote type="cite">On 14 Jul 2024, at 15:07, Emil Tsalapatis &lt;<a href="mailto:freebsd-lists@etsalapatis.com" target="_blank">freebsd-lists@etsalapatis.com</a>&gt; wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div>Hi David, Warner,</div><div><br></div><div>    I&#39;m glad you find this approach interesting! I&#39;ve been meaning to update the virtio-dbg patch for a while but unfortunately haven&#39;t found the time in the last month since I uploaded it... I&#39;ll update it soon to address the reviews and split off the 
userspace device emulation code out of the patch to make reviewing 
easier (thanks Alan for the suggestion). If you have any questions or feedback please let me know.<br></div><div><br></div><div>WRT virtiofs itself, I&#39;ve been working on it too but I haven&#39;t found the time to clean it up and upload it. I have a messy but working implementation <a href="https://github.com/etsal/freebsd-src/tree/virtiofs-head" target="_blank">here</a>. The changes to FUSE itself are indeed minimal because it is enough to redirect the messages into a virtiofs device instead of sending them to a local FUSE device. The virtiofs device and the FUSE device are both simple bidirectional queues. Not sure on how to deal with directly mapping files between host and guest just yet, because the Linux driver uses their DAX interface for that, but it should be possible.<br></div><div><br></div><div>Emil<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Jul 14, 2024 at 3:11 AM David Chisnall &lt;<a href="mailto:theraven@freebsd.org" target="_blank">theraven@freebsd.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Wow, that looks incredibly useful.  Not needing bhyve / qemu (nested, if your main development is a VM) to test virtio drivers would be a huge productivity win.  <div><br></div><div>David<br id="m_4134453877474676446m_2432313125591762966lineBreakAtBeginningOfMessage"><div><br><blockquote type="cite"><div>On 13 Jul 2024, at 23:06, Warner Losh &lt;<a href="mailto:imp@bsdimp.com" target="_blank">imp@bsdimp.com</a>&gt; wrote:</div><br><div><div dir="ltr"><div>Hey David,</div><div><br></div><div>You might want to check out  <a href="https://reviews.freebsd.org/D45370" target="_blank">https://reviews.freebsd.org/D45370</a>; which has the testing framework as well as hints at other work that&#39;s been done for virtiofs by Emil Tsalapatis. It looks quite interesting. Anything he&#39;s done that&#39;s at odds with what I&#39;ve said just shows where my analysis was flawed :) This looks quite promising, but I&#39;ve not had the time to look at it in detail yet.</div><div><br></div><div>Warner</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Jul 13, 2024 at 2:44 AM David Chisnall &lt;<a href="mailto:theraven@freebsd.org" target="_blank">theraven@freebsd.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>On 31 Dec 2023, at 16:19, Warner Losh &lt;<a href="mailto:imp@bsdimp.com" target="_blank">imp@bsdimp.com</a>&gt; wrote:<br><div><blockquote type="cite"><br><div><div style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none">Yea. The FUSE protocol is going to be the challenge here. For this to be useful, the VirtioFS support on the FreeBSD  needs to be 100% in the kernel, since you can&#39;t have userland in the loop. This isn&#39;t so terrible, though, since our VFS interface provides a natural breaking point for converting the requests into FUSE requests. The trouble, I fear, is a mismatch between FreeBSD&#39;s VFS abstraction layer and Linux&#39;s will cause issues (many years ago, the weakness of FreeBSD VFS caused problems for a company doing caching, though things have no doubt improved from those days). Second, there&#39;s a KVM tie-in for the direct mapped pages between the VM and the hypervisor. I&#39;m not sure how that works on the client (FreeBSD) side (though the description also says it&#39;s mapped via a PCI bar, so maybe the VM OS doesn&#39;t care).</div></div></blockquote><br></div><div>From what I can tell from a little bit of looking at the code, our FUSE implementation has a fairly cleanly abstracted layer (in fuse_ipc.c) for handling the message queue.  For VirtioFS, it would &#39;just&#39; be necessary to factor out the bits here that do uio into something that talked to a VirtIO ring.  I don’t know what the VFS limitations are, but since the protocol for VirtioFS is the kernel &lt;-&gt; userspace protocol for FUSE, it seems that any functionality that works with FUSE filesystems in userspace would work with VirtioFS filesystems.</div><div><br></div><div>The shared buffer cache bits are nice, but are optional, so could be done in a later version once the basic functionality worked.  </div><div><br></div><div>David</div><div><br></div></div></blockquote></div>
</div></blockquote></div><br></div></div></blockquote></div>
</div></blockquote></div></blockquote></div></div>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABFh=a6Tm=2JJdrk9LDQ%2BM96Wndr8%2Br=C4c17K3RQ0mb4%2BN0KQ>