Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Sep 2005 12:30:18 GMT
From:      soc-chenk <soc-chenk@FreeBSD.org>
To:        Perforce Change Reviews <perforce@freebsd.org>
Subject:   PERFORCE change 83458 for review
Message-ID:  <200509121230.j8CCUILL038025@repoman.freebsd.org>

next in thread | raw e-mail | index | archive | help
http://perforce.freebsd.org/chv.cgi?CH=83458

Change 83458 by soc-chenk@soc-chenk_leavemealone on 2005/09/12 12:29:43

	Added Messaging chapter to IMPLEMENTATION_NOTES
	Submitted by:	soc-chenk

Affected files ...

.. //depot/projects/soc2005/fuse4bsd2/Changelog#4 edit
.. //depot/projects/soc2005/fuse4bsd2/IMPLEMENTATION_NOTES#3 edit

Differences ...

==== //depot/projects/soc2005/fuse4bsd2/Changelog#4 (text+ko) ====

@@ -1,3 +1,6 @@
+Mon Sep 12 14:28:34 CEST 2005  at node: creo.hu, nick: csaba
+  * Added Messaging chapter to IMPLEMENTATION_NOTES
+
 Mon Sep 12 11:37:04 CEST 2005  at node: creo.hu, nick: csaba
   * refined handling of cache and unbound filehandles
   

==== //depot/projects/soc2005/fuse4bsd2/IMPLEMENTATION_NOTES#3 (text+ko) ====

@@ -60,9 +60,20 @@
 ones. Let's see those differences of the modules which stem from the
 differences of the two OS.
 
+TOC:
+0) VFS API
+1) Mounting
+* 1a -- interface
+* 1b -- security
+* 1c -- anything else
+2) [vi]node operations
+3) Syncing
+4) Messaging
+5) Miscellaneous
+
 0) VFS API
 
-This is the one with the greatest impact. Linux VFS operations are
+This is the one of the greatest impact. Linux VFS operations are
 inherently (struct) file based, BSD VFS operations are inherently vnode
 based (BSD vnodes correspond to Linux inodes). That is, all file-related
 VFS operations take a (struct) file parameter in Linux.
@@ -402,7 +413,120 @@
 described above, in a way which has nothing to do with in-kernel
 buffers.
 
-4) Miscellaneous
+4) Messaging
+
+Here I give a brief comparison of the ways of implementing messaging
+between kernel and userspace in Linux and in FreeBSD.
+
+Before anything else: I don't claim superiority of either solutions over
+the other. I implemented my solution from scratch, without understanding
+the respective parts of the Linux code, and without having a clear vision
+how this will be used by the VFS. (This latter implies that I tried to
+make the design as general as possible. On one hand, this is good; on the other
+hand, it means it doesn't contain any Fuse-specific tuning.)
+
+Now, "post festam" I took the effort of peeking at Linux Fuse's messaging
+code, and I feel able to make this comparison. Nothing is carved into stone,
+I might make up my mind and bend my code closer to that of Linux Fuse.
+Or I might make it even more different.
+
+Terminology: "up" will mean "from kernel to userspace", "down" will mean...
+you can guess. ("In" and "out" are too relativistic to my taste.) The basic
+vehicle of messaging is called a "request" in Linux, a "ticket" in FreeBSD.
+
+The basic mode of operation is similar. 
+
+* There is a pool of requests/tickets.
+
+* Syscall handler wants to get data from daemon. Takes a request/ticket from
+  the pool, fills in its fields, and inserts into upgoing queue. If
+  buffered I/O is being done, the backing pages/buffers are attached to
+  the request/ticket, too. Handler alerts device's read method and falls
+  asleep, waiting for answer.
+
+* The device's read method pushes up the message to the daemon.
+
+* The daemon does whatever she should do with it, and sends back an answer.
+  The device's write method grabs the answer and finds out its requester and
+  wakes that up. If buffered read is being done, appropriate parts of
+  the answer are handled differently, and copied into attached
+  page/buffer.
+ 
+* Syscall handler woken up, processes answer, drops requests/ticket, returns.
+
+Differences:
+
+* In Linux, one Fuse mount works with a fixed number of preallocated requests
+  (with some exceptions, when new ones are created), in FreeBSD, tickets are
+  created on demand. 
+
+* In Linux, the buffers hosting the fields of a request are allocated on
+  the stack (that is, these fields are pointers to structures held in
+  variables of syscall handlers), in FreeBSD, they are allocated
+  dynamically (they are not freed when the ticket get dropped, they are
+  kept, reused, and reallocated on demand).
+  
+* In Linux, the unique field of the request is filled with a really
+  unique value upon being taken out of the pool (ie., number of
+  take-out). In FreeBSD, unique values are owned by the ticket itself
+  (it's not changed during ticket's lifetime), so unique values give
+  information about the number of messaging sessions going in parallell
+  (there is a secondary field for each ticket which stores the number of
+  take-outs that ticket went through, but that's rarely used).
+
+* Messaging API: in Linux, the fields of requests are Fuse specific
+  (eg., there are fields named "inode", and "inode2", as file
+  operations take one or two inode). This means that syscall handlers
+  usually can fill these fields in a straightforward way 
+  ("req->inode = inode;").  
+
+  In FreeBSD, there are just raw message and answer buffers attached to a
+  given ticket. Syscall handlers use variables of pointers of the 
+  required structs, and frontend methods for tickets set them to an appropriate
+  value (to the appropriate point in the ticket's appropriate buffer).
+  In some of the more complex cases this means a bit of manual pointer
+  arithmetic; for those of the complex patterns which are not unique
+  (mknod/creat/link), further, specific frontend methods are used.
+  In general, I didn't feel that this approach yields too much tedious
+  repetition when setting up a ticket. 
+
+Interrupt handling: in Linux, when a syscall is interrupted, the
+corresponding request is "backgrounded". It's put into another queue, and
+when the daemon (unaware of the interrupt) sends its reply, then its
+get dropped silently.
+
+And by-and-large, the same happens in FreeBSD -- just as the special
+case of a more general mechanism.
+
+Tickets have a callback field, which can hold an arbitrary function (of
+the given type), or can be NULL. When the device write method finds the
+ticket to which a given answer should be passed, then invokes the
+callback on the incoming data (so that's what "passing" means), provided
+it's not NULL. If the callback is not NULL, then the device write method
+expects the handler doing the necessary resource management by the
+handler; but if it's NULL, the device write method takes up this role
+and does what it can do -- drops the ticket, etc.
+
+While any type of callback could be used, there are only two in use
+currently: the so-called standard one, which does what's described
+above: fetch the answer and wake up syscall handler, and NULL.
+
+NULL is used when we don't want to wait for the answer (in case of doing
+a RELEASE, and in FreeBSD, for FSYNC too), and to handle interruption.
+If the syscall is interrupted (that is, if it returns from sleep with an
+error), then it locks the answer queue and replaces the callback with
+NULL, and thus the device write routine will aptly discard the answer.
+Well, there is no guarantee of a race win: it's possible that the device
+write routine has already taken out the ticket from the answer queue. In
+that case, it will be passed to the standard handler. That's not a
+problem, we can make him notice the interruption, and then he will
+drop the answer rather than waking up anyone. [[Being a bit offtopic:
+daemons are treated as female, for the sake of correctness. Let the
+standard handler be male.]] The only difference is I/O: the standard
+handler copies in data nevertheless, while this is skipped if the device
+write routine finds a NULL callback.
+
+5) Miscellaneous
 
 Now you can ask: what do I think, which VFS design is the better? Vnode
 centric BSD or file centric Linux?



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200509121230.j8CCUILL038025>