Date: Fri, 12 Sep 1997 19:59:37 -0600 From: "Mike Durian" <durian@plutotech.com> To: Terry Lambert <tlambert@primenet.com> Cc: fs@FreeBSD.ORG Subject: Re: VFS/NFS client wedging problem Message-ID: <199709130159.TAA03613@pluto.plutotech.com> In-Reply-To: Your message of "Sat, 13 Sep 1997 00:03:16 -0000."
next in thread | raw e-mail | index | archive | help
On Sat, 13 Sep 1997 00:03:16 -0000, Terry Lambert <tlambert@primenet.com> wrote: > >If you can give more architectural data about your FS, and you can >give the FS you used as a model of how a VFS should be written, I >might be able to give you more detailed help. > >This is probably something that should be taken off the general >-hackers list, and onto fs@freebsd.org I'm not sure what you want when you ask for the architectural data. The thing to keep in mind is that it truly is a "virtual" filesystem and not one that would be used for everyday use. You cannot create or delete files. All the files access a common set of data on our RAID system that we record and play in real time using standard VTR function. The purpose of the file system is to allow graphics type people, those who render or modify video data, a painless way of getting the frames (and eventually audio). Using NFS, rcp, samba, appletalk or ftp they can get the frame they want in the format they want and write the same. All without installing additional software on their workstations. The filesystem layout is controlled from a configuration file with lines like: # # This is a sample PFS config file # The first component is a possible path in the PFS virtual file system. # The rest of the line is a program to run to convert the video data # found via the corrisponding path. Comments are started with a # # and continue to the end of the line. # # Path components can consist of string literals or wildcards. # Wildcards begin with a ${ and end with a }. Literals are all # other characters. Literals must be matched exactly, but wildcards # will expand to different values depending on the wildcard type and # clip meta data. # # Wildcard Expands to # ${name} Names of all clips on system # ${hour} number of hours in a clip # ${minute} number of minutes in a clip # ${second} number of seconds in a clip # ${frame} number of frames in a clip not zero padded # ${pad_frame} number of frames in a clip zero padded # # ${hour}, ${minute}, ${second}, ${frame} will # be further restricted if other wildcards exist. In a path # ${hour}/${minute}/${second}/${frame}, ${frame} will only # expand to the number of frames in the specified hour:minute:second. # # Timecode access HMSF/tiff/${name}/hour${hour}/minute${minute}/second${second}/\ ${hour}.${minute}.${second}.${pad_frame}.tiff \ data_type=vframe \ converter_type=tiff HMSF/tga/${name}/hour${hour}/minute${minute}/second${second}/\ ${hour}.${minute}.${second}.${pad_frame}.tga \ data_type=vframe \ converter_type=tga HMSF/pluto/${name}/hour${hour}/minute${minute}/second${second}/\ ${hour}.${minute}.${second}.${pad_frame}.plt \ data_type=vframe \ converter_type=pluto HMSF/abekas/${name}/hour${hour}/minute${minute}/second${second}/\ ${hour}.${minute}.${second}.${pad_frame}.yuv \ data_type=vframe \ converter_type=abekas HMSF/sgi/${name}/hour${hour}/minute${minute}/second${second}/\ ${hour}.${minute}.${second}.${pad_frame}.rgb \ data_type=vframe \ converter_type=sgi # flat frame access frames/tiff/${name}/${pad_frame}.tiff \ data_type=vframe \ converter_type=tiff frames/tga/${name}/${pad_frame}.tga \ data_type=vframe \ converter_type=tga frames/pluto/${name}/${pad_frame}.plt \ data_type=vframe \ converter_type=pluto frames/abekas/${name}/${pad_frame}.yuv \ data_type=vframe \ converter_type=abekas frames/sgi/${name}/${pad_frame}.rgb \ data_type=vframe \ converter_type=sgi I implemented the filesystem paritally in user space for a number of reasons. Our box has realtime constraints when playing and recording CCIR-601 video, so I didn't want our custom filesystem (PFS, Pluto File System) to get in the way. I though using the scheduler in the normal way would be an easy way to do this (the playback and record stuff runs at a realtime priority). The ability to use normal debugging techniques was also a major factor. So when I first mount the filesystem I create a bunch (where bunch is currently defined as 1) of unix domain sockets for communicating commands from the kernel to the user process. If all sockets are currently in use by commands the next command will sleep until one is available. The user process is a big loop that selects on the sockets and then processes the commands on the sockets one at a time in series and then selects again. So on the user side, commands do not overlap. There is basically a one to one corrispondence between VFS entry points and my commands. Where there are vnodes and pfsnodes in the kernel, there are pfs_states in the user process. For me the file path is the important part, not the data. In a way all my files are like hard links. /pfs/frames/tiff/0.tiff and /pfs/frames/tga/0.tga both access the same data on the raid disk. However, they need to be translated differently. So in lookup, I not only create a new pfsnode, but also issue a command to make sure the user process also knows the full cannonical path. I did notice a deadlock situation when using only one socket. Both synch and reclaim could lock the system if they were forced to sleep waiting for the socket to become available. So I delay those operations. In the synch case I synch immediately if the socket is not in use, but will delay the operation if it is. For reclaim I free all the resources in the kernel, but always delay the reclaim command for the user process. The delayed commands get executed immediately before a socket is returned to the available list or right before one is pulled off the available list. This keeps the operations in order. I should also mention that I can't really behave like an NFS server since my filesystem needs to save state. Do to some video issues having to do with converting from a 4:4:4 RGB color space to a 4:2:2 YUV color space, I need to process lines of video atomically. Since I can't just return short counts on writes (imagine the person writing a program that loops around a write without adding new data until the current data gets out), I need to process what I can and then hang onto the rest until the rest of the line of video shows up. I also sometimes need to hold onto various bits of the file header information so I can parse the data correctly when it arrives. This is an annoying issue for me, but not really what I wanted to ask about. As a final note, here are the vnops entry points I support (with locking type stuff basically no-op'd for my pfsnodes and just the standard boiler plate from the VOP_LOCK man page. I'm still running -current from a month ago until I can get a working version of pmap.c): static int pfs_abortop __P((struct vop_abortop_args *)); static int pfs_access __P((struct vop_access_args *)); static int pfs_badop __P((void)); static int pfs_close __P((struct vop_close_args *)); static int pfs_getattr __P((struct vop_getattr_args *)); static int pfs_inactive __P((struct vop_inactive_args *)); static int pfs_ioctl __P((struct vop_ioctl_args *)); static int pfs_lookup __P((struct vop_lookup_args *)); static int pfs_open __P((struct vop_open_args *)); static int pfs_pathconf __P((struct vop_pathconf_args *ap)); static int pfs_print __P((struct vop_print_args *)); static int pfs_read __P((struct vop_read_args *)); static int pfs_write __P((struct vop_write_args *)); static int pfs_readdir __P((struct vop_readdir_args *)); static int pfs_reclaim __P((struct vop_reclaim_args *)); static int pfs_setattr __P((struct vop_setattr_args *)); static int pfs_fsync __P((struct vop_fsync_args *)); static int pfs_advlock __P((struct vop_advlock_args *)); static int pfs_lock __P((struct vop_lock_args *)); static int pfs_unlock __P((struct vop_unlock_args *)); static int pfs_islocked __P((struct vop_islocked_args *)); static int pfs_truncate __P((struct vop_truncate_args *)); pfs_truncate is a no-op but keeps atalkd happy. Same with advlock. So, is this the sort of detail you wanted? Or did I complete miss all the relevent information? mike
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199709130159.TAA03613>