From owner-freebsd-hackers@FreeBSD.ORG Fri Oct 31 01:45:55 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E17F116A4CE for ; Fri, 31 Oct 2003 01:45:54 -0800 (PST) Received: from razorbill.mail.pas.earthlink.net (razorbill.mail.pas.earthlink.net [207.217.121.248]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1002143F75 for ; Fri, 31 Oct 2003 01:45:54 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from user-2ivfje9.dialup.mindspring.com ([165.247.205.201] helo=mindspring.com) by razorbill.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 1AFVqm-0001bQ-00; Fri, 31 Oct 2003 01:45:49 -0800 Message-ID: <3FA22EF1.39B64387@mindspring.com> Date: Fri, 31 Oct 2003 01:44:17 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: andi payn References: <1067529247.36829.2138.camel@verdammt.falcotronic.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a489d52b5a41296f08bc19dfb764a696f4350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: freebsd-hackers@freebsd.org Subject: Re: kevent and related stuff X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Oct 2003 09:45:55 -0000 andi payn wrote: > First, let me mention that I'm not nearly as experienced coding for *BSD > as for linux, so I may ask some stupid questions. > > I've been looking at the fam port, and this has brought up a whole slew > of questions. I'm not sure if all of them are appropriate to this list, > but I don't know who else to ask, so.... > > First, some background: On Irix and Linux, fam works by asking the > kernel to send it a signal whenever the specified accesses occur. On > FreeBSD, since there is no imon interface and no dnotify fcntl, it > instead works by periodically stating all of the files it's > watching--which is obviously not as good. The fam FAQ suggests that > FreeBSD users should adapt fam to use the kevent interface. Yes. The "file access monitor" tool is the classic argument. > I looked into kevent, and it seems like there are a number of problems > that lead me to suspect that this is a really stupid idea. And yet, I'd > assume that someone on the fam team at SGI and/or one of the outside fam > developers would know FreeBSD at least as well as me. Therefore, I'm > guessing I'm missing something here. So, any ideas anyone can offer > would be very helpful. > > So, here's the questions I have: > > * I think (but I'm not sure) that kevent doesn't notify at all if the > only change to a file is its ATIME. If I'm right, this makes kevent > completely useless for fam. Adding a NOTE_ACCESS or something similar > would fix this. Since I'm pretty new to FreeBSD: What process do I go > through to figure out whether anyone else wants this, whether the > interface I've come up with is acceptable, etc.? And, once I write the > code, do I submit it as a pr? You add it, submit it as a PR, if send-pr will work from your machine properly, discuss it on the lists, and if someone with a commit bit has the time and likes the idea, it will be committed. > * The kevent mechanism doesn't monitor directories in a sufficient way > to make fam happy. If you change a file in a directory that you're > watching, unlike imon or dnotify, kevent won't see anything worth > reporting at all. This means that for directory monitoring, kevent is > useless as-is. Again, if I wanted to patch kevent to provide this > additional notification, would others want this? I'm not sure that this is correct, unless you are talking about monitoring all files in a directory by merely monitoring the directory. If you make a modification to the file metadata (e.g. add a link or rename it), then you will be notified that the directory has changed. The argument against subhierarchy monitoring is that it will, by definition, stop at the directory level, and it can not be successfully implemented for all FS types. > * When two equivalent events appear in the queue, kevent aggregates > them. This means that if there are two updates to a file since the last > time you checked, you'll only see the most recent one. For some uses of > fam (keeping a folder window up to date), this is what you want; for > others (keeping track of how often a file is read), this is useless. The > only solution I can think of is to add an additional flag, or some other > way to specify that you want duplicated events. This is the classic "edge triggered vs. level triggered" argument that Linux people bring up every time someone suggest they implement kqueue in Linux. This is easily fixable: you seperate the flag from the data, adding an additional argument to KNOTE(). This also has the side effect of removing the restriction on the PID size, which is imposed by the limited number of bits left over for representing the PID. This is a trivial change, and I've done it several times. The way this works is that you establish, via definition of the udata argument, a contract between the kernel and the user space over what the udata means. The additional argument to KNOTE can then be used by the per event note handling code in the kernel to fill out a udata structure with as much data as you want to give it, and to identify the place in user space to copy it out to. For example, you could set up an accept filter to accept up to 10 connections at a time, and return the fd's into the user space structure's int [10] array and fill out the int count value with how many were returned. For your case, you could use it to copy out each and every event instance, rather than aggregating the events. > * Unlike imon and dnotify, kevent doesn't provide any kind of callback > mechanism; instead, you have to poll the queue for events. Would it be > useful to specify another flag/parameter that would tell the kernel to > signal the monitoring process whenever an event is available? (It would > certainly make the fam code easier to write--but if it's not useful > anywhere else, that's probably not enough.) You can SIGPOLL on the event descriptor returned by kqueue(). You can use it in a select() or poll() call. You can pass it to another kqueue() as an EVFILT_READ event. Snding signals ("callbacks") is probably the absolutely least efficient way of getting the notification back. The presumption here (and it's likely a good one) is that, rather than polling, your application will be event driven, and get the events by blocking and waiting for them. > * The kevent vnode stuff apparently only works on UFS. And it looks like > it would be a major project to port it to other filesystems. It's actually pretty trivial, so long as you know the FS you are porting it to. The notifications for many things should probably be migrated to the VFS layer, instead (see Darwin's implementation). > Would this be useful for anything other than improving fam? Sure; it would be as useful for other FS's as it's useful in UFS now. The primary utility (IMO) is for GUI interfaces which want to update their displays in as close to real time as possible. > What about a port of > the imon kernel interface (and/or the dnotify fcntl) to FreeBSD instead? The way Linux people feel about "edge vs. level triggered" kqueue is about the same way FreeBSD people feel about "dnotify"... but there's no obvious way to fix the complaints about "dnotify". imon is pretty useless; if it would be implemented at all, the way to do it is in terms of kevents. Either way, you are resolving the kqueue "issue", so you might as well use kqueue. > * The kqueue doesn't appear to have any maximum size. If this is true, > the dnotify/fam problem where you get hideous errors from overflowing > queues wouldn't be an issue, but you could instead end up wasting > massive amounts of memory in the kernel if you didn't get around to > reading the queue.... Which is it? This is not strictly true; with the non-contract kqueue interface, it will aggregate the notes on a per object basis, so it's very hard to overflow (you'd have to monitor more things than you have memory available to monitor). With the contract kqueue, you would do one of two things: 1) Block the code in the KNOTE() addition until there was room on the queue 2) Have a ring buffer in user space as part of the contract, and if it overflows, it overflows Both of these boild down to "what do I do when I'm getting more events in the kernel than user space is able to process before buffer exhaustion sets in?". > Any answers, or pointers to where I can find these answers, would be > greatly appreciated. I don't pretend that my answers are authoritative; however, having done what you want to do for two commercial companies now, I can tell you the approach I describe (using a contract between the kernel an user space, and separating the parameter from the flag bits) will work. 8-). In fact, if you check the -current archives from about two years ago in June or July or so, you will see patches that perform this separation, and which add support for System V IPC message queues sending events, thereby allowing them to be selected/polled/etc. via their kqueue descriptor. -- Terry