From owner-freebsd-arch Thu Nov 30 15: 0: 9 2000 Delivered-To: freebsd-arch@freebsd.org Received: from falla.videotron.net (falla.videotron.net [205.151.222.106]) by hub.freebsd.org (Postfix) with ESMTP id 10BA537B402; Thu, 30 Nov 2000 15:00:00 -0800 (PST) Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213]) by falla.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0G4V00MKE17X64@falla.videotron.net>; Thu, 30 Nov 2000 17:59:57 -0500 (EST) Date: Thu, 30 Nov 2000 18:00:38 -0500 (EST) From: Bosko Milekic Subject: Re: zero copy code review In-reply-to: <20001129231653.A1503@panzer.kdm.org> To: "Kenneth D. Merry" Cc: arch@FreeBSD.ORG, gallatin@FreeBSD.ORG Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hi, On Wed, 29 Nov 2000, Kenneth D. Merry wrote: > [ -net and -current BCCed for wider coverage, this is probably best > handled on -arch ] > > I would like to request reviews of the zero copy sockets and NFS code I've > been posting about for months: > > http://people.FreeBSD.org/~ken/zero_copy > > There are diffs posted above against -current as of early November 28th, > along with a FAQ, and change log. > > These diffs include changes in: > > - the socket code > - NFS code > - VM code > - ti(4) driver > - sendfile code > > Much of the code was written by Drew Gallatin , but I > wrote a lot of the ti(4) driver mods and cleaned things up a fair bit. > > The code is stable, and I don't know of any bugs at the moment. I have run > with it enabled on one of my main development boxes for months without any > problems. > > The way things are currently configured, it is not turned on by default. > You need two kernel options and a sysctl to turn it on. The zero copy NFS > code can be turned on with gdb, although it might be better to make that > into a sysctl. (I haven't played with the zero copy NFS code much, Drew > has done much more with that.) > > How to turn the code on is covered in the web page, above. > > Anyway, I'd like to commit this code sometime next week, if no one comes up > with any issues or problems. > > Comments, bug reports, etc., are welcome. In general, I am pro-the zero copy stuff you've been gathering/merging/updating/writing/etc. over the past several months. Looking at the sendfile portion of your changes, it's pretty obvious that they are very minimal, but I'm curious as to why you've bothered removing the "static" before the sf_buf_free(). I can see why it really has no significance in the sf_buf_alloc() case, but sf_buf_free() is attached to the mbuf's m_ext free function pointer (I'm really just curious if the motivation was strictly stylistic). Here some other notes, which I came across during a real quick read of some of the code (I am sort of in a pre-final-exam period, so I can't dedicate too much time to this for the next 2 weeks, about :-( ): in nfs/nfs_serv.c: In your first "BEGIN SUSPECT REGION" block: - You allocate an sf_buf somewhere down the line and then attempt to allocate an mbuf to which you will hope to attach the sf_buf to. If the mbuf allocation fails, you don't seem to free the sf_buf anywhere and consequently, it looks as though you may leak sf_bufs. - You only m_freem() on mb (the header mbuf) if mb->m_next != NULL, but if there is no m_next (m_next == NULL), you don't seem to free the mb mbuf (header mbuf) at all. Is this meant to be this way? (Note that it may very well be, I haven't looked at all the other surrounding code, just making sure). - In the actual MEXTADD(), you don't seem to be passing the M_RDONLY flag (which is done for sendfile buffer ext mbufs). M_RDONLY is used to indicate to the rest of the code that the m_data is not to be tampered with (trimmed, et al) -- in other words, it's read-only. Have you considered it? - Stylistic suggestion: please try to keep things 25x80. :-) [ skipped all the other NFS + ti driver changes ] jumbo.h: I would like to eventually split the cluster code out of mbuf.h and uipc_mbuf.c and change jumbo.h/uipc_jumbo.c -> cluster.h/uipc_cluster.c mbuf.h: - Make EXT_DISPOSABLE 3, instead of 300... if you decide to keep it. The reason I say this is because it seems to me that EXT_DISPOSABLE should be more of an m_flag than an ext_type, which would probably mean that we should make m_flag bigger than a short (which it is now). The reason I argue this is because EXT_DISPOSABLE seems to be more of an indication of what should be done with the contents of the mbuf. Perhaps what needs to be done instead is make the EXT_DISPOSABLE flag, have if_ti use the DRV ext type (like it should be doing) for its external buffers, and make it set EXT_DISPOSABLE|M_RDONLY during the MEXTADD. Let's not get too strict with this for now, though, it would be better to make sure everything is working perfectly until we decide what to do with this - and it can be changed easily later. tiio.h: Are you sure tiio.h belongs in src/sys/sys ? Also, have you checked whether any locking should be performed here? Considering that this is all supposed to improve performance, it would be nice if it didn't all need to run under Giant. I realize that some of this will have to wait (i.e. VM), but what about the if_ti code? Is that something that can be looked at RSN? I would strongly urge you to run some tests under real heavy network activity (possibly lower NMBCLUSTERS for this) and starve out the mbuf resources and see if anything strange happens - you may catch a couple of leaks that may have accidently slipped through. Finally, I'd like to suggest possibly breaking up some of the diff to smaller chunks, just so it is easier to track things down if something does break. With -CURRENT changing relatively dramatically now sometimes several times in a single day, I think this would be worth it for everybody. > Thanks! > > Ken > -- > Kenneth Merry > ken@kdm.org Thank *you*! Regards, Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message