From owner-freebsd-arch Thu Nov 30 19:18: 7 2000 Delivered-To: freebsd-arch@freebsd.org Received: from field.videotron.net (field.videotron.net [205.151.222.108]) by hub.freebsd.org (Postfix) with ESMTP id 94ECF37B400; Thu, 30 Nov 2000 19:18:03 -0800 (PST) Received: from modemcable213.3-201-24.mtl.mc.videotron.ca ([24.201.3.213]) by field.videotron.net (Sun Internet Mail Server sims.3.5.1999.12.14.10.29.p8) with ESMTP id <0G4V003BYD61AF@field.videotron.net>; Thu, 30 Nov 2000 22:18:01 -0500 (EST) Date: Thu, 30 Nov 2000 22:18:43 -0500 (EST) From: Bosko Milekic Subject: Re: zero copy code review In-reply-to: <14886.63486.157224.937225@grasshopper.cs.duke.edu> To: Andrew Gallatin Cc: "Kenneth D. Merry" , arch@FreeBSD.ORG, alfred@freebsd.org Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hi Andrew, On Thu, 30 Nov 2000, Andrew Gallatin wrote: [...] > Bosko Milekic writes: > > > > In general, I am pro-the zero copy stuff you've been > > gathering/merging/updating/writing/etc. over the past several months. > > Looking at the sendfile portion of your changes, it's pretty obvious that > > they are very minimal, but I'm curious as to why you've bothered removing > > the "static" before the sf_buf_free(). I can see why it really has no > > significance in the sf_buf_alloc() case, but sf_buf_free() is attached to > > the mbuf's m_ext free function pointer (I'm really just curious if the > > motivation was strictly stylistic). > > It was un-staticized because it is called by socow_iodone(), which > is the m_ext free for zero-copy transmissions. I see. But if the sendfile code still passes it as its own free routine, then shouldn't it remain staticized, strictly speaking? Although I may have missed it in the large diff, I did not see any changes to the actual registering of sf_bufs in the actual sendfile code (i.e. uipc_syscalls.c). I'm under the impression that in uipc_syscalls.c, the MEXTADD which sets up an sf_buf with an mbuf still passes sf_buf_free as its free routine. > > Here some other notes, which I came across during a real quick read > > of some of the code (I am sort of in a pre-final-exam period, so I can't > > dedicate too much time to this for the next 2 weeks, about :-( ): > > > > in nfs/nfs_serv.c: > > In your first "BEGIN SUSPECT REGION" block: > > > > - You allocate an sf_buf somewhere down the line and then attempt to > > allocate an mbuf to which you will hope to attach the sf_buf to. If the > > mbuf allocation fails, you don't seem to free the sf_buf anywhere and > > consequently, it looks as though you may leak sf_bufs. > > But the mbuf is allocated using M_WAIT. Can that fail? I haven't > kept up with the mbuf changes in -current. Yes, it can. M_WAIT just means "if nothing is available, first drain the stacks and if still nothing is available, then wait kern.ipc.mbuf_wait ticks (sysctl) and if still nothing is available, fail and set the passed in pointer to NULL and hope that the caller will deal with it." Waiting indefinetly can be dangerous in certain situations (for mbufs) but I won't get into that here. In your code, you do deal with the possibility of the MGETHDR returning NULL (you check for it) and you set ENOBUFS in that case and jump to the "errorpath" label. But, before using MGETHDR, you allocate an sf_buf (in sf) and it just so happens that the code beyond "errorpath" does not take care of freeing the sf_buf you allocated before even trying to allocate the mbuf. Another thing to note, especially if you are Pre-SMPng: sf_buf_alloc calls can block, and even indeffinately (until the allocation is succesfull). In sendfile(2), this doesn't matter as you're not allocating the sf_buf from an interrupt. It has the potential to be a problem if you start allocating sf_bufs from interrupt context. Unfortunately, I haven't yet read+fully visualized all the code in the large diff, but this is something to take into account when reviewing. > > - You only m_freem() on mb (the header mbuf) if mb->m_next != NULL, > > but if there is no m_next (m_next == NULL), you don't seem to free the mb > > mbuf (header mbuf) at all. Is this meant to be this way? (Note that it > > may very well be, I haven't looked at all the other surrounding code, > > just making sure). > > Yes. Like most of the NFS code, it is a little convoluted.. mb is a > pre-existing mbuf chain that we're attaching mbufs to. In the failure > case (where the mfreem I think you're talking about is), we backout > what we've done by freeing the mbufs we've added to mb, return > mb->next to null, and continue in the normal (copy) path. Excellent. > <... some helpful comments deleted ....> > > Many of your comments are directly related to -current, I > think I'll let Ken address them... Another one directly related to -CURRENT: I just noticed that the uipc_jumbo.c stuff does not do any locking. Perhaps it would be nice to lock the code sooner or later. I would be willing to go over it and do it but, as I said, I am really not going to be able to do much until 2 weeks from now. Furthermore, I wonder if Alfred is gutsy enough ( :-) ) to raise his voice and let us know how much this may interefere with the adding of locks to sockets in the uipc subsystem, and possibly the stack as well. Alfred, where are the potential problems? (As you've already written a portion of the latter, I assume you're very well aware)... > > I would strongly urge you to run some tests under real heavy network > > activity (possibly lower NMBCLUSTERS for this) and starve out the mbuf > > resources and see if anything strange happens - you may catch a couple of > > leaks that may have accidently slipped through. Finally, I'd like to > > suggest possibly breaking up some of the diff to smaller chunks, just so > > it is easier to track things down if something does break. With -CURRENT > > changing relatively dramatically now sometimes several times in a single > > day, I think this would be worth it for everybody. > > FWIW, the client-side nfs changes (in their 4.0-RELEASE form) are in > daily use in our lab and have been for months. We run experiments with > 8 clients running against our Slice cluster nfs file server. Each > client is close to maxed-out (60-70MB/sec per client, typically) for > hours... ;) Okay. Well, it's my understanding that the code is pretty stable; I just want to make sure that the case is the same in -CURRENT, especially when _mbufs_ are _completely_ starved. > Thank you for your feedback. And thank you for impoving the mbuf > system so much. I wasted a whole afternoon yesterday doing something > which I could have done in 5 minutes if only I had mext_refcnt in 4.0 ;) Heh; no problem, really. :-) Thanks! > Drew Cheers, Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message