Date: Mon, 11 Mar 2002 14:21:02 -0800 From: Alfred Perlstein <bright@mu.org> To: John Baldwin <jhb@FreeBSD.org> Cc: davidc@freebsd.org, smp@freebsd.org Subject: Re: select fix and giant pushdown patch Message-ID: <20020311222102.GA92565@elvis.mu.org> In-Reply-To: <XFMail.020311120529.jhb@FreeBSD.org> References: <20020311001131.GN26621@elvis.mu.org> <XFMail.020311120529.jhb@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
* John Baldwin <jhb@FreeBSD.org> [020311 09:05] wrote: > > > Index: dev/kbd/kbd.c > > =================================================================== > > RCS file: /home/ncvs/src/sys/dev/kbd/kbd.c,v > > retrieving revision 1.27 > > diff -u -r1.27 kbd.c > > --- dev/kbd/kbd.c 12 Sep 2001 08:37:06 -0000 1.27 > > +++ dev/kbd/kbd.c 10 Mar 2002 08:08:36 -0000 > > @@ -524,7 +524,7 @@ > > #endif > > clist_alloc_cblocks(&sc->gkb_q, KB_QSIZE, KB_QSIZE/2); /* XXX */ > > sc->gkb_rsel.si_flags = 0; > > - sc->gkb_rsel.si_pid = 0; > > + SEL_INIT(&sc->gkb_rsel); > > splx(s); > > > > return 0; > > softc's are already zero'd so you don't need this at all. ok. fixed in the other places you mentioned as well. > > > Index: kern/kern_ktrace.c > > =================================================================== > > RCS file: /home/ncvs/src/sys/kern/kern_ktrace.c,v > > retrieving revision 1.60 > > diff -u -r1.60 kern_ktrace.c > > --- kern/kern_ktrace.c 27 Feb 2002 19:10:50 -0000 1.60 > > +++ kern/kern_ktrace.c 10 Mar 2002 08:08:36 -0000 > > @@ -181,6 +181,8 @@ > > > > if (error) > > return; > > + > > + mtx_lock(&Giant); > > /* > > * don't let p_tracep get ripped out from under us > > */ > > @@ -200,6 +202,7 @@ > > vrele(vp); > > FREE(kth, M_KTRACE); > > p->p_traceflag &= ~KTRFAC_ACTIVE; > > + mtx_unlock(&Giant); > > } > > This will probably result in lots of nice lock order reversals. That is why > I'm working on getting things like this out from under Giant rather than > bogusly pushing Giant down into them. How exactly? Every other place is covered by Giant, calling into Ktrace while holding any lock other than Giant should kill you because Ktrace may block. I wouldn't worry about it. > > Note that pushing Giant down into foo_drop routines or foo_free (we should pick > a consistent name for those btw) is probably a good idea to handle things like > fdrop() and crfree(). > > > + si = TAILQ_FIRST(&td->td_selinfo); > > + while (si != NULL) { > > + nsi = TAILQ_NEXT(si, si_thrlist); > > + si->si_thread = NULL; > > + si = nsi; > > + } > > + TAILQ_INIT(&td->td_selinfo); > > +} > > + > > +/* > > Maybe: > > while(!TAILQ_EMPTY(&td->td_selinfo) { > si = TAILQ_FIRST(...); > nsi = TAILQ_NEXT(...); > ... > } > > w/o the TAILQ_INIT(). No, actually this code needs to be fixed up. :) *slaps chad* It should be a TAILQ_FOREACH followed by a TAILQ_INIT. :) TAILQ_FOREACH(&td->td_selq, si, si_thrlist) si->si_thread = NULL; TAILQ_INIT(&td->td_selq); The idea is that you really don't need to dequeue the selinfos, just null out the thread pointer and clear the thread's list of selinfos. :) > Also, calling the field td_selinfo makes it look like you have embedded a > selinfo in the thread instead of a list of them. Maybe something like > td_silist or something that makes it clear it's a head of a list and not a > selinfo itself? Can do. > > > + mtx_lock(&sellock); > > retry: > > + > > ncoll = nselcoll; > > mtx_lock_spin(&sched_lock); > > td->td_flags |= TDF_SELECT; > > mtx_unlock_spin(&sched_lock); > > - PROC_UNLOCK(td->td_proc); > > + mtx_unlock(&sellock); > > + > > + /* XXX Is there a better place for this? */ > > + TAILQ_INIT(&td->td_selinfo); > > Yes, init it when the thread is created at fork() for now (Julian will move it > when needed). It only needs the INIT once. > > The PROC_LOCK's were oringally due to TDF_SELECT being a p_flag and can > definitely go away now that it is a td_flag. I think we need an assertion that a cv can only be used with a single mutex. > > > @@ -838,23 +864,28 @@ > > timo = ttv.tv_sec > 24 * 60 * 60 ? > > 24 * 60 * 60 * hz : tvtohz(&ttv); > > } > > + > > + /* XXX What is the point of this? */ > > mtx_lock_spin(&sched_lock); > > td->td_flags &= ~TDF_SELECT; > > mtx_unlock_spin(&sched_lock); > > You have to have the lock to make sure no one else is writing to td_flags and > to make sure you don't have a stale value while performing the update. This is Chad's XXX, what he's talking about is why do we need to clear the TDF_SELECT flag at all, not that it's done while holding sched lock (which makes sense). I think there's actually a race here, we seem to only recheck the TDF_SELECT flag if there was a timeout specified in the select call. So if there wasn't a timeout it looks like we could loose a race where an fd being waited on but we don't recheck if a selwakeup occured on it. Am I missing something? Basically, I think this code should be something like: mtx_lock_spin(&sched_lock); if ((td->td_flags & TDF_SELECT) == 0) { mtx_unlock_spin(&sched_lock); goto restart; } td->td_flags &= ~TDF_SELECT; mtx_unlock_spin(&sched_lock); ?? Why was there extra hysterics put into the timeout code? > > Thanks for fixing select() btw. I was also thinking about how to fix this as I > mentioned to you earlier. The pfind() was definitely evil. I don't like > having a global lock per-se, but I can't think of a better way to do this > myself. You can look at the linux mechanism, but there it seems to pound heavily on the schedlock by leaving multiple equivelants to "asleep" entries on the wait channels. I think Linux allows one to basically sleep on multiple addresses asyncronously. I'm not sure this is better because it requires a LOT of overhead imo. a) the filedesc lock isn't held so each file must be "fhold()'d" and a pointer to it retained in a private store. because of this, when select exits instead of just having to walk a linked list NULL'ing out a pointer, they must walk the list of objects, fput()'ing them, and removing the async sleep structures (unless that's done via linking all the async sleep objects together.:) b) each file may require an async sleep event to be queued, this can pound on schedlock (that is unless they have some magical way of avoiding that). > > > + TAILQ_REMOVE(&td->td_selinfo, sip, si_thrlist); > > + mtx_lock_spin(&sched_lock); > > + if (td->td_wchan == (caddr_t)&selwait) { > > + if (td->td_proc->p_stat == SSLEEP) > > + setrunnable(td); > > + else > > + cv_waitq_remove(td); > > + } else > > + td->td_flags &= ~TDF_SELECT; > > + mtx_unlock_spin(&sched_lock); > > This can probably be simplified now to something like this: > > mtx_lock_spin(&sched_lock); > if (td->td_proc->p_stat == SSLEEP) > setrunnable(td); > mtx_unlock_spin(&sched_lock); Huh, why? > > I would rather you commit the select stuff by itself and do the Giant > pushdown's afterwards though I might like to review just that part of the diff > once the select changes are in. Ok. I'll post two seperate diffs tonight. > > Also, the XXXKSE comment in sys/select.h about si_thread can go away now. Ok. -- -Alfred Perlstein [alfred@freebsd.org] 'Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom.' Tax deductible donations for FreeBSD: http://www.freebsdfoundation.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020311222102.GA92565>