Date: Sat, 26 Jun 1999 23:10:01 -0600 From: Wes Peters <wes@softweyr.com> To: Jesus Monroy <jesus.monroy@usa.net> Cc: Ville-Pertti Keinonen <will@iki.fi>, hackers@FreeBSD.ORG Subject: Re: [Re: coarse vs fine-grained locking in SMP systems] Message-ID: <3775B229.C3981D1D@softweyr.com> References: <19990626055629.6978.qmail@www0i.netaddress.usa.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Jesus Monroy wrote: > > Ville-Pertti Keinonen <will@iki.fi> wrote: > > mo@servo.ccr.org (Mike O'Dell) writes: > > > we published the best Unix SMP paper I've ever seen in Computing > > > Systems - from the Amdahl guys who did an SMP version of the kernel > > > by very clever hacks on SPLx() macros to make them spin locks and > > > a bit of other clever trickery on the source. they could take a stock > > > > An approach like that can't possibly be sufficient if code has been > > written with the assumption that only interrupt-like events or > > blocking calls can change things from under it. There is quite a bit > > of code in FreeBSD that relies on this. > > > Can you elaborate on this a bit more? I think I missing > some of the finer points on what you are saying. > > I work on interrupt driven device drivers and I'm trying > to see how this ties in. Here's a good example. It's from a different system that uses a modified BSD TCP/IP stack, but the example still holds. This system keeps a linked list of network interfaces. A periodic timer callback goes through this list to handle timeout issues for the IP stack. The timer routine didn't bother to acquire the semaphore protecting the list of network interfaces. When we moved the code to an SMP system, it would occasionally (as in once or twice a year across the entire customer base) crash with a null pointer in the list of network interfaces. The timer routine was being preempted by a higher-prioty user interface task removing a network interface. The timer would run off an invalid pointer and crash the system. This never happened on the single-processor system because the timer ran in timer context, which is not interruptable by a normal process regardless of the priority, but had failed to protect itself from a normal process on the other CPU. -- "Where am I, and what am I doing in this handbasket?" Wes Peters Softweyr LLC http://www.softweyr.com/~softweyr wes@softweyr.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3775B229.C3981D1D>