Date: Tue, 11 Mar 2003 16:25:48 -0800 From: Terry Lambert <tlambert2@mindspring.com> To: dlt@mebtel.net Cc: freebsd-current@FreeBSD.org Subject: Re: exclusive sleep mutex netisr... Message-ID: <3E6E7E8C.D1F8DA65@mindspring.com> References: <20030311231140.GA3881@mebtel.net>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------F11808DF9B39AD7AD0A22524 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Derek Tattersall wrote: > I see several instances of this in /var/log/messages after cvsup'ing > Monday evening and rebuilding world and kernel. I haven't seen any > messages about this, so I figured I'd ask here. > > Message: > Mar 11 17:33:30 lorne kernel: malloc() of "64" with the following > non-sleepablelocks held: > Mar 11 17:33:30 lorne kernel: exclusive sleep mutex netisr lock r = 0 > (0xc0579160) locked @ /usr/src/sys/net/netisr.c:215 > > Can anybody supply me a clue as to what's going on here? Problem with Jonathan Lemon's commit to update the NETISR code to each have it's own queue. The problem appears to be that the ni->ni_handler code os called with the "mtx_lock(&netisr_mtx);" mutex in effect, which is actually only supposed to protect the netisrs list from deregistration or reregistration while in the traversal loop. Probably the most correct thing to so is probe, drop, call, reacquire, reprobe, and restart on non-matching value (it looks to me as if the mutex is only protecting the netisrs[] array,y way of the NULL-ness of ni_handler, rather than the elements in the queue ni_queue). It seems to me that the order of operation in netisr_register() and netisr_unregister(), and the lack of mutex protection there, too, is broken. My suggestion would be to sysctl the net.isr.enable on, to enable direct dispatch as a workaround. I like that code path better for livelock avaoidance anyway. Otherwise, here's a patch that I think makes things work as they are supposed to work (but the handler dispatch loop is probably still wrong, given that it doesn't back off the mutex acquisition and retry, and happens at clock interrupt, so a blocking mutex acquisition is probably a bad thing there). -- Terry --------------F11808DF9B39AD7AD0A22524 Content-Type: text/plain; charset=us-ascii; name="netisr.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="netisr.diff" *** netisr.c.old Tue Mar 11 12:12:01 2003 --- netisr.c Tue Mar 11 12:24:32 2003 *************** *** 75,82 **** KASSERT(!(num < 0 || num >= (sizeof(netisrs)/sizeof(*netisrs))), ("bad isr %d", num)); ! netisrs[num].ni_handler = handler; netisrs[num].ni_queue = inq; } void --- 75,86 ---- KASSERT(!(num < 0 || num >= (sizeof(netisrs)/sizeof(*netisrs))), ("bad isr %d", num)); ! mtx_lock(&netisr_mtx); ! KASSERT((netisrs[num].ni_handler == NULL)), ! ("isr already registered %d", num)); netisrs[num].ni_queue = inq; + netisrs[num].ni_handler = handler; + mtx_unlock(&netisr_mtx); } void *************** *** 87,99 **** KASSERT(!(num < 0 || num >= (sizeof(netisrs)/sizeof(*netisrs))), ("bad isr %d", num)); ni = &netisrs[num]; ! ni->ni_handler = NULL; ! if (ni->ni_queue != NULL) { s = splimp(); IF_DRAIN(ni->ni_queue); splx(s); } } struct isrstat { --- 91,108 ---- KASSERT(!(num < 0 || num >= (sizeof(netisrs)/sizeof(*netisrs))), ("bad isr %d", num)); + mtx_lock(&netisr_mtx); ni = &netisrs[num]; ! /* repeat drain of queue until queue is empty and mutex is held */ ! while (ni->ni_queue != NULL) { ! mtx_unlock(&netisr_mtx); s = splimp(); IF_DRAIN(ni->ni_queue); splx(s); + mtx_lock(&netisr_mtx); } + ni->ni_handler = NULL; + mtx_unlock(&netisr_mtx); } struct isrstat { *************** *** 199,204 **** --- 208,214 ---- struct netisr *ni; struct mbuf *m; u_int bits; + u_int obits; int i; #ifdef DEVICE_POLLING const int polling = 1; *************** *** 207,212 **** --- 217,223 ---- #endif mtx_lock(&netisr_mtx); + restart: do { bits = atomic_readandclear_int(&netisr); if (bits == 0) *************** *** 220,225 **** --- 231,238 ---- printf("swi_net: unregistered isr %d.\n", i); continue; } + obits = bits; + mtx_unlock(&netisr_mtx); if (ni->ni_queue == NULL) ni->ni_handler(NULL); else *************** *** 229,234 **** --- 242,250 ---- break; ni->ni_handler(m); } + mtx_lock(&netisr_mtx); + if (obits != atomic_readandclear_int(&netisr)) + goto restart; } } while (polling); mtx_unlock(&netisr_mtx); --------------F11808DF9B39AD7AD0A22524-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E6E7E8C.D1F8DA65>