Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 11 Mar 2003 16:25:48 -0800
From:      Terry Lambert <tlambert2@mindspring.com>
To:        dlt@mebtel.net
Cc:        freebsd-current@FreeBSD.org
Subject:   Re: exclusive sleep mutex netisr...
Message-ID:  <3E6E7E8C.D1F8DA65@mindspring.com>
References:  <20030311231140.GA3881@mebtel.net>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------F11808DF9B39AD7AD0A22524
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Derek Tattersall wrote:
> I see several instances of this in /var/log/messages after cvsup'ing
> Monday evening and rebuilding world and kernel.  I haven't seen any
> messages about this, so I figured I'd ask here.
> 
> Message:
> Mar 11 17:33:30 lorne kernel: malloc() of "64" with the following
> non-sleepablelocks held:
> Mar 11 17:33:30 lorne kernel: exclusive sleep mutex netisr lock r = 0
> (0xc0579160) locked @ /usr/src/sys/net/netisr.c:215
> 
> Can anybody supply me a clue as to what's going on here?

Problem with Jonathan Lemon's commit to update the NETISR code
to each have it's own queue.

The problem appears to be that the ni->ni_handler code os called
with the "mtx_lock(&netisr_mtx);" mutex in effect, which is actually
only supposed to protect the netisrs list from deregistration or
reregistration while in the traversal loop.

Probably the most correct thing to so is probe, drop, call,
reacquire, reprobe, and restart on non-matching value (it
looks to me as if the mutex is only protecting the netisrs[]
array,y way of the NULL-ness of ni_handler,  rather than the
elements in the queue ni_queue).

It seems to me that the order of operation in netisr_register()
and netisr_unregister(), and the lack of mutex protection there,
too, is broken.

My suggestion would be to sysctl the net.isr.enable on, to enable
direct dispatch as a workaround.  I like that code path better for
livelock avaoidance anyway.

Otherwise, here's a patch that I think makes things work as they
are supposed to work (but the handler dispatch loop is probably
still wrong, given that it doesn't back off the mutex acquisition
and retry, and happens at clock interrupt, so a blocking mutex
acquisition is probably a bad thing there).

-- Terry
--------------F11808DF9B39AD7AD0A22524
Content-Type: text/plain; charset=us-ascii;
 name="netisr.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="netisr.diff"

*** netisr.c.old	Tue Mar 11 12:12:01 2003
--- netisr.c	Tue Mar 11 12:24:32 2003
***************
*** 75,82 ****
  	
  	KASSERT(!(num < 0 || num >= (sizeof(netisrs)/sizeof(*netisrs))),
  	    ("bad isr %d", num));
! 	netisrs[num].ni_handler = handler;
  	netisrs[num].ni_queue = inq;
  }
  
  void
--- 75,86 ----
  	
  	KASSERT(!(num < 0 || num >= (sizeof(netisrs)/sizeof(*netisrs))),
  	    ("bad isr %d", num));
! 	mtx_lock(&netisr_mtx);
! 	KASSERT((netisrs[num].ni_handler == NULL)),
! 	    ("isr already registered %d", num));
  	netisrs[num].ni_queue = inq;
+ 	netisrs[num].ni_handler = handler;
+ 	mtx_unlock(&netisr_mtx);
  }
  
  void
***************
*** 87,99 ****
  	
  	KASSERT(!(num < 0 || num >= (sizeof(netisrs)/sizeof(*netisrs))),
  	    ("bad isr %d", num));
  	ni = &netisrs[num];
! 	ni->ni_handler = NULL;
! 	if (ni->ni_queue != NULL) {
  		s = splimp();
  		IF_DRAIN(ni->ni_queue);
  		splx(s);
  	}
  }
  
  struct isrstat {
--- 91,108 ----
  	
  	KASSERT(!(num < 0 || num >= (sizeof(netisrs)/sizeof(*netisrs))),
  	    ("bad isr %d", num));
+ 	mtx_lock(&netisr_mtx);
  	ni = &netisrs[num];
! 	/* repeat drain of queue until queue is empty and mutex is held */
! 	while (ni->ni_queue != NULL) {
! 		mtx_unlock(&netisr_mtx);
  		s = splimp();
  		IF_DRAIN(ni->ni_queue);
  		splx(s);
+ 		mtx_lock(&netisr_mtx);
  	}
+ 	ni->ni_handler = NULL;
+ 	mtx_unlock(&netisr_mtx);
  }
  
  struct isrstat {
***************
*** 199,204 ****
--- 208,214 ----
  	struct netisr *ni;
  	struct mbuf *m;
  	u_int bits;
+ 	u_int obits;
  	int i;
  #ifdef DEVICE_POLLING
  	const int polling = 1;
***************
*** 207,212 ****
--- 217,223 ----
  #endif
  
  	mtx_lock(&netisr_mtx);
+ restart:
  	do {
  		bits = atomic_readandclear_int(&netisr);
  		if (bits == 0)
***************
*** 220,225 ****
--- 231,238 ----
  				printf("swi_net: unregistered isr %d.\n", i);
  				continue;
  			}
+ 			obits = bits;
+ 			mtx_unlock(&netisr_mtx);
  			if (ni->ni_queue == NULL)
  				ni->ni_handler(NULL);
  			else
***************
*** 229,234 ****
--- 242,250 ----
  						break;
  					ni->ni_handler(m);
  				}
+ 			mtx_lock(&netisr_mtx);
+ 			if (obits != atomic_readandclear_int(&netisr))
+ 				goto restart;
  		}
  	} while (polling);
  	mtx_unlock(&netisr_mtx);

--------------F11808DF9B39AD7AD0A22524--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E6E7E8C.D1F8DA65>