Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Sep 2004 15:14:31 -0400
From:      Ken Smith <kensmith@cse.Buffalo.EDU>
To:        Kris Kennaway <kris@obsecurity.org>
Cc:        sparc64@freebsd.org
Subject:   Re: panic: ipi_send: couldn't send ipi
Message-ID:  <20040927191430.GA718@electra.cse.Buffalo.EDU>
In-Reply-To: <20040925070741.GA51297@xor.obsecurity.org>
References:  <20040925070741.GA51297@xor.obsecurity.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Sep 25, 2004 at 12:07:41AM -0700, Kris Kennaway wrote:
> Another panic on SMP sparc:
> 
> panic: ipi_send: couldn't send ipi
> cpuid = 1
> KDB: enter: panic
> [thread 100158]
> Stopped at      kdb_enter+0x38: ta              %xcc, 1
> db> trace
> panic() at panic+0x19c
> cpu_ipi_send() at cpu_ipi_send+0xb8
> cpu_ipi_selected() at cpu_ipi_selected+0x38
> spitfire_icache_page_inval() at spitfire_icache_page_inval+0x70
> pmap_enter() at pmap_enter+0x27c
> vm_fault() at vm_fault+0x1200
> trap_pfault() at trap_pfault+0x1e4
> trap() at trap+0x208
> -- fast data access mmu miss tar=0x4041e797 %o7=0x402c2f48 --
> userland() at 0x402c2f5c
> user trace: trap %o7=0x402c2f48
> 

Would someone(s) who have been having this problem mind trying
the following patch?  I'm a bit nervous about just "blindly"
raising the IPI_RETRIES constant to something huge, I would
like to try and track this down a bit to find out a bit more
about what's going on.

This patch is a bit weird, but it should report on the console
when it finds a need to bump the number of retries (and it's
starting off with a higher default number than before - the
PR that someone cited reported test numbers in the 2000 range
with that test patch).  If you find that it stops the panic's
I'm interested in finding what it winds up bumping the loop
counter to, and if anyone knows what sort of activity triggers
the delays in the ipi delivery I'm interested...

I've compile-tested this but I'm afraid I sorta loaned my test
MP machines to someone for a while so I can't try to exercise
this here at the moment.  :-(

Thanks...

Index: sys/sparc64/include/smp.h
===================================================================
RCS file: /home/ncvs/src/sys/sparc64/include/smp.h,v
retrieving revision 1.16
diff -u -r1.16 smp.h
--- sys/sparc64/include/smp.h	8 Apr 2003 06:35:08 -0000	1.16
+++ sys/sparc64/include/smp.h	27 Sep 2004 17:01:52 -0000
@@ -45,7 +45,9 @@
 #define	IPI_RENDEZVOUS	PIL_RENDEZVOUS
 #define	IPI_STOP	PIL_STOP
 
-#define	IPI_RETRIES	100
+#define	IPI_RETRIES_START	1000
+#define	IPI_RETRIES_INCREMENT	1000
+#define	IPI_RETRIES_MAX		10000
 
 struct cpu_start_args {
 	u_int	csa_count;
Index: sys/sparc64/sparc64/mp_machdep.c
===================================================================
RCS file: /home/ncvs/src/sys/sparc64/sparc64/mp_machdep.c,v
retrieving revision 1.27
diff -u -r1.27 mp_machdep.c
--- sys/sparc64/sparc64/mp_machdep.c	27 Sep 2004 16:06:38 -0000	1.27
+++ sys/sparc64/sparc64/mp_machdep.c	27 Sep 2004 19:13:00 -0000
@@ -107,6 +107,8 @@
 
 static volatile u_int	shutdown_cpus;
 
+int	ipi_retries_max = IPI_RETRIES_START;
+
 void cpu_mp_unleash(void *);
 SYSINIT(cpu_mp_unleash, SI_SUB_SMP, SI_ORDER_FIRST, cpu_mp_unleash, NULL);
 
@@ -429,7 +431,7 @@
 
 	KASSERT((ldxa(0, ASI_INTR_DISPATCH_STATUS) & IDR_BUSY) == 0,
 	    ("cpu_ipi_send: outstanding dispatch"));
-	for (i = 0; i < IPI_RETRIES; i++) {
+	for (i = 0; i <= ipi_retries_max; i++) {
 		s = intr_disable();
 		stxa(AA_SDB_INTR_D0, ASI_SDB_INTR_W, d0);
 		stxa(AA_SDB_INTR_D1, ASI_SDB_INTR_W, d1);
@@ -441,6 +443,11 @@
 		intr_restore(s);
 		if ((ldxa(0, ASI_INTR_DISPATCH_STATUS) & IDR_NACK) == 0)
 			return;
+		if (i == ipi_retries_max && ipi_retries_max < IPI_RETRIES_MAX) {
+			ipi_retries_max += IPI_RETRIES_INCREMENT;
+			printf("cpu_ipi_send: raised ipi_retries_max to %d\n",
+			    ipi_retries_max);
+		}
 	}
 	if (
 #ifdef KDB


-- 
						Ken Smith
- From there to here, from here to      |       kensmith@cse.buffalo.edu
  there, funny things are everywhere.   |
                      - Theodore Geisel |



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040927191430.GA718>