Date: Mon, 27 Sep 2004 15:14:31 -0400 From: Ken Smith <kensmith@cse.Buffalo.EDU> To: Kris Kennaway <kris@obsecurity.org> Cc: sparc64@freebsd.org Subject: Re: panic: ipi_send: couldn't send ipi Message-ID: <20040927191430.GA718@electra.cse.Buffalo.EDU> In-Reply-To: <20040925070741.GA51297@xor.obsecurity.org> References: <20040925070741.GA51297@xor.obsecurity.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Sep 25, 2004 at 12:07:41AM -0700, Kris Kennaway wrote: > Another panic on SMP sparc: > > panic: ipi_send: couldn't send ipi > cpuid = 1 > KDB: enter: panic > [thread 100158] > Stopped at kdb_enter+0x38: ta %xcc, 1 > db> trace > panic() at panic+0x19c > cpu_ipi_send() at cpu_ipi_send+0xb8 > cpu_ipi_selected() at cpu_ipi_selected+0x38 > spitfire_icache_page_inval() at spitfire_icache_page_inval+0x70 > pmap_enter() at pmap_enter+0x27c > vm_fault() at vm_fault+0x1200 > trap_pfault() at trap_pfault+0x1e4 > trap() at trap+0x208 > -- fast data access mmu miss tar=0x4041e797 %o7=0x402c2f48 -- > userland() at 0x402c2f5c > user trace: trap %o7=0x402c2f48 > Would someone(s) who have been having this problem mind trying the following patch? I'm a bit nervous about just "blindly" raising the IPI_RETRIES constant to something huge, I would like to try and track this down a bit to find out a bit more about what's going on. This patch is a bit weird, but it should report on the console when it finds a need to bump the number of retries (and it's starting off with a higher default number than before - the PR that someone cited reported test numbers in the 2000 range with that test patch). If you find that it stops the panic's I'm interested in finding what it winds up bumping the loop counter to, and if anyone knows what sort of activity triggers the delays in the ipi delivery I'm interested... I've compile-tested this but I'm afraid I sorta loaned my test MP machines to someone for a while so I can't try to exercise this here at the moment. :-( Thanks... Index: sys/sparc64/include/smp.h =================================================================== RCS file: /home/ncvs/src/sys/sparc64/include/smp.h,v retrieving revision 1.16 diff -u -r1.16 smp.h --- sys/sparc64/include/smp.h 8 Apr 2003 06:35:08 -0000 1.16 +++ sys/sparc64/include/smp.h 27 Sep 2004 17:01:52 -0000 @@ -45,7 +45,9 @@ #define IPI_RENDEZVOUS PIL_RENDEZVOUS #define IPI_STOP PIL_STOP -#define IPI_RETRIES 100 +#define IPI_RETRIES_START 1000 +#define IPI_RETRIES_INCREMENT 1000 +#define IPI_RETRIES_MAX 10000 struct cpu_start_args { u_int csa_count; Index: sys/sparc64/sparc64/mp_machdep.c =================================================================== RCS file: /home/ncvs/src/sys/sparc64/sparc64/mp_machdep.c,v retrieving revision 1.27 diff -u -r1.27 mp_machdep.c --- sys/sparc64/sparc64/mp_machdep.c 27 Sep 2004 16:06:38 -0000 1.27 +++ sys/sparc64/sparc64/mp_machdep.c 27 Sep 2004 19:13:00 -0000 @@ -107,6 +107,8 @@ static volatile u_int shutdown_cpus; +int ipi_retries_max = IPI_RETRIES_START; + void cpu_mp_unleash(void *); SYSINIT(cpu_mp_unleash, SI_SUB_SMP, SI_ORDER_FIRST, cpu_mp_unleash, NULL); @@ -429,7 +431,7 @@ KASSERT((ldxa(0, ASI_INTR_DISPATCH_STATUS) & IDR_BUSY) == 0, ("cpu_ipi_send: outstanding dispatch")); - for (i = 0; i < IPI_RETRIES; i++) { + for (i = 0; i <= ipi_retries_max; i++) { s = intr_disable(); stxa(AA_SDB_INTR_D0, ASI_SDB_INTR_W, d0); stxa(AA_SDB_INTR_D1, ASI_SDB_INTR_W, d1); @@ -441,6 +443,11 @@ intr_restore(s); if ((ldxa(0, ASI_INTR_DISPATCH_STATUS) & IDR_NACK) == 0) return; + if (i == ipi_retries_max && ipi_retries_max < IPI_RETRIES_MAX) { + ipi_retries_max += IPI_RETRIES_INCREMENT; + printf("cpu_ipi_send: raised ipi_retries_max to %d\n", + ipi_retries_max); + } } if ( #ifdef KDB -- Ken Smith - From there to here, from here to | kensmith@cse.buffalo.edu there, funny things are everywhere. | - Theodore Geisel |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040927191430.GA718>