From owner-freebsd-ia64@FreeBSD.ORG Fri Jun 6 09:31:36 2003 Return-Path: Delivered-To: freebsd-ia64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1579C37B401 for ; Fri, 6 Jun 2003 09:31:36 -0700 (PDT) Received: from plim.fujitsu-siemens.com (plim.fujitsu-siemens.com [217.115.66.8]) by mx1.FreeBSD.org (Postfix) with ESMTP id E2F8143F3F for ; Fri, 6 Jun 2003 09:31:33 -0700 (PDT) (envelope-from alan.robinson@fujitsu-siemens.com) Received: from trulli.pdb.fsc.net (this.is.a.RFC1918.address [172.25.96.53] (may be forged))h56GVWF29620 for ; Fri, 6 Jun 2003 18:31:32 +0200 Received: from athen.mch.fsc.net (backbay.mch.fsc.net [172.25.94.188]) by trulli.pdb.fsc.net (8.11.6/8.11.6) with ESMTP id h56GVVg15005; Fri, 6 Jun 2003 18:31:32 +0200 Received: from sanpedro.mch.fsc.net (sanpedro [172.25.95.234]) by athen.mch.fsc.net (8.11.6/8.11.6) with ESMTP id h56GVUI08466; Fri, 6 Jun 2003 18:31:31 +0200 (MDT) Received: (from robin@localhost) by sanpedro.mch.fsc.net (8.9.3p2/8.9.3/Debian 8.9.3-21) id SAA19851; Fri, 6 Jun 2003 18:31:30 +0200 From: Alan Robinson Date: Fri, 6 Jun 2003 18:31:30 +0200 To: freebsd-ia64@freebsd.org Message-ID: <20030606183130.A19592@fujitsu-siemens.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i X-sent-by-me: robin@sanpedro Subject: DDB and SMP causes Unaligned Reference X-BeenThere: freebsd-ia64@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Alan.Robinson@fujitsu-siemens.com List-Id: Porting FreeBSD to the IA-64 List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Jun 2003 16:31:36 -0000 Using the manual escape to debugger with an SMP kernel causes an unaligned reference while stopping the other CPU, the system then hangs. # # CPU1 stopping CPUs: 0x00000001... # fatal kernel trap (cpu 0): # # trap vector = 0x1e (Unaligned Reference) # cr.iip = 0xe0000000009a1320 # cr.ipsr = 0x1210080a6010 (mfl,ic,i,dt,dfh,rt,cpl=0,it,ri=1,bn) # cr.isr = 0x20200000000 (code=0,vector=0,w,ei=1) # cr.ifa = 0xe000000000004228 # curthread = 0xe00000003d9bb080 # pid = 12, comm = idle: cpu0 # CPU0 stopping CPUs: 0x00000002... # (The self built kernel was built using projects/ia64 cvsup'd sources which are now a day or two old) The cr.iip points into swapctx (or savectx) which was called from from interrupt() as follows.... } else if (vector == ipi_vector[IPI_STOP]) { u_int32_t mybit = PCPU_GET(cpumask); CTR1(KTR_SMP, "IPI_STOP, cpuid=%d", PCPU_GET(cpuid)); savectx(PCPU_GET(pcb)); stopped_cpus |= mybit; while ((started_cpus & mybit) == 0) /* spin */; started_cpus &= ~mybit; stopped_cpus &= ~mybit; if (PCPU_GET(cpuid) == 0 && cpustop_restartfunc != NULL) { void (*f)(void) = cpustop_restartfunc; cpustop_restartfunc = NULL; (*f)(); } I cannot find a place where PCU_SET(pcb) is used nor can I find a direct setting of pc_pcb :-( The following 2 line change in sys/ia64/ia64/mp_machdep.c fixes the problem for me. +static struct pcb ia64_intr_pcb[MAXCPU]; void cpu_mp_start() { struct pcpu *pc; ap_spin = 1; SLIST_FOREACH(pc, &cpuhead, pc_allcpu) { pc->pc_current_pmap = kernel_pmap; pc->pc_other_cpus = all_cpus & ~pc->pc_cpumask; + pc->pc_pcb = &(ia64_intr_pcb[pc->pc_cpuid]); if (pc->pc_cpuid > 0) { ap_stack = malloc(KSTACK_PAGES * PAGE_SIZE, M_PMAP, M_WAITOK); ap_pcpu = pc; ap_delay = 2000; ap_awake = 0; I can now get the following: # manual escape to debugger # CPU1 stopping CPUs: 0x00000001... stopped. # Stopped at Debugger+0x31: nop.m 0x0 # db> trace # Debugger(0xe000000000a10a78, 0xe000000000990bf0, 0x692) at Debugger+0x30 # scgetc(0xe0000000002ae800, 0x2, 0xe0000000011f2cb0) at scgetc+0xbe0 # sckbdevent(0xe000000000b232e8, 0x0, 0xe0000000002ae800) at sckbdevent+0x640 # ukbd_interrupt(0xe000000000b232e8, 0x0) at ukbd_interrupt+0x850 # ukbd_intr(0xe000000001167a00, 0xe000000000b232e8, 0x0) at ukbd_intr+0x80 # usb_transfer_complete(0xe000000001167a00, 0xe000000000ae6110) at usb_transfer_complete+0x400 # ohci_softintr(0xe0000000010ab000) at ohci_softintr+0x240 # usb_schedsoftintr(0xe0000000010ab000, 0xe000000000ae6110, 0xe0000000006728f0, 0x30a) at usb_schedsoftintr+0x50 # ohci_intr1(0xe0000000010ab000) at ohci_intr1+0x440 # ohci_intr(0xe0000000010ab000, 0xe0000000006f97b0, 0x1024) at ohci_intr+0x80 # ithread_loop(0xe0000000011a8200, 0xe000000001118100, 0xe000000000ae6110, 0xe000000000ab5590) at ithread_loop+0x460 # fork_exit(0xe000000000a1d4d8, 0xe0000000011a8200, 0xa000000000019580) at fork_exit+0x1b0 # enter_userland() at enter_userland # db> c # CPU1 restarting CPUs: 0x00000001... restarted. # manual escape to debugger # CPU0 stopping CPUs: 0x00000002... stopped. # Stopped at Debugger+0x31: nop.m 0x0 # db> c # CPU0 restarting CPUs: 0x00000002... restarted. # is the double continue normal ? Alan