Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 4 Aug 2000 16:39:32 -0600 
From:      Charles Randall <crandall@matchlogic.com>
To:        freebsd-smp@freebsd.org
Subject:   4.0-R panic on Dell PowerEdge 2450
Message-ID:  <5FE9B713CCCDD311A03400508B8B301301C78A17@bdr-xcln.is.matchlogic.com>

next in thread | raw e-mail | index | archive | help
I've run into the following panic under heavy I/O on a Dell PowerEdge 2450
(2x866 MHz P-III, 1 GB RAM, etc) running 4.0-R. There's a lot of information
here...

I can reproduce this in a few hours or less by running multiple concurrent
"sort" processes in a "while /usr/bin/true" loop on a very large file (the
disk I/O is for the temporary files in the directory pointed to by -T).

The disk controller is,

ahc0: <Adaptec aic7899 Ultra160 SCSI adapter> port 0xdc00-0xdcff mem
0xf8fff000-
0xf8ffffff irq 5 at device 4.0 on pci2

and the disks are,

da0: <SEAGATE ST318404LC 0005> Fixed Direct Access SCSI-3 device 
da0: 80.000MB/s transfers (40.000MHz, offset 63, 16bit), Tagged Queueing
Enabled
da0: 17366MB (35566478 512 byte sectors: 255H 63S/T 2213C)
da1 at ahc0 bus 0 target 1 lun 0
da1: <SEAGATE ST318404LC 0005> Fixed Direct Access SCSI-3 device 
da1: 80.000MB/s transfers (40.000MHz, offset 63, 16bit), Tagged Queueing
Enabled
da1: 17366MB (35566478 512 byte sectors: 255H 63S/T 2213C)

Here's the panic info  (I had to copy this from the console so there may be
a mistake but I did triple-check it),

--- snip ---
mp_lock = 01000001; cpuid = 1 lapic.id = 00000000
instruction pointer      = 0x8:0xc02bf983
stack pointer            = 0x10:0xff80dffc
frame pointer            = 0x10:0x0
code segment             = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, def32 1, gran 1
processor eflags         = interrupt enabled, IOPL = 0
current process          = Idle
interrupt mask           = none <- SMP: XXX
trap number              = 29
panic: unknown/reserved trap
mp_lock = 01000001; cpuid = 1 lapic.id = 00000000
boot() called on cpu#1

syncing disks... Timedout SCB handled by another timeout
--- snip ---

The machine never rebooted, it just locked up.

I was running a debug SMP kernel configured with "config -g" at the time.
I've been unable to reproduce this with the GENERIC kernel (I ran the sort
test above for more than 24 hours without problem -- that doesn't prove that
it doesn't happen, just that it doesn't happen as often).

Here's a diff between GENERIC and my CUSTOM kernel,

--- snip ---
--- GENERIC	Thu Mar  9 16:32:55 2000
+++ CUSTOM	Fri Aug  4 13:36:02 2000
@@ -22,8 +22,8 @@
 cpu		I486_CPU
 cpu		I586_CPU
 cpu		I686_CPU
-ident		GENERIC
-maxusers	32
+ident		CUSTOM
+maxusers	128
 
 #makeoptions	DEBUG=-g		#Build kernel with gdb(1) debug
symbols
 
@@ -54,12 +54,12 @@
 options		ICMP_BANDLIM		#Rate limit bad replies
 
 # To make an SMP kernel, the next two are needed
-#options 	SMP			# Symmetric MultiProcessor Kernel
-#options 	APIC_IO			# Symmetric (APIC) I/O
+options 	SMP			# Symmetric MultiProcessor Kernel
+options 	APIC_IO			# Symmetric (APIC) I/O
 # Optionally these may need tweaked, (defaults shown):
 #options 	NCPU=2			# number of CPUs
 #options 	NBUS=4			# number of busses
-#options 	NAPIC=1			# number of IO APICs
+options 	NAPIC=2			# number of IO APICs
 #options 	NINTR=24		# number of INTs
 
 device		isa
--- snip ---

Mptable returns the following on this system,

--- snip ---

============================================================================
===

MPTable, version 2.0.15

----------------------------------------------------------------------------
---

MP Floating Pointer Structure:

  location:			BIOS
  physical address:		0x000fe710
  signature:			'_MP_'
  length:			16 bytes
  version:			1.4
  checksum:			0x91
  mode:				Virtual Wire

----------------------------------------------------------------------------
---

MP Config Table Header:

  physical address:		0x000f0000
  signature:			'PCMP'
  base table length:		372
  version:			1.4
  checksum:			0xd6
  OEM ID:			'DELL    '
  Product ID:			'POWEREDGE A6'
  OEM table pointer:		0x00000000
  OEM table size:		0
  entry count:			38
  local APIC address:		0xfee00000
  extended table length:	128
  extended table checksum:	0

----------------------------------------------------------------------------
---

MP Config Base Table Entries:

--
Processors:	APIC ID	Version	State		Family	Model	Step
Flags
		 1	 0x11	 BSP, usable	 6	 8	 3
0x383fbff
		 0	 0x11	 AP, usable	 6	 8	 3
0x383fbff
--
Bus:		Bus ID	Type
		 0	 PCI   
		 1	 PCI   
		 2	 PCI   
		 3	 ISA   
--
I/O APICs:	APIC ID	Version	State		Address
		 2	 0x11	 usable		 0xfec00000
		 3	 0x11	 usable		 0xfec01000
--
I/O Ints:	Type	Polarity    Trigger	Bus ID	 IRQ	APIC ID	PIN#
		ExtINT	active-hi        edge	     3	   0	      2	   0
		INT	 conforms    conforms	     3	   1	      2	   1
		INT	 conforms    conforms	     3	   3	      2	   3
		INT	 conforms    conforms	     3	   4	      2	   4
		INT	 conforms    conforms	     3	   6	      2	   6
		INT	 conforms    conforms	     3	   7	      2	   7
		INT	 conforms    conforms	     3	   8	      2	   8
		INT	 conforms    conforms	     3	   9	      2	   9
		INT	 conforms    conforms	     3	  12	      2	  12
		INT	 conforms    conforms	     3	  14	      2	  14
		INT	 conforms    conforms	     3	  15	      2	  15
		INT	 conforms    conforms	     1	 8:A	      3	   0
		INT	 conforms    conforms	     2	 4:A	      3	  15
		INT	 conforms    conforms	     2	 4:B	      3	  14
		INT	 conforms    conforms	     0	 4:A	      3	   1
		INT	 conforms    conforms	     0	 4:C	      3	   1
		INT	 conforms    conforms	     0	 4:B	      3	   2
		INT	 conforms    conforms	     0	 4:D	      3	   2
		INT	 conforms    conforms	     0	 2:A	      3	   4
		INT	 conforms    conforms	     0	 2:C	      3	   4
		INT	 conforms    conforms	     0	 2:B	      3	   5
		INT	 conforms    conforms	     0	 2:D	      3	   5
		INT	 conforms    conforms	     0	 8:A	      3	   6
		INT	 conforms    conforms	     0	 8:C	      3	   6
		INT	 conforms    conforms	     0	 8:B	      3	   7
		INT	 conforms    conforms	     0	 8:D	      3	   7
		INT	 conforms    conforms	     1	 2:B	      3	  14
		INT	 conforms    conforms	     1	 2:A	      3	  15
--
Local Ints:	Type	Polarity    Trigger	Bus ID	 IRQ	APIC ID	PIN#
		ExtINT	active-hi        edge	     3	   0	    255	   0
		NMI	active-hi        edge	     3	   0	    255	   1

----------------------------------------------------------------------------
---

MP Config Extended Table Entries:

--

 bus ID: 0 address type: I/O address
 address base: 0xe000
 address range: 0x1000
--

 bus ID: 0 address type: memory address
 address base: 0xa0000
 address range: 0x20000
--

 bus ID: 0 address type: I/O address
 address base: 0x0
 address range: 0x1000
--

 bus ID: 0 address type: memory address
 address base: 0xfb000000
 address range: 0x3010000
--

 bus ID: 1 address type: I/O address
 address base: 0xc000
 address range: 0x2000
--

 bus ID: 1 address type: memory address
 address base: 0xf4000000
 address range: 0x6110000
--

 bus ID: 3 bus info: 0x01 parent bus ID: 0
----------------------------------------------------------------------------
---

# SMP kernel config file options:


# Required:
options		SMP			# Symmetric MultiProcessor Kernel
options		APIC_IO			# Symmetric (APIC) I/O

# Optional (built-in defaults will work in most cases):
#options		NCPU=2			# number of CPUs
#options		NBUS=4			# number of busses
#options		NAPIC=2			# number of IO APICs
#options		NINTR=28		# number of INTs

============================================================================
===
--- snip ---

Note that NAPIC isn't specified. However, the kernel won't boot without
NAPIC=2 as I've specivied.

Finally, I'm running Luoqi's patch for multiple APIC support based on "diff
-p -u -r1.250.2.2 -r1.250.2.3". I've confirmed with him that this patch is
correct,

--- snip ---
--- ./backup/pmap.c	Tue Jul 25 18:32:03 2000
+++ pmap.c	Tue Jul 25 18:33:06 2000
@@ -426,9 +426,10 @@
 		for (j = 0; j < mp_napics; j++) {
 			/* same page frame as a previous IO apic? */
 			if (((vm_offset_t)SMPpt[NPTEPG-2-j] & PG_FRAME) ==
-			    (io_apic_address[0] & PG_FRAME)) {
+                           (io_apic_address[i] & PG_FRAME)) {
 				ioapic[i] = (ioapic_t *)((u_int)SMP_prvspace
-					+ (NPTEPG-2-j)*PAGE_SIZE);
+                                       + (NPTEPG-2-j) * PAGE_SIZE
+                                       + (io_apic_address[i] & PAGE_MASK));
 				break;
 			}
 			/* use this slot if available */
@@ -436,7 +437,8 @@
 				SMPpt[NPTEPG-2-j] = (pt_entry_t)(PG_V |
PG_RW |
 				    pgeflag | (io_apic_address[i] &
PG_FRAME));
 				ioapic[i] = (ioapic_t *)((u_int)SMP_prvspace
-					+ (NPTEPG-2-j)*PAGE_SIZE);
+                                       + (NPTEPG-2-j) * PAGE_SIZE
+                                       + (io_apic_address[i] & PAGE_MASK));
 				break;
 			}
 		}


--- snip ---

There are no clues in the system log.

Have any other 2450 users seen this? I'm going to try 4.1-R now.

Thanks,
Charles



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5FE9B713CCCDD311A03400508B8B301301C78A17>