Date: Sun, 4 Oct 1998 18:01:45 +0200 (SAT) From: John Hay <jhay@mikom.csir.co.za> To: freebsd-scsi@FreeBSD.ORG Subject: cam panic... probably tag related Message-ID: <199810041601.SAA02050@zibbi.mikom.csir.co.za>
next in thread | raw e-mail | index | archive | help
System is a dual 266MHz PII with Asus motherboard with Adaptec 7880 on board and a Seagate ST34572N as drive 0 and a Conner CFP4207S as drive 2. Everything is on the Seagate except /usr/obj which is symlinked to the Conner. It is running a very up to date -current and using softupdates on all partitions. The machine will panic sometimes during a "make world" especially with a high -j value, but it panics in such a way that it does not leave a dump. What is strange to me is that it seems that the cam code catches the problem and try to recover from it, but the machine still panic. Should cam be able to recover from it? The panic is in acquire_lock() in the softupdate code (I have added a piece of "nm -aout kernel" at the end of this email), but I don't really think it is to blame for the panic. I have build a quirk entry for the Conner drive to limit the tags to max 24 and have now successfully done more than 20 "make world -j24"s without a panic, where previously it would panic within about 3. I have added a diff for the quirk entry at the end. About the minimum number of tags I just took a random number smaller than the max. :-) I'm not sure what it should be. Here is the output on the serial console preceding the panic and also the probe info during the reboot afterwards, with unrelated things here and there removed, just incase it is usefull to someone. The first 3 lines come fairly quickly after I start a make world. I understand them and just left them in for completeness. --------------------------------------------------------------------------- (da1:ahc0:0:2:0): tagged openings now 32 (da1:ahc0:0:2:0): tagged openings now 31 (da0:ahc0:0:0:0): tagged openings now 63 ... Long delay depending on how long it takes to get it to panic ... (da1:ahc0:0:2:0): SCB 0x2d - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 SEQADDR == 0x9 SSTAT1 == 0xa (da1:ahc0:0:2:0): Queuing a BDR SCB (da1:ahc0:0:2:0): Bus Device Reset Message Sent (da1:ahc0:0:2:0): no longer in timeout, status = 34b ahc0: Bus Device Reset on A:2. 31 SCBs aborted (da1:ahc0:0:2:0): tagged openings now 32 Fatal trap 12: page fault while in kernel mode mp_lock = 01000002; cpuid = 1; lapic.id = 00000000 fault virtual address = 0x30 fault code = supervisor read, page not present instruction pointer = 0x8:0xf0192468 stack pointer = 0x10:0xff804ca8 frame pointer = 0x10:0xff804cac code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio <- SMP: XXX trap number = 12 panic: page fault mp_lock = 01000002; cpuid = 1; lapic.id = 00000000 boot() called on cpu#1 syncing disks... Fatal trap 12: page fault while in kernel mode mp_lock = 01000003; cpuid = 1; lapic.id = 00000000 fault virtual address = 0x30 fault code = supervisor read, page not present instruction pointer = 0x8:0xf0192468 stack pointer = 0x10:0xff804a8c frame pointer = 0x10:0xff804a90 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = bio <- SMP: XXX trap number = 12 panic: page fault mp_lock = 01000003; cpuid = 1; lapic.id = 00000000 boot() called on cpu#1 dumping to dev 20401, offset 557056 dump Fatal trap 12: page fault while in kernel mode mp_lock = 01000004; cpuid = 1; lapic.id = 00000000 fault virtual address = 0x30 fault code = supervisor read, page not present instruction pointer = 0x8:0xf0192468 stack pointer = 0x10:0xff804594 frame pointer = 0x10:0xff804598 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = Idle interrupt mask = net tty bio cam <- SMP: XXX trap number = 12 panic: page fault mp_lock = 01000004; cpuid = 1; lapic.id = 00000000 boot() called on cpu#1 dumping to dev 20401, offset 557056 dump device not ready Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... cpu_reset called on cpu#1 cpu_reset: Stopping other CPUs cpu_reset: Restarting BSP cpu_reset_proxy: Grabbed mp lock for BSP cpu_reset_proxy: Stopped CPU 1 ... >> FreeBSD BOOT @ 0x10000: 640/65472 k of memory, serial console Boot default: 0:sd(0,a)kernel ... total=0x23a0d4 entry point=0x100000 Copyright (c) 1992-1998 FreeBSD Inc. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. FreeBSD 3.0-BETA #9: Fri Oct 2 16:16:08 SAST 1998 jhay@beast.mikom.csir.co.za:/usr/src/sys/compile/BEAST Timecounter "i8254" frequency 1193561 Hz cost 3246 ns CPU: Pentium II (686-class CPU) Origin = "GenuineIntel" Id = 0x633 Stepping=3 Features=0x80fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,MMX> real memory = 134217728 (131072K bytes) avail memory = 128409600 (125400K bytes) Programming 24 pins in IOAPIC #0 FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 1, version: 0x00040011, at 0xfee00000 cpu1 (AP): apic id: 0, version: 0x00040011, at 0xfee00000 io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec00000 Probing for devices on PCI bus 0: chip0: <Host to PCI bridge (vendor=8086 device=7180)> rev 0x03 on pci0.0.0 chip1: <PCI to PCI bridge (vendor=8086 device=7181)> rev 0x03 on pci0.1.0 chip2: <Intel 82371AB PCI to ISA bridge> rev 0x01 on pci0.4.0 chip3: <Intel 82371AB USB host controller> rev 0x01 int d irq 9 on pci0.4.2 chip4: <Intel 82371AB Power management controller> rev 0x01 on pci0.4.3 ahc0: <Adaptec aic7880 Ultra SCSI adapter> rev 0x00 int a irq 19 on pci0.6.0 ahc0: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs fxp0: <Intel EtherExpress Pro 10/100B Ethernet> rev 0x04 int a irq 18 on pci0.10.0 fxp0: Ethernet address 00:a0:c9:8d:7c:5f fxp1: <Intel EtherExpress Pro 10/100B Ethernet> rev 0x04 int a irq 17 on pci0.11.0 fxp1: Ethernet address 00:a0:c9:8d:74:dd vga0: <S3 ViRGE DX/GX graphics accelerator> rev 0x01 int a irq 16 on pci0.12.0 Probing for devices on PCI bus 1: Probing for devices on the ISA bus: sc0 at 0x60-0x6f irq 1 on motherboard sc0: VGA color <16 virtual consoles, flags=0x0> sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa sio0: type 16550A, console sio1 at 0x2f8-0x2ff irq 3 on isa sio1: type 16550A lpt0 at 0x378-0x37f irq 7 on isa lpt0: Interrupt-driven port lp0: TCP/IP capable interface fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa fdc0: FIFO enabled, 8 bytes threshold fd0: 1.44MB 3.5in npx0 on motherboard npx0: INT 16 interface APIC_IO: Testing 8254 interrupt delivery APIC_IO: routing 8254 via pin 2 SMP: AP CPU #1 Launched! sa0 at ahc0 bus 0 target 5 lun 0 sa0: <HP HP35470A 7 09> Removable Sequential Access SCSI2 device sa0: 5.0MB/s transfers (5.0MHz, offset 8) da1 at ahc0 bus 0 target 2 lun 0 da1: <CONNER CFP4207S 4.28GB 1420> Fixed Direct Access SCSI2 device da1: 10.0MB/s transfers (10.0MHz, offset 15), Tagged Queueing Enabled da1: 4096MB (8388608 512 byte sectors: 255H 63S/T 522C) da0 at ahc0 bus 0 target 0 lun 0 da0: <SEAGATE ST34572N 0784> Fixed Direct Access SCSI2 device da0: 20.0MB/s transfers (20.0MHz, offset 15), Tagged Queueing Enabled da0: 4340MB (8888924 512 byte sectors: 255H 63S/T 553C) changing root device to da0s1a ... (da0:ahc0:0:0:0): tagged openings now 64 (da0:ahc0:0:0:0): tagged openings now 63 --------------------------------------------------------------------------- Here is the "nm -aout kernel" part around the panic. --------------------------------------------------------------------------- f0192370 F ffs_softdep.o f0192370 F ffs_softdep_stub.o f0192424 t _acquire_lock f0192498 t _free_lock f019251c t _acquire_lock_interlocked f0192590 t _free_lock_interlocked --------------------------------------------------------------------------- --------------------------------------------------------------------------- Index: sys/cam/cam_xpt.c =================================================================== RCS file: /home/ncvs/src/sys/cam/cam_xpt.c,v retrieving revision 1.15 diff -u -r1.15 cam_xpt.c --- cam_xpt.c 1998/10/02 21:00:50 1.15 +++ cam_xpt.c 1998/10/04 06:47:44 @@ -233,12 +233,18 @@ #endif }; +static const char conner[] = "CONNER"; static const char quantum[] = "QUANTUM"; static const char sony[] = "SONY"; static const char west_digital[] = "WDIGTL"; static struct xpt_quirk_entry xpt_quirk_table[] = { + { + /* Sometimes gets stuck */ + { T_DIRECT, SIP_MEDIA_FIXED, conner, "CFP4207S*", "*" }, + /*quirks*/0, /*mintags*/8, /*maxtags*/24 + }, { /* Reports QUEUE FULL for temporary resource shortages */ { T_DIRECT, SIP_MEDIA_FIXED, quantum, "XP39100*", "*" }, John -- John Hay -- John.Hay@mikom.csir.co.za To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199810041601.SAA02050>