From owner-freebsd-questions@FreeBSD.ORG Wed Jan 10 04:53:52 2007 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 804BB16A403 for ; Wed, 10 Jan 2007 04:53:52 +0000 (UTC) (envelope-from pjah@hicom.net) Received: from ns1.hicom.net (ns1.hicom.net [208.245.180.8]) by mx1.freebsd.org (Postfix) with ESMTP id 248EA13C428 for ; Wed, 10 Jan 2007 04:53:52 +0000 (UTC) (envelope-from pjah@hicom.net) Received: from [192.168.2.162] (pool-68-239-218-104.nwrk.east.verizon.net [68.239.218.104]) (authenticated bits=0) by ns1.hicom.net (8.13.6/8.13.6) with ESMTP id l0A4d5S9036963 for ; Tue, 9 Jan 2007 23:39:10 -0500 (EST) Message-ID: <45A46DE6.6000806@hicom.net> Date: Tue, 09 Jan 2007 23:39:02 -0500 From: Juergen Heberling User-Agent: Thunderbird 1.5.0.9 (Windows/20061207) MIME-Version: 1.0 To: freebsd-questions@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: 6.1 Freezes - Suspect SCSI Issue X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jan 2007 04:53:52 -0000 Hi all Please suggest some way of diagnosing this problem: System freezes after being up in production and apparently stable for several weeks, no dump, no error message, nothing on the console - so I suspect hardware. See dmesg below but it's a Supermicro (X6DA3-G2) with 2xXeon (Nocona) processors, onboard AIC9410 ("SAS") - each channel of the SCSI "card" handles 4 drives. The drives are mirrored using GEOM on the other channel (NOT using the hardware mirroring). Right after placing system into production I needed to increase the SCSI tags (via camcontrol) on the devices in one of the mirrors ("homea") because of utterly poor performance which had resulted in several reboots: gstat showed queue lengths generally about 150 deep with spikes to 400 deep. After setting the tags to "32" performance on the mirror was adequate. I then tried to increase the tags to "64" but "camcontrol tags da4 -v" never showed more that "54" 1. So I dont understand why "camcontrol tags da4 -N 64" never goes above "54" (and why shouldn't I try to set the tags to even 128 (512 tags per channel, I believe, 4 drives per channel)). The following shows the initial tags setting and the "reduction" (to "50" in this case). The commands were issued all within a few minutes. # camcontrol tags da4 -v -N 64 (pass2:ahd0:0:4:0): tagged openings now 64 (pass2:ahd0:0:4:0): dev_openings 64 (pass2:ahd0:0:4:0): dev_active 0 (pass2:ahd0:0:4:0): devq_openings 64 (pass2:ahd0:0:4:0): devq_queued 0 (pass2:ahd0:0:4:0): held 0 (pass2:ahd0:0:4:0): mintags 2 (pass2:ahd0:0:4:0): maxtags 255 # camcontrol tags da4 -v (pass2:ahd0:0:4:0): dev_openings 64 (pass2:ahd0:0:4:0): dev_active 0 (pass2:ahd0:0:4:0): devq_openings 64 (pass2:ahd0:0:4:0): devq_queued 0 (pass2:ahd0:0:4:0): held 0 (pass2:ahd0:0:4:0): mintags 2 (pass2:ahd0:0:4:0): maxtags 255 # camcontrol tags da4 -v (pass2:ahd0:0:4:0): dev_openings 50 (pass2:ahd0:0:4:0): dev_active 0 (pass2:ahd0:0:4:0): devq_openings 50 (pass2:ahd0:0:4:0): devq_queued 0 (pass2:ahd0:0:4:0): held 0 (pass2:ahd0:0:4:0): mintags 2 (pass2:ahd0:0:4:0): maxtags 255 2. Why dont I see any bus or device error messages (or indication of a dump) in the log and what can I do to turn the error messages on? Any suggestions would be appreciated. Juergen Here is my dmesg, long lines were wrapped: Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.1-RELEASE #0: Sun May 7 04:42:56 UTC 2006 root@opus.cse.buffalo.edu:/usr/obj/usr/src/sys/SMP Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(TM) CPU 3.20GHz (3200.13-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf4a Stepping = 10 Features=0xbfebfbff Features2=0x641d> AMD Features=0x20100000 AMD Features2=0x1 Logical CPUs per core: 2 real memory = 3489071104 (3327 MB) avail memory = 3414409216 (3256 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 6 cpu3 (AP): APIC ID: 7 ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-47 on motherboard ioapic2 irqs 48-71 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 cpu0: on acpi0 cpu1: on acpi0 cpu2: on acpi0 cpu3: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pci0: at device 0.1 (no driver attached) pci0: at device 1.0 (no driver attached) pci0: at device 0.1 (no driver attached) pci0: at device 1.0 (no driver attached) pcib1: irq 16 at device 2.0 on pci0 pci1: on pcib1 pcib2: irq 16 at device 3.0 on pci0 pci2: on pcib2 pcib3: at device 0.0 on pci2 pci3: on pcib3 ahd0: port 0x2400-0x24ff,0x2000-0x20ff mem 0xdd200000-0xdd201fff irq 32 at device 2.0 on pci3 ahd0: [GIANT-LOCKED] aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs ahd1: port 0x2c00-0x2cff,0x2800-0x28ff mem 0xdd202000-0xdd203fff irq 33 at device 2.1 on pci3 ahd1: [GIANT-LOCKED] aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs pci2: at device 0.1 (no driver attached) pcib4: at device 0.2 on pci2 pci4: on pcib4 em0: port 0x3000-0x303f mem 0xdd300000-0xdd31ffff irq 54 at device 2.0 on pci4 em0: Ethernet address: 00:30:48:68:84:32 em1: port 0x3040-0x307f mem 0xdd320000-0xdd33ffff irq 55 at device 2.1 on pci4 em1: Ethernet address: 00:30:48:68:84:33 pci2: at device 0.3 (no driver attached) pcib5: irq 16 at device 4.0 on pci0 pci5: on pcib5 pcib6: irq 16 at device 6.0 on pci0 pci6: on pcib6 uhci0: port 0x1400-0x141f irq 16 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] usb0: on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: port 0x1420-0x143f irq 19 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] uhci1: port 0x1420-0x143f irq 19 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] usb1: on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: port 0x1440-0x145f irq 18 at device 29.2 on pci0 uhci2: [GIANT-LOCKED] usb2: on uhci2 usb2: USB revision 1.0 uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered uhci3: port 0x1460-0x147f irq 16 at device 29.3 on pci0 uhci3: [GIANT-LOCKED] usb3: on uhci3 usb3: USB revision 1.0 uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub3: 2 ports with 2 removable, self powered ehci0: mem 0xdd001000-0xdd0013ff irq 23 at device 29.7 on pci0 ehci0: [GIANT-LOCKED] usb4: EHCI version 1.0 usb4: companion controllers, 2 ports each: usb0 usb1 usb2 usb3 usb4: on ehci0 usb4: USB revision 2.0 uhub4: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub4: 8 ports with 8 removable, self powered pcib7: at device 30.0 on pci0 pci7: on pcib7 pci7: at device 1.0 (no driver attached) isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x14a0-0x14af at device 31.1 on pci0 ata0: on atapci0 ata1: on atapci0 pci0: at device 31.3 (no driver attached) acpi_button0: on acpi0 ppc0: port 0x378-0x37f,0x778-0x77f irq 7 drq 1 on acpi0 ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode ppc0: port 0x378-0x37f,0x778-0x77f irq 7 drq 1 on acpi0 ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/8 bytes threshold ppbus0: on ppc0 plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 pmtimer0 on isa0 orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xc8fff,0xc9000-0xd2fff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec acd0: DVDROM at ata0-master UDMA33 Waiting 5 seconds for SCSI devices to settle da1 at ahd1 bus 0 target 0 lun 0 da1: Fixed Direct Access SCSI-3 device da1: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled da1: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C) da3 at ahd1 bus 0 target 2 lun 0 da3: Fixed Direct Access SCSI-3 device da3: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled da3: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C) da5 at ahd1 bus 0 target 4 lun 0 da5: Fixed Direct Access SCSI-3 device da5: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled da5: Fixed Direct Access SCSI-3 device da5: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled da5: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C) da0 at ahd0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-3 device da0: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled da0: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C) da2 at ahd0 bus 0 target 2 lun 0 da2: Fixed Direct Access SCSI-3 device da2: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled da2: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C) da4 at ahd0 bus 0 target 4 lun 0 da4: Fixed Direct Access SCSI-3 device da4: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged Queueing Enabled da4: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C) da6 at ahd0 bus 0 target 6 lun 0 da6: Fixed Direct Access SCSI-3 device da6: 320.000MB/s transfers (160.000MHz, offset 127, 16bit), Tagged Queueing Enabled da6: 35046MB (71775284 512 byte sectors: 255H 63S/T 4467C) da7 at ahd1 bus 0 target 6 lun 0 da7: Fixed Direct Access SCSI-3 device da7: 320.000MB/s transfers (160.000MHz, offset 127, 16bit), Tagged Queueing Enabled da7: 35046MB (71775284 512 byte sectors: 255H 63S/T 4467C) SMP: AP CPU #3 Launched! SMP: AP CPU #1 Launched! SMP: AP CPU #2 Launched! GEOM_MIRROR: Device gm0 created (id=1170997708). GEOM_MIRROR: Device gm0: provider da0 detected. GEOM_MIRROR: Device mail created (id=4084922715). GEOM_MIRROR: Device mail: provider da2 detected. GEOM_MIRROR: Device homea created (id=3543800137). GEOM_MIRROR: Device homea: provider da4 detected. GEOM_MIRROR: Device homeb created (id=2383534711). GEOM_MIRROR: Device homeb: provider da6 detected. GEOM_MIRROR: Device gm0: provider da1 detected. GEOM_MIRROR: Device gm0: provider da1 activated. GEOM_MIRROR: Device gm0: provider mirror/gm0 launched. GEOM_MIRROR: Device gm0: rebuilding provider da0. GEOM_MIRROR: Device gm0: provider mirror/gm0 launched. GEOM_MIRROR: Device gm0: rebuilding provider da0. GEOM_MIRROR: Device mail: provider da3 detected. GEOM_MIRROR: Device mail: provider da3 activated. GEOM_MIRROR: Device mail: provider mirror/mail launched. GEOM_MIRROR: Device mail: rebuilding provider da2. GEOM_MIRROR: Device homea: provider da5 detected. GEOM_MIRROR: Device homea: provider da5 activated. GEOM_MIRROR: Device homea: provider mirror/homea launched. GEOM_MIRROR: Device homea: rebuilding provider da4. GEOM_MIRROR: Device homeb: provider da7 detected. GEOM_MIRROR: Device homeb: provider da7 activated. GEOM_MIRROR: Device homeb: provider mirror/homeb launched. GEOM_MIRROR: Device homeb: rebuilding provider da6. Trying to mount root from ufs:/dev/mirror/gm0s1a ...