Date: Tue, 17 Jun 2008 12:59:45 GMT From: Salik Rafiq <chameeyass@hotmail.com> To: freebsd-gnats-submit@FreeBSD.org Subject: kern/124670: large file operation on RAID cause many GEOM errors - crash Message-ID: <200806171259.m5HCxjfg097712@www.freebsd.org> Resent-Message-ID: <200806171300.m5HD07ZU028576@freefall.freebsd.org>
index | next in thread | raw e-mail
>Number: 124670 >Category: kern >Synopsis: large file operation on RAID cause many GEOM errors - crash >Confidential: no >Severity: serious >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Jun 17 13:00:07 UTC 2008 >Closed-Date: >Last-Modified: >Originator: Salik Rafiq >Release: 7.0 RELEASE >Organization: Chameeya S S Ltd. >Environment: FreeBSD ChamRAID01 7.0-RELEASE FreeBSD 7.0-RELEASE #0: Sun Feb 24 19:59:52 UTC 2008 root@logan.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC i386 >Description: machine configuration: Celeron 800Mhz., 768MB RAM, 11GB IDE mount: / ad2 - motherboard IDE connection Sil 3512 SATA PCI card 320GB SATA ad4 320GB SATA ad6 - created /dev/mirror/dat mount: /home I have serious problems when I work with a large file or large file copies. I have had a serious of issues with the RAID. it goes down nearly every day. Sometimes several times each day! Here's a extract of message when I was copying a single 1.8GB file from one SAMBA share on the mirror to another SAMBA share on the same mirror. Jun 17 10:22:48 ChamRAID01 kernel: xl0: transmission error: 90 Jun 17 10:22:48 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 120 bytes Jun 17 10:41:33 ChamRAID01 kernel: xl0: transmission error: 90 Jun 17 10:41:33 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 180 bytes Jun 17 10:54:49 ChamRAID01 kernel: xl0: transmission error: 90 Jun 17 10:54:49 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 240 bytes Jun 17 10:55:12 ChamRAID01 kernel: xl0: transmission error: 90 Jun 17 10:55:12 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 300 bytes Jun 17 10:56:22 ChamRAID01 kernel: ad4: FAILURE - device detached Jun 17 10:56:22 ChamRAID01 kernel: subdisk4: detached Jun 17 10:56:22 ChamRAID01 kernel: ad4: detached Jun 17 10:56:22 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider ad4 disconnected. Jun 17 10:56:22 ChamRAID01 kernel: g_vfs_done():mirror/dat[READ(offset=267860606976, length=131072)]error = 6 Jun 17 10:56:41 ChamRAID01 kernel: ad6: FAILURE - device detached Jun 17 10:56:41 ChamRAID01 kernel: subdisk6: detached Jun 17 10:56:41 ChamRAID01 kernel: ad6: detached Jun 17 10:56:41 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider ad6 disconnected. Jun 17 10:56:41 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider mirror/dat destroyed. Jun 17 10:56:41 ChamRAID01 kernel: GEOM_MIRROR: Device dat destroyed. Jun 17 10:56:41 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268466061312, length=16384)]error = 6 Jun 17 10:56:41 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268467388416, length=131072)]error = 6 Jun 17 10:56:41 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268467519488, length=131072)]error = 6 Jun 17 10:56:41 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268467650560, length=131072)]error = 6 Jun 17 10:56:41 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268467781632, length=131072)]error = 6 ... lots and lots of these ... - the machine crash after this. Don't think it left a dump. - when it came back the OS reported missing and bad blocks on the mirror disks. I did a fsck and cleaned the mirror disks up. I didn't check the mirror status but I suspect it was rebuilding. when the fsck finished I attempted to reboot the machine by issuing a reboot. The machine crashed - it left a core dump this time. When it came back up, the mirror rebuilt and I tried the file copy on the console instead of from my Windows machine. And the same happened. Jun 17 11:40:31 ChamRAID01 kernel: ad6: FAILURE - device detached Jun 17 11:40:31 ChamRAID01 kernel: subdisk6: detached Jun 17 11:40:31 ChamRAID01 kernel: ad6: detached Jun 17 11:40:31 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider ad6 disconnected. Jun 17 11:40:31 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider mirror/dat destroyed. Jun 17 11:40:31 ChamRAID01 kernel: GEOM_MIRROR: Device dat: rebuilding provider ad4 stopped. Jun 17 11:40:31 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=289641709568, length=131072)]error = 6 Jun 17 11:40:31 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=289641840640, length=131072)]error = 6 Jun 17 11:40:36 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=114688, length=16384)]error = 6 ..again LOTS of these. I had to turn the machine off this time. I'm thinking of removing the raid and just going with a single device and a cron job to copy the files over to the other disk each night. At least that would work in the meantime. I don't have any idea what the issue is. SiL 3512 drivers perhaps? I have NOT created the mirror in the RAID card BIOS..just using JBOD. I have replaced the power supply incase it was a power issue. Here is the boot messages: Jun 17 10:57:51 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268309741568, length=16384)]error = 6 Jun 17 10:57:51 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268293685248, length=16384)]error = 6 Jun 17 10:57:51 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268297748480, length=32768)]error = 6 Jun 17 11:10:58 ChamRAID01 syslogd: kernel boot file is /boot/kernel/kernel Jun 17 11:10:58 ChamRAID01 kernel: Copyright (c) 1992-2008 The FreeBSD Project. Jun 17 11:10:58 ChamRAID01 kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Jun 17 11:10:58 ChamRAID01 kernel: The Regents of the University of California. All rights reserved. Jun 17 11:10:58 ChamRAID01 kernel: FreeBSD is a registered trademark of The FreeBSD Foundation. Jun 17 11:10:58 ChamRAID01 kernel: FreeBSD 7.0-RELEASE #0: Sun Feb 24 19:59:52 UTC 2008 Jun 17 11:10:58 ChamRAID01 kernel: root@logan.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC Jun 17 11:10:58 ChamRAID01 kernel: Timecounter "i8254" frequency 1193182 Hz quality 0 Jun 17 11:10:58 ChamRAID01 kernel: CPU: Intel Celeron (768.42-MHz 686-class CPU) Jun 17 11:10:58 ChamRAID01 kernel: Origin = "GenuineIntel" Id = 0x686 Stepping = 6 Jun 17 11:10:58 ChamRAID01 kernel: Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE> Jun 17 11:10:58 ChamRAID01 kernel: real memory = 671023104 (639 MB) Jun 17 11:10:58 ChamRAID01 kernel: avail memory = 642785280 (613 MB) Jun 17 11:10:58 ChamRAID01 kernel: kbd1 at kbdmux0 Jun 17 11:10:58 ChamRAID01 kernel: ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) Jun 17 11:10:58 ChamRAID01 kernel: hptrr: HPT RocketRAID controller driver v1.1 (Feb 24 2008 19:59:27) Jun 17 11:10:58 ChamRAID01 kernel: acpi0: <HP HPBDD_IO> on motherboard Jun 17 11:10:58 ChamRAID01 kernel: acpi0: [ITHREAD] Jun 17 11:10:58 ChamRAID01 kernel: acpi0: Power Button (fixed) Jun 17 11:10:58 ChamRAID01 kernel: acpi0: reservation of 0, a0000 (3) failed Jun 17 11:10:58 ChamRAID01 kernel: acpi0: reservation of 100000, 27ef0000 (3) failed Jun 17 11:10:58 ChamRAID01 kernel: Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 Jun 17 11:10:58 ChamRAID01 kernel: acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0 Jun 17 11:10:58 ChamRAID01 kernel: cpu0: <ACPI CPU> on acpi0 Jun 17 11:10:58 ChamRAID01 kernel: acpi_throttle0: <ACPI CPU Throttling> on cpu0 Jun 17 11:10:58 ChamRAID01 kernel: acpi_button0: <Power Button> on acpi0 Jun 17 11:10:58 ChamRAID01 kernel: pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff,0x4000-0x407f,0x4080-0x40ff,0x5000-0x500f,0x6000-0x607f on acpi0 Jun 17 11:10:58 ChamRAID01 kernel: pci0: <ACPI PCI bus> on pcib0 Jun 17 11:10:58 ChamRAID01 kernel: agp0: <VIA 82C694X (Apollo Pro 133A) host to PCI bridge> on hostb0 Jun 17 11:10:58 ChamRAID01 kernel: agp0: aperture size is 256M Jun 17 11:10:58 ChamRAID01 kernel: pcib1: <PCI-PCI bridge> at device 1.0 on pci0 Jun 17 11:10:58 ChamRAID01 kernel: pci1: <PCI bus> on pcib1 Jun 17 11:10:58 ChamRAID01 kernel: vgapci0: <VGA-compatible display> port 0x9000-0x90ff mem 0xd6000000-0xd6ffffff,0xd5000000-0xd5000fff irq 12 at device 0.0 on pci1 Jun 17 11:10:58 ChamRAID01 kernel: isab0: <PCI-ISA bridge> at device 4.0 on pci0 Jun 17 11:10:58 ChamRAID01 kernel: isa0: <ISA bus> on isab0 Jun 17 11:10:58 ChamRAID01 kernel: atapci0: <VIA 82C686A UDMA66 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xa000-0xa00f at device 4.1 on pci0 Jun 17 11:10:58 ChamRAID01 kernel: ata0: <ATA channel 0> on atapci0 Jun 17 11:10:58 ChamRAID01 kernel: ata0: [ITHREAD] Jun 17 11:10:58 ChamRAID01 kernel: ata1: <ATA channel 1> on atapci0 Jun 17 11:10:58 ChamRAID01 kernel: ata1: [ITHREAD] Jun 17 11:10:58 ChamRAID01 kernel: uhci0: <VIA 83C572 USB controller> port 0xa400-0xa41f irq 10 at device 4.2 on pci0 Jun 17 11:10:58 ChamRAID01 kernel: uhci0: [GIANT-LOCKED] Jun 17 11:10:58 ChamRAID01 kernel: uhci0: [ITHREAD] Jun 17 11:10:58 ChamRAID01 kernel: usb0: <VIA 83C572 USB controller> on uhci0 Jun 17 11:10:58 ChamRAID01 kernel: usb0: USB revision 1.0 Jun 17 11:10:58 ChamRAID01 kernel: uhub0: <VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0 Jun 17 11:10:58 ChamRAID01 kernel: uhub0: 2 ports with 2 removable, self powered Jun 17 11:10:58 ChamRAID01 kernel: pci0: <bridge> at device 4.4 (no driver attached) Jun 17 11:10:58 ChamRAID01 kernel: pci0: <multimedia, audio> at device 4.5 (no driver attached) Jun 17 11:10:58 ChamRAID01 kernel: xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xb800-0xb87f mem 0xd8000000-0xd800007f irq 10 at device 5.0 on pci0 Jun 17 11:10:58 ChamRAID01 kernel: miibus0: <MII bus> on xl0 Jun 17 11:10:58 ChamRAID01 kernel: xlphy0: <3c905C 10/100 internal PHY> PHY 24 on miibus0 Jun 17 11:10:58 ChamRAID01 kernel: xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto Jun 17 11:10:58 ChamRAID01 kernel: xl0: Ethernet address: 00:50:da:38:c1:2c Jun 17 11:10:58 ChamRAID01 kernel: xl0: [ITHREAD] Jun 17 11:10:58 ChamRAID01 kernel: atapci1: <SiI SiI 3512 SATA150 controller> port 0xbc00-0xbc07,0xc000-0xc003,0xc400-0xc407,0xc800-0xc803,0xcc00-0xcc0f mem 0xd8001000-0xd80011ff irq 11 at device 6.0 on pci0 Jun 17 11:10:58 ChamRAID01 kernel: atapci1: [ITHREAD] Jun 17 11:10:58 ChamRAID01 kernel: ata2: <ATA channel 0> on atapci1 Jun 17 11:10:58 ChamRAID01 kernel: ata2: [ITHREAD] Jun 17 11:10:58 ChamRAID01 kernel: ata3: <ATA channel 1> on atapci1 Jun 17 11:10:58 ChamRAID01 kernel: ata3: [ITHREAD] Jun 17 11:10:58 ChamRAID01 kernel: fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 Jun 17 11:10:58 ChamRAID01 kernel: fdc0: [FILTER] Jun 17 11:10:58 ChamRAID01 kernel: fd0: <1440-KB 3.5" drive> on fdc0 drive 0 Jun 17 11:10:58 ChamRAID01 kernel: sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 Jun 17 11:10:58 ChamRAID01 kernel: sio0: type 16550A Jun 17 11:10:58 ChamRAID01 kernel: sio0: [FILTER] Jun 17 11:10:58 ChamRAID01 kernel: sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 Jun 17 11:10:58 ChamRAID01 kernel: sio1: type 16550A Jun 17 11:10:58 ChamRAID01 kernel: sio1: [FILTER] Jun 17 11:10:58 ChamRAID01 kernel: atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 Jun 17 11:10:58 ChamRAID01 kernel: atkbd0: <AT Keyboard> irq 1 on atkbdc0 Jun 17 11:10:58 ChamRAID01 kernel: kbd0 at atkbd0 Jun 17 11:10:58 ChamRAID01 kernel: atkbd0: [GIANT-LOCKED] Jun 17 11:10:58 ChamRAID01 kernel: atkbd0: [ITHREAD] Jun 17 11:10:58 ChamRAID01 kernel: pmtimer0 on isa0 Jun 17 11:10:58 ChamRAID01 kernel: orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xcc000-0xcc7ff,0xcd000-0xd17ff pnpid ORM0000 on isa0 Jun 17 11:10:58 ChamRAID01 kernel: ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0 Jun 17 11:10:58 ChamRAID01 kernel: ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode Jun 17 11:10:58 ChamRAID01 kernel: ppc0: FIFO with 16/16/8 bytes threshold Jun 17 11:10:58 ChamRAID01 kernel: ppbus0: <Parallel port bus> on ppc0 Jun 17 11:10:58 ChamRAID01 kernel: ppbus0: [ITHREAD] Jun 17 11:10:58 ChamRAID01 kernel: lpt0: <Printer> on ppbus0 Jun 17 11:10:58 ChamRAID01 kernel: lpt0: Interrupt-driven port Jun 17 11:10:58 ChamRAID01 kernel: ppi0: <Parallel I/O> on ppbus0 Jun 17 11:10:58 ChamRAID01 kernel: plip0: <PLIP network interface> on ppbus0 Jun 17 11:10:58 ChamRAID01 kernel: ppc0: [GIANT-LOCKED] Jun 17 11:10:58 ChamRAID01 kernel: ppc0: [ITHREAD] Jun 17 11:10:58 ChamRAID01 kernel: sc0: <System console> at flags 0x100 on isa0 Jun 17 11:10:58 ChamRAID01 kernel: sc0: VGA <16 virtual consoles, flags=0x300> Jun 17 11:10:58 ChamRAID01 kernel: vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Jun 17 11:10:58 ChamRAID01 kernel: uhub1: <ALCOR Generic USB Hub, class 9/0, rev 1.10/3.12, addr 2> on uhub0 Jun 17 11:10:58 ChamRAID01 kernel: uhub1: 4 ports with 4 removable, self powered Jun 17 11:10:58 ChamRAID01 kernel: ukbd0: <CHESEN USB Keyboard, class 0/0, rev 1.10/1.10, addr 3> on uhub1 Jun 17 11:10:58 ChamRAID01 kernel: kbd2 at ukbd0 Jun 17 11:10:58 ChamRAID01 kernel: uhid0: <CHESEN USB Keyboard, class 0/0, rev 1.10/1.10, addr 3> on uhub1 Jun 17 11:10:58 ChamRAID01 kernel: ums0: <vendor 0x062a product 0x0000, class 0/0, rev 1.10/0.00, addr 4> on uhub1 Jun 17 11:10:58 ChamRAID01 kernel: ums0: 3 buttons and Z dir. Jun 17 11:10:58 ChamRAID01 kernel: uhid1: <No brand SP04-A1, class 0/0, rev 1.10/1.00, addr 5> on uhub1 Jun 17 11:10:58 ChamRAID01 kernel: uhid2: <No brand SP04-A1, class 0/0, rev 1.10/1.00, addr 5> on uhub1 Jun 17 11:10:58 ChamRAID01 kernel: uhid2: unexpected endpoint Jun 17 11:10:58 ChamRAID01 kernel: device_attach: uhid2 attach returned 6 Jun 17 11:10:58 ChamRAID01 kernel: Timecounter "TSC" frequency 768417488 Hz quality 800 Jun 17 11:10:58 ChamRAID01 kernel: Timecounters tick every 1.000 msec Jun 17 11:10:58 ChamRAID01 kernel: hptrr: no controller detected. Jun 17 11:10:58 ChamRAID01 kernel: acd0: CDRW <PHILIPS CDRW1610A/0.010000> at ata0-slave UDMA33 Jun 17 11:10:58 ChamRAID01 kernel: ad2: 9773MB <FUJITSU MPF3102AT 0028> at ata1-master UDMA66 Jun 17 11:10:58 ChamRAID01 kernel: ad4: 305245MB <WDC WD3200AAKS-00B3A0 01.03A01> at ata2-master SATA150 Jun 17 11:10:58 ChamRAID01 kernel: ad6: 305245MB <WDC WD3200AAKS-00B3A0 01.03A01> at ata3-master SATA150 Jun 17 11:10:58 ChamRAID01 kernel: GEOM_MIRROR: Device mirror/dat launched (1/2). Jun 17 11:10:58 ChamRAID01 kernel: GEOM_MIRROR: Device dat: rebuilding provider ad4. Jun 17 11:10:58 ChamRAID01 kernel: Trying to mount root from ufs:/dev/ad2s1a Jun 17 11:10:58 ChamRAID01 kernel: WARNING: / was not properly dismounted Jun 17 11:10:58 ChamRAID01 savecore: reboot after panic: page fault Jun 17 11:10:58 ChamRAID01 savecore: writing core to vmcore.2 Hope someone can help me out. >How-To-Repeat: Copy or manipulate a large zip or similar file on the RAID device. This can be either done from SAMBA, NFS or on the machine itself. >Fix: none. >Release-Note: >Audit-Trail: >Unformatted:help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200806171259.m5HCxjfg097712>
