Date: Thu, 17 Nov 2005 18:20:38 +0100 From: =?ISO-8859-1?Q?Johan_Str=F6m?= <johan@stromnet.org> To: freebsd-stable@freebsd.org Subject: Page fault, GEOM problem?? Message-ID: <991F35AA-151B-4AEA-82BD-5F4AEDF28424@stromnet.org>
next in thread | raw e-mail | index | archive | help
Ok, just got this not so very nice error on a RELENG_6_0 box (built from sources this morning, GENERIC kernel minus drivers I dont use): Nov 17 15:35:43 elfi kernel: subdisk10: detached Nov 17 15:35:43 elfi kernel: ad10: detached Nov 17 15:35:43 elfi kernel: unknown: TIMEOUT - READ_DMA retrying (1 retry left) LBA=85720528 Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Device gm0s1: provider ad10s1 disconnected. Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134356992, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134373376, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=134438912, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268591104, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268607488, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268623872, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=268640256, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=20151026176, length=2048)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[WRITE(offset=32299655680, length=8192)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[READ(offset=37363671552, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[READ(offset=38349087232, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[READ(offset=45453566464, length=16384)] Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6). ad10s1[READ(offset=54459458048, length=131072)] Nov 17 17:59:18 elfi syslogd: kernel boot file is /boot/kernel/kernel Nov 17 17:59:18 elfi kernel: Nov 17 17:59:18 elfi kernel: Nov 17 17:59:18 elfi kernel: Fatal trap 12: page fault while in kernel mode Nov 17 17:59:18 elfi kernel: fault virtual address = 0x48 Nov 17 17:59:18 elfi kernel: fault code = supervisor read, page not present Nov 17 17:59:18 elfi kernel: instruction pointer = 0x20:0xc0506b92 Nov 17 17:59:18 elfi kernel: stack pointer = 0x28:0xd56d7c9c Nov 17 17:59:18 elfi kernel: frame pointer = 0x28:0xd56d7c9c Nov 17 17:59:18 elfi kernel: code segment = base 0x0, limit 0xfffff, type 0x1b Nov 17 17:59:18 elfi kernel: = DPL 0, pres 1, def32 1, gran 1 Nov 17 17:59:18 elfi kernel: processor eflags = interrupt enabled, resume, IOPL = 0 Nov 17 17:59:18 elfi kernel: current process = 36 (swi4: clock sio) Nov 17 17:59:18 elfi kernel: trap number = 12 Nov 17 17:59:18 elfi kernel: panic: page fault Nov 17 17:59:18 elfi kernel: Uptime: 8h55m1s ad10 and ad6, 2 brand new Maxtor Maxline 300GB SATA, attached to a Promise PDC40518 SATA150 controller, makes a GEOM mirror gm0s1. I've been running this stuff in another "test" machine (MSI K8N neo Platinum, KT333 chip I believe), and I havent had a single problem. I moved the disks/controllercard to my "real" server 24 hours ago, with the only apparent "problem" I seemd to have was this: Nov 17 07:06:12 elfi kernel: xl0: transmission error: 90 Nov 17 07:06:12 elfi kernel: xl0: tx underrun, increasing tx start threshold to 120 bytes Nov 17 07:06:18 elfi kernel: xl0: watchdog timeout Nov 17 07:06:18 elfi kernel: xl0: link state changed to DOWN Nov 17 07:06:18 elfi kernel: vlan5: link state changed to DOWN Nov 17 07:06:20 elfi kernel: xl0: link state changed to UP Nov 17 07:06:20 elfi kernel: vlan5: link state changed to UP Comming and going... these problems just apperade during first 20-30 minutes after boot, then they dissapeared totally (and yes there was plenty of IO on the net going on both during and after these messages). Sometimes i just got the first two messages and nothing "happened", but sometimes the watchdog message came and the network died for a minute or so. Here is dmesg from last boot (directly after crash): Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.0-RELEASE #0: Thu Nov 17 00:49:29 CET 2005 johan@elfi.stromnet.org:/usr/obj/usr/src/sys/ELFI ACPI APIC Table: <ASUS A7V333 > Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(TM) XP 1900+ (1599.56-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x662 Stepping = 2 Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE, MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE> AMD Features=0xc0480800<SYSCALL,MP,MMX+,3DNow+,3DNow> real memory = 536854528 (511 MB) avail memory = 516014080 (492 MB) ioapic0: Changing APIC ID to 2 ioapic0 <Version 0.2> irqs 0-23 on motherboard npx0: [FAST] npx0: <math processor> on motherboard npx0: INT 16 interface acpi0: <ASUS A7V333> on motherboard acpi0: Power Button (fixed) pci_link0: <ACPI PCI Link LNKA> irq 11 on acpi0 pci_link1: <ACPI PCI Link LNKB> irq 10 on acpi0 pci_link2: <ACPI PCI Link LNKC> irq 0 on acpi0 pci_link3: <ACPI PCI Link LNKD> irq 12 on acpi0 pci_link4: <ACPI PCI Link LNKE> irq 5 on acpi0 Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <32-bit timer at 3.579545MHz> port 0xe408-0xe40b on acpi0 cpu0: <ACPI CPU> on acpi0 acpi_button0: <Power Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 agp0: <VIA 8367 (KT266/KY266x/KT333) host to PCI bridge> mem 0xe0000000-0xe3ffffff at device 0.0 on pci0 pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0 pci1: <ACPI PCI bus> on pcib1 pci0: <multimedia, audio> at device 5.0 (no driver attached) fwohci0: <Texas Instruments TSB43AB21/A/AI/A-EP> mem 0xdf000000-0xdf0007ff,0xde800000-0xde803fff irq 17 at device 7.0 on pci0 fwohci0: OHCI version 1.10 (ROM=1) fwohci0: No. of Isochronous channels is 4. fwohci0: EUI64 00:e0:18:00:00:02:7e:fe fwohci0: Phy 1394a available S400, 1 ports. fwohci0: Link S400, max_rec 2048 bytes. firewire0: <IEEE1394(FireWire) bus> on fwohci0 sbp0: <SBP-2/SCSI over FireWire> on firewire0 fwe0: <Ethernet over FireWire> on firewire0 if_fwe0: Fake Ethernet address: 02:e0:18:02:7e:fe fwe0: Ethernet address: 02:e0:18:02:7e:fe fwe0: if_start running deferred for Giant fwohci0: Initiate bus reset fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode firewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me) firewire0: bus manager 0 (me) uhci0: <VIA 83C572 USB controller> port 0xd400-0xd41f irq 19 at device 9.0 on pci0 uhci0: [GIANT-LOCKED] usb0: <VIA 83C572 USB controller> on uhci0 usb0: USB revision 1.0 uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: <VIA 83C572 USB controller> port 0xd000-0xd01f irq 16 at device 9.1 on pci0 uhci1: [GIANT-LOCKED] usb1: <VIA 83C572 USB controller> on uhci1 usb1: USB revision 1.0 uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered ehci0: <VIA VT6202 USB 2.0 controller> mem 0xde000000-0xde0000ff irq 17 at device 9.2 on pci0 ehci0: [GIANT-LOCKED] usb2: EHCI version 0.95 usb2: companion controllers, 2 ports each: usb0 usb1 usb2: <VIA VT6202 USB 2.0 controller> on ehci0 usb2: USB revision 2.0 uhub2: VIA EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub2: 4 ports with 4 removable, self powered pci0: <display, VGA> at device 12.0 (no driver attached) atapci0: <Promise PDC40518 SATA150 controller> port 0xb400-0xb47f, 0xb000-0xb0ff mem 0xdc000000-0xdc000fff,0xdb800000-0xdb81ffff irq 17 at device 14.0 on pci0 ata2: <ATA channel 0> on atapci0 ata3: <ATA channel 1> on atapci0 ata4: <ATA channel 2> on atapci0 ata5: <ATA channel 3> on atapci0 xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xa800-0xa87f mem 0xdb000000-0xdb00007f irq 19 at device 16.0 on pci0 miibus0: <MII bus> on xl0 xlphy0: <3c905C 10/100 internal PHY> on miibus0 xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl0: Ethernet address: 00:04:76:ef:c6:36 isab0: <PCI-ISA bridge> at device 17.0 on pci0 isa0: <ISA bus> on isab0 atapci1: <VIA 8233A UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xa400-0xa40f at device 17.1 on pci0 ata0: <ATA channel 0> on atapci1 ata1: <ATA channel 1> on atapci1 uhci2: <VIA 83C572 USB controller> port 0xa000-0xa01f at device 17.2 on pci0 uhci2: [GIANT-LOCKED] usb3: <VIA 83C572 USB controller> on uhci2 usb3: USB revision 1.0 uhub3: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub3: 2 ports with 2 removable, self powered uhci3: <VIA 83C572 USB controller> port 0x9800-0x981f irq 21 at device 17.3 on pci0 uhci3: [GIANT-LOCKED] usb4: <VIA 83C572 USB controller> on uhci3 usb4: USB revision 1.0 uhub4: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub4: 2 ports with 2 removable, self powered ppc0: <ECP parallel printer port> port 0x378-0x37f,0x778-0x77b irq 7 drq 3 on acpi0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/9 bytes threshold ppbus0: <Parallel port bus> on ppc0 lpt0: <Printer> on ppbus0 lpt0: Interrupt-driven port ppi0: <Parallel I/O> on ppbus0 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] pmtimer0 on isa0 orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xccfff, 0xd0000-0xd07ff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 1599556047 Hz quality 800 Timecounters tick every 1.000 msec acd0: CDROM <CD-ROM CDU701-F/1.0q> at ata1-master PIO4 ad6: 286188MB <Maxtor 7L300S0 BANC1G10> at ata3-master SATA150 ad10: 286188MB <Maxtor 7L300S0 BANC1G10> at ata5-master SATA150 GEOM_MIRROR: Device gm0s1 created (id=4118114647). GEOM_MIRROR: Device gm0s1: provider ad6s1 detected. GEOM_MIRROR: Device gm0s1: provider ad10s1 detected. GEOM_MIRROR: Device gm0s1: provider ad6s1 activated. GEOM_MIRROR: Device gm0s1: provider mirror/gm0s1 launched. GEOM_MIRROR: Device gm0s1: rebuilding provider ad10s1. Trying to mount root from ufs:/dev/mirror/gm0s1a WARNING: / was not properly dismounted WARNING: /tmp was not properly dismounted WARNING: /usr was not properly dismounted /usr: mount pending error: blocks 8076 files 28 WARNING: /var was not properly dismounted /var: mount pending error: blocks 4508 files 2 xl0: transmission error: 90 xl0: tx underrun, increasing tx start threshold to 120 bytes xl0: transmission error: 90 xl0: tx underrun, increasing tx start threshold to 180 bytes The network card is the exact same model as the one I used in the "test" machine, didn't have any problems there.. So, any ideas what this can be? If there were a disk crash, wish I have a hard time believing since I ran powermax (maxtor test program) on both of these disk 3 weeks ago and they have been running fine w/o a single problem since I started using them, why didn't just GEOM kick in and run on the other disk? Pagefaulting is not a way to react if a disk goes dead.. Hope someone can help me/this problem doesn't occur any more... but I suppose that is to much to hope for... Thanks Johan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?991F35AA-151B-4AEA-82BD-5F4AEDF28424>