Date: Thu, 16 Feb 2006 00:41:32 +0100 From: Paolo Maero <paolo@euresis.it> To: Wilko Bulte <wb@freebie.xs4all.nl> Cc: stable@freebsd.org, =?ISO-8859-1?Q?S=F8ren_Schmidt?= <sos@deepcore.dk> Subject: Re: Showstopper ATA bug in 6.1-PRE? (Still happening) Message-ID: <66854B6E-6D86-47BB-998F-6955401718BF@euresis.it> In-Reply-To: <20060210080350.GA5978@freebie.xs4all.nl> References: <43EA5C50.5020804@deepcore.dk> <20060208213704.GA703@freebie.xs4all.nl> <43EA6625.2070106@deepcore.dk> <20060208221056.GA1299@freebie.xs4all.nl> <43EB5393.5090502@deepcore.dk> <20060209144250.GB4874@freebie.xs4all.nl> <43EB55A1.9040405@deepcore.dk> <20060209201912.GA680@freebie.xs4all.nl> <43EBA4F7.7040407@deepcore.dk> <20060209220824.GA1499@freebie.xs4all.nl> <20060210080350.GA5978@freebie.xs4all.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
On Feb 10, 2006, at 9:03 AM, Wilko Bulte wrote: > On Thu, Feb 09, 2006 at 11:08:24PM +0100, Wilko Bulte wrote.. >> On Thu, Feb 09, 2006 at 09:24:23PM +0100, Sren Schmidt wrote.. >>> Wilko Bulte wrote: >>>> On Thu, Feb 09, 2006 at 03:45:53PM +0100, Sren Schmidt wrote.. >>>>> Wilko Bulte wrote: >>>>>> On Thu, Feb 09, 2006 at 03:37:07PM +0100, Sren Schmidt wrote.. >>>>>>> Wilko Bulte wrote: >>>>>>>> On Wed, Feb 08, 2006 at 10:44:05PM +0100, Sren Schmidt wrote.. >>>>>>>>> Wilko Bulte wrote: >>>>>>>>>> On Wed, Feb 08, 2006 at 10:02:08PM +0100, Sren Schmidt >>>>>>>>>> wrote.. >>>>>>>>>>> Wilko Bulte wrote: >>>>>>>>>>>> Hi Soren, >>>>>>>>>>>> >>>>>>>>>>>> I just went to 6.1-PRE on my main machine, coming from >>>>>>>>>>>> 6.0-STABLE >>>>>>>>>>>> of roughly end of december. >>>>>>>>>>>> >>>>>>>>>>>> And I hit some stuff that really worries me: >>>>>>>>>>>> >>>>>>>>>>>> - the freshly built kernel keels over with (hand >>>>>>>>>>>> transcribed): >>>>>>>>>>>> >>>>>>>>>>>> ata3: reiniting channel SATA connect ... >>>>>>>>>>>> SATA connected >>>>>>>>>>>> sata_connect_devices 0x1 <ATA_MASTER> >>>>>>>>>>>> >>>>>>>>>>>> ad6: req=0xC35ba0c8 SETFEATURES SETTRANSFERMODE >>>>>>>>>>>> semaphore timeout >>>>>>>>>>>> !! DANGER Will RObinson !! >>>>>>>>>>>> >>>>>>>>>>>> (... is where I cannot read my own handwriting, it >>>>>>>>>>>> scrolled quite >>>>>>>>>>>> fast on >>>>>>>>>>>> the screen..) >>>>>>>>>>>> >>>>>>>>>>>> Boot device is a SATA RAID1 on a Promise 2300. >>>>>>>>>>> Hmm, that should not happen. Could you try to backstep >>>>>>>>>>> just ATA to >>>>>>>>>>> before the MFC, that is 24/1/06 and let me know if that >>>>>>>>>>> helps >>>>>>>>>>> please ? >>>>>>>>>> First impression is that the problem is gone. None of the >>>>>>>>>> previously reported errors are seen. I am running a level >>>>>>>>>> 0 dump >>>>>>>>>> from disk to disk >>>>>>>>>> to see if the box remains stable. Given that this is my >>>>>>>>>> primary >>>>>>>>>> machine >>>>>>>>>> I sure hope it will be :-) >>>>>>>>>> >>>>>>>>>>>> Another snag is that my ad10 disk on 6.0-STABLE suddenly >>>>>>>>>>>> became >>>>>>>>>>>> ad12 on >>>>>>>>>>>> 6.1-PRE >>>>>>>>>>> Hmm that is because there is only 2 ports on your promise >>>>>>>>>>> which is >>>>>>>>>>> now correctly identified, before it was errounsly found >>>>>>>>>>> as 3 ports. >>>>>>>>>> Ah, OK. I would suggest a note to the Release Note >>>>>>>>>> writers would be >>>>>>>>>> a good >>>>>>>>>> thing, devices changing location after an upgrade in the - >>>>>>>>>> stable >>>>>>>>>> branch >>>>>>>>>> is unnerving ;-) >>>>>>>>> Well, the good thing is that I can reproduce the error >>>>>>>>> here, the bad >>>>>>>>> thing is that it slipped through testing on -current... >>>>>>>>> Oh, well, I'll look into it ASAP... >>>>>>>> Thank you Soren! >>>>>>> OK, had a few this afternoon, could you try this patch and >>>>>>> let me know >>>>>>> if it helps, at least it makes the problem go away on my >>>>>>> testbed.. >>>>>> Is this relative to HEAD or RELENG_6? I cannot / will not go >>>>>> to HEAD >>>>>> with this machine (my main production box.. :-) >>>>> Doesn't matter, ATA is the same on both... >>>> >>>> OK, I was not sure if they were 100% identical. >>>> >>>> The patch at first impression seems to have eliminated the problem. >>> >>> Good seems I'm on the right track at least. >>> >>>> Interestingly enough ad10 remained ad10 with the patch applied? >>> >>> Yeah, thats intentional, I though we better not break POLA here.. >> >> I agree :-) >> >>>> I'll put some load on to see what happens. >>> >>> Let me know how that turns out, I'll clean things up a bit and >>> get it >>> committed to -current, then get permission to MFC when we are >>> sure it >>> fixes the problem... >> >> I ran a 44GB disk-to-disk dump without incidents (source on the >> RAID1, >> target on the JBOD). No problems whatsoever. >> >> Looks like things behave much better now. Tonight the machine will >> run a daily full dump to DLT tape, I'll know how that turns out >> tomorrow. > > Backup ran without problems. > > -- > Wilko Bulte wilko@FreeBSD.org > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable- > unsubscribe@freebsd.org" > Soren, I am still getting this error. I cvsup'd to RELENG_6 today (Feb. 15, 2006 at 6:00 PM CET) and I got the following files from HEAD: ata-all.c (1.264) ata-all.h (1.113) ata-chipset.c (1.158) ata-pci.c (1.115) I have then rebuilt the kernel and world and I tested the system. I have a Promise TX2300 with two 250 GB Maxtor in RAID1 with gmirror, plus two 9 GB SCSI drives in RAID1 with gmirror where I put the OS and applications and a 160 GB UATA Seagate (attached to the motherboard UATA) where I backup sensitive data. The system was able to rebuild the mirror (I always got the error with RELENG_6_0), but then I made a backup tar from the two Maxtor's mirror to the Seagate and I got the following error after having copied approx 16 GB: Feb 15 20:56:31 zoe kernel: GEOM_MIRROR: Device data1: provider ad4 disconnected. Feb 15 20:56:31 zoe kernel: GEOM_MIRROR: Device data1: provider ad4 detected. Feb 15 20:56:31 zoe kernel: GEOM_MIRROR: Device data1: rebuilding provider ad4. ... Feb 15 22:33:15 zoe kernel: GEOM_MIRROR: Device data1: rebuilding provider ad4 finished. Feb 15 22:33:15 zoe kernel: GEOM_MIRROR: Device data1: provider ad4 activated. ... Feb 15 23:04:12 zoe kernel: ad4: req=0xc2422578 SETFEATURES SET TRANSFER MODE semaphore timeout !! DANGER Will Robinson !! Feb 15 23:04:28 zoe last message repeated 4 times Feb 15 23:04:32 zoe kernel: ad4: req=0xc2422578 SETFEATURES SET TRANSFER MODE semaphore timeout !! DANGER Will Robinson !! Feb 15 23:05:04 zoe last message repeated 8 times Feb 15 23:07:08 zoe last message repeated 31 times Feb 15 23:08:56 zoe last message repeated 27 times Feb 15 23:09:00 zoe kernel: ad4: req=0xc2422578 SETFEATURES SET TRANSFER MODE semaphore timeout !! DANGER Will Robinson !! Feb 15 23:09:32 zoe last message repeated 8 times Feb 15 23:11:36 zoe last message repeated 31 times Feb 15 23:21:40 zoe last message repeated 151 times Feb 15 23:31:40 zoe last message repeated 150 times Feb 15 23:34:28 zoe last message repeated 42 times Feb 15 23:34:32 zoe kernel: ad4: req=0xc2422578 SETFEATURES SET TRANSFER MODE semaphore timeout !! DANGER Will Robinson !! Feb 15 23:35:04 zoe last message repeated 8 times Feb 15 23:37:08 zoe last message repeated 31 times Feb 15 23:40:52 zoe last message repeated 56 times Feb 15 23:40:53 zoe shutdown: power-down by paolo: Feb 15 23:40:56 zoe syslogd: exiting on signal 15 I was not able to shutdown the system cleanly. I also noticed a small problem in the code (probably not related to this error): At line 1011 of src/sys/dev/ata/ata-all.c a free() is used but ata_delayed_attach is not reset to NULL Actual (1.264): if (config_intrhook_establish(ata_delayed_attach) != 0) { printf("ata: config_intrhook_establish failed\n"); free(ata_delayed_attach, M_TEMP); } Fix: if (config_intrhook_establish(ata_delayed_attach) != 0) { printf("ata: config_intrhook_establish failed\n"); free(ata_delayed_attach, M_TEMP); + ata_delayed_attach = NULL; } This is my dmesg: Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.1-PRERELEASE #0: Wed Feb 15 20:27:17 CET 2006 root@zoe.paolo.maero.net:/usr/obj/usr/src/sys/ZOE_022 ACPI APIC Table: <Intel N440BX > Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel Pentium III (696.41-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x683 Stepping = 3 Features=0x387fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE, MCA,CMOV,PAT,PSE36,PN,MMX,FXSR,SSE> real memory = 536805376 (511 MB) avail memory = 516100096 (492 MB) FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 1 cpu1 (AP): APIC ID: 0 ioapic0 <Version 1.1> irqs 0-23 on motherboard npx0: [FAST] npx0: <math processor> on motherboard npx0: INT 16 interface acpi0: <PTLTD RSDT> on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0xc08-0xc0b on acpi0 cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 agp0: <Intel 82443GX host to PCI bridge> mem 0xf8000000-0xfbffffff at device 0.0 on pci0 pcib1: <PCI-PCI bridge> at device 1.0 on pci0 pci1: <PCI bus> on pcib1 pcib2: <PCI-PCI bridge> at device 15.0 on pci1 pci2: <PCI bus> on pcib2 atapci0: <Promise PDC20771 SATA300 controller> port 0x3400-0x347f, 0x3000-0x30ff mem 0xf4220000-0xf4220fff,0xf4200000-0xf421ffff irq 23 at device 7.0 on pci2 ata2: <ATA channel 0> on atapci0 ata3: <ATA channel 1> on atapci0 ata4: <ATA channel 2> on atapci0 ahc0: <Adaptec aic7896/97 Ultra2 SCSI adapter> port 0x2400-0x24ff mem 0xf4101000-0xf4101fff irq 19 at device 12.0 on pci0 ahc0: [GIANT-LOCKED] aic7896/97: Ultra2 Wide Channel A, SCSI Id=7, 32/253 SCBs ahc1: <Adaptec aic7896/97 Ultra2 SCSI adapter> port 0x2800-0x28ff mem 0xf4102000-0xf4102fff irq 19 at device 12.1 on pci0 ahc1: [GIANT-LOCKED] aic7896/97: Ultra2 Wide Channel B, SCSI Id=7, 32/253 SCBs fxp0: <Intel 82559 Pro/100 Ethernet> port 0x2000-0x203f mem 0xf4100000-0xf4100fff,0xf4000000-0xf40fffff irq 21 at device 14.0 on pci0 miibus0: <MII bus> on fxp0 inphy0: <i82555 10/100 media interface> on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: Ethernet address: 00:d0:b7:88:6c:c5 isab0: <PCI-ISA bridge> at device 18.0 on pci0 isa0: <ISA bus> on isab0 atapci1: <Intel PIIX4 UDMA33 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x2040-0x204f at device 18.1 on pci0 ata0: <ATA channel 0> on atapci1 ata1: <ATA channel 1> on atapci1 uhci0: <Intel 82371AB/EB (PIIX4) USB controller> port 0x2060-0x207f irq 21 at device 18.2 on pci0 uhci0: [GIANT-LOCKED] usb0: <Intel 82371AB/EB (PIIX4) USB controller> on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered intpm0: <Intel 82371AB Power management controller> port 0x1040-0x104f irq 9 at device 18.3 on pci0 intpm0: I/O mapped 1040 intpm0: intr IRQ 9 enabled revision 0 intpm0: [GIANT-LOCKED] intsmb0: <Intel PIIX4 SMBUS Interface> on intpm0 smbus1: <System Management Bus> on intsmb0 smb0: <SMBus generic I/O> on smbus1 intpm0: PM I/O mapped c00 pci0: <display, VGA> at device 20.0 (no driver attached) atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] fdc0: <floppy drive controller> port 0x3f4-0x3f7 irq 6 drq 2 on acpi0 fdc0: No FDOUT register! device_attach: fdc0 attach returned 6 ppc0: <ECP parallel printer port> port 0x378-0x37f,0x778-0x77f irq 7 drq 3 on acpi0 ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/8 bytes threshold ppbus0: <Parallel port bus> on ppc0 plip0: <PLIP network interface> on ppbus0 lpt0: <Printer> on ppbus0 lpt0: Interrupt-driven port ppi0: <Parallel I/O> on ppbus0 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A fdc0: <floppy drive controller> port 0x3f4-0x3f7 irq 6 drq 2 on acpi0 fdc0: No FDOUT register! device_attach: fdc0 attach returned 6 pmtimer0 on isa0 orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcdfff, 0xce000-0xcf7ff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ulpt0: EPSON USB Printer, rev 1.10/1.00, addr 2, iclass 7/1 ulpt0: using bi-directional mode Timecounters tick every 1.000 msec ad0: 152627MB <WDC WD1600BB-00HTA0 15.05R15> at ata0-master UDMA33 acd0: CDRW <HL-DT-ST GCE-8525B/1.02> at ata1-master UDMA33 ad4: 239372MB <Maxtor 7L250S0 BANC1E00> at ata2-master SATA150 GEOM_MIRROR: Device data1 created (id=618669616). GEOM_MIRROR: Device data1: provider ad4 detected. ad6: 239372MB <Maxtor 7L250S0 BANC1E00> at ata3-master SATA150 Waiting 5 seconds for SCSI devices to settle GEOM_MIRROR: Device data1: provider ad6 detected. GEOM_MIRROR: Device data1: provider ad6 activated. GEOM_MIRROR: Device data1: provider ad4 is stale. GEOM_MIRROR: Device data1: provider mirror/data1 launched. SMP: AP CPU #1 Launched! da0 at ahc0 bus 0 target 0 lun 0 da0: <IBM DNES-309170W SAH0> Fixed Direct Access SCSI-3 device da0: 80.000MB/s transfers (40.000MHz, offset 30, 16bit), Tagged Queueing Enabled da0: 8748MB (17916240 512 byte sectors: 255H 63S/T 1115C) da1 at ahc0 bus 0 target 1 lun 0 da1: <IBM DNES-309170W SAH0> Fixed Direct Access SCSI-3 device da1: 80.000MB/s transfers (40.000MHz, offset 30, 16bit), Tagged Queueing Enabled da1: 8748MB (17916240 512 byte sectors: 255H 63S/T 1115C) GEOM_MIRROR: Device sgm0 created (id=1390498924). GEOM_MIRROR: Device sgm0: provider da0 detected. GEOM_MIRROR: Device sgm0: provider da1 detected. GEOM_MIRROR: Device sgm0: provider da1 activated. GEOM_MIRROR: Device sgm0: provider da0 activated. GEOM_MIRROR: Device sgm0: provider mirror/sgm0 launched. Trying to mount root from ufs:/dev/mirror/sgm0s1a Regards Paolo
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?66854B6E-6D86-47BB-998F-6955401718BF>