Date: Mon, 2 Mar 2009 14:07:59 -0500 From: Elliot Schlegelmilch <elliot@schlegelmilch.org> To: Alexander Motin <mav@mavhome.dp.ua> Cc: FreeBSD-Current <freebsd-current@freebsd.org> Subject: Re: SATA disks suddenly stop working Message-ID: <20090302190759.GA95194@schlegelmilch.org> In-Reply-To: <49AAB0A6.3040304@mavhome.dp.ua> References: <go44ht$2i6a$1@FreeBSD.cs.nctu.edu.tw> <1235602472.00079680.1235592003@10.7.7.3> <1235658185.00079898.1235647801@10.7.7.3> <1235863381.00080963.1235851802@10.7.7.3> <49AAB0A6.3040304@mavhome.dp.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Alexander Motin wrote: [snip] >> >> ata2: <ATA channel 0> on atapci1 >> ata2: AHCI reset...: 2 >> ata2: SATA connect time=0ms >> ata2: ready wait time=0ms52 (12272 MB) >> ata2: software reset port 15... >> ata2: ahci_issue_cmd timeout: 100 of 100ms, status=00000001 >> ata2: software reset set timeout >> ata2: software reset port 0... >> ata2: ahci_issue_cmd timeout: 100 of 100ms, status=00000001 >> ata2: software reset set timeout >> ata2: SIGNATURE: ffffffff >> ata2: Unknown signature, assuming disk device >> ata2: AHCI reset done: devices=00000001 >> ata2: [MPSAFE] >> ata2: [ITHREAD] >> >> One for each channel, up to ata7. > > Does it happen during boot or what do you mean by unable to reattach > drive now? Yes, I saw the above during boot. What I mean by unable to reattach is describing the old behavior: Sometimes my ad12 would fall off the bus, and I could usually retrieve it by 'atacontrol detach ata6; atacontrol attach ata6;' Now it's: ata6: still BUSY after softreset and attempting the detach/attach results in: Tracing pid 12 tid 100007 td 0xffffff0001afb390 device_get_parent() at device_get_parent+0x1 ata_start() at ata_start+0x1c5 ata_reinit() at ata_reinit+0x1dd ata_completed() at ata_completed+0x75 softclock() at softclock+0x291 intr_event_execute_handlers() at intr_event_execute_handlers+0x68 ithread_loop() at ithread_loop+0xb2 fork_exit() at fork_exit+0x12a fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xfffffffe4004ad40, rbp = 0 --- This isn't a huge deal, and is probably a red herring, as I suspect the disk is going bad at this point. This is running Feb 1 kernel, as I recall. However, it can and has stayed attached for weeks at a time before. >> atapci0@pci0:0:31:1: class=0x01018a card=0x948115d9 chip=0x269e8086 >> rev=0x09 hdr=0x00 >> vendor = 'Intel Corporation' >> device = '631xESB/632xESB/3100 Ultra ATA Storage Controller' >> class = mass storage >> subclass = ATA >> >> The last known kernel which works was Dec 17, but trying to rebuild a >> kernel from that date doesn't see the SATA disks either (as the kernel >> which sees the disks zfs doesn't work.) Or perhaps I'm csup'ing >> incorrectly. > > Haven't you tried to just touched reset sequence on 15. Do you mean a kernel on Feb 15? Was there more that happened between 15th and the 22nd or so? > When you succeed to boot, can you try to make some experiments against > HEAD, may be some of them fix the problem: > 1) comment that line inside ata_ahci_issue_cmd(): > ATA_OUTL(ctlr->r_res2, ATA_AHCI_P_FBS + offset, (port << 8) | > 0x00000001); > > 2) comment these lines inside ata_sata_phy_reset(): > if ((ATA_IDX_INL(ch, ATA_SCONTROL) & ATA_SC_DET_MASK) == > ATA_SC_DET_IDLE) > return ata_sata_connect(ch); > > 3) comment first that line inside ata_ahci_softreset(): > return (-1); > > Thanks. > I'll try these patches and report back right after I freshen up my backups. :)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090302190759.GA95194>