Date: Thu, 25 Dec 2003 16:08:31 +0100 From: "Willem Jan Withagen" <wjw@withagen.nl> To: "Soren Schmidt" <sos@spider.deepcore.dk>, "Bruce Evans" <bde@zeta.org.au> Cc: current@freebsd.org Subject: Re: deadlock in ata_queue_request() Message-ID: <002901c3caf9$855abe50$4c1b3dd4@digiware.nl> References: <200312250907.hBP97AUG031336@spider.deepcore.dk>
next in thread | previous in thread | raw e-mail | index | archive | help
Not shure if this is the same problem, but I get many of these. And on regular basis it freezes my system beyond getting into DDB, or crtl-alt-del, only the hard reset gets me out. Console has than: ad4: READ command timeout tag=0 serv=0 - resetting ata2: resetting devices .. Normally this is followed with: done ... But once in a while the box does not return... So is this a disk going bad, or is it a bug?? This is with GENERIC-5.1-p11 The disk is: ATA channel 2: Master: ad4 <Maxtor 6Y060L0/YAR41VW0> ATA/ATAPI rev 7 Running on a Promise PDC20269 UDMA133 controller at DMA133. But the box crashed again (20 minutes up), so I'm getting a little worried. It's part of a 4 disk vinum raid5, so I've got time to go --WjW ----- Original Message ----- From: "Soren Schmidt" <sos@spider.deepcore.dk> To: "Bruce Evans" <bde@zeta.org.au> Cc: <current@freebsd.org>; <sos@freebsd.org> Sent: Thursday, December 25, 2003 10:07 AM Subject: Re: deadlock in ata_queue_request() > It seems Bruce Evans wrote: > > ata_queue_request() sleeps in an interrupt handler here: > > Yes, I have a local fix to help this, the sleep was originally left in to > make a backport to -stable easier (ie no mutexes), but this need to be > changed here. I'll get it committed asap, but it is hollidays and the > kids has alot of new toys :) > > > % void > > % ata_queue_request(struct ata_request *request) > > % { > > % /* mark request as virgin (it might be a reused one) */ > > % request->result = request->status = request->error = 0; > > % request->flags &= ~ATA_R_DONE; > > % > > % /* put request on the locked queue at the specified location */ > > % mtx_lock(&request->device->channel->queue_mtx); > > % if (request->flags & ATA_R_AT_HEAD) > > % TAILQ_INSERT_HEAD(&request->device->channel->ata_queue, request, chain); > > % else > > % TAILQ_INSERT_TAIL(&request->device->channel->ata_queue, request, chain); > > % mtx_unlock(&request->device->channel->queue_mtx); > > % > > % /* should we skip start ? */ > > % if (!(request->flags & ATA_R_SKIPSTART)) > > % ata_start(request->device->channel); > > % > > % /* if this was a requeue op callback/sleep already setup */ > > % if (request->flags & ATA_R_REQUEUE) > > % return; > > % > > % /* if this is not a callback and we havn't seen DONE yet -> sleep */ > > % if (!request->callback) { > > % while (!(request->flags & ATA_R_DONE)) > > % tsleep(request, PRIBIO, "atareq", hz/10); > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > % } > > % } > > > > when it is called from an interrupt handler. It is called from an interrupt > > handler as part of timeout processing: > > > > ... > > msleep(...) > > ata_queue_request(...) > > ata_via_family_setmode(...) > > ata_identify_devices(...) > > ata_reinit(...) > > ata_timeout(...) > > softclock(...) > > ithread_loop(...) > > ... > > > > The timeout was called here shortly after ad2 hung: > > > > Dec 25 12:28:27 besplex kernel: ad2: TIMEOUT - READ_DMA retrying (2 retries left) > > Dec 25 12:28:27 besplex kernel: ata1: resetting devices .. > > Dec 25 12:28:27 besplex kernel: ad2: FAILURE - already active DMA on this device > > Dec 25 12:28:27 besplex kernel: ad2: setting up DMA failed > > > > ATA_R_DONE was never set and wakeup_request() was never called either, so > > softclock() was deadlocked and tsleep() never returned. > > > > The system ran surprisingly well with softclock() deadlocked. ad0 worked > > and everything that didn't use timeouts worked. Examples of things that > > didn't work because they use timeouts: > > - syscons screen updates. > > - statistics in top and systat. > > - sleep 1 in shells. > > - mbmon (shows the status, then never repeats). > > > > I tried the following to recover: > > - call wakeup(request) using ddb. This worked, but ATA_R_DONE was never > > set so ata_queue_request() just looped. > > - also ignore the ATA_R_DONE check using ddb. This un-deadlocked > > softclock(), but ata1 remained wedged. > > - then call "atacontrol reinit 1". This partly worked: > > > > Dec 25 14:31:12 besplex kernel: ata1: resetting devices .. > > Dec 25 14:31:44 besplex kernel: ad2: WARNING - removed from configuration > > Dec 25 14:31:44 besplex kernel: done > > > > but the ata driver caused a null pointer panic an instant later. > > > > - ad2 didn't come back after a hard reset. > > - ad2 came back after a power cycle. > > > > This was on an undermydesktop. Problems resuming on laptops may be similar. > > The hardware may really be wedged. Then the software shouldn't make things > > worse by sleeping or spinning in the timeout handler. > > > > Bruce > > > > -Søren > Yes I know it works under windows!! > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?002901c3caf9$855abe50$4c1b3dd4>