Date: Thu, 25 Dec 2003 10:07:10 +0100 (CET) From: Soren Schmidt <sos@spider.deepcore.dk> To: Bruce Evans <bde@zeta.org.au> Cc: sos@FreeBSD.ORG Subject: Re: deadlock in ata_queue_request() Message-ID: <200312250907.hBP97AUG031336@spider.deepcore.dk> In-Reply-To: <20031225145427.J16564@gamplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
It seems Bruce Evans wrote: > ata_queue_request() sleeps in an interrupt handler here: Yes, I have a local fix to help this, the sleep was originally left in to make a backport to -stable easier (ie no mutexes), but this need to be changed here. I'll get it committed asap, but it is hollidays and the kids has alot of new toys :) > % void > % ata_queue_request(struct ata_request *request) > % { > % /* mark request as virgin (it might be a reused one) */ > % request->result = request->status = request->error = 0; > % request->flags &= ~ATA_R_DONE; > % > % /* put request on the locked queue at the specified location */ > % mtx_lock(&request->device->channel->queue_mtx); > % if (request->flags & ATA_R_AT_HEAD) > % TAILQ_INSERT_HEAD(&request->device->channel->ata_queue, request, chain); > % else > % TAILQ_INSERT_TAIL(&request->device->channel->ata_queue, request, chain); > % mtx_unlock(&request->device->channel->queue_mtx); > % > % /* should we skip start ? */ > % if (!(request->flags & ATA_R_SKIPSTART)) > % ata_start(request->device->channel); > % > % /* if this was a requeue op callback/sleep already setup */ > % if (request->flags & ATA_R_REQUEUE) > % return; > % > % /* if this is not a callback and we havn't seen DONE yet -> sleep */ > % if (!request->callback) { > % while (!(request->flags & ATA_R_DONE)) > % tsleep(request, PRIBIO, "atareq", hz/10); > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > % } > % } > > when it is called from an interrupt handler. It is called from an interrupt > handler as part of timeout processing: > > ... > msleep(...) > ata_queue_request(...) > ata_via_family_setmode(...) > ata_identify_devices(...) > ata_reinit(...) > ata_timeout(...) > softclock(...) > ithread_loop(...) > ... > > The timeout was called here shortly after ad2 hung: > > Dec 25 12:28:27 besplex kernel: ad2: TIMEOUT - READ_DMA retrying (2 retries left) > Dec 25 12:28:27 besplex kernel: ata1: resetting devices .. > Dec 25 12:28:27 besplex kernel: ad2: FAILURE - already active DMA on this device > Dec 25 12:28:27 besplex kernel: ad2: setting up DMA failed > > ATA_R_DONE was never set and wakeup_request() was never called either, so > softclock() was deadlocked and tsleep() never returned. > > The system ran surprisingly well with softclock() deadlocked. ad0 worked > and everything that didn't use timeouts worked. Examples of things that > didn't work because they use timeouts: > - syscons screen updates. > - statistics in top and systat. > - sleep 1 in shells. > - mbmon (shows the status, then never repeats). > > I tried the following to recover: > - call wakeup(request) using ddb. This worked, but ATA_R_DONE was never > set so ata_queue_request() just looped. > - also ignore the ATA_R_DONE check using ddb. This un-deadlocked > softclock(), but ata1 remained wedged. > - then call "atacontrol reinit 1". This partly worked: > > Dec 25 14:31:12 besplex kernel: ata1: resetting devices .. > Dec 25 14:31:44 besplex kernel: ad2: WARNING - removed from configuration > Dec 25 14:31:44 besplex kernel: done > > but the ata driver caused a null pointer panic an instant later. > > - ad2 didn't come back after a hard reset. > - ad2 came back after a power cycle. > > This was on an undermydesktop. Problems resuming on laptops may be similar. > The hardware may really be wedged. Then the software shouldn't make things > worse by sleeping or spinning in the timeout handler. > > Bruce > -Søren Yes I know it works under windows!!
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200312250907.hBP97AUG031336>