Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 25 Dec 2003 16:08:31 +0100
From:      "Willem Jan Withagen" <wjw@withagen.nl>
To:        "Soren Schmidt" <sos@spider.deepcore.dk>, "Bruce Evans" <bde@zeta.org.au>
Cc:        current@freebsd.org
Subject:   Re: deadlock in ata_queue_request()
Message-ID:  <002901c3caf9$855abe50$4c1b3dd4@digiware.nl>
References:  <200312250907.hBP97AUG031336@spider.deepcore.dk>

next in thread | previous in thread | raw e-mail | index | archive | help
Not shure if this is the same problem, but I get many of these.
And on regular basis it freezes my system beyond getting into DDB, or
crtl-alt-del, only the hard reset gets me out.
Console has than:
    ad4: READ command timeout tag=0 serv=0 - resetting
    ata2: resetting devices ..
Normally this is followed with:
    done ...

But once in a while the box does not return...

So is this a disk going bad, or is it a bug??

This is with GENERIC-5.1-p11
The disk is:
    ATA channel 2:
    Master:  ad4 <Maxtor 6Y060L0/YAR41VW0> ATA/ATAPI rev 7
Running on a Promise PDC20269 UDMA133 controller at DMA133.

But the box crashed again (20 minutes up), so I'm getting a little worried.
It's part of a 4 disk vinum raid5, so I've got time to go

--WjW

----- Original Message ----- 
From: "Soren Schmidt" <sos@spider.deepcore.dk>
To: "Bruce Evans" <bde@zeta.org.au>
Cc: <current@freebsd.org>; <sos@freebsd.org>
Sent: Thursday, December 25, 2003 10:07 AM
Subject: Re: deadlock in ata_queue_request()


> It seems Bruce Evans wrote:
> > ata_queue_request() sleeps in an interrupt handler here:
>
> Yes, I have a local fix to help this, the sleep was originally left in to
> make a backport to -stable easier (ie no mutexes), but this need to be
> changed here. I'll get it committed asap, but it is hollidays and the
> kids has alot of new toys :)
>
> > % void
> > % ata_queue_request(struct ata_request *request)
> > % {
> > %     /* mark request as virgin (it might be a reused one) */
> > %     request->result = request->status = request->error = 0;
> > %     request->flags &= ~ATA_R_DONE;
> > %
> > %     /* put request on the locked queue at the specified location */
> > %     mtx_lock(&request->device->channel->queue_mtx);
> > %     if (request->flags & ATA_R_AT_HEAD)
> > % TAILQ_INSERT_HEAD(&request->device->channel->ata_queue, request,
chain);
> > %     else
> > % TAILQ_INSERT_TAIL(&request->device->channel->ata_queue, request,
chain);
> > %     mtx_unlock(&request->device->channel->queue_mtx);
> > %
> > %     /* should we skip start ? */
> > %     if (!(request->flags & ATA_R_SKIPSTART))
> > % ata_start(request->device->channel);
> > %
> > %     /* if this was a requeue op callback/sleep already setup */
> > %     if (request->flags & ATA_R_REQUEUE)
> > % return;
> > %
> > %     /* if this is not a callback and we havn't seen DONE yet -> sleep
*/
> > %     if (!request->callback) {
> > % while (!(request->flags & ATA_R_DONE))
> > %     tsleep(request, PRIBIO, "atareq", hz/10);
> >       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > %     }
> > % }
> >
> > when it is called from an interrupt handler.  It is called from an
interrupt
> > handler as part of timeout processing:
> >
> > ...
> > msleep(...)
> > ata_queue_request(...)
> > ata_via_family_setmode(...)
> > ata_identify_devices(...)
> > ata_reinit(...)
> > ata_timeout(...)
> > softclock(...)
> > ithread_loop(...)
> > ...
> >
> > The timeout was called here shortly after ad2 hung:
> >
> > Dec 25 12:28:27 besplex kernel: ad2: TIMEOUT - READ_DMA retrying (2
retries left)
> > Dec 25 12:28:27 besplex kernel: ata1: resetting devices ..
> > Dec 25 12:28:27 besplex kernel: ad2: FAILURE - already active DMA on
this device
> > Dec 25 12:28:27 besplex kernel: ad2: setting up DMA failed
> >
> > ATA_R_DONE was never set and wakeup_request() was never called either,
so
> > softclock() was deadlocked and tsleep() never returned.
> >
> > The system ran surprisingly well with softclock() deadlocked.  ad0
worked
> > and everything that didn't use timeouts worked.  Examples of things that
> > didn't work because they use timeouts:
> > - syscons screen updates.
> > - statistics in top and systat.
> > - sleep 1 in shells.
> > - mbmon (shows the status, then never repeats).
> >
> > I tried the following to recover:
> > - call wakeup(request) using ddb.  This worked, but ATA_R_DONE was never
> >   set so ata_queue_request() just looped.
> > - also ignore the ATA_R_DONE check using ddb.  This un-deadlocked
> >   softclock(), but ata1 remained wedged.
> > - then call "atacontrol reinit 1".  This partly worked:
> >
> > Dec 25 14:31:12 besplex kernel: ata1: resetting devices ..
> > Dec 25 14:31:44 besplex kernel: ad2: WARNING - removed from
configuration
> > Dec 25 14:31:44 besplex kernel: done
> >
> > but the ata driver caused a null pointer panic an instant later.
> >
> > - ad2 didn't come back after a hard reset.
> > - ad2 came back after a power cycle.
> >
> > This was on an undermydesktop.  Problems resuming on laptops may be
similar.
> > The hardware may really be wedged.  Then the software shouldn't make
things
> > worse by sleeping or spinning in the timeout handler.
> >
> > Bruce
> >
>
> -Søren
>                      Yes I know it works under windows!!
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?002901c3caf9$855abe50$4c1b3dd4>