From owner-freebsd-current@FreeBSD.ORG Sat Nov 13 12:16:37 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4B7A616A4CE; Sat, 13 Nov 2004 12:16:37 +0000 (GMT) Received: from spider.deepcore.dk (cpe.atm2-0-53484.0x50a6c9a6.abnxx9.customer.tele.dk [80.166.201.166]) by mx1.FreeBSD.org (Postfix) with ESMTP id 63CFE43D46; Sat, 13 Nov 2004 12:16:36 +0000 (GMT) (envelope-from sos@DeepCore.dk) Received: from [194.192.25.143] (laptop.deepcore.dk [194.192.25.143]) by spider.deepcore.dk (8.12.11/8.12.10) with ESMTP id iADCGVUL091410; Sat, 13 Nov 2004 13:16:33 +0100 (CET) (envelope-from sos@DeepCore.dk) Message-ID: <4195FAF2.4050409@DeepCore.dk> Date: Sat, 13 Nov 2004 13:15:46 +0100 From: =?ISO-8859-1?Q?S=F8ren_Schmidt?= User-Agent: Mozilla Thunderbird 0.7.2 (X11/20040802) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Zoltan Frombach References: <26249.1100342074@critter.freebsd.dk> <4195E5DB.2070302@DeepCore.dk> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable X-mail-scanned: by DeepCore Virus & Spam killer v1.4 cc: Garance A Drosihn cc: Poul-Henning Kamp cc: freebsd-current@freebsd.org cc: Robert Watson Subject: Re: 5.3-RELEASE: WARNING - WRITE_DMA interrupt timout X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Nov 2004 12:16:37 -0000 Zoltan Frombach wrote: > I will apply this patch first thing tomorrow. But I don't see how will = I=20 > see any difference? Does it put something into a log file? Shouldn't it= ? The change is that if you see the "WARNING interrupt seen" we know that=20 it was the upper layers that used up the 5 secs of timeout, not that some of it was used by a disk being slow to respond. -S=F8ren > Zoltan >=20 > Poul-Henning Kamp wrote: >=20 >> In message <4195E1FF.5090906@DeepCore.dk>,=20 >> =3D?ISO-8859-1?Q?S=3DF8ren_Schmidt?=3D wri >> tes: >> >> >>>>> Timeout is 5 secs, which is a pretty long time in this context IMHO= =2E. >>>> >>>> >>>> Five seconds counted from when ? >>> >>> >>> Now thats the nasty part :) >>> ATA starts the timeout when the request is issued to the device, so=20 >>> theoretically the disk could take 4.9999 secs to complete the request= =20 >>> and then the timeout fires before the taskqueue gets its chance at=20 >>> it, but IMHO thats pretty unlikely... >> >> >> I find that far more likely than kernel threads being stalled for that= >> long. ATA disks doing bad-block stuff takes several seconds on some >> of the disks I've had my hands on. >> >>> Anyhow, I can just remove the warning from ATA if that makes anyone=20 >>> happy, since its just a warning and ATA doesn't do anything with it=20 >>> at all. >>> However, IMNHO this points at a problem somewhere that we should=20 >>> better understand and fix instead. >> >> >> I would prefer you reset the timer to five seconds in your interrupt >> routine so we can see exactly on which side of that the time is spent.= >=20 >=20 > It would be even better to time how long both ops take and be able to > get that via a sysctl or something (I have that on my TODO list but its= > loooong :) ). >=20 > Anyhow resetting it is easy (patch against 5.3R): >=20 > Index: ata-queue.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > RCS file: /home/ncvs/src/sys/dev/ata/ata-queue.c,v > retrieving revision 1.32.2.5 > diff -u -r1.32.2.5 ata-queue.c > --- ata-queue.c 24 Oct 2004 09:27:37 -0000 1.32.2.5 > +++ ata-queue.c 13 Nov 2004 10:44:40 -0000 > @@ -216,6 +216,9 @@ > ata_completed(request, 0); > } > else { > + if (!dumping) > + callout_reset(&request->callout, request->timeout * hz, > + (timeout_t*)ata_timeout, request); > if (request->bio && !(request->flags & ATA_R_TIMEOUT)) { > ATA_DEBUG_RQ(request, "finish bio_taskqueue"); > bio_taskqueue(request->bio, (bio_task_t *)ata_completed, > request); >=20