From owner-freebsd-current@FreeBSD.ORG Wed Nov 10 09:41:02 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 91D4C16A4CE; Wed, 10 Nov 2004 09:41:02 +0000 (GMT) Received: from hotmail.com (bay2-dav9.bay2.hotmail.com [65.54.246.113]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0E97143D62; Wed, 10 Nov 2004 09:41:02 +0000 (GMT) (envelope-from tssajo@hotmail.com) Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; Wed, 10 Nov 2004 01:41:01 -0800 Received: from 24.24.201.219 by BAY2-DAV9.phx.gbl with DAV; Wed, 10 Nov 2004 09:40:24 +0000 X-Originating-IP: [24.24.201.219] X-Originating-Email: [tssajo@hotmail.com] X-Sender: tssajo@hotmail.com From: "Zoltan Frombach" To: "Robert Watson" , =?iso-8859-1?Q?S=F8ren_Schmidt?= References: Date: Wed, 10 Nov 2004 01:40:26 -0800 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2180 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 Message-ID: X-OriginalArrivalTime: 10 Nov 2004 09:41:01.0213 (UTC) FILETIME=[638E74D0:01C4C709] cc: freebsd-current@freebsd.org Subject: Re: Re: 5.3-RELEASE: WARNING - WRITE_DMA interrupt timout - what does it mean? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Nov 2004 09:41:02 -0000 Just let me know what to do and I'll do it. I will recompile my kernel with the patch you've sent. And I will let you know what the result is. In the meantime, here is my dmesg information: # dmesg ad0: WARNING - WRITE_DMA interrupt was seen but timeout fired LBA=2491143 ad0: WARNING - WRITE_DMA interrupt was seen but timeout fired LBA=2481983 Waiting (max 60 seconds) for system process `vnlru' to stop...done Waiting (max 60 seconds) for system process `bufdaemon' to stop...done Waiting (max 60 seconds) for system process `syncer' to stop... Syncing disks, vnodes remaining...12 2 2 0 0 done No buffers busy after final sync Uptime: 2d0h40m18s Waiting (max 60 seconds) for system process `hpt_wt' to stop...done Copyright (c) 1992-2004 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.3-RELEASE #1: Sat Nov 6 06:03:26 PST 2004 tss@www.frombach.com:/usr/obj/usr/src/sys/GENERIC Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 2.80GHz (2806.38-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf29 Stepping = 9 Features=0xbfebfbff Hyperthreading: 2 logical CPUs real memory = 1056899072 (1007 MB) avail memory = 1024679936 (977 MB) ACPI APIC Table: ioapic0: Changing APIC ID to 2 ioapic0 irqs 0-23 on motherboard npx0: [FAST] npx0: on motherboard npx0: INT 16 interface acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 cpu0: on acpi0 acpi_tz0: on acpi0 acpi_button0: on acpi0 acpi_button1: on acpi0 pcib0: port 0x10e0-0x10ff,0x1000-0x10df,0x480-0x48f,0xcf8-0xcff on acpi0 pci0: on pcib0 agp0: mem 0xd0000000-0xd7ffffff at device 0.0 on pci0 pcib1: at device 1.0 on pci0 pci1: on pcib1 pci1: at device 0.0 (no driver attached) isab0: at device 2.0 on pci0 isa0: on isab0 atapci0: port 0x4000-0x400f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 2.5 on pci0 ata0: channel #0 on atapci0 ata1: channel #1 on atapci0 ohci0: mem 0xe1104000-0xe1104fff irq 20 at device 3.0 on pci0 ohci0: [GIANT-LOCKED] usb0: OHCI version 1.0, legacy support usb0: on ohci0 usb0: USB revision 1.0 uhub0: SiS OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 3 ports with 3 removable, self powered ohci1: mem 0xe1100000-0xe1100fff irq 21 at device 3.1 on pci0 ohci1: [GIANT-LOCKED] usb1: OHCI version 1.0, legacy support usb1: on ohci1 usb1: USB revision 1.0 uhub1: SiS OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 3 ports with 3 removable, self powered ohci2: mem 0xe1101000-0xe1101fff irq 22 at device 3.2 on pci0 ohci2: [GIANT-LOCKED] usb2: OHCI version 1.0, legacy support usb2: on ohci2 usb2: USB revision 1.0 uhub2: SiS OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered pci0: at device 3.3 (no driver attached) xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xe000-0xe07f mem 0xe1103000-0xe110307f irq 17 at device 9.0 on pci0 miibus0: on xl0 bmtphy0: <3c905B 10/100 internal PHY> on miibus0 bmtphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl0: Ethernet address: 00:50:04:76:49:e7 fdc0: port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A orm0: at iomem 0xc0000-0xcbfff on isa0 pmtimer0 on isa0 atkbdc0: at port 0x64,0x60 on isa0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] ppc0: parallel port not found. sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 2806375780 Hz quality 800 Timecounters tick every 10.000 msec ad0: 78167MB [158816/16/63] at ata0-master UDMA133 acd0: CDROM at ata1-master UDMA33 Mounting root from ufs:/dev/ad0s1a ipfw2 initialized, divert disabled, rule-based forwarding disabled, default to deny, logging disabled And uname -a : # uname -a FreeBSD www.xxxxxxxx.com 5.3-RELEASE FreeBSD 5.3-RELEASE #1: Sat Nov 6 06:03:26 PST 2004 xxx@www.xxxxxxxx.com:/usr/obj/usr/src/sys/GENERIC i386 I run the 5.3-RELEASE GENERIC kernel (UP mode). Zoltan Subject: Re: Re: 5.3-RELEASE: WARNING - WRITE_DMA interrupt timout - what does it mean? On Wed, 10 Nov 2004, Søren Schmidt wrote: > Zoltan Frombach wrote: > > I just upgraded to 5.3-RELEASE a few days ago. This morning this line > > got into my system log file: > > Nov 9 06:14:03 www kernel: ad0: WARNING - WRITE_DMA interrupt was seen > > but timeout fired LBA=2491143 > > > > I've never seen this message before. Can someone please explain what it > > means? With Thanks, > > It means that the disk has processed the write request (interrupt seen), > but that the system (the bio_taskqueue) hasn't been able to get the > result returned to the kernel. > > Your disk is not involved in this problem since it has done its part, > but the rest of the system is either busy with something else, or there > are bugs lurking that prohibits the bio_taskqueue from running. > > Either way its a WARNING not a FAILURE :) I'm still a bit skeptical that the task queue is at fault -- I run my notebook with continuous measurement of the latency to schedule tasks, generating a warning for any latency > .5 seconds, and the only time I ever see that sort of latency is during the boot process when ACPI has scheduled a task to run, but the task queue thread has not yet been allowed to run: ipfw2 initialized, divert loadable, rule-based forwarding disabled, default to deny, logging disabled taskqueue_run: warning, queue time of 0.900011314 for context 0xc045638c ad0: 19077MB [38760/16/63] at ata0-master UDMA33 It would be quite interesting to have someone run with this timing code on a system reporting the warning to see what they find. What is the threshold to generate the ATA timeout warning? My impression was that it was several seconds, which is quite a long time. The attached patch checks for .5 second or above latency. If the task queue is really getting stalled for several seconds, we should figure out what task it is. There's KTR tracing in the taskqueue code that we can use to dump a trace stream to see what task is taking so long. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Principal Research Scientist, McAfee Research Index: kern/subr_taskqueue.c =================================================================== RCS file: /home/ncvs/src/sys/kern/subr_taskqueue.c,v retrieving revision 1.25 diff -u -r1.25 subr_taskqueue.c --- kern/subr_taskqueue.c 5 Oct 2004 04:16:00 -0000 1.25 +++ kern/subr_taskqueue.c 23 Oct 2004 19:18:26 -0000 @@ -36,9 +36,16 @@ #include #include #include +#include #include +#include #include +int tq_in; +SYSCTL_INT(_kern, OID_AUTO, tq_in, CTLFLAG_RD, &tq_in, 0, ""); +int tq_out; +SYSCTL_INT(_kern, OID_AUTO, tq_out, CTLFLAG_RD, &tq_out, 0, ""); + static MALLOC_DEFINE(M_TASKQUEUE, "taskqueue", "Task Queues"); static void *taskqueue_giant_ih; static void *taskqueue_ih; @@ -140,6 +147,9 @@ return 0; } + getnanotime(&task->ta_queuetime); + tq_in++; + /* * Optimise the case when all tasks have the same priority. */ @@ -172,6 +182,7 @@ taskqueue_run(struct taskqueue *queue) { struct task *task; + struct timespec tv; int owned, pending; owned = mtx_owned(&queue->tq_mutex); @@ -187,8 +198,16 @@ pending = task->ta_pending; task->ta_pending = 0; task->ta_flags |= TAF_PENDING; + tq_out++; mtx_unlock(&queue->tq_mutex); + getnanotime(&tv); + timespecsub(&tv, &task->ta_queuetime); + if (tv.tv_nsec >= 0500000000) { + printf("taskqueue_run: warning, queue time of %d.%09ld " + "for context %p\n", tv.tv_sec, tv.tv_nsec, + task->ta_func); + } task->ta_func(task->ta_context, pending); mtx_lock(&queue->tq_mutex); Index: sys/_task.h =================================================================== RCS file: /home/ncvs/src/sys/sys/_task.h,v retrieving revision 1.3 diff -u -r1.3 _task.h --- sys/_task.h 5 Oct 2004 04:16:01 -0000 1.3 +++ sys/_task.h 23 Oct 2004 19:38:16 -0000 @@ -46,6 +46,7 @@ task_fn_t *ta_func; /* task handler */ void *ta_context; /* argument for handler */ int ta_flags; /* Flags */ + struct timespec ta_queuetime; /* time enqueued */ }; #define TAF_PENDING 0x1 /* Task is being run now */