Date: Fri, 10 Nov 2000 11:35:30 -0800 (PST) From: John Baldwin <jhb@FreeBSD.org> To: Matthew Jacob <mjacob@feral.com> Cc: alpha@FreeBSD.org, Andrew Gallatin <gallatin@cs.duke.edu> Subject: Re: Does your Alpha run a SMPng kernel? Message-ID: <XFMail.001110113530.jhb@FreeBSD.org> In-Reply-To: <Pine.BSF.4.21.0011101005500.83354-100000@beppo.feral.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 10-Nov-00 Matthew Jacob wrote: > > > On Fri, 10 Nov 2000, Andrew Gallatin wrote: > >> >> John Baldwin writes: >> > >> > Have any clues as to where it is hanging? One thing that may help is >> > that I >> >> This is the "one device goes south but everything else is happy" sort >> of problem I was complaining about last week. >> >> My UP1000 running today's -current just wedged while I was rcp'ing a >> large file to it over a 100Mb link. It's busy speweing "fxp0: device >> timeout" Everything but the nic seems happy. (But since I'm running >> with NIS and NFS, loosing the nic is fatal) > > 'ata' timeouts occur with abandon during probing PC164 or on XP1000. ata timeouts are due to the interrupt either not arriving or the interrupt being ignored due to other weirdness. Try adding some printf()'s into the ata interrupt handler to see if it gets called and how far it gets if it does get called. The fxp timeouts are definitely due to no interrupts as well. If you look below, neither the fxp0 or swi:net interrupt threads are runnable, but are stuck in SWAIT instead. >> The machine's state is as follows: >> >> Stopped at siointr1+0x17c: br zero,siointr1+0x32c <zero=0x0> >> db> ps >> pid proc addr uid ppid pgrp flag stat wmesg wchan cmd >> 293 fffffe0005aaffa0 fffffe0006a44000 0 124 124 000084 3 select >> fffffc00006411c8 ypbind >> 289 fffffe0005ab0280 fffffe0006a40000 1387 286 286 004184 3 sbwait >> fffffe0006315c08 rcp >> 286 fffffe0005ab0560 fffffe0006a30000 1387 159 286 2004084 3 opause >> fffffe0006a301b0 tcsh >> 249 fffffe0005ab13c0 fffffe0006a08000 0 1 249 004086 3 nanslp >> fffffc000064b398 login >> 248 fffffe0005ab0b20 fffffe0006a26000 0 1 248 004086 3 ttyin >> fffffe00007acc10 getty >> 247 fffffe0005ab0e00 fffffe0006a22000 0 1 247 004086 3 ttyin >> fffffe00007ad210 getty >> 246 fffffe0005ab3ee0 fffffe00069a6000 0 1 246 004086 3 ttyin >> fffffe0000670e10 getty >> 241 fffffe0005ab27e0 fffffe00069e8000 0 1 241 000084 3 sbwait >> fffffe0006314ce8 zhm >> 227 fffffe0005ab0840 fffffe0006a2c000 0 1 227 000184 3 select >> fffffc00006411c8 lpd >> 168 fffffe0005ab16a0 fffffe0006a02000 0 1 168 000084 3 select >> fffffc00006411c8 sshd >> 164 fffffe0005ab3080 fffffe00069d8000 0 1 164 2000184 3 pause >> fffffe00069d81b0 sendmail >> 161 fffffe0005ab1980 fffffe00069fe000 0 1 161 000084 3 nanslp >> fffffc000064b398 cron >> 159 fffffe0005ab10e0 fffffe0006a10000 0 1 159 000084 3 select >> fffffc00006411c8 inetd >> 140 fffffe0005ab3c00 fffffe00069aa000 0 1 140 000084 3 select >> fffffc00006411c8 amd >> 134 fffffe0005ab1c60 fffffe00069fa000 0 1 129 000084 3 nfsidl >> fffffc000066faf0 nfsiod >> 133 fffffe0005ab1f40 fffffe00069f6000 0 1 129 000084 3 nfsidl >> fffffc000066fae8 nfsiod >> 132 fffffe0005ab2220 fffffe00069f2000 0 1 129 000084 3 nfsidl >> fffffc000066fae0 nfsiod >> 131 fffffe0005ab2500 fffffe00069ee000 0 1 129 000084 3 nfsidl >> fffffc000066fad8 nfsiod >> 124 fffffe0005ab2ac0 fffffe00069e2000 0 1 124 000084 3 select >> fffffc00006411c8 ypbind >> 122 fffffe0005ab2da0 fffffe00069dc000 1 1 122 000184 3 select >> fffffc00006411c8 portmap >> 119 fffffe0005ab3360 fffffe00069d4000 0 1 119 000084 3 select >> fffffc00006411c8 ntpd >> 110 fffffe0005ab3920 fffffe00069b6000 0 1 110 000084 3 select >> fffffc00006411c8 syslogd >> 44 fffffe0005ab3640 fffffe00069bc000 0 1 44 000084 3 mfsidl >> fffffe00064ebfe0 mount_mfs >> 5 fffffe0005ab41c0 fffffe00064fc000 0 0 0 000204 3 syncer >> fffffc00006410f0 syncer >> 4 fffffe0005ab44a0 fffffe00064f8000 0 0 0 100204 3 psleep >> fffffc000064b64c bufdaemon >> 3 fffffe0005ab4780 fffffe00064f4000 0 0 0 000204 3 psleep >> fffffc000065e02c vmdaemon >> 2 fffffe0005ab4a60 fffffe00064f0000 0 0 0 100204 3 psleep >> fffffc0000624990 pagedaemon >> 25 fffffe0005ab4d40 fffffe0006304000 0 0 0 000204 6 >> intr: sbc1 >> 24 fffffe0005ab5020 fffffe0006300000 0 0 0 000204 6 >> swi0: tty:sio >> 23 fffffe0005ab5300 fffffe00062fc000 0 0 0 000204 6 >> intr: atkbd0 >> 22 fffffe0005ab55e0 fffffe00062f6000 0 0 0 000204 6 >> intr: fdc0 >> 21 fffffe0005ab58c0 fffffe00062f2000 0 0 0 000204 6 >> intr: ata1 >> 20 fffffe0005ab5ba0 fffffe00062ee000 0 0 0 000204 6 >> intr: ata0 >> 19 fffffe0005ab5e80 fffffe00062ea000 0 0 0 000204 6 >> intr: sym0 >> 18 fffffe0005ab6160 fffffe00062e4000 0 0 0 000204 6 >> intr: dc0 fxp0 >> 17 fffffe0005ab6440 fffffe00062e0000 0 0 0 000204 6 >> swi5: task queue >> 16 fffffe0005ab6720 fffffe00062dc000 0 0 0 000204 6 >> swi3: cambio >> 15 fffffe0005ab6a00 fffffe00062d8000 0 0 0 000204 6 >> swi2: camnet >> 14 fffffe0005ab6ce0 fffffe00062d4000 0 0 0 000204 3 rndslp >> fffffc000066ba08 random >> 13 fffffe0005ab6fc0 fffffe00062d0000 0 0 0 000204 6 >> swi4: vm >> 12 fffffe0005ab72a0 fffffe00062cc000 0 0 0 00020c 2 >> swi6: clock >> 11 fffffe0005ab7580 fffffe00062c8000 0 0 0 000204 6 >> swi1: net >> 10 fffffe0005ab7860 fffffe0005ac2000 0 0 0 00020c 2 >> idle >> 1 fffffe0005ab7b40 fffffe0005abe000 0 0 1 004284 3 wait >> fffffe0005ab7b40 init >> 0 fffffc000066ba18 fffffc00006ec000 0 0 0 000204 3 sched >> fffffc000066ba18 swapper >> db> c >> fxp0: device timeout >> fxp0: device timeout >> fxp0: device timeout >> >> >> Is there a chance that this is being caused by the kernel, say, >> getting a clock interrupt in the middle of doing some low-level >> should-be-atomic timing-dependant I/O operations in either the driver >> or the I/O support routines? >> >> Remember that on the UP1000, everything goes through the isa interrupt >> controller. So you get some interrupts and then they stop. On the x86 we do a disable/enable interrupt scheme for ISA interrupts by dinking with the PIC, so Matt's suggestion to use isa_enable_intr()/isa_disable_intr() might be needed actually. (see hte INTREN() call in ithd_loop() in i386/isa/ithread.c) Actually, it looks like we wait too long to enable the interrupt sources on the x86. However, using the isa_enable/disable_intr() stuff that Matt suggested might help. >> Drew -- John Baldwin <jhb@FreeBSD.org> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-alpha" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.001110113530.jhb>