From owner-freebsd-current@FreeBSD.ORG Mon Jun 7 03:16:46 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D553516A4D0; Mon, 7 Jun 2004 03:16:46 +0000 (GMT) Received: from www.mmlab.cse.yzu.edu.tw (www.mmlab.cse.yzu.edu.tw [140.138.145.166]) by mx1.FreeBSD.org (Postfix) with ESMTP id 00B7243D53; Sun, 6 Jun 2004 20:16:46 -0700 (PDT) (envelope-from avatar@mmlab.cse.yzu.edu.tw) Received: by www.mmlab.cse.yzu.edu.tw (qmail, from userid 1000) id 7DEF04EFCDB; Mon, 7 Jun 2004 11:16:43 +0800 (CST) Received: from localhost (localhost [127.0.0.1]) by www.mmlab.cse.yzu.edu.tw (qmail) with ESMTP id 799514EFCDA; Mon, 7 Jun 2004 11:16:43 +0800 (CST) Date: Mon, 7 Jun 2004 11:16:43 +0800 (CST) From: Tai-hwa Liang To: Don Lewis In-Reply-To: <200406062012.i56KCMjt008540@gw.catspoiler.org> Message-ID: <040607103216B.85106@www.mmlab.cse.yzu.edu.tw> References: <200406062012.i56KCMjt008540@gw.catspoiler.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-current@FreeBSD.org cc: dl@leo.org Subject: Re: LOR No 9 and strange other kernel messages X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2004 03:16:47 -0000 On Sun, 6 Jun 2004, Don Lewis wrote: [...] > >> Maybe your sound client software (or something else on the system) is > >> querying for the number of vchans. Even "sysctl -a" will invoke the > >> handler. > > > > I'm using the stock mpg123 0.59r to play MP3 files. Since I've never read > > the source before, I'm not sure whether it utilised vchan related sysctls > > or not. > > The code that calls pcm_inprog() and prints the "x: 2" debug message > appears to be an attempt at implementing a reader/writer lock. I'm > pretty sure the failure that triggers the debug message is harmless, > other than causing the sysctl() call to return EWOULDBLOCK. I'm glad to know that the message is harmless. However, the "x: %d" is a little too obscured to endusers IMHO. Shouldn't that be protected by #ifdef DIAGNOSTIC or something like PCM_DEBUG? > The most likely trigger for this event is if the client software is > writing data to sound device and is blocked in the write() syscall when > another thread calls sysctl(). You could find out which process is > calling sysctl() by printing curproc->p_pid in place of or in addition > to x. Ah, that explained everything. I've added the diagnostic code in sys/dev/sound/pcm/sound.c, and can 100% trigger this while mpg123 is playing and having a "sysctl -a" running in another xterm at the same time. x: 2, current pid: 39234 (sysctl) x: 2, current pid: 39234 (sysctl) It also looks like gcc will use sysctl() to retrieve physical memory size before compilation. I guess the "x: 2" I ran into was just a result of running gcc too frequently(make buildworld buildkernel). > [...] > >> Do your sound hardware and em0 share the same irq? If they do, that > > > > Nope. > > > > interrupt total rate > > irq0: clk 356127 99 > > irq1: atkbd0 6006 1 > > irq4: cbb0 em0++ 1378364 386 > > irq6: cbb1 pcm0 110211 30 > > irq7: ppc0 3 0 > > irq8: rtc 455852 127 > > irq12: psm0 16785 4 > > irq13: npx0 1 0 > > irq14: ata0 322853 90 > > irq15: ata1 55 0 > > Total 2646257 742 > > In that case, the em0 problem is likely to be something else. Does em0 > lock up even if you don't use the sound hardware? Yes. It always locked up if the network load is high(w/ ACPI enabled). > > >> might be a clue that the problems are related. Can you boot with ACPI > >> disabled, and if so, does it change the symptoms? > > > > Whoops! My first trial on booting without ACPI -- kernel panic in > > usb_get_next_event(). Apparently I have to disconnect the USB mouse > > prior to kernel booting. See final section for the backtrace. > > Sounds like bug #3. Any idea about how bug #3 can happen? I've tried to comment out the #ifdef DIAGNOSTIC code around sys/dev/usb/usb.c:752 to see if the extra checking for NULL ue helps. Unfortunately that didn't work for me: the kernel always panic at usb_get_next_event+0x5e after starting usbd. > > And, yes, the em0 didn't hang after booting to the non-ACPI environment. > > However, there're still something weird happening at that moment: > > > > 1. There're still a couple of "x: 2" dumped on console. Test case: > > "make buildworld buildkernel -DNOCLEAN" in one xterm, mpg123 in > > another xterm, and providing file downloading to another Windoze > > box. > > > > 2. While the remote box was downloading(Intel eepro 100/VE) files, > > the number of interrupt on em0 was dramatically reduced to 300 ~ 500+ > > per second. It was 3000+ when ACPI was enabled(no device polling in > > kernel). > > > > 3. The system average load is surprising low(I'm using SCHED_4BSD), > > about 0.06 ~ 0.30; however, the download was awfully slooowwwww. > > It took my brother's Windoze XP about 25 minutes to complete the > > whole downloading(3 ISO files, about 2GB in total). I'm pretty sure > > that the mediaopt on em0 was 100baseTX + full-duplex. > > I wonder if the em0 interrupt is not getting enabled or is getting > misrouted and the em0 interrupt service routing is getting triggered by > another interrupt source that is happening at a lower rate. Print out > the irq mapping with ACPI disabled to see what em0 is sharing its irq > with. This is probably something that jhb and njl will need to look at. Oops, it looks like I accidentally enabled the DEVICE_POLLING in kernel configuration file and forgot to turn kern.polling.enable on. After disabling DEVICE_POLLING and booting w/o ACPI, I can get rid of the aforementioned em0 sluggish problem. The interrupt rate on em0 is now 5700 ~ 7200 per second while downloading a couple of files from this laptop, and the average speed is 10+ MBytes. Hmm... it seems that I have to disable USB mouse and ACPI to get em0 back to work at this moment. I'm not quite sure about where the "IRQ mapping" you're talking about to retrieve from; therefore, I assume that came from dmesg: Found $PIR table, 15 entries at 0xc00fdea0 PCI-Only Interrupts: none Location Bus Device Pin Link IRQs embedded 0 0 A 0x60 3 4 5 6 7 9 10 11 12 14 15 embedded 0 0 B 0x61 3 4 5 6 7 9 10 11 12 14 15 embedded 0 0 C 0x62 3 4 5 6 7 9 10 11 12 14 15 embedded 0 0 D 0x63 3 4 5 6 7 9 10 11 12 14 15 embedded 0 2 A 0x60 3 4 5 6 7 9 10 11 12 14 15 embedded 0 2 B 0x61 3 4 5 6 7 9 10 11 12 14 15 embedded 0 1 A 0x60 3 4 5 6 7 9 10 11 12 14 15 embedded 0 1 B 0x61 3 4 5 6 7 9 10 11 12 14 15 embedded 1 0 A 0x60 3 4 5 6 7 9 10 11 12 14 15 embedded 1 0 B 0x61 3 4 5 6 7 9 10 11 12 14 15 embedded 0 30 A 0x60 3 4 5 6 7 9 10 11 12 14 15 embedded 0 30 B 0x61 3 4 5 6 7 9 10 11 12 14 15 embedded 0 30 C 0x62 3 4 5 6 7 9 10 11 12 14 15 embedded 0 30 D 0x63 3 4 5 6 7 9 10 11 12 14 15 embedded 2 0 A 0x60 3 4 5 6 7 9 10 11 12 14 15 embedded 2 0 B 0x61 3 4 5 6 7 9 10 11 12 14 15 slot 1 2 2 A 0x62 3 4 5 6 7 9 10 11 12 14 15 slot 1 2 2 B 0x63 3 4 5 6 7 9 10 11 12 14 15 embedded 2 3 A 0x60 3 4 5 6 7 9 10 11 12 14 15 embedded 2 3 B 0x61 3 4 5 6 7 9 10 11 12 14 15 embedded 2 3 D 0x63 3 4 5 6 7 9 10 11 12 14 15 slot 2 9 0 A 0x60 3 4 5 6 7 9 10 11 12 14 15 slot 2 9 0 B 0x61 3 4 5 6 7 9 10 11 12 14 15 slot 2 9 0 C 0x62 3 4 5 6 7 9 10 11 12 14 15 slot 2 9 0 D 0x63 3 4 5 6 7 9 10 11 12 14 15 embedded 9 1 A 0x61 3 4 5 6 7 9 10 11 12 14 15 embedded 9 2 A 0x62 3 4 5 6 7 9 10 11 12 14 15 embedded 9 2 B 0x62 3 4 5 6 7 9 10 11 12 14 15 embedded 2 1 A 0x60 3 4 5 6 7 9 10 11 12 14 15 embedded 2 8 A 0x68 3 4 5 6 7 9 10 11 12 14 15 embedded 0 29 A 0x60 3 4 5 6 7 9 10 11 12 14 15 embedded 0 29 B 0x63 3 4 5 6 7 9 10 11 12 14 15 embedded 0 29 C 0x62 3 4 5 6 7 9 10 11 12 14 15 embedded 0 29 D 0x6b 3 4 5 6 7 9 10 11 12 14 15 embedded 0 31 A 0x62 3 4 5 6 7 9 10 11 12 14 15 embedded 0 31 B 0x61 3 4 5 6 7 9 10 11 12 14 15 pcib0: at pcibus 0 on motherboard pir0: on motherboard $PIR: Links after initial probe: Link IRQ Rtd Ref IRQs 0x60 255 N 10 3 4 5 6 7 9 10 11 12 14 15 0x61 255 N 10 3 4 5 6 7 9 10 11 12 14 15 0x62 255 N 8 3 4 5 6 7 9 10 11 12 14 15 0x63 255 N 6 3 4 5 6 7 9 10 11 12 14 15 0x68 255 N 1 3 4 5 6 7 9 10 11 12 14 15 0x6b 255 N 1 3 4 5 6 7 9 10 11 12 14 15 $PIR: Found matching pin for 1.0.INTA at func 0: 4 $PIR: Found matching pin for 2.0.INTA at func 0: 4 $PIR: Found matching pin for 2.0.INTB at func 1: 6 $PIR: Found matching pin for 2.2.INTA at func 0: 11 $PIR: Found matching pin for 2.1.INTA at func 0: 4 $PIR: Found matching pin for 0.29.INTA at func 0: 4 $PIR: Found matching pin for 0.29.INTB at func 1: 10 $PIR: Found matching pin for 0.29.INTC at func 2: 11 $PIR: Found matching pin for 0.29.INTD at func 7: 11 $PIR: Found matching pin for 0.31.INTA at func 1: 255 $PIR: Found matching pin for 0.31.INTB at func 3: 6 $PIR: Links after initial IRQ discovery: Link IRQ Rtd Ref IRQs 0x60 4 Y 10 3 4 5 6 7 9 10 11 12 14 15 0x61 6 Y 10 3 4 5 6 7 9 10 11 12 14 15 0x62 11 Y 8 3 4 5 6 7 9 10 11 12 14 15 0x63 10 Y 6 3 4 5 6 7 9 10 11 12 14 15 0x68 255 N 1 3 4 5 6 7 9 10 11 12 14 15 0x6b 11 Y 1 3 4 5 6 7 9 10 11 12 14 15 $PIR: IRQs used by BIOS: 4 6 10 11 $PIR: Interrupt Weights: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ] [ 0 0 0 0 10 0 10 0 0 0 6 9 0 0 0 0 ] For the complete dmesg, please consult: http://www.mmlab.cse.yzu.edu.tw/~avatar/dmesg-noacpi.txt http://www.mmlab.cse.yzu.edu.tw/~avatar/dmesg-acpi.txt