From owner-freebsd-current@FreeBSD.ORG Mon Jun 7 04:48:38 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A677E16A4CE for ; Mon, 7 Jun 2004 04:48:38 +0000 (GMT) Received: from gw.catspoiler.org (217-ip-163.nccn.net [209.79.217.163]) by mx1.FreeBSD.org (Postfix) with ESMTP id E98E143D4C for ; Mon, 7 Jun 2004 04:48:37 +0000 (GMT) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.12.11/8.12.11) with ESMTP id i574mTpP009303; Sun, 6 Jun 2004 21:48:33 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200406070448.i574mTpP009303@gw.catspoiler.org> Date: Sun, 6 Jun 2004 21:48:29 -0700 (PDT) From: Don Lewis To: avatar@mmlab.cse.yzu.edu.tw In-Reply-To: <040607103216B.85106@www.mmlab.cse.yzu.edu.tw> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii cc: freebsd-current@FreeBSD.org cc: dl@leo.org Subject: Re: LOR No 9 and strange other kernel messages X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2004 04:48:38 -0000 On 7 Jun, Tai-hwa Liang wrote: > On Sun, 6 Jun 2004, Don Lewis wrote: > [...] >> >> Maybe your sound client software (or something else on the system) is >> >> querying for the number of vchans. Even "sysctl -a" will invoke the >> >> handler. >> > >> > I'm using the stock mpg123 0.59r to play MP3 files. Since I've never read >> > the source before, I'm not sure whether it utilised vchan related sysctls >> > or not. >> >> The code that calls pcm_inprog() and prints the "x: 2" debug message >> appears to be an attempt at implementing a reader/writer lock. I'm >> pretty sure the failure that triggers the debug message is harmless, >> other than causing the sysctl() call to return EWOULDBLOCK. > > I'm glad to know that the message is harmless. However, the "x: %d" is > a little too obscured to endusers IMHO. Shouldn't that be protected by > #ifdef DIAGNOSTIC or something like PCM_DEBUG? This diagnotic message should probably just go away. Also the locking should be fixed to avoid the EWOULDBLOCK error. >> The most likely trigger for this event is if the client software is >> writing data to sound device and is blocked in the write() syscall when >> another thread calls sysctl(). You could find out which process is >> calling sysctl() by printing curproc->p_pid in place of or in addition >> to x. > > Ah, that explained everything. I've added the diagnostic code in > sys/dev/sound/pcm/sound.c, and can 100% trigger this while mpg123 is > playing and having a "sysctl -a" running in another xterm at the same time. > > x: 2, current pid: 39234 (sysctl) > x: 2, current pid: 39234 (sysctl) > > It also looks like gcc will use sysctl() to retrieve physical memory size > before compilation. I guess the "x: 2" I ran into was just a result of > running gcc too frequently(make buildworld buildkernel). Gcc only looks up hw.physmem and hw.usermem, so it shouldn't trigger this message. >> > [...] >> >> Do your sound hardware and em0 share the same irq? If they do, that >> > >> > Nope. >> > >> > interrupt total rate >> > irq0: clk 356127 99 >> > irq1: atkbd0 6006 1 >> > irq4: cbb0 em0++ 1378364 386 >> > irq6: cbb1 pcm0 110211 30 >> > irq7: ppc0 3 0 >> > irq8: rtc 455852 127 >> > irq12: psm0 16785 4 >> > irq13: npx0 1 0 >> > irq14: ata0 322853 90 >> > irq15: ata1 55 0 >> > Total 2646257 742 >> >> In that case, the em0 problem is likely to be something else. Does em0 >> lock up even if you don't use the sound hardware? > > Yes. It always locked up if the network load is high(w/ ACPI enabled). > >> >> >> might be a clue that the problems are related. Can you boot with ACPI >> >> disabled, and if so, does it change the symptoms? >> > >> > Whoops! My first trial on booting without ACPI -- kernel panic in >> > usb_get_next_event(). Apparently I have to disconnect the USB mouse >> > prior to kernel booting. See final section for the backtrace. >> >> Sounds like bug #3. > > Any idea about how bug #3 can happen? I've tried to comment out the > #ifdef DIAGNOSTIC code around sys/dev/usb/usb.c:752 to see if the extra > checking for NULL ue helps. Unfortunately that didn't work for me: the > kernel always panic at usb_get_next_event+0x5e after starting usbd. I'd recommend posting a separate message about this panic. That way you are more likely to catch the attention of a USB expert. >> > And, yes, the em0 didn't hang after booting to the non-ACPI environment. >> > However, there're still something weird happening at that moment: >> > >> > 1. There're still a couple of "x: 2" dumped on console. Test case: >> > "make buildworld buildkernel -DNOCLEAN" in one xterm, mpg123 in >> > another xterm, and providing file downloading to another Windoze >> > box. >> > >> > 2. While the remote box was downloading(Intel eepro 100/VE) files, >> > the number of interrupt on em0 was dramatically reduced to 300 ~ 500+ >> > per second. It was 3000+ when ACPI was enabled(no device polling in >> > kernel). >> > >> > 3. The system average load is surprising low(I'm using SCHED_4BSD), >> > about 0.06 ~ 0.30; however, the download was awfully slooowwwww. >> > It took my brother's Windoze XP about 25 minutes to complete the >> > whole downloading(3 ISO files, about 2GB in total). I'm pretty sure >> > that the mediaopt on em0 was 100baseTX + full-duplex. >> >> I wonder if the em0 interrupt is not getting enabled or is getting >> misrouted and the em0 interrupt service routing is getting triggered by >> another interrupt source that is happening at a lower rate. Print out >> the irq mapping with ACPI disabled to see what em0 is sharing its irq >> with. This is probably something that jhb and njl will need to look at. > > Oops, it looks like I accidentally enabled the DEVICE_POLLING in kernel > configuration file and forgot to turn kern.polling.enable on. > > After disabling DEVICE_POLLING and booting w/o ACPI, I can get rid of > the aforementioned em0 sluggish problem. The interrupt rate on em0 is > now 5700 ~ 7200 per second while downloading a couple of files from this > laptop, and the average speed is 10+ MBytes. > > Hmm... it seems that I have to disable USB mouse and ACPI to get em0 > back to work at this moment. > > I'm not quite sure about where the "IRQ mapping" you're talking about to > retrieve from; therefore, I assume that came from dmesg: > > Found $PIR table, 15 entries at 0xc00fdea0 > PCI-Only Interrupts: none > Location Bus Device Pin Link IRQs > embedded 0 0 A 0x60 3 4 5 6 7 9 10 11 12 14 15 > embedded 0 0 B 0x61 3 4 5 6 7 9 10 11 12 14 15 > embedded 0 0 C 0x62 3 4 5 6 7 9 10 11 12 14 15 > embedded 0 0 D 0x63 3 4 5 6 7 9 10 11 12 14 15 > embedded 0 2 A 0x60 3 4 5 6 7 9 10 11 12 14 15 > embedded 0 2 B 0x61 3 4 5 6 7 9 10 11 12 14 15 > embedded 0 1 A 0x60 3 4 5 6 7 9 10 11 12 14 15 > embedded 0 1 B 0x61 3 4 5 6 7 9 10 11 12 14 15 > embedded 1 0 A 0x60 3 4 5 6 7 9 10 11 12 14 15 > embedded 1 0 B 0x61 3 4 5 6 7 9 10 11 12 14 15 > embedded 0 30 A 0x60 3 4 5 6 7 9 10 11 12 14 15 > embedded 0 30 B 0x61 3 4 5 6 7 9 10 11 12 14 15 > embedded 0 30 C 0x62 3 4 5 6 7 9 10 11 12 14 15 > embedded 0 30 D 0x63 3 4 5 6 7 9 10 11 12 14 15 > embedded 2 0 A 0x60 3 4 5 6 7 9 10 11 12 14 15 > embedded 2 0 B 0x61 3 4 5 6 7 9 10 11 12 14 15 > slot 1 2 2 A 0x62 3 4 5 6 7 9 10 11 12 14 15 > slot 1 2 2 B 0x63 3 4 5 6 7 9 10 11 12 14 15 > embedded 2 3 A 0x60 3 4 5 6 7 9 10 11 12 14 15 > embedded 2 3 B 0x61 3 4 5 6 7 9 10 11 12 14 15 > embedded 2 3 D 0x63 3 4 5 6 7 9 10 11 12 14 15 > slot 2 9 0 A 0x60 3 4 5 6 7 9 10 11 12 14 15 > slot 2 9 0 B 0x61 3 4 5 6 7 9 10 11 12 14 15 > slot 2 9 0 C 0x62 3 4 5 6 7 9 10 11 12 14 15 > slot 2 9 0 D 0x63 3 4 5 6 7 9 10 11 12 14 15 > embedded 9 1 A 0x61 3 4 5 6 7 9 10 11 12 14 15 > embedded 9 2 A 0x62 3 4 5 6 7 9 10 11 12 14 15 > embedded 9 2 B 0x62 3 4 5 6 7 9 10 11 12 14 15 > embedded 2 1 A 0x60 3 4 5 6 7 9 10 11 12 14 15 > embedded 2 8 A 0x68 3 4 5 6 7 9 10 11 12 14 15 > embedded 0 29 A 0x60 3 4 5 6 7 9 10 11 12 14 15 > embedded 0 29 B 0x63 3 4 5 6 7 9 10 11 12 14 15 > embedded 0 29 C 0x62 3 4 5 6 7 9 10 11 12 14 15 > embedded 0 29 D 0x6b 3 4 5 6 7 9 10 11 12 14 15 > embedded 0 31 A 0x62 3 4 5 6 7 9 10 11 12 14 15 > embedded 0 31 B 0x61 3 4 5 6 7 9 10 11 12 14 15 > pcib0: at pcibus 0 on motherboard > pir0: on motherboard > $PIR: Links after initial probe: > Link IRQ Rtd Ref IRQs > 0x60 255 N 10 3 4 5 6 7 9 10 11 12 14 15 > 0x61 255 N 10 3 4 5 6 7 9 10 11 12 14 15 > 0x62 255 N 8 3 4 5 6 7 9 10 11 12 14 15 > 0x63 255 N 6 3 4 5 6 7 9 10 11 12 14 15 > 0x68 255 N 1 3 4 5 6 7 9 10 11 12 14 15 > 0x6b 255 N 1 3 4 5 6 7 9 10 11 12 14 15 > $PIR: Found matching pin for 1.0.INTA at func 0: 4 > $PIR: Found matching pin for 2.0.INTA at func 0: 4 > $PIR: Found matching pin for 2.0.INTB at func 1: 6 > $PIR: Found matching pin for 2.2.INTA at func 0: 11 > $PIR: Found matching pin for 2.1.INTA at func 0: 4 > $PIR: Found matching pin for 0.29.INTA at func 0: 4 > $PIR: Found matching pin for 0.29.INTB at func 1: 10 > $PIR: Found matching pin for 0.29.INTC at func 2: 11 > $PIR: Found matching pin for 0.29.INTD at func 7: 11 > $PIR: Found matching pin for 0.31.INTA at func 1: 255 > $PIR: Found matching pin for 0.31.INTB at func 3: 6 > $PIR: Links after initial IRQ discovery: > Link IRQ Rtd Ref IRQs > 0x60 4 Y 10 3 4 5 6 7 9 10 11 12 14 15 > 0x61 6 Y 10 3 4 5 6 7 9 10 11 12 14 15 > 0x62 11 Y 8 3 4 5 6 7 9 10 11 12 14 15 > 0x63 10 Y 6 3 4 5 6 7 9 10 11 12 14 15 > 0x68 255 N 1 3 4 5 6 7 9 10 11 12 14 15 > 0x6b 11 Y 1 3 4 5 6 7 9 10 11 12 14 15 > $PIR: IRQs used by BIOS: 4 6 10 11 > $PIR: Interrupt Weights: > [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ] > [ 0 0 0 0 10 0 10 0 0 0 6 9 0 0 0 0 ] > > For the complete dmesg, please consult: > > http://www.mmlab.cse.yzu.edu.tw/~avatar/dmesg-noacpi.txt > http://www.mmlab.cse.yzu.edu.tw/~avatar/dmesg-acpi.txt Wierd ... in the ACPI case, em0 gets attached, gets detached, and then gets attached again. em0: port 0x8000-0x803f mem 0xc0200000-0xc020ffff,0xc0220000-0xc023ffff irq 4 at device 1.0 on pci2 em0: Reserved 0x20000 bytes for rid 0x10 type 3 at 0xc0220000 em0: Reserved 0x40 bytes for rid 0x18 type 4 at 0x8000 em0: [GIANT-LOCKED] em0: bpf attached em0: Speed:N/A Duplex:N/A [ snip ] em0: Link is up 100 Mbps Full Duplex [ snip ] em0: detached [ snip ] em0: port 0x8000-0x803f mem 0xc0200000-0xc020ffff,0xc0220000-0xc023ffff irq 4 at device 1.0 on pci2 pcib2: device em0 requested decoded memory range 0xc0220000-0xc023ffff pcib2: device em0 requested decoded I/O range 0x8000-0x803f em0: [GIANT-LOCKED] em0: bpf attached em0: Speed:N/A Duplex:N/A em0: Link is up 100 Mbps Full Duplex It looks like em0 shares irq 4 with the exact same set of devices in both cases. It does seem strange to me that it is sharing irq 4 with sio0. Typically the sio ports get exclusive use of irqs. The whole part of the dmesg starting with "em0: detached" and "pci0: driver added" is something that I haven't seen before, but I'm not running bleeding edge -CURRENT at the moment. I'd recommend starting a separate thread to discuss your em0 problem.