From owner-freebsd-hackers@FreeBSD.ORG Tue Jun 16 23:01:48 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 921441065672; Tue, 16 Jun 2009 23:01:48 +0000 (UTC) (envelope-from mel.flynn+fbsd.hackers@mailing.thruhere.net) Received: from mailhub.rachie.is-a-geek.net (rachie.is-a-geek.net [66.230.99.27]) by mx1.freebsd.org (Postfix) with ESMTP id 795948FC19; Tue, 16 Jun 2009 23:01:47 +0000 (UTC) (envelope-from mel.flynn+fbsd.hackers@mailing.thruhere.net) Received: from smoochies.rachie.is-a-geek.net (mailhub.lan.rachie.is-a-geek.net [192.168.2.11]) by mailhub.rachie.is-a-geek.net (Postfix) with ESMTP id 4D9E47E837; Tue, 16 Jun 2009 15:01:46 -0800 (AKDT) From: Mel Flynn To: freebsd-hackers@freebsd.org Date: Tue, 16 Jun 2009 15:01:45 -0800 User-Agent: KMail/1.11.4 (FreeBSD/8.0-CURRENT; KDE/4.2.4; i386; ; ) References: <200906151353.06630.mel.flynn+fbsd.hackers@mailing.thruhere.net> <200906160952.24895.mel.flynn+fbsd.hackers@mailing.thruhere.net> <200906161502.42741.jhb@freebsd.org> In-Reply-To: <200906161502.42741.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200906161501.45411.mel.flynn+fbsd.hackers@mailing.thruhere.net> Cc: Subject: Re: How best to debug locking/scheduler problems X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Jun 2009 23:01:48 -0000 On Tuesday 16 June 2009 11:02:42 John Baldwin wrote: > On Tuesday 16 June 2009 1:52:23 pm Mel Flynn wrote: > > Hi John, > > > > On Tuesday 16 June 2009 04:19:57 John Baldwin wrote: > > > On Monday 15 June 2009 5:53:05 pm Mel Flynn wrote: > > > > PID TID COMM TDNAME KSTACK > > > > 4283 100215 kdeinit4 - mi_switch > > > > turnstile_wait _mtx_lock_sleep uipc_peeraddr kern_getpeername > > > > getpeername syscall Xint0x80_syscall > > > > % ps -ww 4283 > > > > PID TT STAT TIME COMMAND > > > > 4283 ?? T 0:00.38 kdeinit4: kdeinit4: kio_http http > > > > local:/tmp/ksocket-mel/klauncherxJ1635.slave-socket > > > > local:/tmp/ksocket- mel/plasmayC1653.slave-socket (kdeinit4) > > > > > > > > %ls -l /tmp/ksocket-mel/ > > > > > > > > total 2 > > > > -rw-rw-r-- 1 mel wheel 62 Jun 14 22:55 KSMserver__0 > > > > srw------- 1 mel wheel 0 Jun 14 22:55 kdeinit4__0 > > > > srwxrwxr-x 1 mel wheel 0 Jun 14 22:55 > > > > klauncherxJ1635.slave-socket > > > > > > You can use kgdb and the scripts at www.freebsd.org/~jhb/gdb. Simply > > > run 'kgdb' as root and do 'lcd /folder/with/scripts' and 'source gdb6'. > > > You can then do 'lockchain 4283' to find who holds the lock this thread > > > is blocked on and what state they are in. > > > > Looks like a deadlock: > > > > (kgdb) lockchain 4283 > > thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx" > > thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx" > > thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx" > > thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx" > > thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx" > > thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx" > > thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx" > > thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx" > > thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx" > > thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx" > > thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx" > > thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx" > > thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx" > > thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx" > > thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx" > > thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx" > > thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx" > > thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx" > > thread 100215 (pid 4283, kdeinit4) blocked on lock 0xc64374a0 "unp_mtx" > > thread 100122 (pid 1635, klauncher) blocked on lock 0xc6806348 "unp_mtx" > > DEADLOCK > > > > Looking through the scripts now to see how I can get more info on the > > call chain and hoping I don't panic the machine ;). It is quite random to > > reproduce. > > In kgdb you can simply do 'tid 100122' followed by 'bt' and 'tid 100215' > followed by 'bt'. Cool, thanks for helping John. Of course it pretty much shows me what procstat -k shows and can't get any info on the userland part, but I can fully inspect the locks and threads. Both threads are in TDS_INHIBITED state, and blocked on: (kgdb) frame 0 #0 sched_switch (td=0xc5971240, newtd=0xc4d39900, flags=259) at /usr/src/sys/kern/sched_ule.c:1864 1864 cpuid = PCPU_GET(cpuid); print newtd->td_name $9 = "idle: cpu0\000\000\000\000\000\000\000\000\000" Is there anything you want to see to shed some light on why these threads might be deadlocked? This is a 8-current kernel, seen this issue for a while (USB interface device ohci # OHCI PCI->USB interface device ehci # EHCI PCI->USB interface (USB 2.0) device usb # USB Bus (required) #device udbp # USB Double Bulk Pipe devices device uhid # "Human Interface Devices" device ukbd # Keyboard device ulpt # Printer device umass # Disks/Mass storage - Requires scbus and da device ums # Mouse device urio # Diamond Rio 500 MP3 player # USB Serial devices device u3g # USB-based 3G modems (Option, Huawei, Sierra) device uark # Technologies ARK3116 based serial adapters device ubsa # Belkin F5U103 and compatible serial adapters device uftdi # For FTDI usb serial adapters device uipaq # Some WinCE based devices device uplcom # Prolific PL-2303 serial adapters device uslcom # SI Labs CP2101/CP2102 serial adapters device uvisor # Visor and Palm devices device uvscom # USB serial support for DDI pocket's PHS # USB Ethernet, requires miibus device aue # ADMtek USB Ethernet device axe # ASIX Electronics USB Ethernet device cdce # Generic USB over Ethernet device cue # CATC USB Ethernet device kue # Kawasaki LSI USB Ethernet device rue # RealTek RTL8150 USB Ethernet device udav # Davicom DM9601E USB # FireWire support device firewire # FireWire bus code device sbp # SCSI over FireWire (Requires scbus and da) device fwe # Ethernet over FireWire (non-standard!) device fwip # IP over FireWire (RFC 2734,3146) device dcons # Dumb console driver device dcons_crom # Configuration ROM for dcons # # SMB bus # # System Management Bus support is provided by the 'smbus' device. # Access to the SMBus device is via the 'smb' device (/dev/smb*), # which is a child of the 'smbus' device. # # Supported devices: # smb standard I/O through /dev/smb* # # Supported SMB interfaces: # iicsmb I2C to SMB bridge with any iicbus interface # bktr brooktree848 I2C hardware interface # intpm Intel PIIX4 (82371AB, 82443MX) Power Management Unit # alpm Acer Aladdin-IV/V/Pro2 Power Management Unit # ichsmb Intel ICH SMBus controller chips (82801AA, 82801AB, 82801BA) # viapm VIA VT82C586B/596B/686A and VT8233 Power Management Unit # amdpm AMD 756 Power Management Unit # amdsmb AMD 8111 SMBus 2.0 Controller # nfpm NVIDIA nForce Power Management Unit # nfsmb NVIDIA nForce2/3/4 MCP SMBus 2.0 Controller # device smbus # Bus support, required for smb below. device ichsmb # # MMC/SD # # mmc MMC/SD bus # mmcsd MMC/SD memory card # sdhci Generic PCI SD Host Controller # device mmc device mmcsd device sdhci