Date: Wed, 28 Feb 2007 14:18:33 -0500 From: "Stephane E. Potvin" <sepotvin@FreeBSD.org> To: Robert Watson <rwatson@FreeBSD.org> Cc: current@FreeBSD.org Subject: Re: HEADS UP: UNIX domain socket locking changes merged to CVS HEAD Message-ID: <45E5D589.3080202@FreeBSD.org> In-Reply-To: <20070226204916.C56223@fledge.watson.org> References: <20070226204916.C56223@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Robert Watson wrote: > > Dear all, > > After on-and-off development since 2005, I've now merged the UNIX domain > socket locking patch. Special thanks to Kris Kennaway who has been > providing stability testing, performance testing, and general support > and feedback for this project since inception. > > Please let me know if you experience any problems with UNIX domain > sockets -- these changes will affect applications that consume UNIX > domain sockets directly, like MySQL and Postfix, as well as consumers of > POSIX fifos, which are implemented using UNIX domain sockets in-kernel. Since this commit, I've been observing frequent deadlocks on my laptop, mostly when starting-up gnome. It usually takes less than 5 to 10 minutes for the deadlock to happens. I was able to drop into ddb once and got the following information: (there might be some typos as I had to copy this manually) show alllocks Process 906 (gnome-power-manager) thread 0xc553c570 (100126) exclusive sleep mutex unp_mtx r = 0 (0xc5573bb8) locked @ /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:849 shared rw unp_global_rwlock r = 0 (0xc06d1dac) locked @ /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:768 Process 860 (dbus-daemon) thread 0xc4d001d0 (100095) exclusive sleep mutex unp_mtx r = 0 (0xc5573b10) locked @ /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:849 shared rw unp_global_rwlock r = 0 (0xc06d1dac) locked @ /usr/home/FreeBSD/src.CURRENT.libgcc_s/sys/kern/uipc_usrreq.c:768 show lock 0xc5573bb8 class: sleep mutex name: unp_mtx flags: {DEF, RECURSE, DUPOK} state: {OWNED, CONTESTED} owner: 0xc553c570 (tid 100126, pid 906, "gnome-power-manager") show turnstile 0xc5573bb8 Lock: 0xc5573bb8 - (sleep mutex) unp_mtx Lock Owner: 0xc553c570 (tid 100126, pid 906, "gnome-power-manager") Shared Waiters: empty Exclusive Waiters: 0xc4d001d0 (tid 100095, pid 860, "dbus-daemon") Pending Threads: empty show lock 0xc5573b10 class: sleep mutex name: unp_mtx flags: {DEF, RECURSE, DUPOK} state: {OWNED, CONTESTED} owner: 0xc4d001d0 (tid 100095, pid 860, "dbus-daemon") show turnstile 0xc5573b10 Lock: 0xc5573b10 - (sleep mutex) unp_mtx Lock Owner: 0xc4d001d0 (tid 100095, pid 860, "dbus-daemon") Shared Waiters: empty Exclusive Waiters: 0xc553c570 (tid 100126, pid 906, "gnome-power-manager") Pending Threads: empty show lock 0xc06d1dac class: rw name: unp_global_rwlock state: RLOCK: 2 locks waiters: writers show turnstile 0xc06d1dac Lock: 0xc06d1dac - (rw) unp_global_rwlock Lock Owner: none Shared Waiters: empty Exclusive Waiters: 0xc4d00000 (tid 100096, pid 857, "gconfd-2") 0xc4d01570 (tid 100085, pid 804, "login") 0xc4fcaae0 (tid 100133, pid 887, "bonobo-activation-s") 0xc48c23a0 (tid 100106, pid 897, "gaim") 0xc4d01910 (tid 100120, pid 909, "gnome-screensaver") 0xc553cae0 (tid 100123, pid 905, "gnome-mount") Pending Threads: empty bt 100095 Tracing pid 860 tid 100095 td 0xc4d001d0 shced_switch(3301966288,0,1,3226391662,3310601584,...) at 3226314602 = sched_switch+303 mi_switch(1,0,3227647346,647,3228084884,...) at 3226245932 = mi_switch+489 turnstile_wait(3310828472,3310601584,0,3310601586,3310828472,...) at 3226393861 = turnstile_wait+633 _mtx_lock_sleep(3310828472,3301966288,0,3227660663,877,...) at 3226177946 = _mtx_lock_sleep+261 _mtx_lock_flags(3310828472,0,3227660663,877,3310833112,...) at 3226177102 = _mtx_lock_flags+102 uipc_send(3310832888,0,3296484864,0,0,...) at 3226561343 = uipc_send+1058 sosend_generic(3310832888,0,3302262848,3296484864,0,...) at 3226529764 = sosend_generic_1067 sosend(3310832888,0,3302262848,0,0,...) at 3226530139 = sosend+63 soo_write(3304721288,3302262848,3297254528,0,3301966288,...) at 3226433647 = soo_write+121 dofilewrite kern_writev writev syscall bt 100126 Tracing pid 906 tid 100126 td 0xc553c570 sched_switch mi_switch turnstile_wait _mtx_lock_sleep _mtx_locl_flags uipc_send sosend_generic sosend soo_write dofilewrite kern_writev writev syscalL As you can see, the threads 100095 and 100126 both are waiting on each other's lock. The function uipc_send tries to lock two unp_mtx without holding a write lock on unp_global_rwlock. It seems that the write ownership is taken by uipc_send only if nam is not NULL or the PRUS_EOF flag is set. Both of these conditions are false in this particular call scenario. From the comments just above the second lock in uipc_usrreq.c, the global write lock should already acquired by the time we get there. I'm not sure where or under what condition the write lock should be acquired to correctly fix this. I'll keep the core around in case you want me to provide more information. Regards, Steph
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45E5D589.3080202>