From owner-freebsd-sparc64@FreeBSD.ORG Tue Mar 1 00:04:38 2005 Return-Path: Delivered-To: freebsd-sparc64@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9DB9516A4CE; Tue, 1 Mar 2005 00:04:38 +0000 (GMT) Received: from obsecurity.dyndns.org (CPE0050040655c8-CM00111ae02aac.cpe.net.cable.rogers.com [69.199.47.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4877E43D55; Tue, 1 Mar 2005 00:04:38 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id C14D1512A7; Mon, 28 Feb 2005 16:04:36 -0800 (PST) Date: Mon, 28 Feb 2005 16:04:36 -0800 From: Kris Kennaway To: net@FreeBSD.org Message-ID: <20050301000436.GA33346@xor.obsecurity.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="VS++wcV0S1rZb1Fb" Content-Disposition: inline User-Agent: Mutt/1.4.2.1i cc: rwatson@FreeBSD.org cc: sparc64@FreeBSD.org Subject: Race condition in mb_free_ext()? X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Mar 2005 00:04:38 -0000 --VS++wcV0S1rZb1Fb Content-Type: text/plain; charset=us-ascii Content-Disposition: inline I'm seeing an easily-provoked livelock on quad-CPU sparc64 machines running RELENG_5. It's hard to get a good trace because the processes running on other CPUs cannot be traced from DDB, but I've been lucky a few times: db> show alllocks Process 15 (swi1: net) thread 0xfffff8001fb07480 (100008) exclusive sleep mutex so_snd r = 0 (0xfffff800178432a8) locked @ netinet/tcp_input.c:2189 exclusive sleep mutex inp (tcpinp) r = 0 (0xfffff800155c3b08) locked @ netinet/tcp_input.c:744 exclusive sleep mutex tcp r = 0 (0xc0bdf788) locked @ netinet/tcp_input.c:617 db> wh 15 Tracing pid 15 tid 100008 td 0xfffff8001fb07480 sab_intr() at sab_intr+0x40 psycho_intr_stub() at psycho_intr_stub+0x8 intr_fast() at intr_fast+0x88 -- interrupt level=0xd pil=0 %o7=0xc01a0040 -- mb_free_ext() at mb_free_ext+0x28 sbdrop_locked() at sbdrop_locked+0x19c tcp_input() at tcp_input+0x2aa0 ip_input() at ip_input+0x964 netisr_processqueue() at netisr_processqueue+0x7c swi_net() at swi_net+0x120 ithread_loop() at ithread_loop+0x24c fork_exit() at fork_exit+0xd4 fork_trampoline() at fork_trampoline+0x8 db> That code is here in mb_free_ext(): /* * This is tricky. We need to make sure to decrement the * refcount in a safe way but to also clean up if we're the * last reference. This method seems to do it without race. */ while (dofree == 0) { cnt = *(m->m_ext.ref_cnt); if (atomic_cmpset_int(m->m_ext.ref_cnt, cnt, cnt - 1)) { if (cnt == 1) dofree = 1; break; } } mb_free_ext+0x24: casa 0x4 , %g2, %g1 mb_free_ext+0x28: subcc %g1, %g2, %g0 which is inside the atomic_cmpset_int (i.e. it's probably spinning in the loop). Can anyone see if there's a problem with this code, or perhaps the sparc64 implementation of atomic_cmpset_int()? Kris --VS++wcV0S1rZb1Fb Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (FreeBSD) iD8DBQFCI7GUWry0BWjoQKURAlD7AJ972l7rDX+G0cG95Iv2pqEVRINnrQCdHQeP fItGM33s+lUrRQehQkKJx8I= =TG2u -----END PGP SIGNATURE----- --VS++wcV0S1rZb1Fb--