From owner-freebsd-current@FreeBSD.ORG Sat Aug 23 09:33:11 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 87F57106568A for ; Sat, 23 Aug 2008 09:33:11 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.157]) by mx1.freebsd.org (Postfix) with ESMTP id E81D18FC14 for ; Sat, 23 Aug 2008 09:33:10 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: by fg-out-1718.google.com with SMTP id l26so825479fgb.35 for ; Sat, 23 Aug 2008 02:33:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender :to:subject:cc:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references :x-google-sender-auth; bh=y6PpEzAmytM3FK+MWhlkHKYy3+ITFooztEkH0SLyS6g=; b=wiHFyQ3SnkaF25bTbC6aLnWAvIBwnDPUQPwh6/oqQiC3OwnvgYswB08uRwX+Vr9V0+ oIDzeaWw8jJdRvppHm1Xx7t8OoalDKzhmd22qiuTGt0UqYxm67mBYAvlj9MUnixSPMF/ Efs1C1YfXI0rv4z+ZK1aM5X0JTNGNqi1DCrZg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references:x-google-sender-auth; b=LgWGYYyEykI9L3Sc8fl9sYiEZ04IfFLNjD2evsdQXttwRt+ESkhx4V89lNwf+y9FRh SyDpW+WYZfSwWgG1fnkAC1huCXiPP17ZQl0/iiSWYDLaZiGg6wlQLA0wGe8O/CqwP4A1 Lngm2Dqn2j243E3z+Lh+KC5hT67tH3dPmKkak= Received: by 10.86.72.3 with SMTP id u3mr1550483fga.62.1219483989132; Sat, 23 Aug 2008 02:33:09 -0700 (PDT) Received: by 10.86.78.14 with HTTP; Sat, 23 Aug 2008 02:33:09 -0700 (PDT) Message-ID: <3bbf2fe10808230233u195f3530wf4e3b6e007b638d9@mail.gmail.com> Date: Sat, 23 Aug 2008 11:33:09 +0200 From: "Attilio Rao" Sender: asmrookie@gmail.com To: "John Baldwin" In-Reply-To: <200808230003.44081.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <11617822.2511219426408994.JavaMail.coremail@bj163app64.163.com> <200808230003.44081.jhb@freebsd.org> X-Google-Sender-Auth: 2cadeefc22e93c4d Cc: kevinxlinuz , freebsd-current@freebsd.org Subject: Re: [BUG] I think sleepqueue need to be protected in sleepq_broadcast X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Aug 2008 09:33:11 -0000 2008/8/23, John Baldwin : > On Friday 22 August 2008 01:33:28 pm kevinxlinuz wrote: > > Hi, > > I'm looking in the problem ( amd64/124200: kernel panic on mutex sleepq > > chain).It troubles me for a long time.I add a KASSERT in sleepq_broadcast() > > to check the sleepqueue's wait channel.At last it turn out that the > > sleepqueue's wait channel was changed before sleepq_resume_thread(). In > > sleepq_lookup(),We can easily find sq->sq_wchan == wchan.But after a short > > time,the sq->sq_wchan nolonger equal with wchan,so I think it was changed > > by other threads. > > The sleepq chain lock is already held for all of sleepq_broadcast() by the > caller (see wakeup() and cv_broadcastpri()). That said, I don't have any > other good ideas for the panic you are seeing. Do you have a crash dump? It > might be interesting to see what other thread is using that sleep queue. Ben Close and me investigated this bug extensively and still didn't find the source. Factors we have now: 1) The lock, when accessing with DDB, is exactly locked by another thread even if it should be held by the curthread. It is like the mutex cookie gets overwritten by the other thread like if it was free. An extra drop (and subsequent acquire) is not very likely because of (2). 2) KTR traces doesn't show anything wrong. Accesses to sleepqueue chain lock are paired (both on via mtx_* interface and thread_lock respectively). This is very strange because it excludes a wrong locks semantic. 3) The problem is reproducible even on 4BSD, without PREEMPTION and even with smp sysctl disabled (it just brings more time). 4) The bug seems triggered by sx + waitchannel when used in the sx_sleep() and such. I'm thinking this can be some nasty, but sorta of deterministic, race between sleepqueue accesses between the sx sleepqueue and the waitchannel sleepqueue. I have still to think better about it, but actually I'm pretty busy and if you have good ideas please let me know. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein