From owner-freebsd-hackers@FreeBSD.ORG Thu Aug 25 22:25:32 2011 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B30F1106566B for ; Thu, 25 Aug 2011 22:25:32 +0000 (UTC) (envelope-from kip.macy@gmail.com) Received: from mail-yw0-f54.google.com (mail-yw0-f54.google.com [209.85.213.54]) by mx1.freebsd.org (Postfix) with ESMTP id 749118FC19 for ; Thu, 25 Aug 2011 22:25:32 +0000 (UTC) Received: by ywo32 with SMTP id 32so2584427ywo.13 for ; Thu, 25 Aug 2011 15:25:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=IJhVz3nIcXSGs5x5PUnhNXHGalyTFnx6wq1/fTqgyao=; b=ShzRmIBuvSSyUxl+7ucIdNAhVCBokbrtds00YGsoC6tFoAp3o/dSjv13lf8C6nvBom 9x8Ue8ZSZI2pZoYiNUtkNKZxNQfFC0ZnNBwcpyObbZzYG0JVrt7zLM8TGEEE+kbL2vuX 8rkkwjllYb5rHu4OJ2wlFluDg/wDs2T1Rh20Q= MIME-Version: 1.0 Received: by 10.42.168.72 with SMTP id v8mr232748icy.266.1314309371881; Thu, 25 Aug 2011 14:56:11 -0700 (PDT) Received: by 10.42.243.5 with HTTP; Thu, 25 Aug 2011 14:56:11 -0700 (PDT) In-Reply-To: <4E56BB99.6030706@sgi.com> References: <4E56BB99.6030706@sgi.com> Date: Thu, 25 Aug 2011 23:56:11 +0200 Message-ID: From: Kip Macy To: Charlie Martin Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org Subject: Re: Where to ask about a 7.2 bug, and debugging sys/queue.h errors X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Aug 2011 22:25:32 -0000 On Thu, Aug 25, 2011 at 11:16 PM, Charlie Martin wrote: > We're having a crash in some internal code running on FreeBSD 7.2 > (specifically =A07.2-PRERELEASE FreeBSD 7.2-PRERELEASE and yeah, I know i= t's > quite a bit behind) in which after 18-30 hours of running load tests, the > code panics with: > > panic: Bad link elm 0xffffff0044c09600 next->prev !=3D elm > cpuid =3D 0 > KDB: stack backtrace: > db_trace_self_wrapper() at 0xffffffff8019119a =3D db_trace_self_wrapper+0= x2a > panic() at 0xffffffff80307c72 =3D panic+0x182 > devfs_populate_loop() at 0xffffffff802a43a8 =3D devfs_populate_loop+0x548 > > > First question: where's the most appropriate place to ask about this kind= of > bug on a back version. Probably -stable. I don't know how many developers are still running 7. Most are on 8 at this point. > Second: does this remind anyone of any bugs? =A0Googling came up with a f= ew > somewhat similar things but hasn't provided much insight so far. This panic is very common when list updates aren't adequately serialized. > Third: I tried compiling with the sys/queue.h QUEUE_MACRO_DEBUG defined i= n > order to get more useful information from the panic. =A0The kernel build = fails > in pmap.c when this macro is defined, giving an error saying the CTASSERT > macro is resolving to a negative array size. =A0Is there any particular s= ecret > to using this macro (like, no one goes there any more?) This is because you are running amd64 and the the pv_entry constants were defined assuming the default (smaller) list entry structure. I once fixed this in a local tree, but I think I was so dismayed at the "obviousness" of the bug I was tracking down that I neglected to commit the pmap update. It shouldn't be too hard to calculate the correct constants. Cheers