From owner-freebsd-current@FreeBSD.ORG Mon Nov 29 21:47:17 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 57AE3106564A; Mon, 29 Nov 2010 21:47:17 +0000 (UTC) (envelope-from giovanni.trematerra@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 8ADA28FC08; Mon, 29 Nov 2010 21:47:16 +0000 (UTC) Received: by fxm16 with SMTP id 16so3131021fxm.13 for ; Mon, 29 Nov 2010 13:47:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=rMxnU/sUercXAT5Qap1kGexsJg+Yxqg/Ydyg9BHAoXg=; b=m+e2L8r27Gm01VrAOOLXwuIs5VhWen6a2CiBIzdOqW8jEfPUeKN1Hvtvn9IwMA7cma MMXNUAR+KEVOEX/BJaBgdjFahz46BeW4FRt5ic1HnVJ+ZTOSALQOGwlu5NFqWFYeWEzg eDTuW8qhdj2sm3tgGGiTrocbisqUqPPBI2PPU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=ZFrEgIDcv1c7AdJH7t/eZHQogUqUButTn7fucQirSxxZIlpMWKH6ba9XpuA02kNDLS ZH1WKuHWw366OAXrZqJOY5O8XXms+tBMsyBkP6fG3pHCj6V6lwcOk9pn7pMF8HuTfqUi S7FIjm6e7s/TmKv44ZOkiOEMfVq/Bwu+moTZI= MIME-Version: 1.0 Received: by 10.223.103.4 with SMTP id i4mr5919245fao.70.1291067235356; Mon, 29 Nov 2010 13:47:15 -0800 (PST) Received: by 10.223.87.70 with HTTP; Mon, 29 Nov 2010 13:47:15 -0800 (PST) In-Reply-To: References: <201011291007.37044.jhb@freebsd.org> <4CF3E68C.4050300@FreeBSD.org> Date: Mon, 29 Nov 2010 22:47:15 +0100 Message-ID: From: Giovanni Trematerra To: Attilio Rao Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Alexander Motin , David Rhodus , freebsd-current@freebsd.org Subject: Re: panic: sched_priority: invalid priority 2906: nice 0, ticks 122865664 ftick 516947 ltick 517947 tick pri 2726 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Nov 2010 21:47:17 -0000 On Mon, Nov 29, 2010 at 9:56 PM, Attilio Rao wrote: > 2010/11/29 Alexander Motin : >> On 29.11.2010 17:07, John Baldwin wrote: >>> >>> On Friday, November 26, 2010 4:38:49 pm David Rhodus wrote: >>>> >>>> I hit this panic on my NFS server. >>>> >>>> -DR >>>> >>>> coke.fun dumped core - see /var/crash/vmcore.2 >>>> >>>> Fri Nov 26 14:50:48 UTC 2010 >>>> >>>> FreeBSD coke.fun 9.0-CURRENT FreeBSD 9.0-CURRENT #14 r215800: Wed Nov >>>> 24 12:35:30 UTC 2010 =A0 =A0 root@coke.fun:/usr/obj/usr/src/sys/GENERI= C >>>> i386 >>>> >>>> panic: sched_priority: invalid priority 2906: nice 0, ticks 122865664 >>>> ftick 516947 ltick 517947 tick pri 2726 >>> >>> I ran the numbers and assuming a hz of 1000, this requires you to have = a >>> very >>> large value for ts_ticks (about (2726 * 24)<< =A010). =A0I suspect this= is due >>> to >>> sched_tick() being invoked for a long idle sleep combined with the >>> eventtimer >>> changes. =A0Can you go to frame 10 and 'p td->td_sched->ts_ticks'? >> >> As I can see, this is VirtualBox virtual machine. So it is still a quest= ion >> what side makes so large hole in sched_tick() on some CPUs. It could be >> interesting to get ktr(4) dump with KTR_SPARE2 mask: >> >> options =A0 =A0 =A0 =A0 KTR >> options =A0 =A0 =A0 =A0 ALQ >> options =A0 =A0 =A0 =A0 KTR_ALQ >> options =A0 =A0 =A0 =A0 KTR_ENTRIES=3D131072 >> options =A0 =A0 =A0 =A0 KTR_COMPILE=3D(KTR_SPARE2) >> options =A0 =A0 =A0 =A0 KTR_MASK=3D(KTR_SPARE2) > > I'm sure gianni (CC'ed) got =A0this bug > and got some conclusions on it > before (maybe he also has a patch). I got it on QEMU and assumed that QEMU was not doing a proper job of distributing run-time amongst cores (so VirtualBox???). I figured out that sched_tick is being passed a huge number of ticks elapse= d for the cpu at startup, in my case, by hardclock_anycpu (kern_clock.c). I haven't a patch only a dirty hack just to make sure we won't be running for more than 5s solid, if we have a huge number of ticks in input to sched_tick, which is something that ULE can still handle. Hope this helps. diff -r d16464301129 sys/kern/kern_clock.c --- a/sys/kern/kern_clock.c Thu Sep 23 11:56:35 2010 -0400 +++ b/sys/kern/kern_clock.c Sun Oct 03 17:53:39 2010 -0400 @@ -525,7 +525,7 @@ hardclock_anycpu(int cnt, int usermode) PROC_SUNLOCK(p); } thread_lock(td); - sched_tick(cnt); + sched_tick((cnt < (hz*10)/2) ? cnt : (hz*10)/2); td->td_flags |=3D flags; thread_unlock(td); -- Giovanni Trematerra