Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 Aug 2004 22:44:58 -0700
From:      Julian Elischer <julian@elischer.org>
To:        noackjr@alumni.rice.edu
Cc:        freebsd-current@freebsd.org
Subject:   Re: Deadlocks with recent SMP current
Message-ID:  <411EF85A.30006@elischer.org>
In-Reply-To: <411E9399.3050200@alumni.rice.edu>
References:  <20040813121208.M31181@cvs.imp.ch> <20040813102922.E93695@carver.gumbysoft.com> <411D20DF.2000503@samsco.org> <411E9399.3050200@alumni.rice.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
Jon Noack wrote:
> On 08/13/04 15:13, Scott Long wrote:
> 
>> Doug White wrote:
>>
>>> On Fri, 13 Aug 2004, Martin Blapp wrote:
>>>
>>>> Since yesterday I'm getting complete deadlocks. This time
>>>> unrelated the servers are nor loaded at all, the just freeze
>>>> after a while. No break into DDB possible at all.
>>>
>>>
>>> Welcome to the club; I've been having them on my -curent builder 
>>> since Aug 4. I'm going to set up a duplicate box and start 
>>> binary-searching for the offending commit(s).
>>>
>>> Preemption is the default, disabled.
>>>
>> > My box is a dual-600MHz P3 with 1GB RAM and running kde. A make -j3
>>
>>> buildworld will lock it up 75% of the time. It'll survive a 
>>> nonparallel build, and it'll survive a kernel build.
>>>
>>> Haven't tried WITNESS+INVARIANTS yet since it really dogs the
>>> machine. :)
>>
>>
>> Can you try the patch below? It's really only a band-aid, but might 
>> make things usable for now. Also, are more lockups being seen under 
>> ULE or under 4BSD. There was a recent change to ULE (rev 1.120 of 
>> sched_ule.c) that seems to have aggrivated the scheduler problems on 
>> my test systems.
>>
>> Scott
>>
>> Index: kern_switch.c
>> ===================================================================
>> RCS file: /usr/ncvs/src/sys/kern/kern_switch.c,v
>> retrieving revision 1.78
>> diff -u -r1.78 kern_switch.c
>> --- kern_switch.c       10 Aug 2004 00:26:25 -0000      1.78
>> +++ kern_switch.c       13 Aug 2004 20:11:27 -0000
>> @@ -345,6 +345,8 @@
>>                 return;
>>         }
>>
>> +       critical_enter();
>> +
>>         tda = kg->kg_last_assigned;
>>         if ((ke = td->td_kse) == NULL) {
>>                 if (kg->kg_idle_kses) {
>> @@ -441,6 +443,7 @@
>>                 CTR3(KTR_RUNQ, "setrunqueue: held: td%p kg%p pid%d",
>>                         td, td->td_ksegrp, td->td_proc->p_pid);
>>         }
>> +       critical_exit();
>>  }
>>
>>  /*
> 
> 
> Here's a data point:
> My dual Pentium3 system has been up for 20+ hours with this patch. 
> Previously, it wouldn't survive for more than an hour or so (regardless 
> of load).


try the following change instead:
in maybe_preempt() in kern_switch.c

         ctd = curthread;
+        if ((ctd->td_kse == NULL) || (ctd->td_kse->ke_thread != ctd))
+               return (0);
         pri = td->td_priority;


> 
> Jon
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?411EF85A.30006>